How to split a large PDF into multiple documents with keyword? Split PDF based on keyword

I want to know how to divide a PDF file into multiple files with unique keywords?

I have a large PDF that has been combined from multiple documents.

How can I split the PDF back into multiple documents with a keyword delimiter?

Customer
---------------------------------------------
PDF Content Split is a a powerful software to split PDF to pages based on content!

  • A standalone program requires no Adobe Acrobat.
  • Split batch of PDF at the same time with ease.
  • User-friendly interface and simple operation.
  • Accurately recognize and identify content within PDF file.
  • Packed with varied practical and impressive features.

VeryPDF Content Splitter Command Line is a utility that lets you split Acrobat PDF files into multiple smaller pdf files based on location and text information within the files. It can be used to split composite PDF documents (such as invoices, reports or payroll) into separate files by keywords such as invoice number, account number or employee name.

PDF Content Splitter Command Line can be downloaded from this web page,

http://www.verypdf.com/app/pdf-content-splitter/try-and-buy.html#buy-cmd
http://www.verypdf.com/dl2.php/pdf-content-splitter-cmd.zip

After you download it, you can run following command lines to get the words and their positions,

  pdfcs.exe -searchtext "keyword" C:\in.pdf
  pdfcs.exe -searchtext2 "keyword" C:\in.pdf
  pdfcs.exe -searchtext2 "keyword" -opw 123 -upw 456 C:\in.pdf
  pdfcs.exe -searchtext2 "keyword" -casesensitive C:\in.pdf

such as,

pdfcs.exe -searchtext location E:\pdf-content-splitter-cmd\test.pdf
===== Search keyword in page 1 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 2 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 3 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Found Nothing for keyword (location) in page 4 =====
===== Search keyword in page 5 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 6 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 7 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 8 =====
[90.00, 215.64, 135.31, 226.44] 'Location'
===== Search keyword in page 9 =====
[90.00, 215.64, 135.31, 226.44] 'Location'

after you get the position for this keyword, you can run following command lines to split this PDF file by location easily,

pdfcs.exe -$ XXXXXXXXXXXXX -mode 0 -x 422 -y 139 test.pdf "_out.pdf"

pdfcs.exe -$ XXXXXXXXXXXXX -mode 1 -x 422 -y 139 test.pdf "_out.pdf"

pdfcs.exe -$ XXXXXXXXXXXXX -mode 2 -x 422 -y 139 test.pdf "_out.pdf"

PDF Content Splitter Command Line is a standalone program, it does NOT require Adobe Acrobat Pro, which costs hundreds of dollars.

image

The following is the full command line parameters which included in PDF Content Splitter Command Line application,

C:\pdf-content-splitter-cmd\pdfcs.exe
Web: http://www.verypdf.com
Web: http://www.verydoc.com
Support: http://support.verypdf.com
Email: support@verypdf.com
Build date: Feb 16 2017
VeryPDF PDF Content Splitter Command Line v3.50
Batch split and group PDF pages by keywords.
Copyright 1996-2017 VeryPDF.com Inc.
===========================================
Support input format:
  1. PDF: Adobe Acrobat PDF file format
Support output format:
  1. PDF: Adobe Acrobat PDF file format
===========================================
Usage: pdfcs.exe [options] <Input-file> <Output-file>
  -opw <string>          : owner password (for encrypted files)
  -upw <string>          : user password (for encrypted files)
  -listtext              : list text lines in all PDF pages or selected pages
  -searchtext <string>   : search text in PDF pages and show result to console
  -searchtext2 <string>  : search text in PDF pages, display one word by one word
  -casesensitive         : compare strings with case-sensitive method
  -x <fp>                : set X position to locate text
  -y <fp>                : set Y position to locate text
  -mode <int>            : set PDF Content Splitter mode,
    -mode 0: Group continuous PDF pages which contain same text at special position
    -mode 1: Group valid text at special position and extract pages
    -mode 2: Group all PDF pages which contain same text at special position
  -skip                  : don't overwrite an output file if it already exists
  -h                     : print usage information
  -help                  : print usage information
  --help                 : print usage information
  -?                     : print usage information
  -$ <string>            : input your license key
Examples:
   pdfcs.exe -$ XXXXXXXXXXXXXXXX
   pdfcs.exe -listtext C:\in.pdf
   pdfcs.exe -searchtext "keyword" C:\in.pdf
   pdfcs.exe -searchtext2 "keyword" C:\in.pdf
   pdfcs.exe -searchtext2 "keyword" -opw 123 -upw 456 C:\in.pdf
   pdfcs.exe -searchtext2 "keyword" -casesensitive C:\in.pdf
   pdfcs.exe -x 227 -y 34 -mode 0 C:\in.pdf
   pdfcs.exe -x 227 -y 34 -mode 1 C:\in.pdf
   pdfcs.exe -x 227 -y 34 -mode 2 C:\in.pdf

Batch process examples:
   for %F in (D:\temp\*.pdf) do pdfcs.exe -x 227 -y 34 -mode 0 "%F" "out_%~nF.pdf"
   for %F in (D:\temp\*.pdf) do pdfcs.exe -x 227 -y 34 -mode 1 "C:\test\%~nF.pdf"
   for %F in (D:\temp\*.pdf) do pdfcs.exe -x 227 -y 34 -mode 2 "%F" "C:\test\%~nF.pdf"

Keywords:

PDF Split
PDF Size Splitter
PDF Invoice Split
PDF Page Content Split
PDF Payroll Split
PDF Reports Splitter

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in PDF Split-Merge | Tagged , | Leave a comment

How to extract table and text contents from a PNG image file? Use OCR to convert PNG image to Excel Spreadsheet

I tried your Trial Version to convert a PNG file to Excel. It returned a blank Excel document.

Input: Chose #2 (English)
Output: Tried both xls and xlsx
Attached PNG file is what I was attempting to convert

Does the Trial Version not work with file type PNG?

image

Customer
-----------------------------------------
You can use "VeryPDF Table Extractor OCR" software to extract Table from this PNG image file, please download and install "VeryPDF Table Extractor OCR" software from this web page,

http://www.verypdf.com/app/pdf-to-table-extractor-ocr/try-and-buy.html
http://www.verypdf.com/dl2.php/verypdf-table-extractor-ocr.exe

Please look at attached screenshot for the extracted table.

image

In order to increase the OCR accuracy, you need also do following processes to input PNG image file,

1. Click drop-down button at right of OCR button,

2. Click "Quality" option and select 300% item, this option will increase the DPI quality to input image file,

3. Click "Threshold" option, enter or change this option to 180 or 190 or 210 value, this option will convert from color image file to black and white image, it will improve the OCR accuracy also,

image

VeryPDF

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in Table Extractor OCR | Tagged , , , | Leave a comment

In VeryPDF PDF Extract Tool Command Line software, may I know if the X and Y coordinates represent in points and can I apply a standard factor to convert it to cm?

If I buy the developer license, do I need to integrated the coding of pdf extract in my software or just distribute it together with my software.

Actually I prefer not to touch the integration of coding in order to save my time.

Customer
-----------------------------------------------------

PDF Extract Tool Command Line can be downloaded from this web page,

http://www.verypdf.com/app/pdf-extract-tool/index.html

image

After you purchase the developer license, you need only integrate the DLL or Command Line into your software, then you can redistribute it along with your software to your customers, this can be done easily.

VeryPDF
-----------------------------------------------------
May I know if the X and Y coordinates represent in points and can I apply a standard factor to convert it to cm ?

Or the factor for conversation to cm is a variance?

Customer
-----------------------------------------------------
Yes, the X and Y coordinator are represent in points, you can convert from POINT to CM easily, please look at the details at below,

const float MM2INCH = 0.03937007874015748031496062992126f; // (1 / 25.4f)
const float MM2POINT = 2.8346456692913385826771653543307f; // (1 / 25.4f) * 72

const float INCH2MM = 25.4f;
const float POINT2MM = (25.4f / 72.0f);

The unit is point, you can convert from inch or mm to point by yourself,

For Example, Set page width to 800 pixel and page height to 600 pixel,
pageWidth = 800;
pageHeight= 600;

For Example, Set page width to 8.5 inch and page height to 11 inch,
pageWidth = 8.5 * 72;
pageHeight= 11 * 72;

For Example, Set page width to 210 mm and page height to 297 mm,
pageWidth = 210 / 25.4 * 72;
pageHeight= 297 / 25.4 * 72;

You can convert your inch or mm to point to try again.

VeryPDF
-----------------------------------------------------
May I know the command for just to extract text only to the output file.

I don't need any other information I. The output.

Customer
-----------------------------------------------------
You can run following command lines to extract only text and coordinates from PDF file to text file,

pdfextract.exe -textpos D:\in.pdf D:\out.txt
pdfextract.exe -textpos -nopgbrk D:\in.pdf D:\out.txt

If you needn't coordinates, you can use "PDF to Text OCR Converter Command Line" software to instead of "PDF Extract Command Line" software, "PDF to Text OCR Converter Command Line" does extract text contents only and ignore coordinates,

http://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
http://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in VeryPDF SDK & COM | Tagged , , , | Leave a comment

How to Build a Powerful Mobile Document Scanner in Just 5 Minutes? Mobile Capture SDK turns smartphones and tablets into scanners — Mobile Scanning Solution

VeryPDF released a Mobile Scanner Framework for iOS (PDF Scanner SDK for iOS) today, you can download and purchase it from this web page,

http://www.verypdf.com/app/mobile-pdf-scanner-sdk/index.html
http://www.verypdf.com/app/mobile-pdf-scanner-sdk/try-and-buy.html#buy
http://www.verypdf.com/dl2.php/PDFScannerSDK-iOS.zip

VeryPDF PDF Scanner SDK for iOS (Mobile Scanner Framework) turns your iPhone into a full-featured and powerful scanner for documents, receipts, books, photos, whiteboards, and other text. Using just your iPhone or iPad, you can quickly scan your multipage documents in high quality PDFs, edit, store and send them anywhere.

image

VeryPDF PDF Scanner SDK for iOS (Mobile Scanner Framework) uses advanced fast algorithms to accurately auto-detect document edges, straighten the documents (correct perspective), eliminate shadows and set a perfect contrast for text - black on white.

VeryPDF PDF Scanner SDK for iOS (Mobile Scanner Framework) boasts a powerful yet easy to use interface. Get instant one-tap brightness, rotation and color controls all on one screen!

Our scanner technology includes smart page detection, perspective correction and image enhancement. The batch scanning lets you scan dozens of pages in a matter of seconds.

All processing happens on your iPhone, and the confidentiality of your data is never compromised (no Internet connection needed.)

VeryPDF PDF Scanner SDK for iOS enables your application to automatically detect page borders and smartly remove background.

image

VeryPDF PDF Scanner SDK for iOS enables your application to enhance your image with up to more modes to make the contents clearer and more readable.

image

In VeryPDF PDF Scanner SDK, the scanning a document will be broken down into three simple steps:

Step 1: Detect edges.
Step 2: Use the edges in the image to find the contour (outline) representing the piece of paper being scanned.
Step 3: Apply a perspective transform to obtain the top-down view of the document.

VeryPDF Mobile Scanner Framework has done all works for you. You just need a few code lines to call it from your Xcode project, this can be done within 5 minutes,

With our mobile scanning solution you can scan any desired document with your mobile phone. With our mobile scanning solution, we turn your mobile phone into an office tool that you can use to scan documents.

image

We suggest you may download the trial version to try first,

http://www.verypdf.com/app/mobile-pdf-scanner-sdk/index.html
http://www.verypdf.com/app/mobile-pdf-scanner-sdk/try-and-buy.html#buy
http://www.verypdf.com/dl2.php/PDFScannerSDK-iOS.zip

If you encounter any problem with VeryPDF Mobile Scanner SDK, please feel free to let us know, we will assist you asap,

http://support.verypdf.com/open.php

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in iOS & iPhone & iPad, VeryPDF SDK & COM | Tagged , , , | Leave a comment

Printing PDF Files on Mac OS X and Linux

Mac OS X includes built-in support for printing PDF files. Modern versions of Linux (with CUPS) also include PDF printing support. This page provides sample code for printing PDF files on these two systems.

For Windows (which does not have built-in PDF support), we recommend our PDFPrint Command Line software.

School Printing PDF files on Mac OS X

The following code uses Core Printing and CGPDFDocument (from Core Graphics) to print PDF files. It uses all of the default settings for the specified printer, and prints all pages of the PDF file.

#include <CoreFoundation/CoreFoundation.h>
#include <ApplicationServices/ApplicationServices.h>

bool printPDF(const char *pdfFileName, const char *printerName) {
  CFStringRef s;
  CFURLRef url;
  CGPDFDocumentRef pdfDoc;
  CGPDFPageRef pdfPage;
  PMPrintSession session;
  PMPrintSettings printSettings;
  CFArrayRef printerList;
  PMPrinter printer;
  char prtName[512];
  PMPageFormat pageFormat;
  CGContextRef ctx;
  int nPages, pg, i;

  //--- load the PDF file
  s = CFStringCreateWithCString(NULL, pdfFileName, kCFStringEncodingUTF8);
  url = CFURLCreateWithFileSystemPath(NULL, s, kCFURLPOSIXPathStyle, false);
  CFRelease(s);
  pdfDoc = CGPDFDocumentCreateWithURL(url);
  CFRelease(url);
  if (!pdfDoc) {
    return false;
  }

  //--- create the Session and PrintSettings
  if (PMCreateSession(&session)) {
    CFRelease(pdfDoc);
    return false;
  }
  if (PMCreatePrintSettings(&printSettings)) {
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  if (PMSessionDefaultPrintSettings(session, printSettings)) {
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  s = CFStringCreateWithCString(NULL, pdfFileName, kCFStringEncodingUTF8);
  PMPrintSettingsSetJobName(printSettings, s);
  CFRelease(s);

  //--- set the printer
  if (PMServerCreatePrinterList(kPMServerLocal, &printerList)) {
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  printer = NULL;
  for (i = 0; i < CFArrayGetCount(printerList); ++i) {
    printer = (PMPrinter)CFArrayGetValueAtIndex(printerList, i);
    s = PMPrinterGetName(printer);
    if (CFStringGetCString(s, prtName, sizeof(prtName),
                           kCFStringEncodingUTF8)) {
      if (!strcmp(prtName, printerName)) {
        break;
      }
    }
  }
  if (i >= CFArrayGetCount(printerList)) {
    CFRelease(printerList);
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  if (PMSessionSetCurrentPMPrinter(session, printer)) {
    CFRelease(printerList);
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  CFRelease(printerList);

  //--- get the PageFormat
  if (PMCreatePageFormat(&pageFormat)) {
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  if (PMSessionDefaultPageFormat(session, pageFormat)) {
    PMRelease(pageFormat);
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }

  //--- print
  nPages = CGPDFDocumentGetNumberOfPages(pdfDoc);
  if (PMSetPageRange(printSettings, 1, nPages)) {
    PMRelease(pageFormat);
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  if (PMSessionBeginCGDocumentNoDialog(session, printSettings, pageFormat)) {
    PMRelease(pageFormat);
    PMRelease(printSettings);
    PMRelease(session);
    CFRelease(pdfDoc);
    return false;
  }
  for (pg = 1; pg <= nPages; ++pg) {
    if (PMSessionBeginPageNoDialog(session, pageFormat, NULL) ||
    PMSessionGetCGGraphicsContext(session, &ctx) ||
    !(pdfPage = CGPDFDocumentGetPage(pdfDoc, pg))) {
      PMSessionEndDocumentNoDialog(session);
      PMRelease(pageFormat);
      PMRelease(printSettings);
      PMRelease(session);
      CFRelease(pdfDoc);
      return false;
    }
    CGContextDrawPDFPage(ctx, pdfPage);
    if (PMSessionEndPageNoDialog(session)) {
      PMSessionEndDocumentNoDialog(session);
      PMRelease(pageFormat);
      PMRelease(printSettings);
      PMRelease(session);
      CFRelease(pdfDoc);
      return false;
    }
  }

  PMSessionEndDocumentNoDialog(session);
  PMRelease(pageFormat);
  PMRelease(printSettings);
  PMRelease(session);
  CFRelease(pdfDoc);
  return true;
}

School Printing PDF files on Linux

The following code uses the CUPS API to print PDF files. It uses all of the default settings for the specified printer, and prints all pages of the PDF file.

#include <cups/cups.h>

bool printPDF(const char *pdfFileName, const char *printerName)
{
  return cupsPrintFile(printerName, pdfFileName, pdfFileName, 0, NULL) != 0;
}

School Printing PDF files on Windows

On the Windows system, you can print the PDF files using VeryPDF PDFPrint Command Line application,

http://www.verypdf.com/app/pdf-print-cmd/index.html

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in PDF Print | Tagged , | Leave a comment
Page 4 of 1,386« First...23456...102030...Last »