How to batch convert scanned PDF files to Searchable PDF files and remove background color from new created PDF files (OCRed PDF files)?

I'm looking for a way to convert thousands of pdf's to searchable pdf's. I've used an OCR program. However, you can't select a folder, you have to go into each sub folder, select the files to convert, and then go to the next folder.

What is another way to convert a large number of pdf's to searchable pdf's?

Haven't had any suggestions. Surely there must be a way to batch convert pdf's(?).

Customer
-------------------------------------

image
This is a good question, VeryPDF has a solution to batch convert all of your PDF files in a folder and its subfolders to searchable PDF files with one command line, it's very easy and quickly.

In general, you can use "Image to PDF OCR Converter Command Line" software to do this work, only one software is enough, however, because some PDF files are contain color information, in order to remove color and grayscale background, we will use PDF to Image Converter Command Line to remove color and grayscale background from PDF files first, and then use "Image to PDF OCR Converter Command Line" software to convert from modified image files to searchable PDF files again.

Please refer to following steps to finish this work,

1. Please download PDF to Image Converter Command Line software from this web page first,

http://www.verypdf.com/app/pdf-to-image-converter/try-and-buy.html#buy-cmd
http://www.verypdf.com/dl2.php/pdf2image_win.zip

After you download and unzip it to a folder, you can run following command line to convert a PDF file to black and white TIFF file, and also remove the color background from TIFF file,

pdf2img.exe -$ "XXXXXXXXXXXXXX" -r 300 -threshold 180 "D:\verypdf.pdf" "D:\out.tif"

"-threshold 180" option will remove colors which threshold value less than 180, this option will remove background color automatically.

2. Please download "Image to PDF OCR Converter Command Line" software from this web page,

http://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html#buy-ocr-cmd
http://www.verypdf.com/tif2pdf/image2pdf_cmd_ocr_trial.zip

After you download and unzip it to a folder, you can run following command line to convert and combine modified TIFF files to a multi-page PDF file, with OCR option,

img2pdfnew.exe -$ XXXXXXXXXXXXXXXXXX -width 595 -height 842 -ocr 1 -tsocr -tsocrlang eng "D:\out*.tif" "D:\VeryPDF.pdf"

With above two steps, you will able to remove background color from PDF file and create a searchable PDF file.

If you have thousands of PDF files in a folder, you can use following .bat file to batch convert all of PDF files in this folder to searchable PDF file on the fly,
-------------------------------------
ECHO ON
set InputFolder=D:\downloads\pdf
set OutputFolder=D:\downloads\pdfocr
set TempFolder=D:\test

mkdir %OutputFolder%
mkdir %TempFolder%

for %%F in ("%InputFolder%\*.pdf") do (

del /Q "%TempFolder%\%%~nF*.tif"

.\pdf2image_win\pdf2img.exe -$ "XXXXXXXXXXXXXX" -r 300 "%%F" "%TempFolder%\%%~nF.tif"

.\img2pdfnew.exe -$ XXXXXXXXXXXXXXXXXX -width 595 -height 842 -ocr 1 -tsocr -tsocrlang eng "%TempFolder%\%%~nF*.tif" "%OutputFolder%\%%~nF.pdf"

)
-------------------------------------

Above .bat file is work for one folder at one time, however, if you wish support subfolders automatically, you can use following .bat file,
-------------------------------------
ECHO ON
set InputFolder=D:\downloads\pdf
set TempFolder=D:\test

mkdir %TempFolder%

for /r "%InputFolder%" %%F in (*.pdf) do (

del /Q "%TempFolder%\%%~nF*.tif"

.\pdf2image_win\pdf2img.exe -$ "XXXXXXXXXXXXXX" -r 300 "%%F" "%TempFolder%\%%~nF.tif"

.\img2pdfnew.exe -$ XXXXXXXXXXXXXXXXXX -width 595 -height 842 -ocr 1 -tsocr -tsocrlang eng "%TempFolder%\%%~nF*.tif" "%~dpnF-ocr.pdf"

)
-------------------------------------

even if you have thousands and thousands of PDF files it a folder and its subfolders, above .bat script will OCR all of them with one command line, it's wonderful.

image

We have another option which allow you to monitor a folder and its subfolders automatically, once a PDF file be copied into the monitored folder, the monitor application will convert this PDF file to searchable PDF file automatically, this function can be done by a "PHP Folder Watcher" application, you may download and buy "PHP Folder Watcher" from this web page,

https://veryutils.com/php-folder-watcher

PHP Folder Watcher is a PHP Script to monitor folders recursively, it's also support xcopy function to backup files.

PHP Folder Watcher is a convenient, automated way to monitor folders at background. PHP Folder Watch monitors one or more folders on your computer for new files. When a file is added to a monitored folder, it will call external script or application to process this new file automatically.

PHP Folder Watcher is especially useful for quickly posting scanned documents to a folder. If you save a scanned document to a folder that is being monitored by PHP Folder Watch, the file posting workflow begins automatically.

Enjoy!

VeryPDF

VN:F [1.9.20_1166]
Rating: 2.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in Image to PDF Converter | Tagged , , , , | Leave a comment

Does PDF Text Replacer software support regular expressions?

hi,

Does it possible to use regular expressions in PDF text replacer?

do you have some examples?

regards,
Customer
---------------------------------------

image
Thanks for your message, PDF Text Replacer doesn't support regular expressions currently, it does support simple text replacement only.

btw, we have a PDF Text Replacer Command Line software, the command line version is work better than PDF Text Replacer GUI version, you may download PDF Text Replacer Command Line software from our website to try,

http://www.verypdf.com/app/pdf-text-replacer/try-and-buy.html#buy-cmd
http://www.verypdf.com/dl2.php/pdftextreplacer_cmd.zip

You may also look at following web pages for more information about PDF Text Replacer Command Line software,

http://www.verypdf.com/app/pdf-text-replacer/search-and-replace-pdf-text-command-line.html

http://www.verypdf.com/wordpress/201303/verypdf-releases-pdf-text-replacer-command-line-software-35653.html

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in PDF Text Replacer | Tagged , , | Leave a comment

Error 429, ActiveX component can't create object. It happens when VB tries to run the htmltools.exe application

Hello

I have downloaded your trial version of htmltools.exe.

I wish to use it in a VB6 program that I use. In my program, I can only generate .rtf files, but my client now wants .pdf.

I tried to use your included VB example, but I get the following error when I try to to run it.

Error 429, ActiveX component can't create object. It happens when VB tries to run the line

Set HTMLConverter = CreateObject("HtmlSchee.HtmlShell")

The program is being tested on a Windows 10 PC.

Thank you in advance

All the best.
Customer
------------------------------------------
I tested your VB sample program on an old Windows XP PC and it works fine. I think it’s a Windows 10 issue.

Customer
------------------------------------------

image

http://www.verypdf.com/app/html-converter/try-and-buy.html

Thanks for your message, on the Windows 10 system, you need run the software with administrator privilege, you will able to get it work fine.

VeryPDF

VN:F [1.9.20_1166]
Rating: 2.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: +1 (from 1 vote)
Posted in HTML Converter (htmltools) | Tagged , , | Leave a comment

Regarding Subscription to PDF Editor OCX controls (AxPDFOCXLib library) from C# source code

Hi Team,

Good Morning...!!!

We are in development of some .Net applications for our clients and have a requirement of disabling Print/SaveAs options in PDF files.

We have studied and analysed about your tool PDF Editor OCX ActiveX services for PDF which is almost meets our requirement and decided to take the subscription to it.

But before that we need to clarify some queries regarding ActiveX control and small demo on how it integrates and works.

So, could you please provide us specific point of contact from your team who can help us on the same?

Customer
--------------------------------------


Thanks for your message, we suggest you may download the trial version of PDF Editor OCX Control (ActiveX) Developer License $2999.00 from this web page to try,

http://www.verypdf.com/app/pdf-editor/try-and-buy.html#buy-dev
http://www.verypdf.com/pdf-editor/pdfeditor_ocx.zip

This demo package contains VB, VC, C# and VB.NET examples, you can compile and run these demo projects easily.

We have also PDF Viewer OCX Control (ActiveX) which can be used to view PDF files only, you may download PDF Viewer OCX Control from this web page to try,

http://www.verypdf.com/dl2.php/pdfviewerocx.zip

If you encounter any problem with these products, please feel free to let us know, we will assist you asap.

VeryPDF
--------------------------------------
Hi Team,

Thanks for your response.

I've already downloaded your trial version of PDF Editor OCX Control (ActiveX) and tried, But I'm getting issue with "AxPDFOCXLib.dll" and "PDFOCXLib.dll" files
(Error CS0246 The type or namespace name 'AxPDFOCXLib' could not be found ). I've tried to copy the dll's from others projects which are available in your zip file but couldn't found. Tried by adding "AxInterop.PDFOCXLib.dll" and "Interop.PDFOCXLib.dll" files from PDFAnnotator-C# project but still no luck and also tried to download those dll's separately from your site or from internet but haven't found.

I've attached the screen shot of solution for your reference.

Could you please share those dll's with us? so that we can compile and run the trial version of PDF Editor OCX Control (ActiveX).

Regards,
Customer
--------------------------------------
You need run a CMD window with administrator privilege first, and run following command line to register pdfocx.ocx into your system,

regsvr32 pdfocx.ocx

after you register pdfocx.ocx successful, you will able to 'AxPDFOCXLib' properly.

VeryPDF

See Also:

http://www.verypdf.com/wordpress/201604/royalty-free-pdf-annotator-ocx-activex-control-for-c-and-net-developers-component-to-view-and-annotate-pdf-documents-42537.html

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in PDF Editor | Tagged , | Leave a comment

I am interested in your PDF Editor Toolkit SDK to assist me in converting some batches pdf files to searchable pdf

Dear Verypdf,

I am interested in your PDF Editor Toolkit SDK to assist me in converting some batches pdf files to searchable pdf.

I need to download the trial version to confirm if it will address my challenge but I am having difficulties downloading it.

Kindly assist me with an active download link for me to test run the application before placing order.

Regards,
Customer

Sent from Mail for Windows 10
--------------------------------------------------

image
Thanks for your message, the following products all have the functions to convert from scanned PDF, TIFF, JPG, PNG, etc. raster files to searchable PDF files, the output PDF files are all contain a hidden text layer, you can open the OCRed PDF files in Adobe Reader and search text contents properly,

Image to PDF OCR Converter Command Line,
http://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html#buy-ocr-cmd
http://www.verypdf.com/tif2pdf/image2pdf_cmd_ocr_trial.zip

PDF to Text OCR Converter Command Line,
http://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
http://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip

VeryPDF OCR to Any Converter Command Line,
http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html
http://www.verypdf.com/dl2.php/ocr2any_cmd.zip

Please look at following web pages for more information,

http://www.verypdf.com/wordpress/201211/convert-scanned-pdf-to-searchable-pdf-without-losing-color-32937.html

http://www.verypdf.com/wordpress/201312/bulk-scanned-pdf-files-to-searchable-pdf-files-batch-converter-40025.html

http://www.verypdf.com/wordpress/201211/convert-image-and-scanned-pdf-to-searchable-pdf-32896.html

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in OCR Products, VeryPDF SDK & COM | Tagged | Leave a comment
Page 2 of 1,42012345...102030...Last »