ocr products, pdf to text converter

disable OCR function in pdftotxtocr.exe

I am trying out the pdftotxtocr.exe and was wondering if there is a switch to disable OCR processing. I tried running without the "-ocr" switch but the program still proceed to step "This PDF file seems not contain text contents. We will use OCR technology to recognize this PDF file continue..."
========================
Thanks for your message, the free trial version of PDF to Text OCR Converter hasn't an option to disable OCR function, however, after you purchased it, please email to us your Order ID, we will send a new version to you, the new version has a "-disableocr" parameter which can be used to disable OCR function completely.

VeryPDF
=========================
Thanks for your response. I will send you the order ID soon.

Another question. I am trying to extract text from the attached scanned sketch but got garbled result. What should be correct switches in order to have the best OCR recognition results?
=====================
Our OCR engine doesn't support handwritten characters, it is support printed characters only, so it can't convert  handwritten characters in your PDF file into editable Word document, please notice this matter.

VeryPDF
=====================
If the OCR engine does not support handwritten characters, then how come the program still generate output file with many non alphanumeric characters (see attached)? Is there a way to instruct the program to simply output a blank txt file in this situation?
=====================
Our OCR engine does not support handwritten characters, it will create garbage characters from handwritten characters, we haven't a way to simply output a blank txt file in this situation, sorry for this matter.

VeryPDF

 

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
pdf to image converter

PDF to Image converter: Quality of generated TIFF images

I use the PDF to image converter DLL to generate TIFF group 4 fax encoding from PDF documents in one of our applications and I noticed that the image quality is rather poor compared to other PDF to TIFF conversion tools, but the text conversion is correct. The following code extract shows how I use it:

PDFToImageEnableErrorDiffusion( TRUE );
PDFToImageSetCode("XXXXXXXXXXXXX");
PDFToImageSetFileNameSuffix( "%08d" );

m_progress.Create( "Generating images from PDF...", 100, m_list.size() * 2, TRUE, 0 );

g_pageList = this;
PDFToImageSetProgressCBFunc( PageProgressCBFunc );

PDFToImageConverter2( info.m_ctrlfile.c_str(), ( info.m_imageloc + ".tif" ).c_str(), NULL, NULL, 300, 300, 1, COMPRESSION_CCITTFAX4, 100, FALSE, 0, m_list[0].PhyNum(), m_list[m_list.size() - 1].PhyNum() );

Below you can find a small print-screen of the image generated with the pdf2image DLL and an example generated with another application (XXXXX). You can see that the text quality is equal…

Above is generated with pdf2image...

… and this is generated with XXXXX.

But from the same run, you can see that the image quality is far less…

Above is generated with pdf2image…

… and this is from the run with XXXXX.

Is there a way I can improve the quality of the generated TIFF group 4 fax encoded images?
================================
Can you please email to us your sample PDF file for checking?

VeryPDF
================================
In attachment you can find a zip file with 5 documents.
UZBE_11.pdf                - The pdf file for which I’m trying to generate tiff files (1 page per file)
XXXXX_00000310.tif           - The tiff file generated with XXXXX for page 310 in the document
XXXXX_00000311.tif           - The tiff file generated with XXXXX for page 311 in the document
verypdf_00000310.tif     - The tiff file generated with the verypdf library for page 310 in the document
verypdf_00000311.tif     - The tiff file generated with the verypdf library for page 311 in the document
The verypdf_* files are generated using the code further down in this mail.
The XXXXX_* files are generated with XXXXX using the following settings:
*    TIFF image, black&white, 1 bit
*    High quality dithering
*    CCITT group 4 encoding
As you can see, there is a big difference in quality between the verypdf generated images and the images generated with XXXXX. How can I improve the quality of the images generated with verypdf?

PDF to Image Converter halftone quality
PDF to Image Converter halftone quality

================================
Thanks for your sample files, we have checked your sample files, we noticed the difference is caused by halftone technology, the XXXXX and VeryPDF are using different halftone patterns. This is not a problem, we will try to use different halftone patterns to convert color PDF pages to black and white TIFF files in the future releases.

VeryPDF
================================
I have still a couple of questions.

1. Which dithering will be used (halftoning, Floyd-Steinberg,...)? Personally, I would suggest using Floyd-Steinberg since this produces the best quality for photographic images and it has a very good approximation for shades of gray.

2. Is there already timeline for this implementation? I need to know this to keep my customers informed about this improvement. It is also on my list of hot topics. For one of my customers this is a showstopper.
=============================
We are using halftoning algorithm, I know Floyd-Steinberg is work better, but the speed of Floyd-Steinberg is slow, however, we will try to provide an option to choose halftoning or Floyd-Steinberg in the next version of PDF2Image SDK product.

Thanks for your message, our engineers are busy on some critical projects now, so we haven't an approximate date for when will the next version be released at the moment. However, we will let you know after new version is ready, thanks for your patience.

VeryPDF
=============================
Continue to previous email, you can use following command line to convert your PDF file to high quality black and white TIFF file,

nResult = PDFToImageConverter("C:\\test.pdf", "C:\\out.pdf",NULL,NULL,200,200, 24, COMPRESSION_CCITTFAX4,100,TRUE,TRUE,-1,-1);

if you set "bitcount" to 24 and "compression" to COMPRESSION_CCITTFAX4, PDFToImageConverter() function will create high quality BW TIFF file with Floyd-Steinberg technology, please use above source code to try.

VeryPDF
=============================

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
docprint pro

Installation problems pdf-converter-creator

Product's name: PDF Converter and Creator
Product's version number: v2.1
Operating System: WIN7
Order ID: XXXXXXXX

the problem is, after a runing installation, I need to re-install the software.
But now, the printer driver doesn't installed, short before the installation finished, I see a window with an error message (see attachment).
====================
We apologize for any inconvenience this may have caused to you, please by following steps to try again,

1. please uninstall PDF Printer from your system first,

2. reboot the computer,

3. reinstall PDF Printer again,

We hoping above steps will solve the problem for you, please to try.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
docprint pro

VeryPDF / DocPrint Support, how to set output PDF file name

Hello - we instant messaged each other and I'm trying to get docprint to work in Visual FoxPro with my Omni ActiveX control.

I have found that this command works just fine:

docPrint = CreateObject("DocPrintCom.docPrint")
But - I need to send printer output to it.  I'm not sure if I need to change the default printer or not.  Currently, I can print the ActiveX control like this (where oForm is set to the OmniForm activeX control)

thisform.oForm.printform(1, thisform.oForm.pagecount, 1)

The parameters for the method printform are  printform(frompage, topage, copies)

That command will print to the default printer.  How can I get it to print to a PDF using your control and name it something like  C:\TEST.PDF  ?

Thank you.
===============================
You can call SetOutputFileName_docPrintPDFDriver() function from your code to set the output filename to docPrint PDF Driver, please refer to following sample code,

****************************************************
'Default output filename, you can change it to anything that you want
Const szOutputFileName = "C:\docPrint_output.pdf"

'You can select docPrint or docPrint PDF Driver printers at here
Const sPrinterName = "docPrint PDF Driver"

Private Declare Function SetDefaultPrinter Lib "winspool.drv" Alias "SetDefaultPrinterA" (ByVal pszbuffer As String) As Long

Sub SaveString(hKey As Long, strPath As String, strValue As String, strData As String)
Dim Ret
'Create a new key
RegCreateKey hKey, strPath, Ret
'Save a string to the key
RegSetValueEx Ret, strValue, 0, REG_SZ, ByVal strData, LenB(strData)
'close the key
RegCloseKey Ret
End Sub

Sub SaveStringLong(hKey As Long, strPath As String, strValue As String, strData As Long)
Dim Ret
'Create a new key
RegCreateKey hKey, strPath, Ret
'Set the key's value
RegSetValueEx Ret, strValue, 0, REG_DWORD, strData, 4
'close the key
RegCloseKey Ret
End Sub

Private Sub SetOutputFileName_docPrintPDFDriver(ByVal m_ptrOutputFile As String)
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticOutput", 1
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticValue", 2
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutoView", 0

'SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "EmbedNum", 0
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "Unit", 3

SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageSelect", 10
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageSize", 7

SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "Bitcount", 1
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "xResolution", 300
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "yResolution", 300
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageW", 0
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageH", 0

SaveString HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticDirectory", m_ptrOutputFile
End Sub

Private Sub PrintWord_Click()
' You need install MS Word in order to get this function to work

SetDefaultPrinter sPrinterName
SetOutputFileName_docPrintPDFDriver szOutputFileName

On Error GoTo FileOpenDlg_ErrHandler
FileOpenDlg.CancelError = True
FileOpenDlg.Flags = cdlOFNFileMustExist Or cdlOFNPathMustExist Or cdlOFNExplorer Or cdlOFNLongNames
FileOpenDlg.Filter = "MS Word documents (*.doc)|*.doc"
FileOpenDlg.FilterIndex = 1
FileOpenDlg.ShowOpen

On Error Resume Next
Dim wordApp As Object
Dim wDoc As Object

Set wordApp = CreateObject("Word.Application")

Err = 0
Set wDoc = wordApp.Documents.Open(FileOpenDlg.FileName, , 1)

If Err = 0 Then
wordApp.ActivePrinter = sPrinterName
Call wordApp.PrintOut(False)

wDoc.Close
Set wDoc = Nothing
End If

Call wordApp.Quit
Set wordApp = Nothing

FileOpenDlg_ErrHandler:
Exit Sub

End Sub
****************************************************

You can call SetDefaultPrinter() function to set "docPrint PDF Driver" to default printer, call SetOutputFileName_docPrintPDFDriver() function to set output filename, then you can print a document to "docPrint PDF Driver" printer, you will get a PDF file at specify path easily.

VeryPDF
====================================
I saw this example:

Private Sub Command1_Click()
Set docPrint = CreateObject("DocPrintCom.docPrint")
nRet = docPrint.docPrintCOM_Register("XXXXXXXXXXXXXX", "VeryPDF.com Company")
nRet = docPrint.RunCmd("-i https://www.verypdf.com -o C:\output.pdf -* XXXXXXXXXXXXXX -d -O 2 -s ShowHTMLStatusBar=1 -l 10000", 0)
MsgBox "Return value = " & Str(nRet)
End Sub

But how do I print to a file from the Omni control with:  thisform.oForm.printform(1, thisform.oForm.pagecount, 1)

Also, I have the trial version so do I just skip the register command?

Thanks,
====================================
You can call SetOutputFileName_docPrintPDFDriver() function from your code to set the output filename to docPrint PDF Driver, please refer to following sample code,

====================================
'Default output filename, you can change it to anything that you want
Const szOutputFileName = "C:\docPrint_output.pdf"

'You can select docPrint or docPrint PDF Driver printers at here
Const sPrinterName = "docPrint PDF Driver"

Private Declare Function SetDefaultPrinter Lib "winspool.drv" Alias "SetDefaultPrinterA" (ByVal pszbuffer As String) As Long

Private Sub SetOutputFileName_docPrintPDFDriver(ByVal m_ptrOutputFile As String)
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticOutput", 1
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticValue", 2
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutoView", 0

'SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "EmbedNum", 0
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "Unit", 3

SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageSelect", 10
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageSize", 7

SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "Bitcount", 1
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "xResolution", 300
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "yResolution", 300
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageW", 0
SaveStringLong HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "PageH", 0

SaveString HKEY_CURRENT_USER, "Software\verypdf\pdfcamp", "AutomaticDirectory", m_ptrOutputFile
End Sub

Private Sub PrintForm_Click()

SetDefaultPrinter sPrinterName
SetOutputFileName_docPrintPDFDriver szOutputFileName

thisform.oForm.printform(1, thisform.oForm.pagecount, 1)
End Sub
====================================

You can call SetDefaultPrinter() function to set "docPrint PDF Driver" to default printer, call SetOutputFileName_docPrintPDFDriver() function to set output filename, then you can print a document to "docPrint PDF Driver" printer, you will get a PDF file at specify path easily.

In above example, you can call “thisform.oForm.printform(1, thisform.oForm.pagecount, 1)” function to print the document to docPrint PDF Driver and create PDF file easily, please to try.

VeryPDF
====================================

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
postscript to text/pdf/image

PS to PDF Converter is support for PostScript Type 1

Does this software support the conversion of PostScript Type 1, Type 2, and Type 3 to PDF?  Or does this software only convert Type 3?

Thanks,
===================
Our PS to PDF Converter can convert PostScript Type 1, Type 2, and Type 3 to PDF, you can download trial version of PS to PDF Converter from our website to try,

http://www.verydoc.com/ps-to-pdf.html

VeryPDF
===================
PostScript font background

The PostScript page description language, and PostScript fonts, had been developed by Adobe Systems. The language is described within the Adobe `red book'

@Stringpub-AW = "Ad\-di\-son-Wes\-ley"

@Stringpub-AW:adr = "Reading, MA, USA"

@BookAdobe:1990:PLR,
author = "Adobe Systems",
title = "\POSTSCRIPT Language Reference Manual",
publisher = pub-AW,
address = pub-AW:adr,
edition = "Second",
pages = "viii + 764",
year = "1990",
ISBN = "0-201-18127-4",
LCCN = "QA76.73.P67 P67 1990",
bibdate = "Tue Dec 14 22:33:36 1993",
acknowledgement = ack-nhfb,

along with the Form 1 font format is described within the Adobe `black book'

@ManualAdobe:1990:ATFa,
author = "Adobe Systems",
title = "Adobe sort 1 font format",
organization = pub-ADOBE,
address = pub-ADOBE:adr,
pages = "iii + 101",
year = "1990",
bibdate = "Sun Feb 11 07:52:15 MST 1996",
acknowledgement = ack-nhfb,
annote = "Includes index. ``Version 1.0''--verso t.p. ``Part
number: LPS0064''--verso t.p.",
keywords = "PostScript (Computer program language)",

Prior towards the publication with the black book, the font format as well as the needed decryption important had been secret and proprietary to Adobe, but the pressure of competition from the Apple/Microsoft TrueType font development led them to document and publish the format, permitting other typesetter and font vendors to convert their own fonts to Type 1 format, with the result that you'll find now with the order of 10,000 Kind 1 fonts commercially accessible from many vendors.
PostScript font formats

Adobe Sort 1 fonts are stored in two frequent formats, .pfa (PostScript Font ASCII) and .pfb (PostScript Font Binary). These contain descriptions of the character shapes, with each character being generated by a tiny program that calls on other small programs to compute frequent parts with the characters within the font. In both instances, the character descriptions are encrypted.

Prior to such a font could be employed, it ought to be rendered into dots in a bitmap, either by the PostScript interpreter, or by a specialized rendering engine, for example Adobe Kind Manager, which is employed to generate low-resolution screen fonts on Apple Macintosh and on Microsoft Windows systems.

The Form 1 outline files don't include adequate information for typesetting with the font, simply because they've only restricted metric data, and nothing about kerning (position adjustments of certain adjacent characters) or ligatures (replacement of adjacent characters by a single character glyph, those for fi, ffi, fl, and ffl becoming most frequent in English typography).

This missing data is supplied in additional files, known as .afm (Adobe Font Metric) files. These are ASCII files having a well-defined easy-to-parse structure. Some font vendors, such as Adobe, enable them to be freely distributed; other people, for example Bitstream, contemplate them to be restricted by a font license which ought to be purchased.

PostScript printers normally contain from a dozen to a hundred fonts in .pfb (or equivalent) format in ROM, or in some situations, on disk. Nonetheless, none that I'm aware of contain the .afm files, so in order to use the printer-resident fonts together with your typesetting program, you need to get those .afm files from your printer vendor. Several printer vendors now make these files accessible on CD-ROMs and at their World-Wide Web internet sites, together with .ppd (PostScript Printer Description) files that printer and typesetting software can use to acquire additional info about the fonts and characteristics of a certain printer model.

In case you are considering seeing what these files appear like, here are some sample font files inside the formats described above, using the Nimbus Roman No9 L Normal font (visually identical to Times Roman) kindly released for cost-free public use by URW Software program, among the veteran font vendors. For the binary .pfb file, your Web browser will almost certainly ask for a place to store it on disk, instead of displaying it in the browser window.

.afm file
.disasm file
.pfa file
.pfb file

Bitmap, TrueType, and PostScript Fonts

Luggage may include only bitmap and TrueType fonts PostScript printer fonts are separate files. A suitcase could include just bitmap fonts, bitmap fonts along with a corresponding TrueType font, or bitmap fonts that match another PostScript printer font.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)