Can I use HTML Converter Command Line (htmltools.exe) to convert RTF files to PDF files?

Hi Support Team,

We are interested in buying RTF to PDF converter to convert RTF documents to PDF real-time. It's going to be a real busy interface and will send RTF documents back to back to your Command Line Converter to convert those to PDFs. We are planning to deploy it on one of our Server for this purpose. It'll be great if you can ensure us whether (Server License – USD 399) will serve our purpose (with full functionality) or not ? Also, could you please confirm whether its a commercial license and provide list of all the extra features that it provides as compared to its trial version.

Thank You.
Customer
------------------------------------------------


Thanks for your message, we suggest you may download "VeryDOC DOC to Any Converter Command Line" from following web page to try,

http://www.verydoc.com/doc-to-any.html
http://www.verydoc.com/doc2any_cmd.zip

You can use "VeryDOC DOC to Any Converter Command Line" to convert RTF files to PDF files easily.

VeryPDF
------------------------------------------------
Thanks for your response.

We will buy this licence based on understanding from your response that we will be using it for our Organization use (Commercial).

Also, just to reiterate, this will be a busy Interface as we will be converting lot of RTF's into PDF's.

Let us know if you need any more info from our side.

Thanks!!!
Customer
------------------------------------------------
Could you please let us know the difference between "VeryDOC DOC to Any Converter Command Line" Vs. "Converter Command Line (htmltools.exe)". Both Server license costs varies.

Thank You.
Customer
------------------------------------------------
htmltools.exe is a great software to convert HTML files and web pages to PDF files. It is only support simple RTF format, it doesn't support RTF files which created by MS Word.

doc2any.exe is a professional software to convert office files to PDF files, but it can't convert HTML files to PDF files very well.

If you wish convert RTF files to PDF files, doc2any.exe will be your best choice.

You may also look at following web pages for more information,

http://www.verydoc.com/blog/use-htmltools-exe-or-doc2any-exe-to-convert-rtf-files-to-pdf-files-2.html

http://www.verypdf.com/wordpress/201201/how-to-call-doc2any-exe-or-htmltools-exe-from-a-service-20896.html

http://www.verypdf.com/wordpress/201106/rtf-to-pdf-question-by-htmltools-exe-application-879.html

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in HTML Converter (htmltools) | Tagged | Leave a comment

Can VeryPDF PDF to Text OCR SDK read into variables?

Dear VeryPDF team

I am looking for an OCR-component to integrate into our ASP.NET/C# product that is able to extract the text not only from images but also from pdf files with text and/or image content and return it as a string variable. What I've seen so far from your sample products looks like the reading is fine (except for our German ?, ? and ü, but I suppose there is a localization I haven't found yet), but the output always seems to be a file.

Is it possible to keep the output string only in a variable for further use in the program?

And is there some sort of online reference so I can take a look at the component's object model?

Thank you in advance.
Customer
--------------------------------

image

Thanks for your message, we can add this function into "OCR to Any Converter SDK/COM" product to you, this product will allow you to extract text from both scanned image and PDF files, you will get each character and the X, Y, Width, Height attributes for each character, we will return these information to you by API functions, will this solution okay to you?

You can download and try "OCR to Any Converter SDK/COM" product from following web page,

http://www.verypdf.com/app/ocr-to-any-converter-cmd/try-and-buy.html#buysdk

You can also download and try our "Image to PDF OCR Converter SDK for .NET Developers" product, this product is also return text and positions to you by API functions, but not a disk file,

http://www.verypdf.com/app/image-to-pdf-ocr-converter/try-and-buy.html#buy-com
http://www.verypdf.com/dl2.php/image2pdfocrsdk.zip

Please refer to the API functions from following web page,

http://www.verypdf.com/wordpress/201303/how-to-call-image-to-pdf-ocr-sdk-from-c-source-code-35235.html

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in OCR Products | Tagged , | Leave a comment

Page Layout Analysis for Scanned PDF and TIFF files. Generic layout analysis library or tool not based on OCR.

Layout analysis is a processing step of OCR which is important when recognizing complex documents with multiple columns, tables or embedded images. During layout analysis the OCR software examines the structure of the document, distinguishes between images and text and tries to recognize the text flow of the document. Modern OCR software with good layout analysis can replicate the document structure almost identically with the original and save it in a text file (e.g. DOC, HTML or PDF).

Layout analysis, the division of page images into text blocks, lines, and determination of their reading order, this is a major performance limiting step in large scale document digitization projects.

Document analysis or more precisely, document image analysis, is the process that performs the overall interpretation of document images. This process is the answer to the question, "How is everything that is known about language, document formatting, image processing and character recognition combined in order to deal with a particular application?", Thus document analysis is concerned with the global issues involved in recognition of written language in images. It adds to OCR a superstructure that establishes the organization of the document and applies outside knowledge in interpreting it.

The process of determining document structure may be viewed as guided by a model, explicit or implicit, of the class of documents of interest. The model describes the physical appearance and the relationships between the entities that make up the document. OCR is often at the final level of this process, i.e., it provides a final encoding of the symbols contained in a logical entity such as paragraph or table, once the latter has been isolated by other stages. However, it is important to realize that OCR can also participate in determining document layout. For example, as part of the process of extracting a newspaper article the system may have to recognize the character string, continued on page 5, at the bottom of a page image, in order to locate the entire text.

In practice then, a document analysis system performs the basic tasks of image segmentation, layout understanding, symbol recognition and application of contextual rules in an integrated manner. Current work in this area can be summarized under four main classes of applications.

Here is a question from a customer,

I am looking for layout analysis libraries or tools that can be applied on text PDFs to identify main text content versus sidebars, chapter headings, section headings (possibly even fancy ones having decorations/shading and underlines) etc. Are there libraries which can do the same WITHOUT OCR? It is possible to extract text and images from text PDFs and give an input that contains positions of text and images to the tool; using OCR for such files would be rather circuitous.

VeryPDF Layout Analysis SDK is a best Page Layout Analysis SDK or Library to analyze pages without OCR processing, VeryPDF Layout Analysis SDK can be downloaded from following web page,

http://www.verypdf.com/app/papertools/try-and-buy.html
http://www.verypdf.com/dl2.php/papertoolssdk.zip

The following is a screenshot which using VeryPDF Layout Analysis SDK, as you see, VeryPDF Layout Analysis SDK does recognize text and image areas properly,

image

If you encounter any problem with VeryPDF Layout Analysis SDK, please feel free to let us know,

http://support.verypdf.com/open.php

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in VeryPDF SDK & COM | Tagged | Leave a comment

VeryPDF Layout Analysis SDK, Document layout analysis for OCR, Document Layout Analysis and Optical Character Recognition System, Document Structure and Layout Analysis

VeryPDF Layout Analysis SDK can be downloaded from following web page,

http://www.verypdf.com/app/papertools/try-and-buy.html
http://www.verypdf.com/dl2.php/papertoolssdk.zip

VeryPDF Layout Analysis SDK allows to analyze the layout of any document using complex algorithms, able to recognize with high accuracy the different kind of areas in the page.

VeryPDF Layout Analysis SDK identifies the following types of areas:

  •     Text
  •     Inverted text
  •     Noise
  •     Images (pictures or drawings)
  •     Tables (rows, columns and cells)
  •     Horizontal and Vertical Lines

image

After the layout analysis recognition, it's possible to operate a sub-classification defining some rules according to the kind of document to analyze. For example, on a newspaper page, we could recognize a text area as "Title" or "Header" or "Footer".

image

The following is a C# source code example to execute Layout Analysis to a scanned image file,

string appPath = Path.GetDirectoryName(Application.ExecutablePath);
System.Type VeryPDFType = System.Type.GetTypeFromProgID("VeryPDF.PaperToolsCom");
VeryPDF.PaperToolsCom VeryPDFCom = (VeryPDF.PaperToolsCom)System.Activator.CreateInstance(VeryPDFType);

string appFolder = Path.GetDirectoryName(Application.ExecutablePath);
string strFolder = Directory.GetParent(appFolder).FullName;

string strReturn = "";
int nFileIndex = 0;
VeryPDFCom.EnableDebugLog(true);

string strInFile = strFolder + "\\sample\\test_table_ocr.tif";
string strOutFile = strFolder + "\\sample\\output\\_output_" + nFileIndex.ToString() + ".png";
string strCmd = "-$ XXXXXXXXXXXXXXXXXX -layout \"" + strInFile + "\" \"" + strOutFile + "\"";
strReturn = strReturn + VeryPDFCom.PaperTools(strCmd);

If you encounter any problem with VeryPDF Despeckle SDK, please feel free to let us know.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in VeryPDF SDK & COM | Tagged | Leave a comment

VeryPDF Black Border Removal SDK, Black Border removal Library. Remove blackborder after scanning. Ask Your Question 0 Remove black border lines around image after scanning.

VeryPDF Black Border Removal SDK can be downloaded from following web page,

http://www.verypdf.com/app/papertools/try-and-buy.html
http://www.verypdf.com/dl2.php/papertoolssdk.zip

VeryPDF Black Border Removal SDK (API) allows the automatic black border detection and removal in monochrome or gray-scale images. The black border is produced in the images acquired by scanners when paper size is smaller than scanning area or in images acquired from microfilm, microfiches and aperture-cards. Removing the border from the images is a very important pre-processing step that improves the compression rate, reducing file size, and the visualization aspect.

VeryPDF Black Border Removal SDK allows user to remove black borders from monochrome images, gray-scale images and in color images!

image

The following is a C# example source code to remove black borders from scanned image files,

string appPath = Path.GetDirectoryName(Application.ExecutablePath);
System.Type VeryPDFType = System.Type.GetTypeFromProgID("VeryPDF.PaperToolsCom");
VeryPDF.PaperToolsCom VeryPDFCom = (VeryPDF.PaperToolsCom)System.Activator.CreateInstance(VeryPDFType);

string appFolder = Path.GetDirectoryName(Application.ExecutablePath);
string strFolder = Directory.GetParent(appFolder).FullName;

string strReturn = "";
int nFileIndex = 0;
VeryPDFCom.EnableDebugLog(true);

string strInFile = strFolder + "\\sample\\test_table_ocr.tif";
string strOutFile = strFolder + "\\sample\\output\\_output_" + nFileIndex.ToString() + ".png";
string strCmd = "-$ XXXXXXXXXXXXXXXXXX -removeshortline 3 -removelongline 0 \"" + strInFile + "\" \"" + strOutFile + "\"";
strReturn = strReturn + VeryPDFCom.PaperTools(strCmd);

string strInFile = strFolder + "\\sample\\test_table_ocr.tif";
string strOutFile = strFolder + "\\sample\\output\\_output_" + nFileIndex.ToString() + ".png";
string strCmd = "-$ XXXXXXXXXXXXXXXXXX -removeborder \"" + strInFile + "\" \"" + strOutFile + "\"";
strReturn = strReturn + VeryPDFCom.PaperTools(strCmd);

If you encounter any problem with VeryPDF Black Border Removal SDK, please feel free to let us know.

------------------------------------------------------

Questions:

I scanned some books and randomly there are pages with a black border because of skewing in different sides of the page (probably because the page was not cut straight).

What i would like to do is either:

1.Automatically Color the black borders in white
2.Automatically Cut the black borders

I've attached an example showing the problem that I have.

image

Customer

------------------------------------------------------

You can use VeryPDF Black Border Removal SDK to remove black borders from your scanned image file, please look at following modified image, the black borders were removed from this image file,

image

VeryPDF

VN:F [1.9.20_1166]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)
Posted in VeryPDF SDK & COM | Tagged | Leave a comment
Page 1 of 1,30512345...102030...Last »