Language support in PDF to Text OCR Converter SDK

Hi

I have downloaded a trial version of PDF to Text OCR Converter SDK.

I've managed to get your example code up and running in Visual Studio (C#). In the solution I have changed the reference to Interop.pdfcom to "copy local". This is because in the scenario where we consider using your product our application will be installed as clients on the end users machines.

Now I have a couple of questions:

Where do I put language packages in our scenario?

When I try to convert a PDF file, made from a scan, to a text based PDF file, the file looks fine at first glance. But when I try to mark up some of the text, not all that text is marked. If I try to copy the text and paste it some other place all the text I tried to mark is pasted. I have attached a screen dump where I try to mark up a hole line. But as you can see not all the text is marked.

I also attached the converted file.

VeryPDF
-----------------------------------------------------

image
We suggest you may download the latest version of "PDF to Text OCR Converter Command Line" from our website to try, the latest version of "PDF to Text OCR Converter Command Line" does download necessary language data automatically,

http://www.verypdf.com/app/pdf-to-text-ocr-converter/try-and-buy.html#buy
http://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip

for example,

pdf2txtocr.exe -ocr -lang deu D:\downloads\print1.pdf D:\downloads\print1.txt

above command line will download "-deu" (German language) automatically.

btw, "PDF to Text OCR Converter Command Line" is support following OCR Languages,

http://www.verypdf.com/pdf2txt/ocr-language.htm

You can use "-lang" parameter to specify language for OCR engine, e.g.,

pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang eng -ocrmode 0 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang eng -ocrmode 0 C:\in.pdf C:\out.txt
pdf2txtocr.exe -ocr -lang deu -ocrmode 1 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang eng -ocrmode 2 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang eng -ocrmode 3 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang eng -ocrmode 2 -outboxfile C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang fra -ocrmode 1 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang ita -ocrmode 1 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang nld -ocrmode 1 C:\in.pdf C:\out.pdf
pdf2txtocr.exe -ocr -lang spa -ocrmode 1 C:\in.pdf C:\out.pdf

"-lang" option can be choose from one of following OCR languages, if a language package
is not exist in OCR Data Folder, pdf2txtocr.exe will download necessary language package
from VeryPDF site automatically.

bul: Bulgarian language
cat: Catalan language
ces: Czech language
chi_sim: Chinese (Simplified) language
chi_tra: Chinese (Traditional) language
chr: Cherokee language
dan: Danish language
deu: German language
ell: Greek language
eng: English language
fin: Finish language
fra: French language
hun: Hungarian language
ind: Indonesian language
ita: Italian language
jpn: Japanese language
kor: Korean language language
lav: Latvian language
lit: Lithuanian language
nld: Dutch language
nor: Norwegian language
pol: Polish language
por: Portuguese language
ron: Romanian language
rus: Russian language
slk: Slovak language
slv: Slovenian language
spa: Spanish language
srp: Serbian language
swe: Swedish language
tgl: Tagalog language
tur: Turkish language
ukr: Ukranian language
vie: Vietnamese language

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

This entry was posted in PDF to Text OCR Command Line and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!