We are looking for a command line application for converting, is verypdf converter extract Text in Unicode. Since we deal with scientific documents α, β, γ etc.
The current application that we use , do it very efficiently but we are planning to replace it as it can’t maintain the Format.
It will kind of you if you can answer my queries. Also please let me know if demo is available for us to test.
Yes, our PDF2TXT software does support command line and unicode features.
Please run following command line to convert your PDF file to text file to try again, (-breaker parameter will insert page breaker into converted .txt file)
"C:\Program Files\VeryPDF PDF2TXT v3.2\pdf2txt.exe" C:\in.pdf C:\out.txt -unicode -breaker
You can also run following command line to convert PDF file to text file without page breaker symbols,
"C:\Program Files\VeryPDF PDF2TXT v3.2\pdf2txt.exe" C:\in.pdf C:\out.txt -unicode
We hoping "-unicode" parameter will work better for you, please to try.
Thanks for your reply.
Another query is that , does it convert directory containing pdf files something like,
C:\MyPDFFiles\*.pdf D:\MyConvertedTXT\ -unicode
Or we have to make bat of with command for each individual file.
As per your instructions I shall try out and see the output.
Also another query, if we go for ocr command line version will it be able extract the text from pdf having embedded fonts.
You can run following command line to batch convert all PDF files in a folder to text files,
for %F in (C:\MyPDFFiles\*.pdf) do "C:\Program Files\VeryPDF PDF2TXT v3.2\pdf2txt.exe" "%F" "%~nF.pdf" -unicode -breaker
if you wish put above command line into a .bat file, you need use %% to instead of % character,
for %%F in (C:\MyPDFFiles\*.pdf) do "C:\Program Files\VeryPDF PDF2TXT v3.2\pdf2txt.exe" "%%F" "%%~nF.pdf" -unicode -breaker
Yes, PDF to Textversion is able to extract the text from PDF having embedded fonts, that's no problem.
I tried a sample PDF with the demo PDT to TXT , the output was jumbled. Can you please have a look and see why it fails. Also the layout.
You can run following command line to convert your PDF file to text file properly,
pdf2txtocr.exe -ocr -bitcount 1 "D:\temp\EKA_US_EN_48.pdf" "D:\temp\EKA_US_EN_48.pdf.txt"
D:\temp>"E:\pdf2txtocrcmd\pdf2txtocr.exe" -ocr -bitcount 1 "D:\temp\EKA_US_EN_48.pdf" "D:\temp\EKA_US_EN_48.pdf.txt"
You have 297 times to evaluate this product, you may purchase a full version from 'http://www.verypdf.com'.
The test version can only convert PDF files in the first few pages, if you need
to convert more of the page, please purchase the full version from
[OCR] Processing page 1 of 3...
[OCR] Processing page 2 of 3...
[OCR] Processing page 3 of 3...