I have a problem converting text files by pdf2txt.
In order to demonstrate this I attached 2 text files:
1) Converted by pdf2txt: page0144_by_pdf2txt.txt
2) Converted by acrobat reader v.9: page0144_by_adobe_acrobat_reader.txt
3) Screenshot of pdf page 144
The original pdf file is not attached, because it's about 20MB in size.
On pdf page 144 layout is composed in a 2 column table layout filled with text.
Pdf2txt.exe converts it properly into 2 equivalent columns. But the option "-format"
has no influence on the converted text output. Leaving out option "-format" produces the same text output in 2 columns.
Take a look at the text output produced by ADOBE ACROBAT READER V9.
This outputfile containes text/words in order of their appearance.
This is what I expeceted the output to be, when using pdf2txt without option "-format".
Otherwise single words or textphrases can't be found in the correct context of sentences, when organizing a search engine on these text files.
I used this Command line to extract page 144 from pdf:
"C:/Programme/PDF2TXT/pdf2txt.exe" "page0144.pdf" "page0144.txt" -format -unicode -first 144 -last 144
Your quick help is very much appreciated.
We suggest you may download PDF to Text OCR Converter Command Line v2.0 from following URL to try,
you can run following command line to convert your PDF file to text file in reading order,
pdf2txtocr.exe C:\in.pdf C:\out.txt
if you wish keep the layout in output text file, you can add -layout option, for example,
pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
This text file was converted by VeryPDF PDF to Text Converter in multiple columns format,
This text file was converted in reading order format, as you see, you can read this text file or copy/paste it into MS Word or Excel or other applications easily,