Convert PDF file to text file in reading order

Hi,

I have a problem converting text files by pdf2txt.

In order to demonstrate this I attached 2 text files:

1) Converted by pdf2txt:  page0144_by_pdf2txt.txt
2) Converted by acrobat reader v.9: page0144_by_adobe_acrobat_reader.txt
3) Screenshot of pdf page 144

The original pdf file is not attached, because it's about 20MB in size.

Problem:

On pdf page 144 layout is composed in a 2 column table layout filled with text.
Pdf2txt.exe converts it properly into 2 equivalent columns. But the option "-format"
has no influence on the converted text output. Leaving out option "-format" produces the same text output in 2 columns.

Take a look at the text output produced by ADOBE ACROBAT READER V9.
This outputfile containes text/words in order of their appearance.

This is what I expeceted the output to be, when using pdf2txt without option "-format".
Otherwise single words or textphrases can't be found in the correct context of sentences, when organizing a search engine on these text files.
 
I used this Command line to extract page 144 from pdf:

"C:/Programme/PDF2TXT/pdf2txt.exe" "page0144.pdf" "page0144.txt" -format -unicode -first 144 -last 144

Your quick help is very much appreciated.
====================================
We suggest you may download PDF to Text OCR Converter Command Line v2.0 from following URL to try,

https://www.verypdf.com/pdf2txt/pdf2txtocrcmd.zip

you can run following command line to convert your PDF file to text file in reading order,

pdf2txtocr.exe C:\in.pdf C:\out.txt

if you wish keep the layout in output text file, you can add -layout option, for example,

pdf2txtocr.exe -layout C:\in.pdf C:\out.txt

This text file was converted by VeryPDF PDF to Text Converter in multiple columns format,

image

This text file was converted in reading order format, as you see, you can read this text file or copy/paste it into MS Word or Excel or other applications easily,

image

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!