I recently purchased a copy of PDF2TXT as I needed to convert PDF to text and keep the formatting. I tried it on some documents and it seems to work ok, I’ve just realised that it is failing on most. The format of the PDF has obviously something strange as I tried it with Adobe and it is also failing to save the pdf as text.
I’ve attached a few sample files for you to have a look at.
================================
We have double checked your PDF file just now, your PDF file contains some embedded fonts, the characters which render by embedded fonts can't be copied out, you may open this PDF file in Adobe Reader, press CTRL+A, CTRL+C to copy all text contents, and press CTRL+V to paste them into notepad, you will notice that you can't copy out the readable text contents from this PDF file. Our PDF2TXT can't convert this PDF file to readable text file too, please understand this matter.
Please refer to No.4 item in FAQ list,
https://www.verypdf.com/pdf2txt/support/index.html#4
Additionally, you can download "PDF to Text OCR Converter Command Line" product from our website to try, "PDF to Text OCR Converter Command Line" can convert this type PDF file to text file properly,
https://www.verypdf.com/pdf2txt/pdf-to-text-ocr-converter.htm
e.g.,
pdf2txtocr.exe -ocr D:\temp3\xp5.pdf D:\temp3\xp5.txt
VeryPDF
VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Related Posts
- VeryPDF PDF Extract allows you to extract content from PDF files and save it in a structured data format
- Efficient and Accurate EMF to Text Conversion with VeryPDF Command Line Converter
- Powerful VeryPDF PDF Conversion SDK for Developers: Convert PDF, Word, Excel, PowerPoint, HTML, and More!
- Intelligent PDF Data Extraction with VeryPDF Data Extraction SDK: JSON Output, Table Extraction, and More
- Convert PDF to Text with VeryPDF PDF to Text SDK for Windows, Linux, Mac, iOS, Android platforms
- VeryPDF PDF SDK for Web & Windows & Linux & Mac & iOS & Android as well as PDF Conversion SDK
- VeryPDF Text and Image Extraction Toolkit is a developer product for reliably extracting text, images and metadata from PDF documents
- Full Text Extraction with VeryPDF PDF to Text OCR SDK for .NET
- PDF to Text OCR Converter SDK for .NET, C# OCR SDK, OCR API, OCR Library for .NET Developers Royalty Free
- TextFileWithPosition Word Records Containing Unprintable Characters with PDF Extractor Command Line software
- Convert PDF to text and add page number by command line
- Batch attach OCRed text layer to original PDF file using VeryPDF OCR to Any Converter Command Line application
- VeryPDF Server OCR, Automated high-volume conversion of scanned documents to searchable PDF
- How to convert a PDF file with customized fonts to editable Word document?
- PDF to Text OCR Converter SDK for .NET, C# OCR SDK, OCR API, OCR Library for .NET Developers Royalty Free
Hi,
I am looking to convert the PDF file to XML. How can I do that.
I am also looking to take the info form PDF and scan that info (all the fields like company name, product name etc) my database. Any solution available.
Thanks,
Mohammed Aejaz
Please download following products from our website to try,
http://www.verydoc.com/pdf2xmlsdk.html
http://www.verydoc.com/pdfparsersdk.html
these products are all can convert PDF files to XML files, we hoping these products will useful to you.