How to replace Umlaut characters in PDF file with VeryPDF PDF Text Replacer Command Line (pdftr.exe) software?

Hi!

Years and years ago, I paid for PDF Text Replacer (both GUI and command line). The GUI version replaces only one word at a time, making it useless for me, and the command line version cannot handle German Umlaut characters (äöü ÄÖÜ) and other extended characters, so I found another way.

Now I *need* to replace Text with Umlauts in a PDF. Is there an update that works for this?

https://www.verypdf.com/app/pdf-text-replacer/try-and-buy.html
https://www.verypdf.com/wordpress/201303/verypdf-releases-pdf-text-replacer-command-line-software-35653.html

Thanks,
Customer
-------------------------
Sorry for the typo in the title, instead of "ITF-8" please read "UTF-8".

I am attaching "umlaut.pdf", a PDF I just created directly in Acrobat which contains lots of "ä" and also one each of "ö", "ü", and "ß".

I run this command:

pdftr -$ "XXXXXXXXXXXXXXX" -oldtext "ä" -newtext "!!" umlaut.pdf out.pdf

and I get this output:

[Message] Try to replace text in page contents...
[ReplaceText] ä=>!!
[ContentParserExport] Processing page 1 of 1...
[Message] Output to "c:\temp\vpdf-24244-1715130280-27637-3.txt" file.
[Not Found in] [Subset Font] '.''
[Message] Open "c:\temp\vpdf-24244-1715130280-27637-3.txt" file.
[Warning] Failed to search keywords in PDF pages, we will switch to 'overlay' mode to try again.
[Message] Create "tt.pdf" file successful.

P.S.: What must I do to buy an upgrade to the correct version of pdftr?

Attached is the text file that I used to make the Acrobat file attached to the previous comment. In this UTF-8 encoded file there are only the ä ö ö ß umlauts, as the "ff" and "st" ligatures are inserted when I put the text into a Phxd umDF in Acrobat.

Thanks,
Customer
-------------------------
Thanks for your message, you can run following command line to get all text contents from PDF file first,

pdftr.exe -listtext D:\Downloads\umlaut.pdf

You will see following text contents,

===== Search keyword in page 1 =====
[62.77, 73.84, 74.32, 95.39] 'Umlaut-a has its own glyph: <22><E4><22> (Unicode code point: U+00E4).'
[62.77, 112.24, 74.32, 133.79] 'Unicode also o<00>era the possibility of using two glyphs: <22>a<22> (lowercase <22>a<22>,'
[62.77, 131.44, 74.32, 152.99] 'U+0061) followed by <22><A8><22> (<22>combining dieresis<22>, U+0308), which are'
[62.77, 150.64, 68.99, 172.19] 'separate characters but are visually shown on top of each other. As'
[62.77, 169.84, 71.67, 191.39] 'Po<06>script cannot handle the <22>combining<22> characters in Unicode, we look'
[62.77, 189.04, 70.77, 210.59] 'only at the single-glyph characters.'
[62.77, 227.44, 74.32, 248.99] 'Anyway, <22><E4><22> is the code point U+00E4. In UTF-8, all code points >= U'
[62.77, 246.64, 71.79, 268.19] '+0080 (that is, all characters that are not pure ASCII) are represented as a'
[62.77, 265.84, 68.99, 287.39] 'series of bytes. See https://en.wikipedia.org/wiki/UTF-8#Encoding for an'
[62.77, 285.04, 69.87, 306.59] 'example. My umlaut-a (<22><E4><22>, U+00E4) becomes 0xC3 0xA4 in the UTF-8'
[62.77, 304.24, 69.87, 325.79] 'encoding,'
[62.77, 342.64, 71.67, 364.19] 'So here is our te<06> <06>ring:'
[62.77, 381.04, 74.32, 402.59] 'Gef<E4>hrliche Kr<F6>ten f<FC>hren zu gro<DF>en Problemen.'
[62.77, 419.44, 72.55, 440.99] 'This text contains four characters that are not plain 7-bit ASCII: <E4> <F6> <FC> <DF>.'
[62.77, 457.84, 68.10, 479.39] 'In addition, there are ligatures (single characters to represent <22><00><22> or <22><06><22>,'
[62.77, 477.04, 68.10, 498.59] 'for example) that Acrobat uses automatically in words like <22>o<00>er<22> or'
[62.77, 496.24, 69.30, 517.79] '<22><06>ring<22>.'

From above text contents, we known the "ä" is "<E4>" in PDF contents, so you can run following command line to replace "ä" with "!!" characters,

pdftr.exe -searchandoverlaytext "<E4>=>!!" D:\Downloads\umlaut.pdf D:\Downloads\umlaut-out.pdf

This the log message from above command line,

[Message] Working in "Evaluation" mode1.
[Message] Trial version has some restrictions, please purchase full version to remove the restrictions.
[Message] You have 198 times to evaluate this software, you may purchase a full version from "http://www.verypdf.com" web site.
[Message] Working in "Evaluation" mode.
[Message] Try to replace text by overlay mode...
[ReplaceText] <E4>=>!!
[ContentParserExport] Processing page 1 of 1...
[Found and Overlay] [INFO] We will find '<E4>' and overlay with '!!'...
[Found and Overlay] 'Umlaut-a has its own glyph: <22><E4><22> (Unicode code point: U+00E4).'=>'!!'
[Found and Overlay] 'Anyway, <22><E4><22> is the code point U+00E4. In UTF-8, all code points >= U'=>'!!'
[Found and Overlay] 'example. My umlaut-a (<22><E4><22>, U+00E4) becomes 0xC3 0xA4 in the UTF-8'=>'!!'
[Found and Overlay] 'Gef<E4>hrliche Kr<F6>ten f<FC>hren zu gro<DF>en Problemen.'=>'!!'
[Found and Overlay] 'This text contains four characters that are not plain 7-bit ASCII: <E4> <F6> <FC> <DF>.'=>'!!'
[Message] Create "D:\Downloads\umlaut-out.pdf" file successful.

pdftr.exe worked fine with above command line.

image

So, please run above command lines to try again, please feel free to let us know if you still have same problem.

VeryPDF

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!