In this article, we will walk through how to use PDFextract, a command-line tool, to extract XML data from ZUGFERD invoices and address some common questions that users may have regarding its usage. Whether you are working with a few invoices or processing hundreds of them across multiple workstations, PDFextract can be a useful tool for automating ZUGFERD data extraction.
https://www.verypdf.com/app/pdf-extract-tool/index.html
https://www.verypdf.com/app/pdf-extract-tool/try-and-buy.html#buy
What is ZUGFERD?
ZUGFERD (Zentraler User Guide des Forums elektronische Rechnung Deutschland) is a standard for electronic invoices in Germany, primarily based on XML. It allows businesses to exchange invoices in a machine-readable format, with XML files embedded inside PDF documents. The XML data is often used for automated accounting and processing.
Step 1: Using PDFextract to Extract XML from a ZUGFERD PDF
The tool PDFextract allows you to extract various contents, including text, fonts, and XML data, from a PDF file. If you're dealing with ZUGFERD invoices, the tool can extract the embedded XML data, but it does not have a built-in option to extract only XML, as it pulls all available content from the document.
How to Run PDFextract from the Command Line
Here’s the command to extract content from a ZUGFERD PDF file:
pdfextract.exe -outfolder C:\Path\to\output_folder C:\Path\to\input_ZUGFERD_invoice.pdf
This will extract all contents, including the XML, text, fonts, and other embedded files from the PDF. However, if you only want to extract the XML data, you will need a custom version of PDFextract that supports this feature.
Custom Solution for XML-only Extraction
If extracting only the XML is critical for your workflow, the VeryPDF team offers to create a custom version of the tool that extracts just the XML data. Please feel free to contact VeryPDF Support Team if you are interested in this custom-built version which would allow you to extract XML data files and without additional files.
Step 2: Running PDFextract in Quiet Mode
To run PDFextract in quiet mode (without displaying the command-line window), you can append > nul
to the command. This will suppress the standard output of the application, allowing it to run silently in the background.
Example Command for Quiet Mode
pdfextract.exe -outfolder D:\Downloads\1 D:\Downloads\EN16931_Einfach.pdf > nul
Step 3: Automating PDF Extraction Without a Console Window
If you prefer to automate the process of extracting XML from a ZUGFERD PDF without showing the console window, you can use C++ code to execute the command. The example code provided below demonstrates how to use the Windows API to run PDFextract in the background and capture its output.
C++ Code Example to Run PDFextract
#include <windows.h>
#include <iostream>
#include <string>
int main() {
// Command to run
const char* command = "pdfextract.exe -outfolder D:\\Downloads\\1 D:\\Downloads\\EN16931_Einfach.pdf";
// Create necessary structures
STARTUPINFO si;
PROCESS_INFORMATION pi;
SECURITY_ATTRIBUTES sa;
// Zero out the structures
ZeroMemory(&si, sizeof(si));
ZeroMemory(&pi, sizeof(pi));
ZeroMemory(&sa, sizeof(sa));
// Set the SECURITY_ATTRIBUTES for pipe
sa.nLength = sizeof(sa);
sa.bInheritHandle = TRUE; // Allow handles to be inherited
// Create a pipe for capturing output
HANDLE hStdOutRead, hStdOutWrite;
if (!CreatePipe(&hStdOutRead, &hStdOutWrite, &sa, 0)) {
std::cerr << "CreatePipe failed with error: " << GetLastError() << std::endl;
return 1;
}
// Set up the STARTUPINFO structure
si.cb = sizeof(si);
si.dwFlags = STARTF_USESTDHANDLES;
si.hStdOutput = hStdOutWrite; // Redirect standard output to pipe
si.hStdError = hStdOutWrite; // Redirect standard error to pipe
// Create the process
if (CreateProcess(
NULL, // Application name (NULL uses command line)
(LPSTR)command, // Command line
NULL, // Process security attributes
NULL, // Thread security attributes
TRUE, // Inherit handles
0, // No creation flags
NULL, // Environment variables
NULL, // Current directory
&si, // Startup information
&pi // Process information
) == 0) {
std::cerr << "CreateProcess failed with error: " << GetLastError() << std::endl;
return 1;
}
// Close the write end of the pipe, we only need to read from it
CloseHandle(hStdOutWrite);
// Read the output from the pipe
DWORD dwRead;
CHAR chBuf[4096];
std::string output;
while (true) {
if (!ReadFile(hStdOutRead, chBuf, sizeof(chBuf) - 1, &dwRead, NULL) || dwRead == 0)
break;
chBuf[dwRead] = '\0'; // Null-terminate the output
output.append(chBuf); // Append the output to the string
}
// Print the captured output
std::cout << "Captured Output:\n" << output << std::endl;
// Wait for the process to finish
WaitForSingleObject(pi.hProcess, INFINITE);
// Clean up
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
CloseHandle(hStdOutRead);
return 0;
}
This code runs PDFextract silently, capturing any output it generates and waiting for the process to complete.
Step 4: Licensing for Multiple Workstations
For organizations needing to process ZUGFERD invoices across multiple workstations, there are two main licensing options available:
- Server License: This option costs USD 299.95 per server. This license allows you to install PDFextract on one server. You will need to buy a separate server license for each server that you intend to run the software on.
- Developer License: This option costs USD 1499.95 per developer and grants more flexibility, especially if you are developing software that integrates with PDFextract.
For processing on 1-5 workstations or servers, the Server License is the most cost-effective choice. However, if you require a Developer License for integrating PDFextract into custom workflows or plan to run PDFextract on more than 5 servers, the Developer License may be more suitable.
Conclusion
PDFextract is a powerful tool for extracting XML data from ZUGFERD PDF invoices, and with the right configuration, you can automate the process without displaying a console window. Although the tool does not natively support extracting only XML data, a custom version can be developed to meet this need. If you're processing invoices on multiple workstations, choosing the appropriate license can help ensure you get the best value for your organization.
For more details on purchasing and licensing, visit VeryPDF's Buy Page.