The technology behind automatic PDF recognition: OCR, AI and the self-learning system for invoices and purchase orders.
The Intelligent Document Recogniser (IDR) is eConnect's proprietary technology that converts PDF documents, images and receipts into validated electronic documents. The system processes 100% of all submitted documents and continuously improves through use.
In addition to invoices, the IDR can also recognize and convert purchase orders. This functionality uses the same three technology layers as invoice recognition. Order recognition is currently available as a beta and is actively being developed.
Three layers, one result. The IDR combines OCR text recognition, LLM interpretation and proprietary document interpretation into an accurate, self-learning system that correctly recognizes more than 98% of all core fields.
The IDR combines three layers that work together to recognize and process documents:
OCR layer: The first step is pure text recognition via OCR. This layer reads the text on the document, regardless of language (over 200 languages are supported) or format.
LLM layer: A Large Language Model helps interpret the recognized text. The model understands the context of the information and assists in assigning text to the correct invoice elements.
Document interpretation: This is the core of the IDR and entirely proprietary eConnect technology. This layer determines what the document means, which fields are relevant, how the data should be structured and whether everything is correct. The intelligence embedded here, including the self-learning system, validation rules and feedback loop, is eConnect's differentiating capability.
The OCR and LLM are commodity services. The real value lies in the interpretation and validation layer that eConnect builds and maintains.
The diagram below shows how a PDF invoice is processed through the three layers of the IDR into a validated e-invoice.
The IDR automatically recognizes all relevant invoice elements:
Since July 2024, the IDR also offers line recognition: splitting PDF invoice lines into individual transaction lines. Per line, the description, price, quantity, line amount and reference fields are recognized.
By default, 80% of invoice lines are recognized automatically. The remaining 20% falls back to standard processing, where the subtotal per VAT rate is used. The "No Lines No Pay" guarantee applies: you only pay for lines that are actually successfully recognized.
Optionally, you can activate line validation. The eConnect validation team then reviews the lines that were not automatically recognized, giving you a 100% guarantee on invoice recognition at line level.
The IDR improves as more invoices are processed. This works through a feedback loop:
The result is a system that continuously improves. An error that occurs once doesn't persist; it is resolved and the system learns from it.
The IDR scores high on accuracy:
The architecture prevents structural errors by validating each recognized value against another source (document-internal or external). When in doubt, the document is routed to the QC team, providing an additional safety layer.
The pure processing time for automatically processed documents averages 5 to 30 seconds. The total turnaround time including queue wait is approximately 3.5 minutes on average. With the priority option (available with higher subscriptions), documents are given precedence.
The IDR also recognizes country-specific payment references:
+++XXX/XXXX/XXXXX+++ with built-in checksum validationThese recognition features are part of the Professional version of the IDR.
Want to experience how the IDR works? Submit a test invoice and view the result in your Inbox.
Submit your first invoice