IDR via the API: PDF-to-XML conversion

Automatically recognise and convert PDF invoices to e-invoices via recognize hooks.

The Intelligent Document Recognizer (IDR) automatically converts PDF invoices into structured e-invoices. Through the PSB API you control the IDR with recognize hooks, which let you configure the recognition process: quality level, priority, extraction features and party details. This article describes how to use the IDR via the API.

How does IDR work via the API?

The IDR operates as a hook in the PSB hook system. You register a recognize hook that listens on a topic (e.g. InvoiceReceived). When a PDF document arrives via that topic, the PSB automatically forwards it to the IDR for recognition. After processing, the IDR publishes the result on a callback topic.

The hook action for IDR has the following format:

recognize://idr?quality={quality}&priority={priority}&features={features}&data={base64-data}
Parameters
quality

The quality level determines how strictly the IDR evaluates the recognition result:

ValueDescriptiondefaultStandard quality level (recommended for most scenarios)hqHigh confidence: only results with high confidence are acceptedlqLower quality allowed: more results, but with lower confidence
priority

The priority determines the processing order in the IDR queue:

ValueDescriptionhighPriority in the queuemediumStandard prioritylowBackground processing when capacity is available
features

Features activate additional extraction capabilities. You can combine multiple features (comma-separated):

FeatureDescriptionibanIBAN extraction from the PDFg-accountG-account split recognitionorder-referenceOrder reference extractionproject-referenceProject reference extractioncontract-referenceContract reference extraction

Example: features=iban,g-account,order-reference

data (party details)

The data field contains base64-encoded JSON with details of the receiving organisation. The IDR uses this information to enrich and validate the recognition result. The JSON contains names, identifiers, email address and addresses of the organisation.

Example of the JSON before base64 encoding:

{
  "names": ["Company Name B.V."],
  "identifiers": [
    { "type": "KVK", "value": "12345678" }
  ],
  "email": "[email protected]",
  "addresses": [
    {
      "street": "Example Street 1",
      "postcode": "1234 AB",
      "city": "Utrecht",
      "country": "NL"
    }
  ]
}
Registering a recognize hook

A complete recognize hook looks like this:

{
  "action": "recognize://idr?quality=default&priority=medium&features=iban,order-reference&data={base64-encoded-party-details}",
  "topics": ["InvoiceReceived"]
}

Register this hook via the Hook endpoint:

POST /api/v1/hook

Or include the hook directly in an Enrollment request.

Callback topics

After processing, the IDR publishes the result on one of the following topics:

TopicDescriptionPurchaseInvoiceRecognizedPDF was successfully recognised and converted to an e-invoicePurchaseInvoiceRecognizedPendingRecognition is awaiting quality control (manual review)PurchaseInvoiceRecognizedRejectedRecognition was rejected after quality controlPurchaseInvoiceRecognizedErrorAn error occurred during recognition

Set up a webhook or mail hook on these topics to receive the result. For example:

{
  "action": "https://api.company.nl/idr/callback",
  "topics": ["PurchaseInvoiceRecognized", "PurchaseInvoiceRecognizedError"]
}
File size limit and supported formats

The maximum file size for IDR uploads is 15 MB. Files larger than 15 MB return an HTTP 413 (Content Too Large) error code.

The IDR supports the following file types:

FormatNotesPDFPrimary format, both scanned and born-digitalJPEG / PNGImages of invoices (photos, scans)TIFFMulti-page scans

Other file types (Word, Excel, HTML) are not supported and result in an IDR422 Invalid PDF Content error. Password-protected or DRM-secured PDFs produce a similar error message.

Processing flow

The complete IDR process via the API follows four steps:

  1. A PDF document arrives via the PSB (upload or receipt via Peppol/SFTP/email)
  2. The recognize hook sends the document to the IDR with the configured parameters
  3. The IDR processes the document and publishes the result on the appropriate callback topic
  4. You receive the recognised document via your webhook or retrieve it via the PurchaseInvoice endpoints

Want to learn more about the PSB hook architecture? Read the article on configuring and securing webhooks.

Open the API reference