- Überblick
- Erste Schritte
- Erstellen von Modellen
- Verbrauchen von Modellen
- Modelldetails
- Öffentliche Endpunkte
- 1040 – Dokumententyp
- 1040 Formular C – Dokumententyp
- 1040 Formular D – Dokumententyp
- 1040 Formular E – Dokumententyp
- 1040x – Dokumententyp
- 3949a – Dokumententyp
- 4506T – Dokumententyp
- 709 – Dokumententyp
- 941x – Dokumententyp
- 9465 – Dokumententyp
- ACORD125 – Dokumententyp
- ACORD126 – Dokumententyp
- ACORD131 – Dokumententyp
- ACORD140 – Dokumententyp
- ACORD25 – Dokumententyp
- Kontoauszüge – Dokumententyp
- Frachtbrief – Dokumententyp
- Gründungsurkunde – Dokumententyp
- Ursprungszeugnis – Dokumententyp
- Überprüfungen – Dokumententyp
- Children Product Certificate – Dokumententyp
- CMS 1500 – Dokumententyp
- EU-Konformitätserklärung – Dokumententyp
- Finanzberichte – Dokumententyp
- FM1003 – Dokumententyp
- I9 – Dokumententyp
- Ausweise – Dokumententyp
- Rechnungen – Dokumententyp
- Rechnungen2 – Dokumententyp
- Rechnungen Australien – Dokumententyp
- Rechnungen China – Dokumententyp
- Rechnungen Hebräisch – Dokumententyp
- Rechnungen Indien – Dokumententyp
- Rechnungen Japan – Dokumententyp
- Rechnungen Versand – Dokumententyp
- Packlisten – Dokumententyp
- Gehaltsabrechnungen – Dokumententyp
- Reisepässe – Dokumententyp
- Bestellungen – Dokumententyp
- Zahlungsbelege – Dokumententyp
- Belege2 – Dokumententyp
- Zahlungsbelege Japan – Dokumententyp
- Zahlungsavis – Dokumententyp
- UB04 – Dokumententyp
- Angaben zum Abschluss von Hypotheken in den USA – Dokumententyp
- Betriebskostenabrechnungen – Dokumententyp
- Fahrzeugbrief – Dokumententyp
- W2 – Dokumententyp
- W9 – Dokumententyp
- Unterstützte Sprachen
- Insights-Dashboards
- Daten und Sicherheit
- Protokollierung
- Lizenzierung
- Anleitungen zu …
- Fehlersuche und ‑behebung

Document Understanding-Benutzerhandbuch für moderne Projekte
The trainable splitter uses the Helix Classifier model to split and classify multi-document packets. It detects document boundaries automatically and assigns a document type to each detected sub-document.
The model is available only for tenants in Europe and the US.
Use the trainable splitter in the following scenarios:
- Mortgage applications: Split packets containing IDs, application forms, and bank statements.
- Healthcare onboarding: Verify the presence of required documents such as medical certificates, NPI forms, and IDs.
- Insurance claims: Separate claim forms, medical records, and receipts.
- Invoice processing: Handle multi-vendor invoice packets.
- Document cleanup: Remove irrelevant pages so that only relevant content is processed downstream.
When creating a new project, tenants located in Europe and the US may enable the new splitter and classifier model. This trainable model can be trained to split and classify complex documents, enabling you to process document packets.
Follow the instructions on this page to create a Document UnderstandingTM project and enable the new splitter and classifier model.
Voraussetzungen
Before you begin, make sure the following conditions are met:
- Your tenant is located in Europe or the US.
- IntelligentOCR.Activities version 6.27.0 or later is installed.
- Modern Projects is enabled in your Automation Cloud tenant.
- You have sample document packets representative of your production use case.
- Öffnen Sie Document Understanding.
- Wählen Sie Projekt erstellen aus.
- Enter the desired project name.
- Wählen Sie Modern aus, um die moderne Umgebung zu verwenden.
- Konfigurieren Sie bei Bedarf Erweiterte Optionen.
- Switch on the Enable splitting toggle to allow the model to split documents into individual files before classification. You can also enable this option from the Project settings screen.
Important: When the Enable splitting option is turned off, the model runs in classification-only mode:
- The splitting annotation interface is unavailable.
- Documents cannot be split manually.
- For training, upload single-page or multi-page documents of the same type.
- All other functionality remains unchanged.
- Select the OCR method from the OCR method drop-down list.
- Enter the OCR API Key.
Note: This field is populated automatically if you select a UiPath® OCR.
- Enter the OCR URL. For the full list of URLs for UiPath OCRs, see the Public Endpoints page.
- Choose whether to Apply OCR on PDFs. The default is Auto.
- Switch on the Enable splitting toggle to allow the model to split documents into individual files before classification. You can also enable this option from the Project settings screen.
- Wählen Sie Erstellen.
Ergebnis
Your project is created. The Build section becomes available, where you can upload documents for extraction or classification.
Wählen Sie eine der beiden verfügbaren Optionen aus:
- Extract data from documents: Pulls specific fields from your documents, such as invoice numbers, dates, and totals. Use this option when you need to extract fields from documents.
- Classify and split documents: Sorts documents by type and separates multiple documents within a single file. Use this option when you need to split and classify documents.
- Wählen Sie einen Dokumenttyp aus.
- Wählen Sie Hochladen aus oder ziehen Sie Ihre Dateien per Drag-and-Drop in den neuen Dokumenttyp. Warten Sie, bis der Upload abgeschlossen ist.
Certain complex files contain multiple document types. The trainable splitter detects where each sub-document starts and ends, and classifies each section accordingly.
- Select Classify and Split Documents.
- Upload your document packets. Wait for the upload and processing to finish.
- Select a document from the upload section.
- Select Split. The splitting annotation interface opens.
Note: If the project already has a trained model, uploaded documents are pre-annotated using that model. This helps speed up annotation and lets you review prediction results on new documents.
- Select New document type to create a document type for each item in your taxonomy. Choose a predefined document type or create a custom one.
For custom document types, provide the following:
- Name: A clear, descriptive name for the document type.
- Description: One to three sentences explaining the document's purpose and what makes it distinct from similar types.
- Key indicators: Comma-separated fields or terms that uniquely identify this document type.
Descriptions and key indicators directly affect model accuracy. If classification scores are low, refine descriptions before adding more training data.
Example for an Invoice document type:
- Description: A formal payment request issued by a seller to a buyer, listing line items, quantities, and total amounts due.
- Key indicators: invoice number, invoice date, total amount, seller information, buyer information, payment terms
Tips for writing effective descriptions:
- Include terminology specific to the document type.
- If two document types are frequently confused, add distinguishing details to both descriptions.
- Assign pages not needed for downstream processing to the Unknown type. This includes cover pages, blank pages, and separator sheets. The model predicts these pages as Unknown at runtime.
- Select the boundaries between document types to indicate where each document starts and ends.
- Assign each page range to a document type using the drop-down menu.
- Select Confirm when you have finished annotating the document.
Ergebnis
Each sub-document appears under its corresponding document type in the Build section. Each sub-document is pre-annotated with the schema of its assigned document type.
Train on original, unsplit production document packets — not on pre-split individual documents.
The model learns document-bundling patterns from the context around each document type: what appears before it and after it in a real packet. Training on pre-split documents removes this context and reduces splitting accuracy.
Empfohlener Ansatz:
- Upload production packets that contain multiple document types.
- Include packets that represent the range of orderings and document counts seen in production.
- Aim for a balanced dataset across all document types.
Model training starts automatically after both of the following conditions are met:
- At least five sub-documents have been created and annotated.
Note: For example, if you are using a single PDF, it must contain at least five sub-documents. If you are using two PDFs, one must contain at least two sub-documents and the other at least three.
- Ein Dokument wurde bestätigt.
The training status is visible in the upper right corner of the Classification pane.
Training data requirements
| Anforderungen | Details |
|---|---|
| Minimum document types | 1 |
| Minimum total samples | 5 documents across all document types |
| Minimum samples per type | 1 |
| Recommended for reliable results | 50 to 100 packets |
| Maximum document size | 160 MB or 500 pages |
| Train/test split | Automatic: 80% training, 20% test |
Improving training results
When performance is unsatisfactory, use one of these approaches:
- Refine the descriptions and key indicators of underperforming document types.
- Add more training samples for document types with low accuracy.
Whenever a new model is trained, all documents in the project receive predictions from the trained model. This lets you review the performance of the classification model.
The Type column displays the ground truth — the document type as annotated. The Predicted type column shows the type predicted by the model.
By default, only document packets are displayed. To view sub-documents within each packet, select View and check Include sub-documents.
Predictions are also available in the annotation interface by enabling the Show Prediction toggle.
Select the Measure tab to review model performance.
| Metrik | What it measures | What to do if low |
|---|---|---|
| Splitting F1 | Accuracy of document boundary detection, independent of classification | Add training data with more varied boundary examples |
| Classification F1 | Accuracy of document type assignment, independent of boundaries | Add more training pages for underperforming document types |
| Overall F1 | Combined score: boundary and type assignment must both be correct | Identify whether splitting or classification is lower and address that first |
A sub-document is counted as correct only when both the boundary detection and the type assignment are correct.
Via IntelligentOCR activities
Use the Document Understanding Project Classifier activity from the IntelligentOCR package. When splitting is enabled in the project, the activity returns multiple ClassificationResults — one per detected sub-document. Iterate over the results to perform validation or extraction on each sub-document.
Via DocumentUnderstanding activities
Use the Classify Document activity.
Via API
classify endpoint. When splitting is enabled in the project version, the endpoint performs splitting and returns classification results for each identified sub-document.
Exporting a trained model dataset
If a project version contains a trained splitter classifier, two export options are available:
- Document Type Dataset Export: Standard export of annotated data.
- Splitter and Classifier Export: Full project export including the trained model.
Only project versions with a trained splitter classifier appear in the Splitter and Classifier Export drop-down list.
Importing into a new project
The import option is available on the empty classification page. Importing a zip file assigns documents to their document types and triggers training automatically.