- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Document Understanding User Guide
Export documents
The Export files dialog box enables you to easily export data for training ML models.
Click the Export button from the management bar.
The dialog box contains three tabs:
The Export now tab allows you to:
- Download to Excel - Download the data locally in an Excel format.
- Download - Download the data locally.
- Export to AI Center - Export the data to AI Center. The exported folders can be found in AI Center under the export folder (Datasets > dataset_name > export).
If no schema is defined, all export options are disabled.
If a schema is defined, it is mandatory to enter a name for your export, otherwise, the Download and Export buttons are disabled. A valid name can have up to 24 characters and should not contain special characters.
You can export or download a schema even if it includes multivalued fields.
You can choose to export one of the following options:
- Current search results - the labeled documents filtered by a predefined keyword/named batch or by a text query. If no filter is applied, all labeled documents in the current view are exported.
- All labelled - all documents with at least one labeled field, of any kind; more precisely, the documents from the labelled filter.
- Schema - a zip file containing the fields and their configurations which can be imported into a different Document Manager session.
- All - exports all documents, no matter if labels are applied or not.
The Backwards-compatible export checkbox enables you to apply legacy export behavior, which is to export each page as a separate document. Try this if the model trained using default export is below expectations. Leave this unchecked to export the documents in their original multi-page form.
To export a dataset, all fields need to be labeled in at least 10 different pages. Otherwise, the export fails with the following messages:
For Classification fields, there is an additional requirement: each option needs to be labeled in at least one document. Otherwise, the export fails with the following message:
When exporting only Evaluation set data, all validations are disabled.
A folder containing the exported dataset coming from Document Manager. This includes:
schema.json
: a file containing the fields to be extracted and their typessplit.csv
: a file containing the split per each document that will be used either for TRAIN or VALIDATE during the Training Pipeline- images: a folder containing images of all the labeled pages
-
latest: a folder containing
.json
files with the labeled data from each page
The Schedule Export feature is documented here.
The Logs tab displays the latest log on export.
In case of a successful export, the log shows the number of processed documents and the export duration.
In case of a successful schema export, the log shows the export duration.
During the file export, you can check the status of the export. This is particularly useful for large exports.
Error messages are also displayed in Logs, for instance:
In case of a successful auto-retraining, the import logs from the fine-tune folder of the dataset are displayed as well: