- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Document Understanding User Guide
Import documents
The Import data dialog box enables you to easily import new documents to be labeled or revised.
Click the Import button from the management bar.
The dialog box contains the following controls:
- Batch name text field - it is mandatory to enter a name for your export, otherwise the Browse or drop files section is disabled; a valid name can have up to 24 characters and should not contain special characters.
- Make this an evaluation set checkbox - if selected, the dataset is used for evaluation purposes.
- Browse or drop files section - click the Browse files to upload to navigate through your directory or simply drag and drop the files inside the frame.
-
Status section - click (load previous import log) to see to check the status of the latest import; when uploading data, in the Status section you receive an overview of your files and you are prompted to proceed with the import by clicking YES or abort the import by clicking CANCEL.
There are 4 types of Import supported in Document Manager:
- Schema import
- Raw documents import (max 2000 pages and 4000 MiB per import)
- Document Manager dataset import (4000 MiB per import)
- Validation Station dataset import (max 2000 pages and 4000 MiB per import)
If you would like to launch a new Document Manager session using the same schema as in an existing session, you can follow these steps:
- Click the Export button from the management bar.
- In the Export files dialog box, check the Schema option.
- Click the Export button inside the dialog box. A
.zip
file is exported. - Click the Import button from the management bar.
- Upload or drag & drop the
.zip
file directly into the new Document Manager session (do not unzip). In this step, you can also upload a predefined schema. - Click YES in the Status section to proceed with the import. The schema is imported.
Schema import can also be applied for multi-value fields.
.pdf
, .tiff
, .png
, .jpg
.
.zip
files are not supported for raw documents import.
OCR settings need to be configured before import.
Follow the steps below:
.zip
file which was exported originally, and import it directly into the new Document Manager instance.
If your new Document Manager instance is completely empty (no data and no fields defined), then both the documents with labels and the schema are imported.
If your new Document Manager instance already has fields defined, then the newly imported dataset needs to have the same fields, or a subset of those fields. Otherwise, the import is rejected.
- Unzip the dataset file.
- Edit the
scheman.json
file from the archive. - Remove all
display_name
properties from thejson
file, then save it. - ZIp the dataset back, and import it into the on-premises session.
Split large datasets
.zip
files into multiple .zip
files that are smaller than 1GB and that have less than 1500 files.
As your RPA workflow processes documents using an existing ML model, some documents may require human validation using the Validation Station activity (available on attended bots or in the browser using Orchestrator Action Center).
The validated data generated in Validation Station can be exported using Machine Learning Extractor Trainer activity and can be used to train ML models using the feature described below.
Follow the steps below:
- Configure the Machine Learning Extractor Trainer to output data into a folder with path
<Trainer/Output/Folder>
(use any empty folder path). - Run an RPA workflow including Validation Station and Machine Learning Extractor Trainer.
- Machine Learning Extractor Trainer creates three subfolders: documents, metadata, and predictions inside of the output folder.
- Zip the
<Trainer/Output/Folder>
to obtain a.zip
file, for instance TrainerOutputFolder.zip. - Import the
.zip
file into Document Manager which detects that the import contains data produced by Machine Learning Extractor Trainer and imports the data accordingly.
If there are missing fields required by the dataset, an error message is displayed in the import dialog box.