UiPath Document Understanding

UiPath Document Understanding

Import Documents

AI Center does not support filenames containing special characters, so we strongly recommend that before importing documents to Data Manager, you make sure that their names contain only Latin characters, numbers, dash (-) and underscore (_).

There are 4 types of Import supported in Data Manager:

  • Schema Import
  • Raw documents import
  • Data Manager dataset import
  • Machine Learning Extractor Trainer dataset import (PREVIEW feature)

Schema import

If you would like to launch a new instance of Data Manager using the same schema as an existing instance, you can follow these steps:

  1. Enter a random string in the filter of the existing instance, such that no documents remain in the view
  2. Click on Export button. A zip file will be exported.
  3. Import zip file directly into the new instance of Data Manager (do not unzip). The schema will be imported.

You may also use one of the predefined schemas provided in the Configuring Data Manager section of this documentation.

Raw documents import

The types of documents that can be imported for labeling are: .pdf, .tiff, .png, .jpg. The steps are:

  1. Click Import. The Import Data window is displayed.
  2. Provide a batch name in the Batch Name field. This enables you to easily filter and find these documents using the Filter drop-down later on.
  3. If you want to use this document batch for training an ML model, leave the Make this a test set checkbox unselected.
  4. If you want to use this document batch for evaluating an ML model (i.e. measuring its performance), select the Make this a test set checkbox. This ensures the data is ignored by the training pipelines.
  5. Upload or drag & drop a file or set of files into the Browse or drop files section.
    Any type of file is accepted. The application inspects them and indicates how many of them can be imported. .zip files are also accepted. The application unzips the archive and goes through folders recursively to find all files inside.
    Importing a dataset zip file exported from another Data Manager instance will import the documents with the labels. This works only if the dataset schema is the same or is a subset of the pre-existing schema in the Data Manager.

Data Manager dataset import

To import a dataset that was labeled previously on another instance of Data Manager, you need to get the zip file which was exported originally, and import it directly into the new Data Manager instance. If your new Data Manager instance is completely empty (no data and no fields defined) then both the data and the schema will be imported. If your new Data Manager instance already has fields defined, then the newly imported dataset needs to have the same fields, or a subset of those fields. Otherwise the import will be rejected.

Validation Station dataset import (Preview feature)

As your RPA workflow processes documents using an existing ML model, some documents may require human validation using the Validation Station activity (available on attended bots or in the browser using Orchestrator Action Center).

The validated data generated in Validation Station can be exported using Machine Learning Extractor Trainer activity and can be used to train ML models using the feature described here.

The steps involved are:

  1. Configure ML Extractor Trainer to output data into a folder with path <Trainer/Output/Folder> (use any empty folder path).
  2. Run RPA workflow including Validation Station and ML Extractor Trainer.
  3. ML Extractor Trainer will create 3 subfolders named: documents, metadata, and predictions inside of the output folder.
  4. Zip the <Trainer/Output/Folder> to obtain a zip file such as
  5. Import zip file into Data Manager. The Data Manager will detect that the import contains data produced by ML Extractor Trainer and will import the data accordingly.
  6. Export data as usual, and upload to AI Center.
  7. Launch Training pipeline or Full pipeline and make sure to select the ML Package and version which you would like to fine-tune.

Updated about a year ago

Import Documents

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.