UiPath AI Fabric

UiPath AI Fabric

Using Data Manager

Importing

There are 4 types of Import supported in Data Manager:

  • Schema Import
  • Raw documents import
  • Data Manager dataset import
  • Machine Learning Extractor Trainer dataset import (PREVIEW feature)

Schema import

If you would like to launch a new instance of Data Manager using the same schema as an existing instance, you can follow these steps:

  1. Enter a random string in the filter of the existing instance, such that no documents remain in the view
  2. Click on Export button. A zip file will be exported.
  3. Import zip file directly into the new instance of Data Manager (do not unzip). The schema will be imported.

You may also use one of the predefined schemas provided in the Configuring Data Manager section of this documentation.

Raw documents import

The types of documents that can be imported for labeling are: .pdf, .tiff, .png, .jpg. The steps are:

  1. Click Import. The Import Data window is displayed.
  2. Provide a batch name in the Batch Name field. This enables you to easily filter and find these documents using the Filter drop-down later on.
  3. If you want to use this document batch for training an ML model, leave the Make this a test set checkbox unselected.
  4. If you want to use this document batch for evaluating an ML model (i.e. measuring its performance), select the Make this a test set checkbox. This ensures the data is ignored by the training pipelines.
  5. Upload or drag & drop a file or set of files into the Browse or drop files section.
    Any type of file is accepted. The application inspects them and indicates how many of them can be imported. .zip files are also accepted. The application unzips the archive and goes through folders recursively to find all files inside.
    Importing a dataset zip file exported from another Data Manager instance will import the documents with the labels. This works only if the dataset schema is the same or is a subset of the pre-existing schema in the Data Manager.

Data Manager dataset import

To import a dataset that was labelled previously on another instance of Data Manager, you need to get the zip file which was exported originally, and import it directly into the new Data Manager instance. If your new Data Manager instance is completely empty (no data and no fields defined) then both the data and the schema will be imported. If your new Data Manager instance already has fields defined, then the newly imported dataset needs to have the same fields, or a subset of those fields. Otherwise the import will be rejected.

Validation Station dataset import (PREVIEW feature)

As your RPA workflow processes documents using an existing ML model, some documents may require human validation using the Validation Station activity (available on attended bots or in the browser using Orchestrator Action Center). The validated data generated in Validation Station can be exported using Machine Learning Extractor Trainer activity, and can be used to train ML models using the feature described here. The steps involved are:

  1. Configure ML Extractor Trainer to output data into a folder with path <Trainer/Output/Folder> (use any empty folder path).
  2. Run RPA workflow including Validation Station and ML Extractor Trainer.
  3. ML Extractor Trainer will create 3 subfolders named: documents, metadata and predictions inside of the output folder.
  4. Zip the <Trainer/Output/Folder> to obtain a zip file such as TrainerOutputFolder.zip
  5. Import zip file into Data Manager. The Data Manager will detect that the import contains data produced by ML Extractor Trainer and will import the data accordingly.
  6. Export data as usual, and upload to AI Fabric.
  7. Launch Training pipeline or Full pipeline and make sure to select the ML Package and version which you would like to fine tune.

Adding and Configuring Fields

Fields cannot be deleted or renamed, so please think carefully before adding new fields. If, however, there are fields which you later decide you do not want to use for training an ML model, you can always hide them using the Hidden checkbox in the Edit Field window.
Click here for details about fields, their meaning and when to use them.

Column Fields

A line item Description or Unit Price on an invoice document would be examples of Column fields.

  1. Click in the table section at the top of the page to add a new Column field. The Create Column Field window is displayed.
  2. In the Enter Unique Field Name field, fill in a unique name for the field. The field does not accept uppercase letters.
  3. Click Create. The Edit Field window is displayed.
  4. From the Content Type drop-down, select the content type.
  5. From the Scoring drop-down, select the measure used to determine accuracy when running evaluations of model predictions.
  6. Click the Hotkey field and press a key on your keyboard to automatically populate it.
  7. Fill in the hex code of the desired field color on the Color field.
  8. Select the Multi line checkbox if the field to be checked against might span across multiple text lines, such as addresses or descriptions. If this option is not selected, only the first line is returned.
  9. Select the Split items checkbox if you want this field to be used as a delimiter between line items or rows in a table. Any line on which this field appears is considered to be a new line item or row in the table. Most commonly this is used on Line Amount fields on Invoice line items.
  10. Select the Hidden checkbox if you do not want this field to be part of exported datasets.
  11. Click Save to save your settings.

Regular Fields

These are fields which appear only once on a given document. A line item Invoice Number or Total Amount on an invoice document would be examples of Column fields.

  1. Click on the right pane in the Regular Fields section. The Create Regular Field window is displayed.
  2. Fill in a unique name for the field in the Enter Unique Field Name field. The field does not accept uppercase letters.
  3. Click Create. The Edit Field window is displayed.
  4. Select the content type from the Content Type drop-down.
  5. Select the post processing mechanism in case the model predicts more than one instance of a field on a given page from the Post processing drop-down.
  6. Click the Hotkey field and press a key on your keyboard to automatically populate it.
  7. In the Color field, fill in the hex code of the desired field color o
  8. From the Multi page drop-down, select the data retrieval strategy. This option defines how in case that fields appear on a few different pages of a multi-page document. This option defines how the model decides which one to return.
  9. From the Scoring drop-down, select the measure used to determine accuracy when running evaluations of model predictions.
  10. Select the Multi line checkbox if the field to be checked against might span across multiple text lines, such as addresses or descriptions. If this option is not selected, only the first line is returned.
  11. Select the Hidden checkbox if you do not want this field to be part of exported datasets.
  12. Click Save to save your settings.

Classification Fields

Data points which refer to a document as a whole. For instance, the Expense Type of a receipt (Food, Hotel, Airline, Transportation) or the Currency of an invoice (USD, EUR, JPY) would be examples of Classification fields.

  1. Click on the right pane in the Classification Fields section. The Create Classification Field window is displayed.
  2. Fill in a unique name for the field in the Enter Unique Field Name field. The field does not accept uppercase letters.
  3. Click Create. The Edit Field window is displayed.
  4. In the text area, fill in the list of classes and type the names as a comma separated list.
  5. Click Save to save your settings.

Labeling Data


See below the main actions you need to perform when labeling documents. A given field may be labeled in multiple places on the same page.

  1. Label field
    • Select words by dragging mouse (rubber banding) or by clicking on them, holding down Shift to select multiple words.
    • Tap the shortcut key to label the field
  2. Remove label
    • Select words, then tap the Delete or Backspace key on your keyboard.
  3. Group table row
    • After you have labeled some Column fields, and only if some rows span multiple lines of text, then you may group them together by using the “/” key to indicate that they are part of the same table row. A green box will appear around the group.
  4. Ungroup table row
    • Select the group and tap “/” again
  5. Make correction to OCR
    • Right-click on the word and edit the text in the tooltip that appears
  6. Make correction to labeled value
    • Click on the text in the sidebar or the top bar and edit the content. A small lock will appear to indicate the field has been manually edited.
  7. Reset labeled value to auto-extracted value
    • Click on the lock, and the field will revert to its auto-extracted value.

Exporting Labeled Documents

A labelled image is an image with at least one labelled field, of any kind. You can see how many images are visible at the top-left of the page. The Export button enables you to easily export data for training ML models.

Exporting a label document takes into consideration the active filter.

  • If you have no filter applied, all labeled images visible in the current view except for testset images are exported.
  • If you have applied a filter, all labeled images visible in the view, including testset images are exported.
  • If you want to export all testset images, select test-set option from the filters drop-down.

🚧

Export requirements

Exporting a dataset requires that the following conditions be satisfied:

  • each Regular or Column field is labelled on at least 10 different images
  • each class of any one Classification field appears at least once

Uploading dataset to AI Fabric

Once dataset is exported, it is exported as a zipped file and a log file. Before you can use it in AI Fabric you need to unzip the file. The extracted folder can then be uploaded as a new dataset or as a subfolder on an existing dataset as described here.

Updated 6 days ago


Using Data Manager


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.