Subscribe

UiPath Document Understanding

UiPath Document Understanding

Export documents

The Export files dialog box enables you to easily export data for training ML models.

Click the Export button export from the management bar.

The dialog box contains three tabs:

871

Export now

The Export now tab allows you to:

  • Download to Excel - Download the data locally in an Excel format.
  • Download - Download the data locally.
  • Export to AI Center - Export the data to AI Center. The exported folders can be found in AI Center under the export folder (Datasets > dataset_name > export).

📘

Note:

The Download to Excel function cannot be used if Schema or Backwards-compatible export options are selected.

If no schema is defined, all export options are disabled.

866

If a schema is defined, it is mandatory to enter a name for your export, otherwise, the Download and Export buttons are disabled. A valid name can have up to 24 characters and should not contain special characters.
You can export or download a schema even if it includes multivalued fields.

You can choose to export one of the following options:

  • Current search results - the labeled documents filtered by a predefined keyword/named batch or by a text query. If no filter is applied, all labeled documents in the current view are exported.
  • All labelled - all documents with at least one labeled field, of any kind; more precisely, the documents from the labelled filter.
  • Schema - a zip file containing the fields and their configurations which can be imported into a different Document Manager session.
  • All - exports all documents, no matter if labels are applied or not.

The Backwards-compatible export checkbox enables you to apply legacy export behavior, which is to export each page as a separate document. Try this if the model trained using default export is below expectations. Leave this unchecked to export the documents in their original multi-page form.

🚧

Warning

The 2021.10 release of Document Manager supports labeling multi-page documents. This is a major change from previous releases where each page was labeled separately. Labeling and exporting multi-page documents assumes each document represents a single logical document. For instance, a six-page document may contain a single six-page invoice but it should not contain three different invoices, two pages each. This is particularly important for evaluation sets.

This requirement is not relevant for Backwards-compatible exports.

Export Validation

To export a dataset, all fields need to be labeled in at least 10 different pages. Otherwise, the export fails with the following messages:

463 472

For Classification fields, there is an additional requirement: each option needs to be labeled in at least one document. Otherwise, the export fails with the following message:

589

When exporting only Evaluation set data, all validations are disabled.

Dataset format

A folder containing the exported dataset coming from Document Manager. This includes:

  • schema.json: a file containing the fields to be extracted and their types
  • split.csv: a file containing the split per each document that will be used either for TRAIN or VALIDATE during the Training Pipeline
  • images: a folder containing images of all the labeled pages
  • latest: a folder containing .json files with the labeled data from each page
1416

Schedule Public Preview

The Schedule Export feature is documented here.

Logs

The Logs tab displays the latest log on export.

In case of a successful export, the log shows the number of processed documents and the export duration.

879

In case of a successful schema export, the log shows the export duration.

792

During the file export, you can check the status of the export. This is particularly useful for large exports.

879

Error messages are also displayed in Logs, for instance:

879

In case of a succesful auto-retraining, the import logs from the fine-tune folder of the dataset are displayed as well:

879

Updated 5 days ago


Export documents


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.