The Export files dialog box enables you to easily export data for training ML models.
Click the Export button from the management bar.
The dialog box contains three tabs:
The Export now tab allows you to:
- Download the data locally using the Download button.
- Export the data to AI Center using the Export button. The exported folders can be found in AI Center under the export folder (Datasets > dataset_name > export).
If no schema is defined, all export options are disabled.
If a schema is defined, it is mandatory to enter a name for your export, otherwise, the Download and Export buttons are disabled. A valid name can have up to 24 characters and should not contain special characters.
You can choose to export one of the following options:
- Current search results - the labeled documents filtered by a predefined keyword/named batch or by a text query. If no filter is applied, all labeled documents in the current view are exported.
- All labelled - all documents with at least one labeled field, of any kind; more precisely, the documents from the labelled filter.
- Schema - a zip file containing the fields and their configurations which can be imported into a different Data Manager session.
The Backwards-compatible export checkbox enables you to apply legacy export behavior, which is to export each page as a separate document. Try this if the model trained using default export is below expectations. Leave this unchecked to export the documents in their original multi-page form.
The 2021.10 release of Data Manager supports labeling multi-page documents. This is a major change from previous releases where each page was labeled separately. Labeling and exporting multi-page documents assumes each document represents a single logical document. For instance, a six-page document may contain a single six-page invoice but it should not contain three different invoices, two pages each. This is particularly important for evaluation sets.
This requirement is not relevant for Backwards-compatible exports.
To export a dataset, all fields need to be labeled in at least 10 different documents. Otherwise, the export fails with the following messages:
For Classification fields, there is an additional requirement: each option needs to be labeled in at least one document. Otherwise, the export fails with the following message:
When exporting only Evaluation set data, all validations are disabled.
A folder containing the exported dataset coming from Data Manager. This includes:
schema.json: a file containing the fields to be extracted and their types
split.csv: a file containing the split per each document that will be used either for TRAIN or VALIDATE during the Training Pipeline
- images: a folder containing images of all the labeled pages
- latest: a folder containing
.jsonfiles with the labeled data from each page
The Schedule Export feature is documented here.
The Logs tab displays the latest log on export.
In case of a successful export, the log shows the number of processed documents and the export duration.
In case of a successful schema export, the log shows the export duration.
During the file export, you can check the status of the export. This is particularly useful for large exports.
Error messages are also displayed in Logs, for instance:
In case of a succesful auto-retraining, the import logs from the fine-tune folder of the dataset are displayed as well:
Updated 14 days ago