# Import documents

> The **Import data** dialog box enables you to easily import new documents to be labeled or revised.

The **Import data** dialog box enables you to easily import new documents to be labeled or revised.

Select the **Import** button ![Import icon](https://dev-assets.cms.uipath.com/assets/images/document-understanding/document-understanding-import-icon-dm_import-84304910-5b8d6edc.webp) from the management bar.

The dialog box contains the following controls:

* **Batch name** text field - it is mandatory to enter a name for your export, otherwise the **Browse or drop files** section is disabled; a valid name can have up to 24 characters and should not contain special characters.
* **Make this an evaluation set** checkbox - if selected, the dataset is used for evaluation purposes.
* **Browse or drop files** section - select **Browse files to upload** to navigate through your directory or simply drag and drop the files inside the frame.
* **Status** section - select **(load previous import log)** to check to check the status of the latest import; when uploading data, in the **Status** section you receive an overview of your files and you are prompted to proceed with the import by selecting **YES** or abort the import by selecting **CANCEL**.

  ![Screenshot of the Import data interface.](https://dev-assets.cms.uipath.com/assets/images/document-understanding/document-understanding-screenshot-of-the-import-data-interface-116559-ad0afa2c-bf53c797.webp)

## Import types

There are 4 types of Import supported in Document Manager:

* Schema import
* Raw documents import (max 2000 pages and 4000 MiB per import)
* Document Manager dataset import (4000 MiB per import)
* Validation Station dataset import (max 2000 pages and 4000 MiB per import)

### Schema import

If you would like to launch a new Document Manager session using the same schema as in an existing session, you can follow these steps:

1. Select the **Export** button from the management bar.
2. In the **Export files** dialog box, check the **Schema** option.
3. Select the **Export** button inside the dialog box. A `.zip` file is exported.
4. Select the **Import** button from the management bar.
5. Upload or drag & drop the `.zip` file directly into the new Document Manager session (do not unzip). In this step, you can also upload a predefined schema.
6. Select **YES** in the **Status** section to proceed with the import. The schema is imported.

Schema import can also be applied for multi-value fields.

:::important
Please be aware that multi valued fields are compatible only with the models that have the version 2022.10 or higher.
:::

### Raw documents import

The types of documents that can be imported for labeling are: `.pdf`, `.tiff`, `.png`, `.jpg`.

`.zip` files are not supported for raw documents import.

OCR settings need to be configured before import.

1. Select the **Import** button ![Import icon](https://dev-assets.cms.uipath.com/assets/images/document-understanding/document-understanding-import-icon-dm_import-84304910-5b8d6edc.webp). The **Import data** dialog box is displayed.
2. Provide a batch name in the **Batch name** field. This enables you to easily filter and find these documents using the **Search** drop-down later on.
   * If you want to use this document batch for training an ML model, leave unselected the **Make this an evaluation set** checkbox.
   * If you want to use this document batch for evaluating an ML model (i.e. measuring its performance), select the **Make this an evaluation set** checkbox. This ensures the data is ignored by the Training Pipelines.
3. Upload or drag & drop a file or set of files into the **Browse or drop files** section.
4. Select **YES**. The file or set of files are imported.

### Document Manager dataset import

To import a dataset that was previously labeled in another Document Manager session, you need to get the `.zip` file which was exported originally, and import it directly into the new Document Manager instance.

If your new Document Manager instance is **completely empty** (no data and no fields defined), then both the documents **with labels** and the schema are imported.

If your new Document Manager instance already has fields defined, then the newly imported dataset needs to have the same fields, or a subset of those fields. Otherwise, the import is rejected.

In case you export a database from an Automation Cloud™ environment, and then import it into an on-premises deployment, you need to follow these steps:

1. Unzip the dataset file.
2. Edit the `scheman.json` file from the archive.
3. Remove all `display_name` properties from the `json` file, then save it.
4. ZIp the dataset back, and import it into the on-premises session.

#### Split large datasets

To import Document Manager datasets larger than 1GB or that have more than 1500 files, we recommend you to use this [script](https://github.com/UiPath/du-customer-scripts/tree/master/datamanager/cloud/split_large_zip) which splits the `.zip` files into multiple `.zip` files that are smaller than 1GB and that have less than 1500 files.

### Validation Station dataset import

As your RPA workflow processes documents **using an existing ML model**, some documents may require human validation using the [Validation Station](https://docs.uipath.com/activities/other/latest/document-understanding/present-validation-station) activity (available on attended bots or in the browser using Orchestrator Action Center).

The validated data generated in **Validation Station** can be exported using [Machine Learning Extractor Trainer](https://docs.uipath.com/activities/other/latest/document-understanding/machine-learning-extractor-trainer) activity and can be used to train ML models.

:::note
For Validation Station dataset import, it is mandatory to have a schema defined.
:::

1. Configure the **Machine Learning Extractor Trainer** to output data into a folder with path `<Trainer/Output/Folder>` (use any empty folder path).
2. Run an RPA workflow including **Validation Station** and **Machine Learning Extractor Trainer**.
3. **Machine Learning Extractor Trainer** creates three subfolders: documents, metadata, and predictions inside of the output folder.
4. Zip the `<Trainer/Output/Folder>` to obtain a `.zip` file, for instance **TrainerOutputFolder.zip**.
5. Import the `.zip` file into **Document Manager** which detects that the import contains data produced by **Machine Learning Extractor Trainer** and imports the data accordingly.

If there are missing fields required by the dataset, an error message is displayed in the import dialog box.

   ![Screenshot of the Import data interface.](https://dev-assets.cms.uipath.com/assets/images/document-understanding/document-understanding-screenshot-of-the-import-data-interface-118101-61ffccf2-8da1f8f1.webp)
