Document Understanding
latest
false
Banner background image
PREVIEW
Document Understanding User Guide for Modern Experience
Last updated Apr 26, 2024

Build

This section provides the following experiences:
  • Upload documents and classify them automatically.
  • Upload documents straight into document types.
  • Manage files from the project (add, remove files and add, change tags).
  • Annotate documents.
  • Add or remove fields.
  • Add or remove business rules.
  • Have a guided experience on training classification and extraction models using the recommendations.

Upload Documents

After successfully creating your project, you can upload your documents from the Build section.
  1. Open your project.
  2. Drag and drop the first batch of your sample documents in the Upload sample documents section.
    Tip: You can use the suggestions from the Recommendations sections. These suggestions will guide you through the process.
    The uploaded files are automatically processed (uploaded, digitized, classified, annotated).
  3. Upload the next batch of your sample documents by clicking Upload.
    Tip: You can check the Recommendations sections for suggestions on what more is useful to upload. For example, if there are too few utility_bills documents, a suggestion is displayed:

    utility_bills has too few samples. Add at least 150 documents for an optimal data set size.

    There are two types of recommendations, one for classification and one for extraction models.


  4. Review the uploaded documents.
    1. Expand the needed section (for example, invoices, receipts).
    2. Click on a document name.
    3. Check if the Document type filled in automatically is correct. You can change the document type using the drop-down list.

Annotate Documents

After successfully creating your project and uploading your documents, you can annotate them from the Build section.

You can start annotating documents from a document type section by clicking Annotate.

You can also annotate a specific document by clicking the three-dot icon next to the document name and selecting Annotate.
Tip: Uploaded documents are automatically processed (uploaded, digitized, classified, including annotated). For a performant model, follow the suggestions from the Recommendations section. These recommendations help you improve the overall performance of your model.
Note: Custom document types are not automatically annotated. You need to manually pre-label documents part of a custom document type.


Validating Pre-Labeled Documents

Uploaded documents part of a known document type are automatically pre-labeled. You can validate this from the Annotate view.

During validation, you can have the following situations:
  • Pre-labeling is correct and should to be validated.
  • Pre-labeling is missing and should be marked as such.
  • Pre-labeling is not correct and should be edited.

If all fields from a document are labeled correctly, click Confirm to validate all the fields at once.

Once a document is validated, it will be marked with a green shield in the document list.



Correct pre-labeling

If the field is correct, mark the checkbox next to the field. In our example, the first field is Vendor Name and is marked correctly. To validate, click the checkbox next to the field.


Missing pre-labeling

If there is no pre-label related to that field, click the three-dot icon next to the field name and select Mark as missing
Important: You can also mark wrong fields as missing. For example, if you do not have a Vendor Address in your document but during processing a different field was pre-labeled as Vendor Address, you can just mark it as missing during validation.


Incorrect pre-labeling

If the pre-labeling is not correct, you can correct the field manually.

You can manually label the field by creating a new field. To do this, you can select the needed information by dragging and dropping a selection box straight on the document and selecting the desired Field Name from the drop-down list.

Note: All fields that are annotated manually get validated automatically.

Document Type Settings

You can change the document type settings from the Annotate view.

To do so, click on the three-dot icon on the right side of the document type name and select Settings.



You can change the following settings:
  • Base model: Dataset size estimations used in the Recommended Actions depend on the base model used to train. Using the most similar base model to your Document Type will reduce the amount of annotation work required.
  • Number of layouts: Dataset size estimations used in the Recommended Actions depend on the number of layouts in the dataset. More layouts generally require annotating more data.
  • Number of languages: Dataset size estimation used in the Recommended Actions depend on the number of languages in the dataset. More languages generally require annotating more data.

Search Documents

You can search uploaded documents by document name. To do so, use the search bar from the left corner of the Build section. For a more efficient search, use the Filter feature to filter by:
  • Document type: choose the desired document type from the drop-down list.
  • Upload date: choose a date interval when the document was uploaded.
  • Status: choose the status of the document


Project and Model Score

You can check your project's overall score from the top right corner. This score factors in the classifier and extractor scores for all document types. Click Project score to display the Measure section. You can check more in-depth performance measurements in that section.

You can check the score for each document type separately from the Document type section. This score factors in the overall performance of the model, as well as the size and quality of the dataset.

Note: You need to upload at least 10 documents to get a project score. For a document type score, you need at least 10 documents under the same document type.


You can check the model rating of your models if you select the score tag. The model rating is a functionality intended to help you visualize the performance of a classification model. It is expressed as a model score from 0 to 100 as follows:
  • Poor (0-49)
  • Average (50-69)
  • Good (70-89)
  • Excellent (90-100)

Select Detailed model scores to go to the Measure section for detailed information.



  • Upload Documents
  • Annotate Documents
  • Validating Pre-Labeled Documents
  • Document Type Settings
  • Search Documents
  • Project and Model Score

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.