Document Understanding
2022.10
false
Banner background image
Document Understanding User Guide
Last updated Apr 19, 2024

Dataset Diagnostics

Training a new model from scratch can sometimes be a very demanding job.

Dataset Diagnostics feature helps you build effective datasets by providing feedback and hints of the steps needed to achieve good accuracy for the trained model.

Located in the Management Bar of the Document Manager, Dataset Diagnostics provides visual and written guidance throughout the whole process of training a new model.



There are three dataset status levels exposed in the Management bar:

  • Red - More labelled training data is required.
  • Orange - More labelled training data is recommended.
  • Green - The needed level of labelled training data is achieved.

If no fields are created in the session, the dataset status level is grey.

More information on each status is available in the Dataset Diagnostics popup menu. Click on the Dataset Diagnostics button to open it.



Dataset Diagnostics Menu

Dataset Tab

Provides information about the documents used for training the model, the total number of imported pages and the total number of labelled pages.

The separation on the color status bar is determined by the recommended number of labelled pages needed for training the model and the actual status of your dataset, including labelled and unlabelled data. Hovering on each color of the status bar provides extra information, in a tooltip, about each status.

The numbers available on the Dataset tab are calculated based on the number of regular fields and item fields from the training session.

  • Red - The dataset requires more labelled data for training the model.



  • Orange - For an increased level of accuracy on the trained model, more labelled data is recommended. You can choose to proceed further with the actual data, but the level of accuracy is not as high as wanted.



  • Green - The labelled data is enough for the dataset to be trained accordingly and to receive accurate information.



Fields Tab

Provides information about each labelled field, more precisely the total number of training pages the label is tagged on, the total number of evaluated documents with the labelled field, and its status for the current training set.



  • Field - The name of the labelled field.
  • Training Pages - The number of pages in the Training+Validation set on which the field is labelled.
  • Evaluation Documents - The number of documents in the Evaluation set on which this field is labelled.
  • Status - The status of each field, marked by three options, Red, Orange, and Green.

Here are all the options available for the Status bar:

  • Red - There is insufficient data about the field, more labels being required.



  • Orange - More pages need to be labelled for the results to be relevant.



  • Green - There are enough labelled pages for the results to be relevant.



Refresh and Close buttons are applicable for both tabs, meaning that if the Refresh button is clicked on the Dataset tab, the Fileds tab is also refreshed.

  • Refresh - Use the refresh option after alterations have been made to the dataset, whether on the number of total pages or the number of labelled pages. The popup menu automatically refreshes every few minutes and it takes place on both tabs, simultaneously. Use this function when a refresh is needed outside the automatic window.
  • Close - Once all the needed information is gathered, close the menu by clicking on the Close button. The entire popup menu is closed, no matter the tab from which the button is clicked.
  • Dataset Diagnostics Menu
  • Dataset Tab
  • Fields Tab

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.