- Overview
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Pipelines
- Data Manager
- OCR Services
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Licensing
- References
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Document Understanding User Guide
Fine-tuning
AI Center includes the capability of fine-tuning ML models using data that has been validated by a human using Validation Station.
As your RPA workflow processes documents using an existing ML model, some documents may require human validation using the [Present Validation Station] (https://docs.uipath.com/activities/docs/present-validation-station) activity (available on attended bots or in the browser using Orchestrator Action Center).
The validated data generated in Validation Station can be exported using Machine Learning Extractor Trainer activity, and can be used to fine-tune ML models in AI Center.
We do not recommend training ML models from scratch (i.e. the DocumentUnderstanding ML Package) using data from Validation Station, but only to fine-tune existing ML models (including out-of-the-box models).
- For the detailed steps involved in fine-tuning an ML model see the Import Documents section of the Document Manager documentation.
-
For more details about how to build a dataset for fine-tuning, go here.
Important:Always add Validation Station data to same dataset and train on ML Package minor version 0
It if often wrongly assumed that the way to use Validation Station data is to retrain the previous model version iteratively, so the current batch is used to train package X.1 to obtain X.2. Then the next batch trains on X.2 to obtain X.3 and so on. This is the wrong way to use the product. Each Validation Station batch needs to be imported into the same Document Manager session as the original manually labeled data making a larger dataset, which must be used to train always on the X.0 ML Package version.