- Overview
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Pipelines
- About Pipelines
- Training Pipelines
- Evaluation pipelines
- Full pipelines
- Fine-tuning
- The Auto-Fine-tuning Loop (Public Preview)
- Document Manager
- OCR Services
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Deep Learning
- Licensing
- References
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Document Understanding User Guide
The Auto-Fine-tuning Loop (Public Preview)
When training/retraining an ML Model, the first thing to keep in mind is that best results are obtained by accumulating all data into a single large and, ideally, carefully curated dataset. Training on dataset A, and then retraining the resulting model on dataset B will produce much worse results than training on the combined dataset A+B.
The second thing to keep in mind is that not all data is the same. Data labeled in a dedicated tool like Document Manager is in general better quality and will result in a better performing model than data labeled in tools with a different focus - such as Validation Station. Data from Validation Station may be high quality from a business process point of view, but less so from a model training point of view, because an ML Model needs data in a very specific form, which is almost always different from the form needed by business processes. For instance, on a 10-page invoice, the invoice number may appear on each page, but in Validation Station it is sufficient to indicate it on the first page, while in Document Manager you would label it on every page. In this case, 90% of the correct labels are missing from the Validation Station data. For this reason, Validation Station data has a limited utility, as described above.
To effectively train an ML Model, you need a single, well-rounded, high quality, and representative dataset. A cumulative approach, therefore, is to add more data to the input dataset and therefore train the ML Model with a larger dataset each time. One way to do this is to use the Auto-Fine-tuning loop.
To get a better understanding of this feature, let's see where Auto-Fine-tuning fits into the ML Model lifecycle.