Document Understanding
2022.10
false
Banner background image
Document Understanding User Guide
Last updated Apr 19, 2024

The Auto-Fine-tuning Loop (Public Preview)

When training/retraining an ML Model, the first thing to keep in mind is that best results are obtained by accumulating all data into a single large and, ideally, carefully curated dataset. Training on dataset A, and then retraining the resulting model on dataset B will produce much worse results than training on the combined dataset A+B.

The second thing to keep in mind is that not all data is the same. Data labeled in a dedicated tool like Document Manager is in general better quality and will result in a better performing model than data labeled in tools with a different focus - such as Validation Station. Data from Validation Station may be high quality from a business process point of view, but less so from a model training point of view, because an ML Model needs data in a very specific form, which is almost always different from the form needed by business processes. For instance, on a 10-page invoice, the invoice number may appear on each page, but in Validation Station it is sufficient to indicate it on the first page, while in Document Manager you would label it on every page. In this case, 90% of the correct labels are missing from the Validation Station data. For this reason, Validation Station data has a limited utility, as described above.

To effectively train an ML Model, you need a single, well-rounded, high quality, and representative dataset. A cumulative approach, therefore, is to add more data to the input dataset and therefore train the ML Model with a larger dataset each time. One way to do this is to use the Auto-Fine-tuning loop.

To get a better understanding of this feature, let's see where Auto-Fine-tuning fits into the ML Model lifecycle.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.