- Document Understanding Release Notes
- ML Packages and Public Endpoints Release Notes
October 2021
Improvements
Fields with less than 10 documents labeled can be deleted without confirmation.
Bug fixes
- Fixed a bug that affected imported files with the same name.
- Fixed a bug in Google OCR that was throwing an error on documents with empty pages.
- Fixed a bug that wrongly displayed the count of files in the Import data dialog box for Validation Station or Data Manager dataset imports.
Known issues
- Default export (document level) only works with ML Packages version 21.10 or later in AI Center. The version appears in the Change Log column of the ML Packages view in AI Center. For older versions, please use the Backwards-compatible export checkbox on the Export files dialog box.
Multi-page document support
Data Manager now supports multi-page documents. This is a major update impacting every aspect of a Machine Learning flow:
Import: you can upload documents up to 150 pages; to bypass this limit, at the risk of an unstable labeling experience, select the Enable large documents checkbox from the Import data dialog box.
Prelabeling: the document is prelabeled as a whole, producing the same results as running in RPA workflow, but it takes more time in case of larger documents. See also Known Issues below.
Labeling: more convenient labeling due to natural scrolling through document pages.
Export: done by default at document level. Should you want to export the documents at page level, select the Backwards-compatible export checkbox from the Export files dialog box; this is also recommended if the model accuracy produced by the default export is below expectations.
Training: on most scenarios, the models trained with the new document level exported datasets should have the same performance with the page level Backwards-compatible export. However, if the models perform below expectations, we recommend that you retry the training using a Backwards-compatible export as well, in case it might produce better results.
Evaluation: this is the main motivation for the multi-page document support feature, since Evaluations scores will more accurately reflect run time performance. Please note that this assumes that each multi-page document contains a single logical document. For instance, if you import 20 page file packets containing 10 invoices of 2 pages each, then this should not be used as part of Evaluation sets. However, they can be used as part of Training sets but only if you export using the Backwards-compatible option enabled.
Improvements
Export schema support using radio button in Export files dialog box.
Maximum import size increased to 2GB or 2000 pages.
Test set renamed to Evaluation set for consistency with AI Center Evaluation Pipelines.
The Predict button appears by default in the management bar, but Prelabelling settings need to be configured for the button to be enabled.
All restrictions on number of samples per field removed from exports of Evaluation sets.
Added Data Manager session name next to file name in the management bar to identify more easily the session you are working on in case of multiple Data Manager tabs open at the same time.
Chinese language documents supported.
Accessibility improvements.
Localization for Portuguese-Portugal,Russian and Turkish.
Known issues
- Invoices China model does not format Chinese style dates in the standard yyyy-mm-dd format. This will be improved in upcoming releases.
- Data Manager parsing of dates is inconsistent with the parsing made by ML models at run time. If you notice that dates are being parsed incorrectly in Data Manager, it is likely they will be parsed correctly in the model prediction at run time. This is known issue and it will be resolved in an upcoming patch.
- At the moment, using the Predict option with Public Endpoints prelabels only the first 10 pages of a document. This is a known issue and an enhancement will be included in an upcoming patch. Using the Predict option with ML Skills in AI Center, however, does not impose such a limitation.