- Release Notes
October 2021
Improvements
Fields with less than 10 documents labeled can be deleted without confirmation.
Bug Fixes
- Fixed a bug that affected imported files with the same name.
- Fixed a bug in Google OCR that was throwing an error on documents with empty pages.
- Fixed a bug that wrongly displayed the count of files in the Import data dialog box for Validation Station or Data Manager dataset imports.
Known Issues
- Default export (document level) only works with ML Packages version 21.10 or later in AI Center. The version appears in the Change Log column of the ML Packages view in AI Center. For older versions, please use the Backwards-compatible export checkbox on the Export files dialog box.
Multi-page document support
Data Manager now supports multi-page documents. This is a major update impacting every aspect of a Machine Learning flow:
Import: you can upload documents up to 150 pages; to bypass this limit, at the risk of an unstable labeling experience, select the Enable large documents checkbox from the Import data dialog box.
Prelabeling: the document is prelabeled as a whole, producing the same results as running in RPA workflow, but it takes more time in case of larger documents. See also Known Issues below.
Labeling: more convenient labeling due to natural scrolling through document pages.
Export: done by default at document level. Should you want to export the documents at page level, select the Backwards-compatible export checkbox from the Export files dialog box; this is also recommended if the model accuracy produced by the default export is below expectations.
Training: on most scenarios, the models trained with the new document level exported datasets should have the same performance with the page level Backwards-compatible export. However, if the models perform below expectations, we recommend that you retry the training using a Backwards-compatible export as well, in case it might produce better results.
Evaluation: this is the main motivation for the multi-page document support feature, since Evaluations scores will more accurately reflect run time performance. Please note that this assumes that each multi-page document contains a single logical document. For instance, if you import 20 page file packets containing 10 invoices of 2 pages each, then this should not be used as part of Evaluation sets. However, they can be used as part of Training sets but only if you export using the Backwards-compatible option enabled.
Improvements
Export schema support using radio button in Export files dialog box.
Maximum import size increased to 2GB or 2000 pages.
Test set renamed to Evaluation set for consistency with AI Center Evaluation Pipelines.
The Predict button appears by default in the management bar, but Prelabelling settings need to be configured for the button to be enabled.
All restrictions on number of samples per field removed from exports of Evaluation sets.
Added Data Manager session name next to file name in the management bar to identify more easily the session you are working on in case of multiple Data Manager tabs open at the same time.
Chinese language documents supported.
Accessibility improvements.
Localization for Portuguese-Portugal,Russian and Turkish.
Known Issues
- Invoices China model does not format Chinese style dates in the standard yyyy-mm-dd format. This will be improved in upcoming releases.
- Data Manager parsing of dates is inconsistent with the parsing made by ML models at run time. If you notice that dates are being parsed incorrectly in Data Manager, it is likely they will be parsed correctly in the model prediction at run time. This is known issue and it will be resolved in an upcoming patch.
- At the moment, using the Predict option with Public Endpoints prelabels only the first 10 pages of a document. This is a known issue and an enhancement will be included in an upcoming patch. Using the Predict option with ML Skills in AI Center, however, does not impose such a limitation.
Released in AI Center Cloud & Endpoints: 22 October 2021, package version: 21.10.9
What's New
The PurchaseOrders ML Package is now Generally Available and it is ready to be used in your production scenarios.
InvoicesChina, DeliveryNotes, RemittanceAdvices, W2, and W9 ML Packages are now in Public Preview. We recommend you check out these packages and start using them for the type of documents you need to process.
Improvements
Implemented document level evaluation. This is representative for the runtime performance in your RPA workflow.
Evaluation can also be done on datasets with fewer fields than the ML Package being evaluated. This facilitates evaluation on out-of-the-box pre-trained ML Packages.
eval.redo_ocr
needs to be set to true in the AI Center Evaluation Pipeline.
Training on CPU now uses a smaller model to obtain a 5x-7x speedup. However, you should expect a lower accuracy by 0-5% on CPU.
Evaluation.xlsx
files produced by Evaluation Pipelines.
The UtilityBills ML Package has been substantially improved.
Address parsing improvement for addresses which skip 1-2 lines of text.
Improvement on extracting negative values, very large values (11 digits or more), or dates far into the future.
Added support for rotated boxes on receipts.
Concatenated spans enhancement.
Bug Fixes
- Fixed a bug that was not returning special characters in String type fields.
- Fixed a bug for the Passports ML Package where the date written as an ordinal number (1st, 2nd, 3rd, 4th, etc.) was not parsed correctly.
Known Issues
Retraining InvoicesJapan and InvoicesChina ML Packages using data from Validation Station is currently not supported. As a workaround, please use Google Cloud Vision OCR.
Upcoming deprecations
All public endpoints, except for UiPathDocumentOCR,FormExtractor,IntelligentFormExtractor, and IntelligentKeywordClassifier, are going to be deprecated for non-West Europe regions starting with December 1, 2021.