Document Understanding
latest
false
  • Release Notes
Banner background image
Document Understanding Release Notes
Last updated May 7, 2024

October 2021

General Release Notes - Document Understanding

19 October 2021

Improvements

Fields with less than 10 documents labeled can be deleted without confirmation.

Bug Fixes

  • Fixed a bug that affected imported files with the same name.
  • Fixed a bug in Google OCR that was throwing an error on documents with empty pages.
  • Fixed a bug that wrongly displayed the count of files in the Import data dialog box for Validation Station or Data Manager dataset imports.

Known Issues

  • Default export (document level) only works with ML Packages version 21.10 or later in AI Center. The version appears in the Change Log column of the ML Packages view in AI Center. For older versions, please use the Backwards-compatible export checkbox on the Export files dialog box.

1 October 2021

Multi-page document support

Data Manager now supports multi-page documents. This is a major update impacting every aspect of a Machine Learning flow:

Import: you can upload documents up to 150 pages; to bypass this limit, at the risk of an unstable labeling experience, select the Enable large documents checkbox from the Import data dialog box.

Prelabeling: the document is prelabeled as a whole, producing the same results as running in RPA workflow, but it takes more time in case of larger documents. See also Known Issues below.

Labeling: more convenient labeling due to natural scrolling through document pages.

Export: done by default at document level. Should you want to export the documents at page level, select the Backwards-compatible export checkbox from the Export files dialog box; this is also recommended if the model accuracy produced by the default export is below expectations.

Training: on most scenarios, the models trained with the new document level exported datasets should have the same performance with the page level Backwards-compatible export. However, if the models perform below expectations, we recommend that you retry the training using a Backwards-compatible export as well, in case it might produce better results.

Evaluation: this is the main motivation for the multi-page document support feature, since Evaluations scores will more accurately reflect run time performance. Please note that this assumes that each multi-page document contains a single logical document. For instance, if you import 20 page file packets containing 10 invoices of 2 pages each, then this should not be used as part of Evaluation sets. However, they can be used as part of Training sets but only if you export using the Backwards-compatible option enabled.

Improvements

Export schema support using radio button in Export files dialog box.

Maximum import size increased to 2GB or 2000 pages.

Test set renamed to Evaluation set for consistency with AI Center Evaluation Pipelines.

The Predict button appears by default in the management bar, but Prelabelling settings need to be configured for the button to be enabled.

All restrictions on number of samples per field removed from exports of Evaluation sets.

Added Data Manager session name next to file name in the management bar to identify more easily the session you are working on in case of multiple Data Manager tabs open at the same time.

Chinese language documents supported.

Accessibility improvements.

Localization for Portuguese-Portugal,Russian and Turkish.

Known Issues

  • Invoices China model does not format Chinese style dates in the standard yyyy-mm-dd format. This will be improved in upcoming releases.
  • Data Manager parsing of dates is inconsistent with the parsing made by ML models at run time. If you notice that dates are being parsed incorrectly in Data Manager, it is likely they will be parsed correctly in the model prediction at run time. This is known issue and it will be resolved in an upcoming patch.
  • At the moment, using the Predict option with Public Endpoints prelabels only the first 10 pages of a document. This is a known issue and an enhancement will be included in an upcoming patch. Using the Predict option with ML Skills in AI Center, however, does not impose such a limitation.

General Release Notes - ML Packages

22 October 2021 | V.21.10.9

Released in AI Center Cloud & Endpoints: 22 October 2021, package version: 21.10.9

What's New

The PurchaseOrders ML Package is now Generally Available and it is ready to be used in your production scenarios.

InvoicesChina, DeliveryNotes, RemittanceAdvices, W2, and W9 ML Packages are now in Public Preview. We recommend you check out these packages and start using them for the type of documents you need to process.

Improvements

Implemented document level evaluation. This is representative for the runtime performance in your RPA workflow.

Evaluation can also be done on datasets with fewer fields than the ML Package being evaluated. This facilitates evaluation on out-of-the-box pre-trained ML Packages.

To assess the impact OCR has on extraction accuracy, you can now rerun it when running an Evaluation Pipeline. This requires OCR to be configured when creating an ML Package and the Environment Variable eval.redo_ocr needs to be set to true in the AI Center Evaluation Pipeline.

Training on CPU now uses a smaller model to obtain a 5x-7x speedup. However, you should expect a lower accuracy by 0-5% on CPU.

Added Minimum Confidence and Straight Through Processing Rate columns to the Evaluation.xlsx files produced by Evaluation Pipelines.

The UtilityBills ML Package has been substantially improved.

Address parsing improvement for addresses which skip 1-2 lines of text.

Improvement on extracting negative values, very large values (11 digits or more), or dates far into the future.

Added support for rotated boxes on receipts.

Concatenated spans enhancement.

Bug Fixes

  • Fixed a bug that was not returning special characters in String type fields.
  • Fixed a bug for the Passports ML Package where the date written as an ordinal number (1st, 2nd, 3rd, 4th, etc.) was not parsed correctly.

Known Issues

Retraining InvoicesJapan and InvoicesChina ML Packages using data from Validation Station is currently not supported. As a workaround, please use Google Cloud Vision OCR.

Upcoming deprecations

All public endpoints, except for UiPathDocumentOCR,FormExtractor,IntelligentFormExtractor, and IntelligentKeywordClassifier, are going to be deprecated for non-West Europe regions starting with December 1, 2021.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.