- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- Overview
- Document Understanding activities
- Data Extraction Validation overview
- Data extraction validation related activities
- Validation Station
- Data consumption
- API calls
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Document Understanding User Guide
Data Extraction Validation overview
After automatic data extraction, one optional (but highly recommended) step is that of extracted data validation.
This refers to a human review step, in which knowledge workers can review the automatically extracted results and correct them when necessary.
Using Data Extraction Validation ensures that the structured data now available is 100% correct.
It is strongly recommended to use the Data Extraction Validation components when:
- you need 100% accuracy on the data,
-
you have no other way to double-check the automatically extracted information from other sources of truth
- e.g., you can check a certain Name or Address that equals a Name or Address already confirmed and existing in a database, etc.
-
you do not have sufficient synthetic checks you can use on data consistency
-
e.g., you can check that line items add up to a total; you can check that an ID number checksum is correct, etc.
Note:Our strong recommendation is that, if possible, to add the Validation step, if you need 100% accuracy.
If this is not an option for all documents, then:
- try to double-check as much of the information as possible
- try to decide on specific confidence thresholds that the business use case can accept for certain fields
- make sure to always check both Extraction Confidence as well as OCR Confidence for a given value before making your decision.
-
Validating the automatically extracted data can be done by a human input through the use of Validation Station.
The Validation Station is available both
- as an attended activity, through the use of the Present Validation Station activity, or
- as Action Center tasks, through the use of the Create Document Validation Action and Wait for Document Validation Action and Resume activities.