document-understanding
latest
false
- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Intelligent OCR activities
Document Understanding User Guide
Last updated Dec 12, 2024
Intelligent OCR activities
With Intelligent OCR activities you can process documents in a comprehensive manner, allowing you to not only digitize, extract, classify, and validate documents, but also train your extractor and classifiers on your specific data, so they can be faster and more accurate. The steps involved in creating Document UnderstandingTM processes using Intelligent OCR activities are:
- Create the Taxonomy: Define document types and convert them into a Document Object Model variable using the Load Taxonomy activity.
- Digitize documents: Prepare documents so robots can process them using an OCR engine, by storing their text inside a String variable, and basic information about them inside a Document Object Model file.
- Classify documents: Prepare documents using certain classifiers, so robots can identify what types of files they're processing.
- Validate the classification of documents: Verify and validate that the documents have been correctly classified.
- Train your classifiers: Configure your classifiers based on input received while validating the classification
- Extract data from documents: Identify and extract specific information from your documents using various extractors to send it for validation.
- Validate the extractions documents: Verify and validate the documents you processed, classified, and extracted, using the input of your team members within Action Center.
- Train your extractors: Configure your extractors based on input received while validating the extraction.
- Consume exported data: Once you validate the extracted data, you can use it as it is or export it as a DataSet variable using the Export Extraction Results activity.
Before you begin using IntelligentOCR.Activities, check the following characteristics:
- High configurability, which also involves a high learning curve.
- The presence of multiple objects and activities, designed to cater for flexibility.
- Reduced reusability, due to the
following complexities:
- You need to configure numerous configurations inside the workflow.
- You need to pass explicit
arguments from one activity to the other repeatedly, such as:
- Taxonomy
- Document Object Model
- Text
- Classification results
- Extraction results