- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Document Understanding User Guide
Intelligent Keyword Classifier
The Intelligent Keyword Classifier is a classifier that uses the word vector it learns from files of certain document types to perform document classification.
The algorithm is built around the concept of repeating content for the same document type and starts from the premise that document types have a series of words that usually occur in those document types, thus allowing for a vector similarity computation.
When classifying a file into a document type, the Intelligent Keyword Classifier:
- finds the closest word vector a file is more similar to,
- reports on the highest scoring document type, with the underlying matching main words.
The Intelligent Keyword Classifier also has file splitting capabilities, meaning that it can report more than one class for a given file, for separate page ranges.
You should consider using this classifier if:
- your files contain one or more document types within a single file
- your document types are relatively easy to differentiate as far as content goes.
You need to use your Automation CloudTM Document UnderstandingTM API Key, or host your own instance of the Intelligent Keyword Classifier in AI Center on-prem, to use this classifier.
Place the Intelligent Keyword Classifier Trainer activity in a Train Classifiers Scope, and configure it accordingly.
We cannot enforce training file consistency across parallel trainings at the activity level. Two possible solutions for this issue are provided by Document Understanding Process. Both consist of traffic control:
- lock files (implemented by default in the process): rename the file using the
.lock
extension, modify and save the file, then rename the file again, removing the.lock
extension - manual setup of a special queue: create an empty queue in Orchestrator and integrate your two activities from the project.
For more information on how to train a Classifier, check this page that describes the process of using the Manage Learning wizard.