- Overview
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Pipelines
- Document Manager
- OCR Services
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Deep Learning
- Licensing
- References
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Document Understanding User Guide
Intelligent Keyword Classifier
The Intelligent Keyword Classifier is a classifier that uses the word vector it learns from files of certain document types to perform document classification.
The algorithm is built around the concept of repeating content for the same document type and starts from the premise that document types have a series of words that usually occur in those document types, thus allowing for a vector similarity computation.
When classifying a file into a document type, the Intelligent Keyword Classifier:
- finds the closest word vector a file is more similar to,
- reports on the highest scoring document type, with the underlying matching main words.
The Intelligent Keyword Classifier also has file splitting capabilities, meaning that it can report more than one class for a given file, for separate page ranges.
You should consider using this classifier if:
- your files contain one or more document types within a single file
- your document types are relatively easy to differentiate as far as content goes.
You need to use your Automation Cloud Document Understanding API Key, or host your own instance of the Intelligent Keyword Classifier in AI Center on-prem, to use this classifier.
Place the Intelligent Keyword Classifier Trainer activity in a Train Classifiers Scope, and configure it accordingly.
We cannot enforce training file consistency across parallel trainings at the activity level. Two possible solutions for this issue are provided by Document Understanding Process. Both consist of traffic control:
- lock files (implemented by default in the process): rename the file using the
.lock
extension, modify and save the file, then rename the file again, removing the.lock
extension - manual setup of a special queue: create an empty queue in Orchestrator and integrate your two activities from the project.
For more information on how to train a Classifier, check this page that describes the process of using the Manage Learning wizard.
Learn more about Intelligent Keyword Classifier, by following this link.