Document Understanding User Guide

DELIVERY:

Last updated Apr 25, 2025

Intelligent Keyword Classifier

What is Intelligent Keyword Classifier

The Intelligent Keyword Classifier is a classifier that uses the word vector it learns from files of certain document types to perform document classification.

The algorithm is built around the concept of repeating content for the same document type and starts from the premise that document types have a series of words that usually occur in those document types, thus allowing for a vector similarity computation.

When classifying a file into a document type, the Intelligent Keyword Classifier:

finds the closest word vector a file is more similar to,
reports on the highest scoring document type, with the underlying matching main words.

The Intelligent Keyword Classifier also has file splitting capabilities, meaning that it can report more than one class for a given file, for separate page ranges.

Note: Unlike the Keyword Based Classifier, you do not need to manually select references in the document when training the Intelligent Keyword Classifier and any such references provided at training time will be ignored.

When to use

You should consider using this classifier if:

your files contain one or more document types within a single file
your document types are relatively easy to differentiate as far as content goes.

Note: Starting with version 6.9.0, the splitting performance for the Intelligent Keyword Classifier activity has been improved. For optimized splitting performance, use version 6.9.0 or higher.

Special requirements

You need to use your Automation Cloud^TM Document Understanding^TM API Key, or host your own instance of the Intelligent Keyword Classifier in AI Center on-prem, to use this classifier.

How to train

Place the Intelligent Keyword Classifier Trainer activity in a Train Classifiers Scope, and configure it accordingly.

We cannot enforce training file consistency across parallel trainings at the activity level. Two possible solutions for this issue are provided by Document Understanding Process. Both consist of traffic control:

lock files (implemented by default in the process): rename the file using the .lock extension, modify and save the file, then rename the file again, removing the .lock extension
manual setup of a special queue: create an empty queue in Orchestrator and integrate your two activities from the project.

For more information on how to train a Classifier, check this page that describes the process of using the Manage Learning wizard.

On this page