Document Understanding User Guide

DELIVERY:

Last updated Dec 2, 2025

About pipelines

Document Understanding^TM ML Packages can run all three types of pipelines:

Once completed, a pipeline run has associated outputs and logs. To see this information, in the Pipelines tab from the left sidebar, click a pipeline to open the Pipeline view which consists of:

the Pipeline details such as type, ML Package name and version, dataset, GPU usage, parameters, and execution time
the Outputs pane; this always includes a _results.json file containing a summary of the Pipeline details
the Logs page; the logs can also be obtained in the ML Logs tab from the left sidebar

All Pipelines return scores in three different files:

evaluation_scores_<package name>.txt - This file contains Accuracy scores for all fields.
evaluation_<package name>.xlsx - This file contains detailed accuracy breakdown per field and per batch, as well as side-by-side comparison for each field, with color highlighting for missed (red) or partially matched (yellow) fields.
evaluation_F1_scores.txt - This file contains the F1 scores for all fields.

Accuracy is obtained by dividing the number of matches by the total number of predictions. A match gets a weight of 1, while a partial match gets a weight corresponding to the Levenshtein distance between the prediction and the true value.

Note:

Partial matches using Levenshtein distance are the default scoring method on fields with Content Type: String. All other Content Types (Dates, Numbers, ID Numbers, Phone Numbers) only use Exact Match scoring.

For String fields you can change this setting in the Advanced tab of the Field Settings dialog in the Document Type view of Document Understanding.

For example, if an evaluation dataset has 100 documents, and a field, say Purchase Order Number, appears on half of the documents, then if model predicted 40 of them correctly and 10 of them partially correct with a Levenshtein distance of 0.8, then the accuracy would be (40 + 10 x 0.8 + 50)/100 = 98%.

Note:

Note that the 50 documents where the field is missing and model did not predict anything are also counted as successful predictions.

On Training pipelines, the scores are calculated on the Validation dataset. The Validation dataset is a randomly selected subset of 20% of the total training dataset submitted in the Training Pipeline.

Training pipelines or Full pipelines can also be used to: