document-understanding
latest
false
- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Evaluation Pipelines
Document Understanding User Guide
Last updated Dec 12, 2024
Evaluation Pipelines
An Evaluation Pipeline is used to evaluate a trained ML model.
Configure the evaluation pipeline as follows:
- In the Pipeline type field, select Evaluation run.
- In the Choose package major version field, select a major version for your package.
- In the Choose package minor version field, select a minor version you want to evaluate.
- In the Choose evaluation dataset field, select a representative evaluation dataset. For more information on dataset structure, check the Dataset format section.
- In the Enter parameters section, there is one environment variable is relevant for Evaluation pipelines you could use:
eval.redo_ocr
which, if set to true, allows you to rerun OCR when running the pipeline to assess the impact of OCR on extraction accuracy. This assumes an OCR engine was configured when the ML Package was created.- The Enable GPU slider is disabled by default, in which case the pipeline is runs on CPU. We strongly recommend that Evaluation pipelines run only on CPU.
-
Select one of the options when the pipeline should run: Run now, Time based or Recurring.
- After you configure all the fields, click Create. The pipeline is created.
For an Evaluation Pipeline, the Outputs pane also includes an artifacts / eval_metrics folder which contains two files:
evaluation_default.xlsx
is an Excel spreadsheet with three different sheets:- The first sheet presents a summary of the overall scores and the scores per batch, for each field, Regular, Column, and Classification fields. A percentage of the perfectly extracted documents is also provided for both per batch and overall documents.
- The second sheet presents a side by side, color coded comparison of Regular Fields, for increasing document accuracy. The most inaccurate documents are presented at the top to facilitate diagnosis and troubleshooting.
- The third sheet presents a side by side color, coded comparison of the Column Fields.
- All scores presented in the Excel file represent accuracy scores.
evaluation_metrics_default.txt
contains the F1 scores of the predicted fields.