Document Understanding
latest
false
- Overview
- Getting Started
- Activities
- Insights Dashboards
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Overview
- Document Understanding - ML Package
- DocumentClassifier - ML Package
- ML Packages With OCR Capabilities
- 1040 - ML Package
- 1040 Schedule C - ML Package
- 1040 Schedule D - ML Package
- 1040 Schedule E - ML Package
- 4506T - ML Package
- 990 - ML Package - Preview
- ACORD125 - ML Package
- ACORD126 - ML Package
- ACORD131 - ML Package
- ACORD140 - ML Package
- ACORD25 - ML Package
- Bank Statements - ML Package
- BillsOfLading - ML Package
- Certificate of Incorporation - ML Package
- Certificate of Origin - ML Package
- Checks - ML Package
- Children Product Certificate - ML Package
- CMS 1500 - ML Package
- EU Declaration of Conformity - ML Package
- Financial Statements - ML Package
- FM1003 - ML Package
- I9 - ML Package
- ID Cards - ML Package
- Invoices - ML Package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML Package
- Packing Lists - ML Package
- Payslips - ML Package
- Passports - ML Package
- Purchase Orders - ML Package
- Receipts - ML Package
- RemittanceAdvices - ML Package
- UB04 - ML Package
- Utility Bills - ML Package
- Vehicle Titles - ML Package
- W2 - ML Package
- W9 - ML Package
- Other Out-of-the-box ML Packages
- Public Endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR Services
- Deep Learning
- Licensing
Full Pipelines
Document Understanding User Guide
Last updated Apr 26, 2024
Full Pipelines
A Full Pipeline runs a Training Pipeline and an Evaluation Pipeline together.
Important:
Minimal dataset size
For successfully running a Training pipeline, we strongly recommend minimum 10 documents and at least 5 samples from each
labeled field in your dataset. Otherwise, the pipeline throws the following error:
Dataset Creation Failed
.
Training on GPU vs CPU
- For larger datasets, you need to train using GPU. Moreover, using a GPU for training is at least 10 times faster than using a CPU.
- Training on CPU is only supported for datasets up to 5000 pages in size for ML Packages v21.10.x and up to 1000 pages for other versions of ML Packages.
- CPU training was limited to 500 pages before 2021.10, it went up to 5000 pages for 2021.10 and with 2022.4 it came back down to 1000 pages max.