- Overview
- Getting Started
- Activities
- Insights Dashboards
- Document Understanding Process
- Quickstart Tutorials
- Framework Components
- ML Packages
- Overview
- Document Understanding - ML Package
- DocumentClassifier - ML Package
- ML Packages With OCR Capabilities
- 1040 - ML Package
- 1040 Schedule C - ML Package
- 1040 Schedule D - ML Package
- 1040 Schedule E - ML Package
- 4506T - ML Package
- 990 - ML Package - Preview
- ACORD125 - ML Package
- ACORD126 - ML Package
- ACORD131 - ML Package
- ACORD140 - ML Package
- ACORD25 - ML Package
- Bank Statements - ML Package
- BillsOfLading - ML Package
- Certificate of Incorporation - ML Package
- Certificate of Origin - ML Package
- Checks - ML Package
- Children Product Certificate - ML Package
- CMS 1500 - ML Package
- EU Declaration of Conformity - ML Package
- Financial Statements - ML Package
- FM1003 - ML Package
- I9 - ML Package
- ID Cards - ML Package
- Invoices - ML Package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML Package
- Packing Lists - ML Package
- Payslips - ML Package
- Passports - ML Package
- Purchase Orders - ML Package
- Receipts - ML Package
- RemittanceAdvices - ML Package
- UB04 - ML Package
- Utility Bills - ML Package
- Vehicle Titles - ML Package
- W2 - ML Package
- W9 - ML Package
- Other Out-of-the-box ML Packages
- Public Endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR Services
- Deep Learning
- Training High Performing Models
- Deploying High Performing Models
- Licensing
Deploying High Performing Models
As Machine Learning (ML) models improve in accuracy over time, their resource requirements change as well. For the best performance, it is important that when deploying ML models via AI Centerâ„¢, the skills are appropriately sized with respect to the traffic they need to handle. For the most part, infrastructure is sized with respect to the number of pages per unit of time (minute or hour). A document can have a single page, or multiple pages.
To deploy infrastructure via AI Center, there are a few important aspects to keep in mind for optimal performance.
There is only one type of GPU infrastructure available. This is highlighted by the checkbox to enable GPU. Each skill runs on a single virtual machine (VM) or node that has a GPU. In this case, CPU and memory are not relevant, since the skill can use all the available CPU and memory resources on that nodes. Besides throughput, GPU is much faster. Because of this, if latency is critical, it is recommended to use GPU.
CPU and memory can be fractioned, which means multiple ML Skills can run on the same node. To avoid any disturbance from a neighboring skill, each ML Skill is limited to the amount of memory and CPU they can consume, depending on the selected tier. Higher CPU leads to faster processing (for a page), while higher memory leads to a larger number of documents that can be processed.
The number of replicas determines the number of containers that are used to serve requests from the ML model. A higher number leads to a larger amount of documents that can be processed in parallel, subject to the limits of that particular tier. The number of replicas is directly tied to the infrastructure type (number of CPUs per replica, or if using a GPU), in the sense that both replicas and infrastructure size can directly affect throughput (pages/minute).
The number of robots impacts throughput. To get efficient throughput, the number of robots needs to be sized in such a manner that it does not overload the ML Skills. This is dependent on the automation itself and should be tested. As a general guideline, you can use one to three robots as a starting point for each replica the ML Skill has. Depending on the overall process time (excluding ML Extractor), the number of robots can be higher or lower (or the number of replicas).
If the infrastructure is not sized correctly, the models can be placed under a very high load. In some cases this can lead to a backlog of requests, long processing time, or even failures when processing documents.
Insufficient memory most commonly encountered on the lower CPU tiers (0.5 CPU or 1 CPU). If you need to process a very large payload (one or several large documents), it can lead to an out of memory exception. This is related to the document size in terms of pages and text density (how much text there is per page). Since the requirements are very specific to each use case, it is not possible to provide exact numbers. You can check the guidelines in the Sizing the infrastructure correctly section for more detailed information. If you encounter an insufficient memory situation, the general recommendation is to go to the next tier.
520
and
499
status codes), backlog, or even lead to the model crashing
(503
and 500
status codes). If you encounter
an insufficient compute situation, we recommend going to the next tier, or even to
the GPU tier.
This section provides general guidelines on how the models perform on each different skill size.
Tier | Maximum pages/document | Expected throughput (pages/hour) | AI Units/hour |
---|---|---|---|
0.5 CPU/2 GB memory | 25 | 300-600 | 1 |
1 CPU/4 GB memory | 50 | 400-800 | 2 |
2 CPU/8 GB memory | 100 | 600-1000 | 4 |
4 CPU/16 GB memory | 100 | 800-1200 | 8 |
6 CPU/24 GB memory | 100 | 900-1300 | 12 |
GPU | 200-250 | 1350-1600 | 20 |
Tier | Maximum pages/document | Expected throughput (pages/hour) | AI Units/hour |
---|---|---|---|
0.5 CPU/2 GB memory | 25 | 40-100 | 1 |
1 CPU/4 GB memory | 50 | 70-140 | 2 |
2 CPU/8 GB memory | 75 | 120-220 | 4 |
4 CPU/16 GB memory | 100 | 200-300 | 8 |
6 CPU/24 GB memory | 100 | 250-400 | 12 |
GPU | 200-250 | 1400-2200 | 20 |
Tier | Maximum pages/document | Expected throughput (pages/hour) | AI Units/hour |
---|---|---|---|
0.5 CPU/2 GB memory | 25 | 60-200 | 1 |
1 CPU/4 GB memory | 50 | 120-240 | 2 |
2 CPU/8 GB memory | 75 | 200-280 | 4 |
4 CPU/16 GB memory | 100 | 250-400 | 8 |
6 CPU/24 GB memory | 100 | 350-500 | 12 |
GPU | 200-250 | 1000-2000 | 20 |
The expected throughput is expressed for each replica, in page/hour, and a minimum and maximum expected throughput, depending on the document itself. The ML Skill should be sized for the highest expected throughput (spike), and not the average throughput in a day, week, or month.
Example 1
- Documents containing maximum five pages.
- A maximum spike of 300 pages per hour.
Since the throughput is on the lower side and the document size is small, a GPU is not needed in this example. Two to four replicas of the 0.5 CPU or 1 CPU tier is sufficient.
Example 2
- Documents containing maximum 80 pages.
- A maximum spike of 900 pages per hour.
For this example, either three replicas of the 4 CPU tier, or a single GPU tier is sufficient.
Example 3
- Documents containing maximum 50 pages.
- A maximum spike of 3000 pages per hour.
- Use 3 GPU replicas.
- Use 12-15 replicas of the 4 CPU or 6 CPU tier.
Both options have high availability because there are more than two replicas for the ML Skill.