Document Understanding
latest
false
Banner background image
Document Understanding User Guide
Last updated Apr 26, 2024

Deploying High Performing Models

As Machine Learning (ML) models improve in accuracy over time, their resource requirements change as well. For the best performance, it is important that when deploying ML models via AI Centerâ„¢, the skills are appropriately sized with respect to the traffic they need to handle. For the most part, infrastructure is sized with respect to the number of pages per unit of time (minute or hour). A document can have a single page, or multiple pages.

Introduction to ML model performance

To deploy infrastructure via AI Center, there are a few important aspects to keep in mind for optimal performance.

GPU

There is only one type of GPU infrastructure available. This is highlighted by the checkbox to enable GPU. Each skill runs on a single virtual machine (VM) or node that has a GPU. In this case, CPU and memory are not relevant, since the skill can use all the available CPU and memory resources on that nodes. Besides throughput, GPU is much faster. Because of this, if latency is critical, it is recommended to use GPU.

CPU

CPU and memory can be fractioned, which means multiple ML Skills can run on the same node. To avoid any disturbance from a neighboring skill, each ML Skill is limited to the amount of memory and CPU they can consume, depending on the selected tier. Higher CPU leads to faster processing (for a page), while higher memory leads to a larger number of documents that can be processed.

Number of replicas

The number of replicas determines the number of containers that are used to serve requests from the ML model. A higher number leads to a larger amount of documents that can be processed in parallel, subject to the limits of that particular tier. The number of replicas is directly tied to the infrastructure type (number of CPUs per replica, or if using a GPU), in the sense that both replicas and infrastructure size can directly affect throughput (pages/minute).

Note: Multiple replicas multiply the throughput.

Number of robots

The number of robots impacts throughput. To get efficient throughput, the number of robots needs to be sized in such a manner that it does not overload the ML Skills. This is dependent on the automation itself and should be tested. As a general guideline, you can use one to three robots as a starting point for each replica the ML Skill has. Depending on the overall process time (excluding ML Extractor), the number of robots can be higher or lower (or the number of replicas).

Potential issues related to infrastructure sizing

If the infrastructure is not sized correctly, the models can be placed under a very high load. In some cases this can lead to a backlog of requests, long processing time, or even failures when processing documents.

Insufficient memory

Insufficient memory most commonly encountered on the lower CPU tiers (0.5 CPU or 1 CPU). If you need to process a very large payload (one or several large documents), it can lead to an out of memory exception. This is related to the document size in terms of pages and text density (how much text there is per page). Since the requirements are very specific to each use case, it is not possible to provide exact numbers. You can check the guidelines in the Sizing the infrastructure correctly section for more detailed information. If you encounter an insufficient memory situation, the general recommendation is to go to the next tier.

Insufficient compute

Insufficient compute refers to both CPU and GPU, although it is more commonly encountered on CPU. When the ML Skill receives too many pages related to its available capacity, requests can timeout (520 and 499 status codes), backlog, or even lead to the model crashing (503 and 500 status codes). If you encounter an insufficient compute situation, we recommend going to the next tier, or even to the GPU tier.

Sizing the infrastructure correctly

General guidelines

This section provides general guidelines on how the models perform on each different skill size.

Note: Each model generation (2022.10, 2023.4, or 2023.10) behaves differently in relation to resources required and throughput. As models become better in terms of accuracy, this can also impact the performance and demand more resources.
Table 1. 2022.10 Extractor
TierMaximum pages/documentExpected throughput (pages/hour)AI Units/hour
0.5 CPU/2 GB memory25300-6001
1 CPU/4 GB memory50400-8002
2 CPU/8 GB memory100600-10004
4 CPU/16 GB memory100800-12008
6 CPU/24 GB memory100900-130012
GPU200-2501350-160020
Table 2. 2023.4 Extractor
TierMaximum pages/documentExpected throughput (pages/hour)AI Units/hour
0.5 CPU/2 GB memory2540-1001
1 CPU/4 GB memory5070-1402
2 CPU/8 GB memory75120-2204
4 CPU/16 GB memory100200-3008
6 CPU/24 GB memory100250-40012
GPU200-2501400-220020
Table 3. 2023.7 and 2023.10 Extractors
TierMaximum pages/documentExpected throughput (pages/hour)AI Units/hour
0.5 CPU/2 GB memory2560-2001
1 CPU/4 GB memory50120-2402
2 CPU/8 GB memory75200-2804
4 CPU/16 GB memory100250-4008
6 CPU/24 GB memory100350-50012
GPU200-2501000-200020

The expected throughput is expressed for each replica, in page/hour, and a minimum and maximum expected throughput, depending on the document itself. The ML Skill should be sized for the highest expected throughput (spike), and not the average throughput in a day, week, or month.

Note: When sizing the infrastructure, make sure to start from the largest document the skill needs to handle and the expected throughput.

Examples

Example 1

The ML Skill needs to process the following using a 2023.10 Extractor:
  • Documents containing maximum five pages.
  • A maximum spike of 300 pages per hour.

Since the throughput is on the lower side and the document size is small, a GPU is not needed in this example. Two to four replicas of the 0.5 CPU or 1 CPU tier is sufficient.

Example 2

The ML Skill needs to process the following using a 2023.4 Extractor:
  • Documents containing maximum 80 pages.
  • A maximum spike of 900 pages per hour.

For this example, either three replicas of the 4 CPU tier, or a single GPU tier is sufficient.

Note: A single replica does not have high availability, so it is always recommended to use at least two replicas for critical production workflows.

Example 3

The ML Skill needs to process the following using a 2023.10 Extractor:
  • Documents containing maximum 50 pages.
  • A maximum spike of 3000 pages per hour.
There are two ways to meet this requirements:
  • Use 3 GPU replicas.
  • Use 12-15 replicas of the 4 CPU or 6 CPU tier.

Both options have high availability because there are more than two replicas for the ML Skill.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.