Document Understanding User Guide

DELIVERY:

Last updated Apr 11, 2025

Traffic limitations

The Extraction and Classification ML Packages require a significant amount of compute resources, which implies some limitations as the size of the documents and/or the throughput of number of documents per minute grow.

Documents larger than 100 pages are expected to run into compute or latency limitations, causing ML Skills to be unstable, or to return HTTP errors. An exact upper limit is hard to define because the text density and image resolution of documents has a large dynamic range, and the text density (number of words per page) impacts the compute and RAM resources required, as well as the latency. Additionally, the capacity of a ML skill depends on the size of the hardware used to deploy it, which is controlled by AI Center. For instance, ML skills can be deployed on GPU or on CPU, which has a large impact on the capacity and speed of the ML Skill.

Regarding throughput, ML Skills can only process one document at a time; this means you need to wait for one document to finish before sending the next one. The larger the documents, the fewer you can process per unit of time.

To mitigate these issues, if you need to process very large documents, keep in mind that in many cases the relevant data may be found on a smaller subset of pages, and this subset may be split out using the Intelligent Keyword Classifier. This may be a great strategy because it eliminates ML skill errors/failures/timeouts, increases throughput and responsiveness, increases extraction accuracy by reducing false positives, and reduces costs by eliminating unnecessary consumption of AI units.

Was this page helpful?

PREVIOUSPublic endpoints

NEXTOCR Configuration

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy