Document Understanding
latest
false
Banner background image
Document Understanding User Guide
Last updated Apr 26, 2024

Traffic limitations

The Extraction and Classification ML Packages require a significant amount of compute resources, which implies some limitations as the size of the documents and/or the throughput of number of documents per minute grow.

Documents larger than 100 pages are expected to run into compute or latency limitations, causing ML Skills to be unstable, or to return HTTP errors. An exact upper limit is hard to define because the text density and image resolution of documents has a large dynamic range, and the text density (number of words per page) impacts the compute and RAM resources required, as well as the latency. Additionally, the capacity of a ML skill depends on the size of the hardware used to deploy it, which is controlled by AI Center. For instance, ML skills can be deployed on GPU or on CPU, which has a large impact on the capacity and speed of the ML Skill.

Regarding throughput, ML Skills can only process one document at a time; this means you need to wait for one document to finish before sending the next one. The larger the documents, the fewer you can process per unit of time.

To mitigate these issues, if you need to process very large documents, keep in mind that in many cases the relevant data may be found on a smaller subset of pages, and this subset may be split out using the Intelligent Keyword Classifier. This may be a great strategy because it eliminates ML skill errors/failures/timeouts, increases throughput and responsiveness, increases extraction accuracy by reducing false positives, and reduces costs by eliminating unnecessary consumption of AI units.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.