document-understanding
latest
false
UiPath logo, featuring letters U and I in white

Document Understanding Modern Projects User Guide

Automation CloudAutomation Cloud Public SectorAutomation SuiteStandalone
Last updated Dec 12, 2024

Fundamental capabilities

To automate document processing, four fundamental capabilities are required: digitization, classification, extraction, and validation.

Figure 1. Fundamental capabilities

Digitization

Digitization converts a physical document into machine-readable text, which can then be processed digitally. Although Optical Character Recognition (OCR) is a significant part of digitization, the digitization process is more complex and involves various steps, including OCR.

For example, when dealing with PDF documents, the digitization algorithm can distinguish between scanned and native PDFs or hybrid ones that contain scanned images and native text. Most of the text can be extracted directly from a native PDF document, but in some cases, a few logos may need to be read using OCR. The digitization process can handle all of these situations to ensure maximum accuracy in text detection while running quickly and efficiently.

Classification and splitting

In most use cases, documents need to be sorted into logical categories so different processing methods can be applied to them. The process of sorting documents involves two tasks:
  • Splitting
  • Classification
Depending on the complexity of the problem, you might need to split documents, classify them, or both.
Note: Document splitting is only available when used with IntelligentOCR.

The objective of splitting is to scan the continuous pages of a document and split them into logical subdocuments. A document splitter algorithm can be document type-agnostic, meaning it can split any document regardless if it is an invoice, a contract, or an application form.

Figure 2. Document splitting An image describing how a four-page document is split into three different documents, each of them having a different document type.

The objective of a classification is to scan a document and decide what document type it belongs to. Knowing the type of a document is important, as different document types require different processing techniques. For example, an invoice needs to be processed by an invoice extraction model to ensure all relevant fields get extracted.

Figure 3. Document classifier An image describing how a document with an unknown document type passes through Document Classifier. After that, the document is classified as an Invoice.

Extraction

Data extraction is the process of selecting and retrieving only the relevant information from a document. Extracting specific data from a lengthy document using string manipulation can be challenging. However, Document UnderstandingTM provides various extraction methodologies for different document types and formats. For example, we only want to extract the Vendor Name, Billing Name, Due Date, and Total fields from an invoice.

Figure 4. Data extraction

Validation

In classification and extraction, software robots use the concept of confidence, which measures the level of certainty that a particular task was performed well. The task can either be recognizing a document type, identifying a field, or reading the data in it. In these cases, the Document Understanding framework allows you to engage a human user to review and validate the robot's output. In the best scenario, the human input is used to train the robot's accuracy through machine learning.

  • Digitization
  • Classification and splitting
  • Extraction
  • Validation

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.