# Introduction

> The **UiPath® Document Understanding<sup>TM</sup>** framework facilitates the processing of incoming files, from file digitization to extracted data validation, all in an open, extensible, and versatile environment.

The **UiPath® Document Understanding<sup>TM</sup>** framework facilitates the processing of incoming files, from file digitization to extracted data validation, all in an open, extensible, and versatile environment.

**Document Understanding** is designed to help you combine different approaches to extract information from multiple document types. The main aim is to make the process of extracting data as easy as possible: creating one single workflow that will extract data from a variety of documents.

  <iframe
    src="https://www.youtube-nocookie.com/embed/ZiqaKeXuIos?rel=0&modestbranding=1"
    title="UiPath Document Understanding - Get documents processed intelligently"
    style={{
      position: "absolute",
      top: 0,
      left: 0,
      width: "100%",
      height: "100%",
      border: 0,
    }}
    loading="lazy"
    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
    referrerPolicy="strict-origin-when-cross-origin"
    allowFullScreen
  ></iframe>

Before using the **Document Understanding** framework, it is recommended to understand the following **Document Understanding Framework Components**:

* [Taxonomy](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/taxonomy#taxonomy) **What documents need to be processed and what data is required from them?** Used to define the document types and the pieces of information targeted for data extraction (fields) for each document type, and formalizes this information into a dedicated Taxonomy structure. This metadata information is managed through the [Taxonomy Manager](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/taxonomy-manager#taxonomy-manager).
* [Digitization](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/digitization#digitization) **What does this file contain?** Used to obtain the textual content and the structure of the incoming document, turning a file into machine-readable content so it can be further processed downstream.
* [Document Classification](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/document-classification#document-classification) **What types of documents from the taxonomy are found in this file?** Used to automatically determine what document types are found within a digitized file.
* [Document Classification Validation](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/document-classification-validation#document-classification-validation) **Is the predicted classification correct? This is how I can review and correct it.** Used for assisting in the human validation and correction of the automatic classification and document splitting results.
* [Classification Training](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/document-classification-training#document-classification-training) **Did the human review the data? This is how the robot can learn from it.** Used to pass the human validated information back to the classifiers, to use it to improve their future predictions.
* [Data Extraction](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/data-extraction#data-extraction) **What data can be found in this particular document?** Used to capture the information required for the identified document type, within the given input document and classification page range.
* [Data Extraction Validation](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/data-extraction-validation#data-extraction-validation) **Is the extracted information correct? This is how I can review and correct it.** Used for assisting in the human validation and correction of the automatically extracted data results.
* [Data Extraction Training](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/data-extraction-training#data-extraction-training) **Did the human review the data? This is how the robot can learn from it.** Used to pass the human validated extracted data back to the extractors, to use it to improve their extraction predictions.
* [Data Consumption](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/data-consumption#data-consumption) Used to export the validated data in order to consume it.
* [Metering & Charging Logic](https://docs.uipath.com/document-understanding/automation-suite/2024.10/classic-user-guide/metering-charging-logic#metering-and-charging-logic-(flex-plan)) Used to explain the consumption of units per page for each available service.

The following diagram presents the Document Understanding Framework components and how they relate to one another:

![Diagram describing the Document Understanding Framework](https://dev-assets.cms.uipath.com/assets/images/document-understanding/document-understanding-diagram-describing-the-document-understanding-framework-114223-a81020d2-e5a2c1f3.webp)

The **Document Understanding** framework is found in the **UiPath.IntelligentOCR.Activities** package. Once the **UiPath.IntelligentOCR.Activities** package is installed, the **Taxonomy Manager** wizard appears in the top ribbon of the UiPath Studio. This same package contains all the core document understanding framework activities.

The scope activities ([Classify Document Scope](https://docs.uipath.com/activities/other/latest/document-understanding/classify-document-scope), [Data Extraction Scope](https://docs.uipath.com/activities/other/latest/document-understanding/data-extraction-scope), [Train Classifiers Scope](https://docs.uipath.com/activities/other/latest/document-understanding/train-classifiers-scope), [Train Extractors Scope](https://docs.uipath.com/activities/other/latest/document-understanding/train-extractors-scope)) that are part of the **Document Understanding** framework allow you to use any document classification and data extraction algorithms that fit your use case and then train these algorithms.

The **Document Understanding** framework can be used not only with the out-of-the-box classifiers and extractors but also with any custom-built ones. These can be created using the abstract classes from the UiPath.DocumentProcessing.Contracts package and can be implemented as classification or data extraction activities. Custom-built OCR engines can also be created using the abstract classes from the UiPath.OCR.Contracts package.

## Resources

Dedicated **Document Understanding** courses can be found in the [UiPath RPA Academy](https://academy.uipath.com/learning-plans).

The [UiPath Community Forum](https://forum.uipath.com/t/document-understanding-data-manager-ga-in-automation-cloud/321155) is the place for getting support from our evergrowing community of users.
