The aim of this page is to help first time users get familiar with Document Understanding.
For scalable production deployments, we strongly recommend using the Document Understanding Process available in UiPath Studio under the Templates section.
This quickstart shows you how to extract data from receipts using the out-of-the-box Receipts ML model with its corresponding public endpoint.
Validation can be done either by presenting the Validation Station or by using the Validation Action in Action Center. Both options are described in the following sections.
Using Receipts ML model with Public Endpoint and Validation Station
In this section, we are going to validate the extraction results using Validation Station.
To create a basic workflow using the Receipts ML Model follow the steps below.
- Create a blank process
- Install the required activities packages
- Create a taxonomy
- Digitize the document
- Extract the data using the Receipts ML model
- Validate the results using Validation Station
- Export the extraction results
Now, let us see every step in detail.
1. Create a blank process
Launch UiPath Studio.
In the HOME backstage view, click Process to create a new project.
The New Blank Process window is displayed. In this window, enter a name for the new project. If you want, you can also add a description to sort through your projects more easily.
Click Create. The new project is opened in Studio.
2. Install the required activities packages
From the Manage Packages button in the ribbon, besides the core activities packages (UiPath.Excel.Activities, UiPath.Mail.Activities, UiPath.System.Activities, UiPath.UIAutomation.Activities) that are added to the project by default, install the following activities packages:
3. Create a taxonomy
Once the activities packages are installed, list out the required fields. The Receipts ML model supports data extraction for the fields below:
- name -
- vendor-addr -
- total -
- date -
- phone -
- currency -
- expense-type -
- items -
- description -
- line-amount -
- unit-price -
- quantity -
- description -
Open Taxonomy Manager and create a group named "Semi Structured Documents", a category named "Finance", and a document type named "Receipts". Create the above listed fields with user friendly names along with respective data types.
4. Digitize the document
In the Main.xaml file, add a Load Taxonomy activity and create a variable for the taxonomy output.
Add a Digitize Document activity with UiPath Document OCR. Provide the input property Document Path and create output variables for Document Text and Document Object Model.
Remember to add the Document Understanding API Key in the UiPath Document OCR activity.
5. Extract the data using the Receipts ML model
Add a Data Extraction Scope activity and fill in the properties.
Drag and drop a Machine Learning Extractor activity. A pop-up with three input parameters, Endpoint, ML Skill, and ApiKey, is displayed on the screen.
Fill in the Endpoint parameter with the Receipts Public Endpoint, namely https://du.uipath.com/ie/receipts, and provide the Document Understanding API key.
Click on Get Capabilities.
The next step is to configure the extractor. Configuring the extractor means mapping the fields that you created in Taxonomy Manager to the fields available in the ML model like shown in the below image:
To use the Machine Learning Extractor with an ML Skill, choose the ML Skill from the dropdown and configure the extractor.
You must have your robot assistant connected to the same tenant as your ML Skill.
6. Validate the results using Validation Station
To check the results through Validation Station, drag and drop the Present Validation Station activity and provide the input details.
7. Export the extraction results
To export the extraction results, drag and drop an Export Extraction Results activity to the end of your workflow. This outputs the results into a
DataSet that contains multiple tables, which could then be written to an Excel file or be used directly in a downstream process.
Download this sample project using this link.
The example contains two workflows:
- Main.xaml - in this workflow, the extraction results are validated using Validation Station; this is described in the above section
- Main - Unattended.xaml - in this workflow, the extraction results are validated using Validation Action; this is described in the following section
Using Receipts ML model with Public Endpoint and Validation Action
Now, let’s see how to use an Action Center Validation Action instead of presenting the Validation Station.
How do tasks in Action Center work?
When an automation includes decisions that a human should make, such as approvals, escalations, and exceptions, UiPath Action Center makes it easy and efficient to hand off the process from robot to human. And back again.
Document Understanding Action Center activities come with the UiPath.IntelligentOCR.Activities package and the UiPath.Persistance.Activities package. Don’t forget to enable Persistence activities from the General Settings in UiPath Studio:
How does the Validation Action work?
Productivity can be increased by adding an orchestration process that adds document validation actions in Action Center, in both on-premises Orchestrator and Automation Cloud. This action reduces the need for storing the documents locally, having a robot installed on each human's operated machine, or having the robot wait for human users to finish validation.
More details here.
How to use the Validation Action?
Repeat steps 1 to 5 described in the above section.
Then, instead of using the Present Validation Station activity, use the Create Document Validation Action and Wait for Document Validation Action and Resume activities.
The below image shows the Create Document Validation Action activity and its properties.
This creates a document validation action in Action Center. The output of the Create Document Validation Action activity can then be used with the Wait for Document Validation Action and Resume activity to suspend and resume orchestration workflows upon human action completion in Action Center.
Updated about a month ago