- Overview
- Getting started
- Activities
- Insights dashboards
- Document Understanding Process
- Quickstart tutorials
- Framework components
- ML packages
- Overview
- Document Understanding - ML package
- DocumentClassifier - ML package
- ML packages with OCR capabilities
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Other Out-of-the-box ML Packages
- Public endpoints
- Traffic limitations
- OCR Configuration
- Pipelines
- OCR services
- Supported languages
- Deep Learning
- Licensing
Document Understanding User Guide
Extracting data from Forms
The aim of this page is to help first time users get familiar with Document UnderstandingTM.
For scalable production deployments, we strongly recommend using the Document Understanding Process available in UiPath® Studio under the Templates section.
This quickstart guides you through the steps required to extract information from W-9 forms using the Intelligent Form Extractor. The W-9 forms are used as an example, but the procedure is similar for other types of documents where the data is structured.
Starting from scratch, these are the steps that need to be followed:
- Create a blank process
- Install the required activities packages
- Create a taxonomy
- Digitize the document
- Extract the data using the Intelligent Form Extractor
- Validate the results using Validation Station
- Export Extraction Results
Now, let us see every step in detail.
Launch UiPath Studio.
In the HOME backstage view, click Process to create a new project.
The New Blank Process window is displayed. In this window, enter a name for the new project. If you want, you can also add a description to sort through your projects more easily.
Click Create. The new project is opened in Studio.
From the Manage Packages button in the ribbon, besides the core activities packages (UiPath.Excel.Activities,UiPath.Mail.Activities,UiPath.System.Activities,UiPath.UIAutomation.Activities) that are added to the project by default, install the following activities packages:
Once Packages are installed, list out the required fields. We will be doing data extraction for the below fields:
- 1_Name -
Text
- 2_BusinessName -
Text
- 3a_Individual -
Boolean
- 3b_CCorp -
Boolean
- 3c_SCorp -
Boolean
- 3d_Partnership -
Boolean
- 3e_TrustEstate -
Boolean
- 3f_LLC -
Boolean
- 3f_LLCTaxClassification -
Boolean
- 3g_Other -
Boolean
- 3g_OtherDetail -
Boolean
- 5_Address -
Text
- 6_CityStateZip -
Text
- 7_AcctNumber -
Text
- TIN_SSN -
Text
- TIN_ETN -
Text
- Certification_Signature -
Boolean
- Certification_SignatureDate -
Date
Open Taxonomy Manager and create a group named Structured Documents, a category named Lending Forms, and a document type named W-9. Create above listed fields with user friendly names along with respective data types.
In the Main.xaml file, add a Load Taxonomy activity and create a variable for the taxonomy output.
Add a Digitize Document activity with UiPath Document OCR. Provide the input property Document Path and create output variables for Document Text and Document Object Model.
Remember to add the Document Understanding API Key in the UiPath Document OCR activity.
Add a Data Extraction Scope activity and fill in the properties.
Drag and drop the Intelligent Form Extractor within it. The endpoint should be auto-populated with the Intelligent Form Extractor endpoint, namely https://du.uipath.com/svc/intelligentforms. Provide the Document Understanding API key.
Once that is done, to create a new template, click Manage Templates > Create Template. A pop-up window opens.
Under Document Type, select the W-9 document type created earlier.
Under Document name, enter a name for your template.
Under Template document (native PDF if possible), attach a template document where you are going to map the field positions.
Under OCR Engine, select again the UiPath Document OCR. Just like before, the endpoint should be auto-populated, namely https://du.uipath.com/ocr, and you just need to provide the API Key.
Click Configure to move to the next step. The Template Manager pop-up window opens.
Here, we will need to select the areas where we want Intelligent Form Extractor to search for our fields. Configure them by following the steps detailed here. You also have the option of using anchors for your fields. More information on anchors here.
You should end up with something like this:
Click Save. In this screen, you can define the handwritten or signature fields, where applicable. You can also define synonyms for Boolean fields. Close the window after you are done.
The next step is to configure the extractor, which means having the Intelligent Form Extractor process all documents of type W-9.
To check the results through Validation Station, drag and drop the Present Validation Station activity and provide the input details.
DataSet
that contains multiple tables, which could then be written to an Excel file or be used directly in a downstream process.
Download this sample project to execute the W-9 with Intelligent Form Extractor workflow using this link.