- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Document Understanding Activities
Validation station
This page shows you how to create a workflow that includes activities such as Digitize Document, Data Extraction Scope, and Present Validation Station.
You can use these activities when you want to automate data extraction and validation from documents of the same type. Invoices or purchase orders are a great fit for these kind of tasks.
The following workflow focuses on using the Digitize Document activity on an invoice, followed by validating the information with the use of the Present Validation Station activity. The OCR engine chosen for this workflow is UiPath® Document OCR, but you can replace it with any other of our OCR engines. A simple taxonomy is used, created based on the chosen invoice document. Visit Taxonomy overview to check how to create your taxonomy.
- Open Studio and create a new Process named by default Main.
- Drag a Sequence container into the Workflow Designer.
- Select the Sequence container and create the following variable:
- Variable Name:
taxonomy
; - Variable Type: DocumentTaxonomy;
- Default Value: None.
- Variable Name:
- Add a Load Taxonomy activity inside the Sequence container.
Add the variable
taxonomy
in the Taxonomy field. - Add a For Each activity after the Load Taxonomy activity, and inside the
Sequence container.
- Add the expression
doc
in the ForEach field. - Add the expression
directory.GetFiles("TestData\InputDocs\")
in the In field. - In the Properties panel, select the option String from the TypeArgument dropdown list.
- Add the expression
- Select the Body container of the For Each activity and create the variables
showed in the following table:
Table 1. The variables to be created Variable Type
Default Value
docName
GenericValue
N/A dom
Document
N/A text
String
N/A extractionResults
ExtractionResult
N/A validatedResults
ExtractionResult
N/A - Add an Assign activity inside the Body container.
- Add the variable
docName
in the To field. - Add the expression
System.IO.Path.GetFileNameWithoutExtension(doc)
in the Value field.
- Add the variable
- Add a Write Line activity after the Assign activity.
Add the expression
"Digitizing "+docName
in the Text field. - Add a Digitize Document activity after the Write Line activity.
- Set the DocumentPath as
doc
. - Add the variable
text
in the DocumentText field. - Add the variable
dom
in the DocumentObjectModel field.
- Set the DocumentPath as
- Drag an OCR engine into the Digitize Document activity. UiPath Document OCR is used for this example.
- Add a Write Line activity after the Digitize Document activity.
Add the expression
docName+" was digitized."
in the Text field. - Add a Write Line activity after the Write Line activity.
Add the expression
"Opening the Validation Station"
in the Text field. - Add a Try Catch activity after the Write Line activity.
- Add a Sequence container in the Try section.
- Add a Present Validation Station activity inside the Sequence container.
- Add
doc
as value in the DocumentPath field. - Add the variable
text
in the DocumentText field. - Add the variable
dom
in the DocumentObjectMOdel field. - Add the variable
taxonomy
in the Taxonomy field. - Add the variable
extractedResults
in the AutomaticExtractionResults field. - Add the variable
validatedResults
in the ValidatedExtractionResults field.
- Add
- Add a Write Text File activity after the Present Validation Station activity.
- Run the process. The robot extracts data automatically, classifies the documents, extracts specific field, prepares the data for validation, and displays the extracted documents.
ZIP
archive of the example: Example.
Running the workflow opens the Validation Station wizard. Here you can verify the extracted information or extract it yourself by using the Tokens or Custom Area options. If you set a field in the taxonomy as multi-value, then multiple values can be extracted for that field. This can be useful for documents with multiple addresses, different currencies, etc.