- Getting Started
- Framework Components
- Data Extraction Overview
- Configure Extractors Wizard of Data Extraction Scope
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Machine Learning Extractor
- FlexiCapture Extractor
- Data Extraction Related Activities
- ML Packages
- Pipelines
- Data Manager
- OCR Services
- Document Understanding deployed in Automation Suite
- Document Understanding deployed in AI Center standalone
- Deep Learning
- Licensing
- References
- UiPath.Abbyy.Activities
- UiPath.AbbyyEmbedded.Activities
- UiPath.DocumentUnderstanding.ML.Activities
- UiPath.DocumentUnderstanding.OCR.LocalServer.Activities
- UiPath.IntelligentOCR.Activities
- UiPath.OCR.Activities
- UiPath.OCR.Contracts
- UiPath.DocumentProcessing.Contracts
- UiPath.OmniPage.Activities
- UiPath.PDF.Activities
Document Understanding User Guide
Configure Extractors Wizard of Data Extraction Scope
The Configure Extractors Wizard accessed via the Data Extraction Scope allows you to choose which extractors are applied to each document type and field.
It can be opened from the body of the activity, by clicking on the Configure Extractors button. The wizard button becomes available after dragging at least one extractor activity into the body of the Data Extraction Scope activity. This wizard displays all the document types defined in the taxonomy and their respective fields and enables you to choose which extractor you want to use for each.
Each document type can be expanded and its fields can be viewed in the wizard and selected for extraction.
The Framework Alias field can be used to map an extractor to one or more trainers. For instance, you can give a Machine Learning Extractor the alias R2D2 and then you can use the same alias for a Machine Learning Extractor Trainer. This creates a link between the extractor and the trainer and has training purposes for the extractor. Each extractor has a unique alias while multiple trainers can share the same alias.
The Minimum Confidence field can be configured with a value between 0 and 100 and represents the confidence threshold above which extracted data is taken into account. If a result of a selected field has a confidence level below the confidence threshold, it is not reported in the final result.
The Get of refresh extractor capabilities button, for the extractors that support this functionality, can be used to easily map your taxonomy fields with the available extractor fields or refresh them in case the extractor fields have changed.
The checkboxes next to each field in any column, if selected, cause the extractor to be asked for a value for the specified field. If cleared, the field is ignored when extracting data.
The text fields next to each document field enable you to map fields defined in your Taxonomy with the fields defined in the extractor's internal taxonomy if any.
The number of columns in the wizard varies according to the number of extractors present in the scope activity. The name of each column is given by the display name of each extractor activity.
If multiple extractors are used in the activity, the order of the extractors in the scope defines their priority. For example, in the image above, if Extractor 1 returns an acceptable value (which is above the Minimum Confidence level) for a particular requested field, then that field is not requested when Extractor 2 and Extractor 3 are executed. If Extractor 1 and Extractor 2 return values below the Minimum Confidence level for that particular field, or return nothing at all, the results from Extractor 3 are taken into account, if they satisfy the confidence acceptability conditions.