- Overview
- Document Processing Contracts
- Release notes
- About the Document Processing Contracts
- Box Class
- IPersistedActivity interface
- PrettyBoxConverter Class
- IClassifierActivity Interface
- IClassifierCapabilitiesProvider Interface
- ClassifierDocumentType Class
- ClassifierResult Class
- ClassifierCodeActivity Class
- ClassifierNativeActivity Class
- ClassifierAsyncCodeActivity Class
- ClassifierDocumentTypeCapability Class
- ExtractorAsyncCodeActivity Class
- ExtractorCodeActivity Class
- ExtractorDocumentType Class
- ExtractorDocumentTypeCapabilities Class
- ExtractorFieldCapability Class
- ExtractorNativeActivity Class
- ExtractorResult Class
- ICapabilitiesProvider Interface
- IExtractorActivity Interface
- ExtractorPayload Class
- DocumentActionPriority Enum
- DocumentActionData Class
- DocumentActionStatus Enum
- DocumentActionType Enum
- DocumentClassificationActionData Class
- DocumentValidationActionData Class
- UserData Class
- Document Class
- DocumentSplittingResult Class
- DomExtensions Class
- Page Class
- PageSection Class
- Polygon Class
- PolygonConverter Class
- Metadata Class
- WordGroup Class
- Word Class
- ProcessingSource Enum
- ResultsTableCell Class
- ResultsTableValue Class
- ResultsTableColumnInfo Class
- ResultsTable Class
- Rotation Enum
- SectionType Enum
- WordGroupType Enum
- IDocumentTextProjection Interface
- ClassificationResult Class
- ExtractionResult Class
- ResultsDocument Class
- ResultsDocumentBounds Class
- ResultsDataPoint Class
- ResultsValue Class
- ResultsContentReference Class
- ResultsValueTokens Class
- ResultsDerivedField Class
- ResultsDataSource Enum
- ResultConstants Class
- SimpleFieldValue Class
- TableFieldValue Class
- DocumentGroup Class
- DocumentTaxonomy Class
- DocumentType Class
- Field Class
- FieldType Enum
- LanguageInfo Class
- MetadataEntry Class
- TextType Enum
- TypeField Class
- ITrackingActivity Interface
- ITrainableActivity Interface
- ITrainableClassifierActivity Interface
- ITrainableExtractorActivity Interface
- TrainableClassifierAsyncCodeActivity Class
- TrainableClassifierCodeActivity Class
- TrainableClassifierNativeActivity Class
- TrainableExtractorAsyncCodeActivity Class
- TrainableExtractorCodeActivity Class
- TrainableExtractorNativeActivity Class
- Document Understanding Digitizer
- Document Understanding ML
- Document Understanding OCR Local Server
- Document Understanding
- Release notes
- About the Document Understanding activity package
- Project compatibility
- Set PDF Password
- Merge PDFs
- Get PDF Page Count
- Extract PDF Text
- Extract PDF Images
- Extract PDF Page Range
- Extract Document Data
- Create Validation Task and Wait
- Wait for Validation Task and Resume
- Create Validation Task
- Classify Document
- Create Classification Validation Task
- Create Classification Validation Task and Wait
- Wait for Classification Validation Task and Resume
- Intelligent OCR
- Release notes
- About the IntelligentOCR activity package
- Project compatibility
- Configuring Authentication
- Load Taxonomy
- Digitize Document
- Classify Document Scope
- Keyword Based Classifier
- Document Understanding Project Classifier
- Intelligent Keyword Classifier
- Create Document Classification Action
- Wait For Document Classification Action And Resume
- Train Classifiers Scope
- Keyword Based Classifier Trainer
- Intelligent Keyword Classifier Trainer
- Data Extraction Scope
- Document Understanding Project Extractor
- RegEx Based Extractor
- Form Extractor
- Intelligent Form Extractor
- Present Validation Station
- Create Document Validation Action
- Wait For Document Validation Action And Resume
- Train Extractors Scope
- Export Extraction Results
- ML Services
- OCR
- OCR Contracts
- Release notes
- About the OCR Contracts
- Project compatibility
- IOCRActivity Interface
- OCRAsyncCodeActivity Class
- OCRCodeActivity Class
- OCRNativeActivity Class
- Character Class
- OCRResult Class
- Word Class
- FontStyles Enum
- OCRRotation Enum
- OCRCapabilities Class
- OCRScrapeBase Class
- OCRScrapeFactory Class
- ScrapeControlBase Class
- ScrapeEngineUsages Enum
- ScrapeEngineBase
- ScrapeEngineFactory Class
- ScrapeEngineProvider Class
- OmniPage
- PDF
- [Unlisted] Abbyy
- [Unlisted] Abbyy Embedded
Data Extraction Scope
UiPath.IntelligentOCR.Activities.DataExtraction.DataExtractionScope
ExtractionResult
variable, containing all automatically extracted data, and can be
used as input for the Export Extraction Results activity.
This activity also features a Configure Extractors wizard,
which lets you specify exactly what fields from the document types
defined in the taxonomy you want to extract.
Designer panel
Input
- DocumentPath - The path to the document you want to
validate. This field supports only strings and String
variables.
Note: The supported file types for this property field are
.png
,.gif
,.jpe
,.jpg
,.jpeg
,.tiff
,.tif
,.bmp
, and.pdf
. - DocumentText - The text of the document itself, stored in
a String variable. This value can be retrieved from the
Digitize Document activity. Visit Digitize Document
for more information on how to achieve this. This field
supports only strings and
String
variables. - DocumentObjectModel - The Document Object Model you want
to use to validate the document against. This model is
stored in a
Document
variable and can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports onlyDocument
variables. - Taxonomy - The Taxonomy against which the document is to
be processed, stored in a
DocumentTaxonomy
variable. This object can be obtained by using a Load Taxonomy activity. This field supports onlyDocumentTaxonomy
variables. - ClassificationResults - The results of running a
classifier activity on the specified document, stored in a
ClassificationResult
object. This field is optional if you specify a DocumentTypeId instead. This field supports onlyClassificationResult
variables. - DocumentTypeID - The Document Type ID, as found in the
Taxonomy Manager. This field is optional if you specify a
file in the ClassificationResults field. This field
supports only strings and
String
variables.
Output
- ExtractionResults - The extraction results of the data
extraction process, stored in an
ExtractionResult
variable.Note: If the page range for data extraction indicates that only a part of the original file is targeted, the Data Extraction Scope generates a file in theTEMP
project folder that is then passed to the extractors. The temporary file contains only the page range that extractors should receive for document processing.
Properties panel
Authentication
The Authentication properties of this activity allow you to perform auto-validation via on-premises robots. Before configuring these properties, ensure you have fulfilled the prerequisites mentioned in the Configuring Authentication page . Once these steps are completed, you can then proceed to fill in the Authentication properties of the activity.
- Runtime
Credentials Asset - Use this field when you need
to access Document Understanding auto-validation features
while the robot is connected to a local Orchestrator, or
from a different tenant. You can choose to enter a
Credential Asset, for authentication purposes, in one of the
following ways:
- From the dropdown list, select the desired Credential Asset from the Orchestrator to which the UiPath® Robot is connected to.
- Manually enter the path to the Orchestrator
Credential Asset where you store the external
application credentials for accessing the
auto-validation features.
The format of the path should be:
<OrchestratorFolderName>/<AssetName>
.
- Runtime Tenant
Url - Use this field, alongside the Runtime
Credentials Asset field. Enter the URL of the
tenant that the robot will connect to in order to execute
the auto-validation. The URL should be in the following
format:
https://<baseURL>/<OrganizationName>/<TenantName>
.
Common
- DisplayName - The display name of the activity.
Input
- ApplyAutoValidation - Adjust confidence
using generative extraction cross-checking. Confidences for
reported values that are confirmed by generative AI will be
increased to 99%. Enabling this feature has additional AI
Unit consumption.
Important:
This feature is currently part of an audit process and is not to be considered part of the FedRAMP Authorization until the review is finalized. See here the full list of features currently under review.
- ClassificationResults - The results of running a classifier activity on the specified document, stored in a
ClassificationResult
object. This field is optional if you specify a DocumentTypeId instead. This field supports onlyClassificationResult
variables. - DocumentObjectModel - The Document Object
Model you want to use to validate the document against. This
model is stored in a
Document
variable and can be retrieved from the Digitize Document activity. Visit Digitize Document for more information on how to achieve this. This field supports onlyDocument
variables. - DocumentPath - The path to the document
you want to validate. This field supports only strings and
String variables.
Note: The supported file types for this property field are
.png
,.gif
,.jpe
,.jpg
,.jpeg
,.tiff
,.tif
,.bmp
, and.pdf
. - DocumentText - The text of the document
itself, stored in a String variable. This value can be
retrieved from the Digitize Document activity. Visit
Digitize Document
for more information on how to achieve this. This field
supports only strings and
String
variables. - DocumentTypeID - The Document Type ID, as
found in the Taxonomy Manager. This field is optional if you
specify a file in the ClassificationResults field.
This field supports only strings and
String
variables. - FormatValuesIfPossible - Specifies that if a value has derived parts reported, then it isn't overridden by the data extraction scope, but if it doesn't have derived parts, then the data extraction scope tries to compute it. If the option is set to False then the values are not formatted.
- AutoValidationConfidenceThreshold -
Confidence threshold for generative validation. Only field
values with confidence below this threshold will be
validated. If values are confirmed, the confidence of those
values will be set to this threshold.
Important:
This feature is currently part of an audit process and is not to be considered part of the FedRAMP Authorization until the review is finalized. See here the full list of features currently under review.
- Taxonomy - The Taxonomy against which the
document is to be processed, stored in a
DocumentTaxonomy
variable. This object can be obtained by using a Load Taxonomy activity. This field supports onlyDocumentTaxonomy
variables.
Misc
- Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Output
- ExtractionResults - The extraction results
of the data extraction process, stored in an
ExtractionResult
variable.Note: If the page range for data extraction indicates that only a part of the original file is targeted, the Data Extraction Scope generates a file in theTEMP
project folder that is then passed to the extractors. The temporary file contains only the page range that extractors should receive for document processing.
The Configure Extractors Wizard can be accessed via the Data Extraction Scope and allows you to choose which extractors are applied to each document type and field.
From the body of the activity, select Configure Extractors. The wizard button becomes available after dragging at least one extractor activity into the body of the Data Extraction Scope activity. This wizard displays all the document types defined in the taxonomy and their respective fields, and enables you to choose which extractor you want to use for each.
Each document type can be expanded and its fields can be viewed in the wizard and selected for extraction.
R2D2
and then you
can use the same alias for a Machine Learning Extractor
Trainer. This creates a link between the extractor and
the trainer and has training purposes for the extractor. Each
extractor has a unique alias while multiple trainers can share the
same alias.
Select Get of refresh extractor capabilities, for the extractors that support this functionality, to easily map your taxonomy fields with the available extractor fields or refresh them in case the extractor fields have changed.
The check boxes next to each field in any column, if selected, cause the Data Extractor Scope to request that particular field from the extractor. If the check box is unchecked, Data Extractor Scope does not request a value for that field from the extractor.
The text inputs next to each field enable you to map fields defined in your Taxonomy with the fields defined in the extractor's internal taxonomy, if any. For regular fields, add in the text input the identifier for target field from the extractor's internal taxonomy. For table fields, the parent table field is mapped at the table level, and the corresponding columns are mapped individually.
The number of columns in the wizard varies according to the number of extractors present in the scope activity. The name of each column is given by the display name of each extractor activity.
If multiple extractors are used in the activity, the order of the extractors in the scope defines their priority. For example, let's consider three extractors. Extractor 1 returns an acceptable value (which is above the Minimum Confidence level) for a particular requested field, then that field is not requested when Extractor 2 and Extractor 3 are executed. If Extractor 1 and Extractor 2 return values below the Minimum Confidence level for that particular field, or return nothing at all, the results from Extractor 3 are taken into account, if they satisfy the confidence acceptability conditions.
The Data Extraction Scope activity is part of the Document Understanding solutions. Visit the Document Understanding Guide for more information.