Subscribe

UiPath Activities

The UiPath Activities Guide

Digitize Document

UiPath.IntelligentOCR.Activities.Digitization.DigitizeDocument

Digitizes a document, extracting its Document Object Model (DOM) and text and storing them in their corresponding variable types.

📘

Important!

You must assign an OCR engine to this activity by dragging it into the body of the activity. The chosen OCR engine is to be used only if the incoming documents require OCR processing. The available OCR engines are Microsoft OCR, Google OCR, Abbyy OCR, OmniPage OCR, Microsoft Cloud OCR, Google Cloud OCR, or Abbyy Cloud OCR. The input and output parameters of the selected OCR engine are automatically set by the Digitize Document activity.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • DegreeOfParalelism - Specifies how many, if any, pages to be analyzed in parallel. The -1 value uses the "Number of Cores on the machine - 1" (meaning it will attempt to process as many pages in parallel as the number of cores - 1 value), while specifying a positive value uses that specific number of logical processors. By default, this property is set to 1.
  • DocumentPath - The file path of the document you want to digitize. This field supports only strings and String variables.
  • ForceApplyOCR - If selected, the OCR engine is applied to all pages of the document, even if they are native PDF files. The default value is False.

📘

Note:

The supported file types for this property field are .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.

Output

  • DocumentObjectModel - The Document Object Model (DOM) of the file, stored in a Document variable. This field supports only Document variables.
  • DocumentText - The text extracted from the specified document. This variable can be subsequently used in the Present Validation Station activity. This field supports only String variables.

Both these output variables, paired as they are dependent, can be used further in Document Processing throughout the entire Document Processing Framework (classification, data extraction, human validation, etc)

Document Object Model

The Document Object Model is captured in a proprietary object documented here.

Example of using the Digitize Document activity

You can see how the Digitize Document activity is used in an example that incorporates multiple activities.
You can check and download the example from here.

Updated 6 months ago


Digitize Document


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.