# Tesseract OCR

> `UiPath.Core.Activities.GoogleOCR`

`UiPath.Core.Activities.GoogleOCR`

Extracts a string and its information from an indicated UI element or image using Tesseract OCR Engine. It can be used with
other OCR activities, such as [Click OCR Text](https://docs.uipath.com/activities/other/latest/ui-automation/click-ocr-text), [Hover OCR Text](https://docs.uipath.com/activities/other/latest/ui-automation/hover-ocr-text), [Double Click OCR Text](https://docs.uipath.com/activities/other/latest/ui-automation/click-ocr-text), [Get OCR Text](https://docs.uipath.com/activities/other/latest/ui-automation/get-ocr-text), and [Find OCR Text Position](https://docs.uipath.com/activities/other/latest/ui-automation/find-ocr-text).

:::note
* The **Tesseract OCR** engine used in UiPath is updated to version 4.0. You can find more info on the capabilities of this engine [here](https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html).
* This activity does not support rotated documents, and thus, results are unpredictable on such documents.
:::

## Prerequisites

Before you run the Tesseract OCR Engine, install Visual C++ Redistributable x86 (32-bit) on the corresponding machine. Even
if your operating system is 64-bit (x64), you still need to install the x86 (32-bit) version. Visit [Visual C++ Redistributable latest supported downloads](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170) to select and install the version of Visual C++ that meets your requirements.

## Limitations

The Tesseract OCR Engine does not support Windows 10 N Edition, because this edition of Windows does not include the Media
Feature Pack. To ensure compatibility, we recommend you upgrade to Windows 10 Pro, which contains all required media components.

## Properties

### **Options**

* **AllowedCharacters** - The OCR engine extracts the given string according to the characters specified here.
* **DeniedCharacters** - The OCR engine extracts the given string without taking into account the characters specified here.
* **Invert** - If this check box is selected, the colors of the UI element are inverted before scraping. This is useful when the background
  is darker than the text color.
* **Language** - The language used by the OCR engine to extract the string from the UI element. For the Google OCR engine, this field needs
  to contain the language file prefix, such as “ron” for Romanian, “ita” for Italian, and “fra” for French. You can find the
  supported language prefixes on [this page](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages).
  :::note
  The language folder can be downloaded from [here](https://github.com/tesseract-ocr/tessdata) and added to your local folder `%UserProfile%\.nuget\packages\uipath.vision\3.x.x\build\net461\tessdata`. Once the language folder has been added locally, make sure to restart the **UiPath Studio** application.
  :::
* **ExtractWords** - If this check box is selected, the on-screen position of each detected word is extracted.
* **Profile** - Choose a preprocessing profile for the specified image or UI element to achieve a better OCR read. The following options
  are available: None - does not apply a preprocessing profile; Screen - preprocessing suitable for remote desktop applications;
  Scan - preprocessing suitable for scanned files; Legacy - uses the engine's default settings for preprocessing images, this
  is the default option.
* **Scale** - The scaling factor of the selected UI element or image. The higher the number is, the more you enlarge the image. This
  can provide a better OCR read and it is recommended with small images. If you want to scale down, values between 0 and 1 are
  also accepted. By default, the value is 2.
  
  :::important
  Large-size images may result in an error when the scaling factor is higher than 1.
  :::
### **Output**

* **Text** - The extracted string. This field supports only `string` variables.
* **Result** - The extracted words along with their on-screen position. This field supports only `KeyValuePair <rectangle,string>` variables.

### **Input**

* **Image** - The image that you want to process. This field supports only `image` variables.

### **Common**

* **DisplayName** - The display name of the activity.

### **Misc**

* **Private** - If selected, the values of variables and arguments are no longer logged at Verbose level.
