studio
2022.4
false
UiPath logo, featuring letters U and I in white

Studio User Guide

Last updated Dec 19, 2024

OCR Activities

In some situations, certain applications are not compatible with the usage of normal scraping or UI automation technologies. Activities in Studio which use OCR technology scan the entire screen of the machine, finding all the characters that are displayed. This enables the user to create automations based on what can be seen on the screen, simplifying automation in virtual machine environments. Citrix and other remote desktop utilities are usually the target of OCR-based activities, as they only stream an image of the desktop to the user, which means normal UI selectors are impossible to find.

Note: A best practice in creating automations is using the Recording Wizard to create the project, automatically generating selectors, and then tweaking the activities to best fit your needs.

Click OCR Text and Hover OCR Text use OCR to scan the screen of the machine for text and perform actions relative to it. If graphic elements change, but the text does not, automations created using text recognition will usually still work. These are very useful activities in automating basic actions in virtual machine environments. As input, these activities receive a Target, which can be either a string variable, a Region variable, a UIElement variable or a selector, which indicate the coordinates where the action must be performed. The target can also be automatically generated by using the Indicate on Screen functionality, which tries to identify UI elements in the indicated region, and generates selectors for them. If this does not work for you, then manual intervention might be required.

Get OCR Text extracts a string and its information from an indicated UI element using the OCR screen scraping method. This activity can also be automatically generated when performing screen scraping, along with a container. By default, the Google OCR engine is used, but you can easily change it with Abbyy or Microsoft. There are some differences between these OCR engines, as explained here, making them fit for different situations. As input, this activity receives a Target, which can be either a Region variable, a UiElement variable or a selector, that helps you identify what you want to automate and where the actions must be performed. The target can also be automatically generated by using the Indicate on Screen functionality, which tries to identify UI elements in the indicated region, and generates selectors for them. If this does not work for you, then manual intervention might be required. This activity returns a string variable containing the text found in the UI element, and a TextInfo variable that contains the screen coordinates of all the found words.

Find OCR Text Position searches for a given string in an UI element, and returns a UIElement variable which contains the said string. This activity can be useful in locating UI elements relative to text on the screen. As input, this activity receives a string which contains the text to be searched for, and a Target, which can be either a Region variable, a UiElement variable or a selector, that helps you identify what you want to automate and where the actions must be performed. The target can also be automatically generated by using the Indicate on Screen functionality, which tries to identify UI elements in the indicated region, and generates selectors for them. If this does not work for you, then manual intervention might be required. This activity returns a UiElement variable that contains the position where the text was found.

OCR Text Exists checks if a text is found in a given UI element by using OCR technology and returns a boolean variable that is true if the text exists and false otherwise. This activity is useful in all types of text-based automation, as it enables you to make decisions based on whether or not a given string is displayed, or it can be used to perform certain actions in a loop, by using it as a Condition in the Retry Scope activity. As input, this activity receives a string which contains the text that is to be searched for, and a Target, which can be either a Region variable, a UiElement variable or a selector, that helps you identify what you want to automate and where the actions must be performed. The target can also be automatically generated by using the Indicate on Screen functionality, which tries to identify UI elements in the indicated region, and generates selectors for them. If this does not work for you, then manual intervention might be required. This activity returns a boolean variable that states whether the text was found or not.

OCR Engines, such as Google OCR, Google Cloud OCR, Microsoft OCR, Microsoft Cloud OCR and Abbyy Cloud OCR are also available as separate activities. These activities extract a string and its position from a provided image by using different OCR engines. These activities can be used with other OCR activities (Click OCR Text, Hover OCR Text, Get OCR Text, Find OCR Text Position). As input, these activities receive an Image variable that contains the image file to be scanned. As output, the activities return an IEnumerable<KeyValuePair<Rectangle,String>> variable, which contains the extracted text and their on-screen coordinates, and a string variable which contains the extracted text.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2025 UiPath. All rights reserved.