AI Computer Vision
  • Release notes
    • 2023.10.2
    • 2023.10.1
  • Overview
    • Introduction
    • Model update resilience
  • Setup and configuration
  • Data storage
Banner background image
AI Computer Vision User Guide
Last updated Mar 11, 2024


AI Computer Vision is a machine-learning based method used to visually identify all the UI elements on a computer screen and interact with them via UiPath Robots, simulating human interaction. It doesn't require or use the underlying properties of applications, but only the aspect and relationship of various screen elements.

Rather than relying on selectors, AI Computer Vision uses AI (Object Detection, OCR, fuzzy text-matching, image-matching for icons) and an anchoring system to tie it all together. More exactly, to visually locate elements on the screen, AI Computer Vision performs an element detection (on the machine-learning server) and a text (OCR) detection, and combines these two into a full understanding of the UI. The relationship between elements detected with these two methods is then encoded into a multi-anchor descriptor, which uniquely identifies the targeted element.

AI Computer Vision is composed of a set of activities, that are part of the UI Automation activity package, as well as a server (which can be cloud, on-premises, or local) hosting an AI model, which is needed to perform the actual analysis of the UI you're automating. By default, our UiPath cloud server is used and also recommended for all Computer Vision and UI Automation activities. You can use Computer Vision cloud regardless of your deployment type. For instance, it does not matter if you have Orchestrator on-premises or Orchestrator cloud, you can run Computer Vision cloud with no special configuration required.

Alternatively, you can host and manage your own on-premises AI Computer Vision server and use it to run the AI Computer Vision activities. When using this type of server, you need to have your own hardware infrastructure (GPUs) or cloud environment. Also, you need to deploy, update, and maintain your own environment locally. Compared to the UiPath cloud server, you might also run into issues with backwards compatibility when upgrading the AI model. For further details on how to avoid this kind of issues, go to Model update resilience.

Local server is another flavour you can opt for. It runs on local CPU and it is the most portable version. However, it is slower and has a slightly lower detection accuracy.

Key benefits

Here are some features of AI Computer Vision you can benefit from:

  • Automation beyond selectors - Enable robots to recognize and interact with more on-screen fields and components - even Flash, Silverlight, PDFs, and images.
  • Reliable on VDIs and desktops - Relieves issues with failure-prone image automation techniques and with selector-based targeting on desktops. Start by creating automations within Citrix, VWware or Microsoft’s Remote Desktop.
  • Broad range of interface types - Includes VDI environments (Citrix, VMWare, Microsoft RDP, VNC, and others) for desktop and web applications. Save your time by getting UI elements identified and added to object repository for you.
  • Intelligent, intuitive capabilities - Provides details, validation, and notifications about on-screen selections via an on-screen wizard. Uses the recorder to easily generate full vision-based automations.
  • Run-time auto-scroll support - Easily automate scrollable content in webpages or apps using Computer Vision activities.
  • Cross-platform capabilities - Automate for Windows, Linux, Android and other operating systems through remote desktops.
  • Automation between VDI & non-VDI - Simplifies VDI-to-desktop automation by reducing necessary modifications.
  • Multiple deployment options - Deploys via SaaS; available on-premises for Linux and Windows, or right from your desktop.
  • Dynamic UI elements - Enables automations that include tables, drop-down lists, and checkbox elements. This increases the resilience of your automations, enabling them to adapt to small changes to the UI and interact with these dynamic elements.
  • Available in UI Automation as part of Unified Target - Reduces the complexity of building UI-based automations when you need both selectors and Computer Vision descriptors.

Deployment options

In the table below you can find a side-by-side comparison of our current Computer Vision deployment options.

 UiPath cloud serverOn-premises serverLocal serverComments
Model regression testingavailablenot availablenot availableEvery new model still detects all the design-time data its previous iteration was detecting so that running automation do not break.
Mock design-time data storageavailablenot availablenot availableThe model learns shapes and colors of UI elements, so using mock data with no sensitive information is recommended.
Runtime data storagenot availablenot availablenot availableRuntime production data (which could contain sensitive information) is never used or stored 0 it is only used as input for the AI model.
Hassle and cost free serveravailablenot availableavailableN/A
SpeedHigh (GPU)High (GPU)Slightly lower (CPU)The local server is a compressed version of the cloud model (less neurons) which might fit well light scenarios with more generic looking UI elements.
Vision accuracyHighHighSlightly lowerThe local server is a compressed version of the cloud model (less neurons) which might fit well light scenarios with more generic looking UI elements.
Free with an Enterprise licenseavailableavailableavailableN/A
  • Community: 30 MP/min
  • Enterprise: 240 MP/min
unlimitedunlimitedThe UiPath cloud server usage limit is designed to allow for very large headroom. It is very difficult to reach this limit even in the most intense usage scenarios.
  • Key benefits
  • Deployment options

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.