- Overview
- Requirements
- Installation
- Q&A: Deployment templates
- Configuring the machines
- Configuring the external objectstore
- Configuring an external Docker registry
- Configuring the load balancer
- Configuring the DNS
- Configuring Microsoft SQL Server
- Configuring the certificates
- Online multi-node HA-ready production installation
- Offline multi-node HA-ready production installation
- Disaster recovery - Installing the secondary cluster
- Downloading the installation packages
- install-uipath.sh parameters
- Enabling Redis High Availability Add-On for the cluster
- Document Understanding configuration file
- Adding a dedicated agent node with GPU support
- Adding a dedicated agent Node for Task Mining
- Connecting Task Mining application
- Adding a Dedicated Agent Node for Automation Suite Robots
- Post-installation
- Cluster administration
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bundle
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to manually set the ArgoCD log level to Info
- How to generate the encoded pull_secret_value for external registries
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- GPU node affected by resource unavailability
- Volume unable to mount due to not being ready for workloads
- Support bundle log collection failure
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to resize PVC
- Failure to resize objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Issues accessing the ArgoCD read-only account
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unhealthy services after cluster restore or rollback
- Pods stuck in Init:0/X
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Running High Availability with Process Mining
- Process Mining ingestion failed when logged in using Kerberos
- Unable to connect to AutomationSuite_ProcessMining_Warehouse database using a pyodbc format connection string
- Airflow installation fails with sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
- How to add an IP table rule to use SQL Server port 1433
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Step 3: Post-deployment steps
\
may not work as expected. To ensure new lines are interpreted correctly, use the console's clipboard widget.
installResult
(in the container) is successful
. The contents will be failed
if the installation failed.
The installation process generates self-signed certificates on your behalf. These certificates are compliant with FIPS 140-2. The Azure deployment template also gives you the option to provide a CA-issued server certificate at installation time instead of using an auto-generated self-signed certificate.
Self-signed certificates will expire in 90 days, and you must replace them with certificates signed by a trusted CA as soon as installation completes. If you do not update the certificates, the installation will stop working after 90 days.
If you installed Automation Suite on a FIPS 140-2-enabled host and want to update the certificates, make sure they are FIPS 140-2-compatible.
For instructions, see Managing certificates.
After completing an Automation Suite installation using the Azure deployment template, you can enable FIPS 140-2 on your machines. For instructions, see Security and compliance.
If you need more information on the Automation Suite installation process or other operations, a good place to start is the storage account used to store various flags and logs during cluster deployment and maintenance.
To locate the storage account, take the following steps:
The flags container stores various flags or files needed for orchestration or just to report the status of various operations. On a new cluster, the flags container contents typically look as shown in the following example:
Files in the flags containers are used to orchestrate various operations, such as the Automation Suite installation process on the cluster, or specific cluster operations, such as Instance Refresh. For example:
uipath-server-000000.success
denotes that the infrastructure installation was completed successfully on that specific node of the cluster;installResult
readssuccess
if the overall installation is successful.
When performing an operation, it typically produces a log file in the logs container. On a fresh cluster, the logs container contents typically look as shown in the following example:
Every file in the logs container represents the logs for a specific step of the installation process. For example:
infra-uipath-server-000000.log
stores the infrastructure installation logs;fabric.log
stores the logs for the fabric installation;services.log
stores the logs for the application and services installation.
Once the installation is complete, you need to access the Deployment Outputs in the Outputs tab.
DateTime
) → Outputs.
Output |
Description |
---|---|
Documentation |
A link to the documentation. |
URL |
The Load Balancer URL. Can be used for direct access. If custom domains were enabled this is the domain that you would use for the CNAME binding. |
KeyVaultURL |
The Azure Portal URL for the Key Vault created by the deployment. It contains all the secrets (credentials) used in the deployment. |
ArgoCDURL |
The URL for accessing ArgoCD. This is available within the VNet. External access to this URL must be set up as described in: Step 4: Configuring the DNS. |
ArgoCDPassword |
The password used to log in to the ArgoCD portal. |
HostAdminUsername and HostAdminPassword |
The credentials used for Host Administration. |
ClusterAdministrationURL |
The URL for the Cluster Administration portal. |
LonghornMonitoringURL | The URL to Longhorn monitoring tools. |
GrafanaMonitoringURL | The URL to Grafana monitoring tools. |
PrometheusMonitoringURL | The URL to Prometheus monitoring tools. |
AlertmanagerMonitoringURL | The URL to Alertmanager monitoring tools. |
All credentials used in the deployment are stored as secrets inside a Key Vault provisioned during the deployment. To access the secrets, filter the resources inside the Resource Group, search for Vault, and then click Secrets.
The operation “List” is not enabled in the key vault’s access policy
warning under the Secrets tab, take the following steps:
- Go to Access policies → Add access policy → Configure the template → Secret Management → Select Principal.
- Select your user, then click Save.
- Navigate back to Secrets. The warning should be gone, and the secrets should be visible.
The VMs are provisioned inside a private VNet. You can access them through Azure Bastion by following these steps:
As mentioned in Step 1: Preparing your Azure Deployment, the Automation Suite Azure deployment creates a Load Balancer with a public IP and a DNS label associated. This DNS label is Microsoft-owned.
The deployment also provisions a Private DNS zone inside the cluster VNet and adds several records that are used during the installation and configuration process.
If you choose to connect from an external machine, you will not be able to use the private DNS zone to resolve the DNS for various services, so you need to add these records to your host file.
See Step 4: Configuring the DNS for more details.
You should now be able to connect to various services running on your cluster.
The Cluster Administration portal is a centralized location where you can find all the resources required to complete an Automation Suite installation and perform common post-installation operations. For details, see Getting started with the Cluster Administration portal.
To access the Cluster Administration portal, take the following step:
https://${CONFIG_CLUSTER_FQDN}/uipath-management
.The general-use Automation Suite user interface serves as a portal for both organization administrators and organization users. It is a common organization-level resource from where everyone can access all Automation Suite areas: administration pages, platform-level pages, service-specific pages, and user-specific pages.
To access Automation Suite, take the following steps:
- Go to the following URL:
https://${Loadbalancer_dns}
, where<loadbalancer_dns>
is the DNS label for the load balancer and is found under outputs. - Switch to the Default organization.
- The username is orgadmin.
- Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.
The host portal is where system administrators configure the Automation Suite instance. The settings configured from this portal are inherited by all your organizations, and some can be overwritten at the organization level.
To access host administration, take the following steps:
- Go to the following URL:
https://${Loadbalancer_dns}
, where<loadbalancer_dns>
is the DNS label for the load balancer and is found under Outputs. - Switch to the Host organization.
- Enter the username you previously specified as a value for the UiPath Admin Username parameter.
- Enter the password you previously specified as a value for the UiPath Admin Password parameter. Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.
You can use the ArgoCD console to manage installed products.
To access ArgoCD, take the following steps:
- Go to the following URL:
https://alm.${Loadbalancer_dns}
, where<loadbalancer_dns>
is the DNS label for the load balancer and is found under Outputs. Note that you must configure external access to this URL as described in Step 4: Configuring the DNS. - The username is admin.
- To access the password, go to the Outputs tab or the credential Keyvault.
To access the monitoring tools for the first time, log in as an admin with the following default credentials:
- Username: admin
- Password: to retrieve the password , run the
following
command:
kubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -d
kubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -d
To update the default password used for accessing the monitoring tools, take the following steps:
-
Run the following command by replacing
newpassword
with your new password:password="newpassword" password=$(echo -n $password | base64) kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]"
password="newpassword" password=$(echo -n $password | base64) kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]" -
Run the following command by replacing
<cluster_config.json>
with the path to your configuration file:/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement
/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement
Compute resources provisioned from the deployment consist of Azure Scale Sets, which allow for easy scaling.
You can manually add additional resources to a specific Scale Set, including adding server nodes, agent nodes, or specialized agent nodes (such as GPU nodes).
You can perform a manual scale by identifying the specific Scale Set and add resources directly.
To do so, take the following steps:
After performing an Automation Suite cluster upgrade, Azure template deployments require some changes to ensure a new node joins the cluster correctly. To automate the changes, we recommend using the dedicated script. For instructions, see the Azure deployment template docs.
Azure allows a 15-minute window at most to prepare for shutdown, whereas the graceful termination of an Automation Suite node varies from 20 minute (for agent and GPU agent nodes) to hours (in the case of server nodes).
To avoid data loss, the server's VMSS upgrade policy is set to manual, and the server VMs have the protection for the scale set actions enabled. As a result, we recommend managing the servers lifecycle via the provided Runbooks.
The InstanceRefresh, RemoveNodes, RemoveServers, and CheckServerZoneResilience runbooks are supported only for multi-node HA-ready production deployments.
The number of servers after running any runbook must be odd and greater than three ( e.g., you cannot execute an Instance Refresh if you have 4 servers; you cannot remove a server if you have a total of five).
Running
state.
Only one runbook must run at a time.
All our storage accounts and SQL servers have private endpoints. A Hybrid Worker group runs the existing automated operations so that they work without issues.
A Hybrid Worker is a VM that sits inside the VNET and on which the various automations will be run.
The VM is typically a Standard_D2s_v3 or a Standard_F2s_v2, depending on which you choose for your server VMs and whether quota permits. The VM is shut down when the deployment finishes to minimize costs.
Runbooks are split into two categories: regular runbooks and hybrid runbooks. You use the regular runbooks to start an operation and gather all the data. The regular runbook then starts the Hybrid Worker VM and the hybrid runbook, with the latter completing the operation.
When the operation is complete, you can turn off the Hybrid Worker VM to limit costs.
The following table describes the runbook breakdown:
Regular runbooks |
Hybrid runbooks |
---|---|
AddGpuNode | HybridAddGpuNode |
BackupCluster | HybridBackupCluster |
GetAllBackups | HybridGetAllBackups |
InstanceRefresh | HybridInstanceRefresh (+HybridCheckServerZoneRezilience) |
RegisterAiCenterExternalOrchestrator | HybridRegisterAiCenterExternalOrchestrator |
RemoveNodes | HybridRemoveNodes |
RemoveServers | HybridRemoveServers |
RestoreClusterInitialize | HybridRestoreClusterInitialize + HybridRestoreClusterSnapshot |
ValidateFullInstall | Ran at the end of the deployment to validate the the full installation. |
Description
The InstanceRefresh runbook has the following use cases:
- Update VMSS OS SKU on the server, agent, and GPU scale sets.
- Perform a node rotation operation for one/more VMSSes.
- Other VMSS configuration changes that were applied to the VMSS beforehand.
Usage
Implementation details
The InstanceRefresh runbook is a wrapper for the RemoveNodes runbook. As a result, the status is tracked while running RemoveNodes. It updates all the VMSS OS versions (if needed) and extracts, based on the received parameters, the hostname for the node rotation operation and forwards them to the RemoveNodes. If the cluster has exactly three servers, the InstanceRefresh runbook creates three new servers; otherwise, RemoveNodes handles the scale-up to maintain at least one server in each Availability Zone at all times.
Description
The RemoveNodes runbook has the following use cases:
- Remove the specified nodes from the Automation Suite cluster.
- Perform a node rotation operation for one/two VMs.
Usage
NODESTOBEREMOVEDCOMPUTERNAME
is a comma-separated list of computer names of the VMs you want to delete (e.g.,pxlqw-agent-000009,pxlqw-agent-00000A
), and it is the only mandatory parameter. We recommend removing nodes from a single VMSS at a time.-
ISINSTANCEREFRESH
andTHREESERVERSSCENARIO
are flags populated by the InstanceRefresh wrapper.Click the OK button to start the runbook.
Implementation details
The RemoveNodes runbook has a recursive approach to overcome the 3-hour fair share timeout. It removes or repaves the first or the first two nodes (the number is chosen in order to fulfill the odd number of servers constraint) from the received list and reruns another instance of the runbook with the remaining list.
The node repaving operation for a node requires taking the following steps:
- Scale out the VMSS with one or two VMs based on the number of nodes that will be removed.
- Perform the node removal for the old instances.
The node removal operation for a node requires taking the following steps:
- Cordon and drain the instances. The operation times out after 20 minutes for an agent and
number_of_instances * 60
minutes for servers. - Stop the rke service on the instances. The operation times out after 5 minutes.
- Remove the nodes from the Automation Suite cluster and delete the VMs. The operation times out after 20 minutes for agents
and
number_of_instances * 60
minutes for servers.
Description
The RemoveServers runbook has the following use case:
- remove servers from the Automation Suite cluster.
Usage
- Go to the Azure Portal and search for the resource called RemoveServers.
- Click the start button to open the parameter list. Complete the parameters considering the following:
-
REMOVEDSERVERSCOUNT
is the number of servers that will be removed. We recommend removing no more than 2 servers at a time in order not to hit the fair share timeout.
Implementation details
The RemoveServers runbook removes the number of servers received as a parameter from the Availability Zones with the most VMs.
Description
The CheckServerZoneResilience runbook scales out the server VMSS and uses the RemoveServers runbook to balance the servers across Availability Zones. This is part of the InstanceRefresh flow and should not be run manually.
Description
In the scenario where the initial deployment was created without a GPU node, we do create the VM Scale Set, but have a different SKU for to prevent zone/SKU availability issues. This runbook changes the SKU to a GPU SKU and adds a node.
Usage
To use this runbook, take the following steps:
- Navigate to the resource group where you deployed Automation Suite, then identify and click Automation Account.
- Click Runbooks and then the AddGPUNode runbook.
- Provide a name for SKU you want to have and click Start.
Parameters:
skuName
– the SKU for the GPU nodes VMSS.
Allowed values:
Standard_NC8as_T4_v3
Standard_NC12s_v3
Standard_NC24s_v3
Description
The runbook registers AI Center to the external Orchestrator provided at deployment time.
Usage
IdentityToken
, which is an installation access token generated by the external Identity service. Since the token has a short availability
(approximately 1-2 hours), we recommend generating it just before running runbook. For instructions, see Installation key.
Description
The GetAllBackups runbook helps you view a list of all available backups, both scheduled and manual.
Description
These runbooks help you perform a restore of the cluster.
Usage
To perform a restore operation, take the following steps:
- In case a VM fails to join the Automation Suite cluster, a rollback will be tried. The newly created VMs will follow the same
steps as an usual node removal (cordon, drain, stop the rke service, remove the node from the cluster, and delete the VMs).
You can find the logs from the joining node procedure in the storage account, inside the logs container, in blobs like
infra-<hostname>.log
. -
In case of a failure while deleting nodes, any runbook will stop and display the logs for the step that failed. Fix the issue, complete the process manually or using the RemoveNodes runbook. You can find all the logs in the storage account, inside the logs container, as follows:
- Cordon and drain –
<timestamp>-<runbook_abreviation>-drain_nodes.log
- Stop the rke service –
<timestamp>-<runbook_abreviation>-stop_rke.log
- Remove the node from the cluster –
<timestamp>-<runbook_abreviation>-remove_nodes.log
- Cordon and drain –
- In case of a timeout, you should wait for the step to finish its execution, check the logs, and complete the process manually
or using the RemoveNodes runbook. All runbooks use the Azure Run Command feature to execute code in the context of the VMs. One limitation of this method is that it does not return the status of
the execution. Therefore, the steps for cordoning, draining, and stopping the rke service run asynchronous, and the status
is kept with blobs in the following format:
<timestamp>-<runbook_abreviation>-<step_name>.<success/fail>
.
- Validating the installation
- Updating certificates
- Enabling FIPS 140-2
- Exploring flags and logs
- Flags container
- Logs container
- Accessing deployment outputs
- Deployment outputs
- Accessing cluster VMs
- DNS requirements
- Accessing the cluster administration portal
- Accessing Automation Suite general interface
- Accessing host administration
- Accessing ArgoCD
- Accessing the monitoring tools
- Scaling your cluster
- Completing an upgrade
- Azure VM lifecycle operations
- Hybrid Workers
- InstanceRefresh
- RemoveNodes
- RemoveServers
- CheckServerZoneResilience
- AddGpuNode
- RegisterAiCenterExternalOrchestrator
- BackupCluster
- GetAllBackups
- RestoreClusterInitialize, RestoreSnapshot
- Troubleshooting