Automation Suite - Step 3: Post-deployment steps

Validating the installation

To check if Automation Suite was installed successfully, you must go to the storage account, inside the flags container. The installation is complete if the contents of the auto-generated file called installResult (in the container) is successful. The contents will be failed if the installation failed.

Updating certificates

The installation process generates self-signed certificates on your behalf. These certificates are compliant with FIPS 140-2. The Azure deployment template also gives you the option to provide a CA-issued server certificate at installation time instead of using an auto-generated self-signed certificate.

Self-signed certificates will expire in 90 days, and you must replace them with certificates signed by a trusted CA as soon as installation completes. If you do not update the certificates, the installation will stop working after 90 days.

If you installed Automation Suite on a FIPS 140-2-enabled host and want to update the certificates, make sure they are FIPS 140-2-compatible.

For instructions, see Managing certificates.

Enabling FIPS 140-2

After completing an Automation Suite installation using the Azure deployment template, you can enable FIPS 140-2 on your machines. For instructions, see Security and compliance.

Exploring flags and logs

If you need more information on the Automation Suite installation process or other operations, a good place to start is the storage account used to store various flags and logs during cluster deployment and maintenance.

To locate the storage account, take the following steps:

Navigate to the resource group where the deployment was performed.
Filter by resource type Storage Account.
Locate the storage account whose name ends with st. For example:
Select the storage account, and then click Containers. You options are flags and logs.

Flags container

The flags container stores various flags or files needed for orchestration or just to report the status of various operations. On a new cluster, the flags container contents typically look as shown in the following example:

Files in the flags containers are used to orchestrate various operations, such as the Automation Suite installation process on the cluster, or specific cluster operations, such as Instance Refresh. For example:

uipath-server-000000.success denotes that the infrastructure installation was completed successfully on that specific node of the cluster;
installResult reads success if the overall installation is successful.

Logs container

When performing an operation, it typically produces a log file in the logs container. On a fresh cluster, the logs container contents typically look as shown in the following example:

Every file in the logs container represents the logs for a specific step of the installation process. For example:

infra-uipath-server-000000.log stores the infrastructure installation logs;
fabric.log stores the logs for the fabric installation;
services.log stores the logs for the application and services installation.

Accessing deployment outputs

Once the installation is complete, you need to access the Deployment Outputs in the Outputs tab.

To do that, go to your Resource Group, and then to Deployments → mainTemplate (or something like Microsoft.Template-DateTime) → Outputs.

Deployment outputs

Output	Description
Documentation	A link to the documentation.
URL	The Load Balancer URL. Can be used for direct access. If custom domains were enabled this is the domain that you would use for the CNAME binding.
KeyVaultURL	The Azure Portal URL for the Key Vault created by the deployment. It contains all the secrets (credentials) used in the deployment.
ArgoCDURL	The URL for accessing ArgoCD. This is available within the VNet. External access to this URL must be set up as described in: Step 4: Configuring the DNS.
ArgoCDPassword	The password used to log in to the ArgoCD portal.
HostAdminUsername and HostAdminPassword	The credentials used for Host Administration.
ClusterAdministrationURL	The URL for the Cluster Administration portal.
LonghornMonitoringURL	The URL to Longhorn monitoring tools.
GrafanaMonitoringURL	The URL to Grafana monitoring tools.
PrometheusMonitoringURL	The URL to Prometheus monitoring tools.
AlertmanagerMonitoringURL	The URL to Alertmanager monitoring tools.

All credentials used in the deployment are stored as secrets inside a Key Vault provisioned during the deployment. To access the secrets, filter the resources inside the Resource Group, search for Vault, and then click Secrets.

Note:

If you see the The operation “List” is not enabled in the key vault’s access policy warning under the Secrets tab, take the following steps:

Go to Access policies → Add access policy → Configure the template → Secret Management → Select Principal.
Select your user, then click Save.
Navigate back to Secrets. The warning should be gone, and the secrets should be visible.

Accessing cluster VMs

The VMs are provisioned inside a private VNet. You can access them through Azure Bastion by following these steps:

Navigate to the resource group where you have deployed Automation Suite.
Because agents, GPU agents, and server VMs are inside Scale Sets, you have to go to the Scale Set that contains your desired instance.
Go to the Instances section in the Settings tab.
Click the name of the VM you want to connect.
Click the Connect button, and then choose Bastion from the drop-down menu.
Enter the credentials provided in the deployment (Admin Username and Admin Password parameters, which you can find in the credentials keyvault, under Secrets) and click Connect.

DNS requirements

As mentioned in Step 1: Preparing your Azure Deployment, the Automation Suite Azure deployment creates a Load Balancer with a public IP and a DNS label associated. This DNS label is Microsoft-owned.

The deployment also provisions a Private DNS zone inside the cluster VNet and adds several records that are used during the installation and configuration process.

If you choose to connect from an external machine, you will not be able to use the private DNS zone to resolve the DNS for various services, so you need to add these records to your host file.

See Step 4: Configuring the DNS for more details.

You should now be able to connect to various services running on your cluster.

Accessing the cluster administration portal

The Cluster Administration portal is a centralized location where you can find all the resources required to complete an Automation Suite installation and perform common post-installation operations. For details, see Getting started with the Cluster Administration portal.

To access the Cluster Administration portal, take the following step:

Go to the following URL: https://${CONFIG_CLUSTER_FQDN}/uipath-management.

Note: You do not need any credentials to access the Cluster Administration portal.

Accessing Automation Suite general interface

The general-use Automation Suite user interface serves as a portal for both organization administrators and organization users. It is a common organization-level resource from where everyone can access all Automation Suite areas: administration pages, platform-level pages, service-specific pages, and user-specific pages.

To access Automation Suite, take the following steps:

Go to the following URL: https://${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under outputs.
Switch to the Default organization.
The username is orgadmin.
Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.

Accessing host administration

The host portal is where system administrators configure the Automation Suite instance. The settings configured from this portal are inherited by all your organizations, and some can be overwritten at the organization level.

To access host administration, take the following steps:

Go to the following URL: https://${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under Outputs.
Switch to the Host organization.
Enter the username you previously specified as a value for the UiPath Admin Username parameter.
Enter the password you previously specified as a value for the UiPath Admin Password parameter. Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.

Accessing ArgoCD

You can use the ArgoCD console to manage installed products.

To access ArgoCD, take the following steps:

Go to the following URL: https://alm.${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under Outputs. Note that you must configure external access to this URL as described in Step 4: Configuring the DNS.
The username is admin.
To access the password, go to the Outputs tab or the credential Keyvault.

Accessing the monitoring tools

To access the monitoring tools for the first time, log in as an admin with the following default credentials:

Username: admin
Password: to retrieve the password , run the following command:
```
kubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -dkubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -d
```

To update the default password used for accessing the monitoring tools, take the following steps:

Run the following command by replacing newpassword with your new password:

password="newpassword"
password=$(echo -n $password | base64)
kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]"password="newpassword"
password=$(echo -n $password | base64)
kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]"

Run the following command by replacing <cluster_config.json> with the path to your configuration file:

/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement

Scaling your cluster

Compute resources provisioned from the deployment consist of Azure Scale Sets, which allow for easy scaling.

You can manually add additional resources to a specific Scale Set, including adding server nodes, agent nodes, or specialized agent nodes (such as GPU nodes).

You can perform a manual scale by identifying the specific Scale Set and add resources directly.

To do so, take the following steps:

Go to the Azure Portal and filter on the specific Scale Set:
Select the appropriate Scale Set and click Scaling.
Modify the Instance count field either by using slider or the input field next to it, and then click Save.

Note: For Server Scale Sets, the instance count needs to be an odd number.
The scaling operation should start in the background, and new resources become available upon completion.

Completing an upgrade

After performing an Automation Suite cluster upgrade, Azure template deployments require some changes to ensure a new node joins the cluster correctly. To automate the changes, we recommend using the dedicated script. For instructions, see the Azure deployment template docs.

Azure VM lifecycle operations

Tip:

Azure allows a 15-minute window at most to prepare for shutdown, whereas the graceful termination of an Automation Suite node varies from 20 minute (for agent and GPU agent nodes) to hours (in the case of server nodes).

To avoid data loss, the server's VMSS upgrade policy is set to manual, and the server VMs have the protection for the scale set actions enabled. As a result, we recommend managing the servers lifecycle via the provided Runbooks.

The InstanceRefresh, RemoveNodes, RemoveServers, and CheckServerZoneResilience runbooks are supported only for multi-node HA-ready production deployments.

The number of servers after running any runbook must be odd and greater than three ( e.g., you cannot execute an Instance Refresh if you have 4 servers; you cannot remove a server if you have a total of five).

All the VMs in VMSSes should be in Running state.

Only one runbook must run at a time.

Important: The InstanceRefresh, RemoveNodes, and RemoveServers runbooks are affected by an issue causing node removal operations to fail if Process Mining (AirFlow) and/or Automation Suite Robots service pods are scheduled on the node.

Hybrid Workers

All our storage accounts and SQL servers have private endpoints. A Hybrid Worker group runs the existing automated operations so that they work without issues.

A Hybrid Worker is a VM that sits inside the VNET and on which the various automations will be run.

The VM is typically a Standard_D2s_v3 or a Standard_F2s_v2, depending on which you choose for your server VMs and whether quota permits. The VM is shut down when the deployment finishes to minimize costs.

Runbooks are split into two categories: regular runbooks and hybrid runbooks. You use the regular runbooks to start an operation and gather all the data. The regular runbook then starts the Hybrid Worker VM and the hybrid runbook, with the latter completing the operation.

When the operation is complete, you can turn off the Hybrid Worker VM to limit costs.

The following table describes the runbook breakdown:

Regular runbooks	Hybrid runbooks
AddGpuNode	HybridAddGpuNode
BackupCluster	HybridBackupCluster
GetAllBackups	HybridGetAllBackups
InstanceRefresh	HybridInstanceRefresh (+HybridCheckServerZoneRezilience)
RegisterAiCenterExternalOrchestrator	HybridRegisterAiCenterExternalOrchestrator
RemoveNodes	HybridRemoveNodes
RemoveServers	HybridRemoveServers
RestoreClusterInitialize	HybridRestoreClusterInitialize + HybridRestoreClusterSnapshot
ValidateFullInstall	Ran at the end of the deployment to validate the the full installation.

InstanceRefresh

Description

The InstanceRefresh runbook has the following use cases:

Update VMSS OS SKU on the server, agent, and GPU scale sets.
Perform a node rotation operation for one/more VMSSes.
Other VMSS configuration changes that were applied to the VMSS beforehand.

Usage

Go to the Azure Portal and search for the resource called InstanceRefresh.
Click the start button to open the parameter list. Complete the parameters considering:
- A node rotation operation is performed on a VMSS only if the parameter REFRESH<node_type> is set to True. If multiple REFRESH<node_type> parameters are set to True, the VMSS node rotation order will be Servers -> Agents -> GPU Agents.
- You must provide the NEWOSVERSION parameter to update VMSS OS SKU. You can find the available Azure Marketplace VMs image SKU using az vm image list-skus --location <deployment_location> --offer RHEL --publisher RedHat --output table. The current VMs are not automatically updated to the latest model (a node rotation operation is needed for that).
  
  Click the OK button to start the runbook.

Implementation details

The InstanceRefresh runbook is a wrapper for the RemoveNodes runbook. As a result, the status is tracked while running RemoveNodes. It updates all the VMSS OS versions (if needed) and extracts, based on the received parameters, the hostname for the node rotation operation and forwards them to the RemoveNodes. If the cluster has exactly three servers, the InstanceRefresh runbook creates three new servers; otherwise, RemoveNodes handles the scale-up to maintain at least one server in each Availability Zone at all times.

RemoveNodes

Description

The RemoveNodes runbook has the following use cases:

Remove the specified nodes from the Automation Suite cluster.
Perform a node rotation operation for one/two VMs.

Usage

Search the computer names of the nodes you want to remove. To do that, go to a VMSS, and click on Instances in the Settings section.
Go to the Azure Portal and search for the resource called RemoveNodes.
Click the start button to open the parameter list. Complete the parameters considering the following:

NODESTOBEREMOVEDCOMPUTERNAME is a comma-separated list of computer names of the VMs you want to delete (e.g., pxlqw-agent-000009,pxlqw-agent-00000A), and it is the only mandatory parameter. We recommend removing nodes from a single VMSS at a time.
ISINSTANCEREFRESH and THREESERVERSSCENARIO are flags populated by the InstanceRefresh wrapper.

Click the OK button to start the runbook.

Implementation details

The RemoveNodes runbook has a recursive approach to overcome the 3-hour fair share timeout. It removes or repaves the first or the first two nodes (the number is chosen in order to fulfill the odd number of servers constraint) from the received list and reruns another instance of the runbook with the remaining list.

The node repaving operation for a node requires taking the following steps:

Scale out the VMSS with one or two VMs based on the number of nodes that will be removed.
Perform the node removal for the old instances.

The node removal operation for a node requires taking the following steps:

Cordon and drain the instances. The operation times out after 20 minutes for an agent and number_of_instances * 60 minutes for servers.
Stop the rke service on the instances. The operation times out after 5 minutes.
Remove the nodes from the Automation Suite cluster and delete the VMs. The operation times out after 20 minutes for agents and number_of_instances * 60 minutes for servers.

RemoveServers

Description

The RemoveServers runbook has the following use case:

remove servers from the Automation Suite cluster.

Usage

Go to the Azure Portal and search for the resource called RemoveServers.
Click the start button to open the parameter list. Complete the parameters considering the following:

REMOVEDSERVERSCOUNT is the number of servers that will be removed. We recommend removing no more than 2 servers at a time in order not to hit the fair share timeout.

Implementation details

The RemoveServers runbook removes the number of servers received as a parameter from the Availability Zones with the most VMs.

CheckServerZoneResilience

Description

The CheckServerZoneResilience runbook scales out the server VMSS and uses the RemoveServers runbook to balance the servers across Availability Zones. This is part of the InstanceRefresh flow and should not be run manually.

AddGpuNode

Description

In the scenario where the initial deployment was created without a GPU node, we do create the VM Scale Set, but have a different SKU for to prevent zone/SKU availability issues. This runbook changes the SKU to a GPU SKU and adds a node.

Important: Do not scale the initial GPU VMSS created before running this runbook if the initial deployment was created without GPU nodes.

Usage

To use this runbook, take the following steps:

Navigate to the resource group where you deployed Automation Suite, then identify and click Automation Account.
Click Runbooks and then the AddGPUNode runbook.
Provide a name for SKU you want to have and click Start.

Parameters:

skuName – the SKU for the GPU nodes VMSS.

Allowed values:

Standard_NC8as_T4_v3
Standard_NC12s_v3
Standard_NC24s_v3

RegisterAiCenterExternalOrchestrator

Description

The runbook registers AI Center to the external Orchestrator provided at deployment time.

Usage

The document exposes a single mandatory parameter, IdentityToken, which is an installation access token generated by the external Identity service. Since the token has a short availability (approximately 1-2 hours), we recommend generating it just before running runbook. For instructions, see Installation key.

BackupCluster

Description

The BackupCluster runbook helps you back up you cluster.

Usage

Navigate to the resource group where you deployed Automation Suite, then identify and click the Automation Account.
Click Runbooks and then the BackupCluster runbook.
Provide a name for the backup you want to create.
To start a backup operation for the Automation Suite cluster, click the Start button at the top of the page.
Once the status of the runbook job is Completed, the backup operation is complete. If the status of the runbook job is Failed, you can check the logs in the storage account for more information.

GetAllBackups

Description

The GetAllBackups runbook helps you view a list of all available backups, both scheduled and manual.

RestoreClusterInitialize, RestoreSnapshot

Description

These runbooks help you perform a restore of the cluster.

Usage

Note: When starting the restore process, we put the cluster in maintenance mode. Once the restore process is successful, we take the cluster out of maintenance mode.

To perform a restore operation, take the following steps:

Identify the restore files you want to use. To do this, navigate to your Automation Suite deployment Automation Account, and run the GetAllBackups runbook.
When the runbook job is complete, check the bottom of the Output tab for a list of available backups. Select the one you want to use in the restore operation and copy it.
Navigate back to the Automation Account and run the RestoreClusterInitialize runbook. For the parameter, paste the name of the previously copied backup file. At this point, the restore process is started.
The RestoreSnapshot job is started automatically. When the job is done, the restore process is complete.

Note: Logs are present in the storage account (ending with st), in the backups container, under the restores/<backup-name>/ folder, where backup-name is the name of the backup used to perform the restore.
After a restore, you should confirm that the cluster is in a good state (see Validating the installation or any ArgoCD troubleshooting link). After this, you have the option to enable the backup on the cluster by running the RestoreClusterFinalize runbook with the same parameter as in the previous step. This enables backups for the cluster.

Troubleshooting

In case a VM fails to join the Automation Suite cluster, a rollback will be tried. The newly created VMs will follow the same steps as an usual node removal (cordon, drain, stop the rke service, remove the node from the cluster, and delete the VMs). You can find the logs from the joining node procedure in the storage account, inside the logs container, in blobs like infra-<hostname>.log.
In case of a failure while deleting nodes, any runbook will stop and display the logs for the step that failed. Fix the issue, complete the process manually or using the RemoveNodes runbook. You can find all the logs in the storage account, inside the logs container, as follows:
- Cordon and drain – <timestamp>-<runbook_abreviation>-drain_nodes.log
- Stop the rke service – <timestamp>-<runbook_abreviation>-stop_rke.log
- Remove the node from the cluster – <timestamp>-<runbook_abreviation>-remove_nodes.log
In case of a timeout, you should wait for the step to finish its execution, check the logs, and complete the process manually or using the RemoveNodes runbook. All runbooks use the Azure Run Command feature to execute code in the context of the VMs. One limitation of this method is that it does not return the status of the execution. Therefore, the steps for cordoning, draining, and stopping the rke service run asynchronous, and the status is kept with blobs in the following format: <timestamp>-<runbook_abreviation>-<step_name>.<success/fail>.

Step 3: Post-deployment steps

Validating the installation

Updating certificates

Enabling FIPS 140-2

Exploring flags and logs

Flags container

Logs container

Accessing deployment outputs

Deployment outputs

Accessing cluster VMs

DNS requirements

Accessing the cluster administration portal

Accessing Automation Suite general interface

Accessing host administration

Accessing ArgoCD

Accessing the monitoring tools

Scaling your cluster

Completing an upgrade

Azure VM lifecycle operations

Hybrid Workers

InstanceRefresh

Description

Usage

RemoveNodes

Description

Usage

Implementation details

RemoveServers

Description

Usage

CheckServerZoneResilience

Description

AddGpuNode

Description

Usage

RegisterAiCenterExternalOrchestrator

Description

Usage

BackupCluster

Description

Usage

GetAllBackups

Description

RestoreClusterInitialize, RestoreSnapshot

Description

Usage

Troubleshooting

Was this page helpful?