Automation Suite - Step 3: Post-deployment steps

Validating the installation

To check if Automation Suite was installed successfully, you must go to the storage account, inside the flags container. The installation is complete if the contents of the auto-generated file called installResult (in the container) is successful. The contents will be failed if the installation failed.

Updating Certificates

Important:

The installation process generates self-signed certificates on your behalf. However, the Azure deployment template also gives you the option to provide a CA-issued server certificate at installation time instead of using an auto-generated self-signed certificate.

Self-signed certificates will expire in 90 days, and you must replace them with certificates signed by a trusted CA as soon as installation completes. If you do not update the certificates, the installation will stop working after 90 days.

For instructions, see Managing certificates.

Exploring flags and logs

If you need more information on the Automation Suite installation process or other operations, a good place to start is the storage account used to store various flags and logs during cluster deployment and maintenance.

To locate the storage account, take the following steps:

Navigate to the resource group where the deployment was performed.
Filter by resource type Storage Account.
Locate the storage account whose name ends with st. For example:
Select the storage account, and then select Containers. You options are flags and logs.

Flags container

The flags container stores various flags or files needed for orchestration or just to report the status of various operations. On a new cluster, the flags container contents typically look as shown in the following example:

Files in the flags containers are used to orchestrate various operations, such as the Automation Suite installation process on the cluster, or specific cluster operations, such as Instance Refresh. For example:

uipath-server-000000.success denotes that the infrastructure installation was completed successfully on that specific node of the cluster;
installResult reads success if the overall installation is successful.

Logs container

When performing an operation, it typically produces a log file in the logs container. On a fresh cluster, the logs container contents typically look as shown in the following example:

Every file in the logs container represents the logs for a specific step of the installation process. For example:

infra-uipath-server-000000.log stores the infrastructure installation logs;
fabric.log stores the logs for the fabric installation;
services.log stores the logs for the application and services installation.

Accessing Deployment Outputs

Once the installation is complete, you need to access the Deployment Outputs in the Outputs tab.

To do that, go to your Resource Group, and then to Deployments → mainTemplate (or something like Microsoft.Template-DateTime) → Outputs.

Deployment Outputs

Output	Description
Documentation	A link to the documentation.
URL	The Load Balancer URL. Can be used for direct access. If custom domains were enabled this is the domain that you would use for the CNAME binding.
KeyVaultURL	The Azure Portal URL for the Key Vault created by the deployment. It contains all the secrets (credentials) used in the deployment.
ArgoCDURL	The URL for accessing ArgoCD. This is available within the VNet. External access to this URL must be set up as described in: Step 4: Configuring the DNS.
ArgoCDPassword	The password used to log in to the ArgoCD portal.
HostAdminUsername and HostAdminPassword	The credentials used for Host Administration.

All credentials used in the deployment are stored as secrets inside a Key Vault provisioned during the deployment. To access the secrets, filter the resources inside the Resource Group, search for Vault, and then click Secrets.

Note:

If you see the The operation “List” is not enabled in the key vault’s access policy warning under the Secrets tab, take the following steps:

Go to Access policies → Add access policy → Configure the template → Secret Management → Select Principal.
Select your user, then click Save.
Navigate back to Secrets. The warning should be gone, and the secrets should be visible.

Accessing cluster VMs

The VMs are provisioned inside a private VNet. You can access them through Azure Bastion by following these steps:

Navigate to the resource group where you have deployed Automation Suite.
Because agents, GPU agents, and server VMs are inside Scale Sets, you have to go to the Scale Set that contains your desired instance.
Go to the Instances section in the Settings tab.
Select the name of the VM you want to connect.
Select the Connect button, and then choose Bastion from the drop-down menu.
Enter the credentials provided in the deployment (Admin Username and Admin Password parameters, which you can find in the credentials keyvault, under Secrets) and select Connect.

DNS requirements

As mentioned in Step 1: Preparing your Azure Deployment, the Automation Suite Azure deployment creates a Load Balancer with a public IP and a DNS label associated. This DNS label is Microsoft-owned.

The deployment also provisions a Private DNS zone inside the cluster VNet and adds several records that are used during the installation and configuration process.

If you choose to connect from an external machine, you will not be able to use the private DNS zone to resolve the DNS for various services, so you need to add these records to your host file.

See Step 4: Configuring the DNS for more details.

You should now be able to connect to various services running on your cluster.

Accessing Automation Suite general interface

The general-use Automation Suite user interface serves as a portal for both organization administrators and organization users. It is a common organization-level resource from where everyone can access all Automation Suite areas: administration pages, platform-level pages, service-specific pages, and user-specific pages.

To access Automation Suite, take the following steps:

Go to the following URL: https://${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under outputs.
Switch to the Default organization.
The username is orgadmin.
Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.

Accessing Host Administration

The host portal is where system administrators configure the Automation Suite instance. The settings configured from this portal are inherited by all your organizations, and some can be overwritten at the organization level.

See Managing system administrators for more on host administrators.

See Interface tour for more on the host portal.

To access host administration, take the following steps:

Go to the following URL: https://${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under Outputs.
Switch to the Host organization.
Enter the username you previously specified as a value for the UiPath Admin Username parameter.
Enter the password you previously specified as a value for the UiPath Admin Password parameter. Retrieve the password by going to Keyvault,Secrets, and then Host Admin Password.

Accessing ArgoCD

You can use the ArgoCD console to manage installed products.

To access ArgoCD, take the following steps:

Go to the following URL: https://alm.${Loadbalancer_dns}, where <loadbalancer_dns> is the DNS label for the load balancer and is found under Outputs. Note that you must configure external access to this URL as described in Step 4: Configuring the DNS.
The username is admin.
To access the password, go to the Outputs tab or the credential Keyvault.

Accessing Rancher

Automation Suite uses Rancher to provide cluster management tools out of the box. This helps you manage the cluster and access monitoring and troubleshooting.

See Rancher documentation for more details.

See Using the monitoring stack for more on how to use Rancher monitoring in Automation Suite.

To access the Rancher console, take the following steps:

Go to the following URL: https://monitoring.${Loadbalancer_dns}, where <loadbalancerdns> is the DNS label of the load balancer and can be found among the Outputs of your deployment.
The username is admin.

To access the password, run the following below.

kubectl get secrets/rancher-admin-password -n cattle-system \
-o "jsonpath={.data['password']}" | echo $(base64 -d)kubectl get secrets/rancher-admin-password -n cattle-system \
-o "jsonpath={.data['password']}" | echo $(base64 -d)

Scaling your cluster

Compute resources provisioned from the deployment consist of Azure Scale Sets, which allow for easy scaling.

You can manually add additional resources to a specific Scale Set, including adding server nodes, agent nodes, or specialized agent nodes (such as GPU nodes).

You can perform a manual scale by identifying the specific Scale Set and add resources directly.

To do so, take the following steps:

Go to the Azure Portal and filter on the specific Scale Set:
Select the appropriate Scale Set and select Scaling.
Modify the Instance count field either by using slider or the input field next to it, and then select Save.

Note: For Server Scale Sets, the instance count needs to be an odd number.
The scaling operation should start in the background, and new resources become available upon completion.

Azure VM Lifecycle Operations

Important:

Azure allows a 15-minute window at most to prepare for shutdown, whereas the graceful termination of an Automation Suite node varies from 20 minute (for agent and GPU agent nodes) to hours (in the case of server nodes).

To avoid data loss, the server's VMSS upgrade policy is set to manual, and the server VMs have the protection for the scale set actions enabled. As a result, we recommend managing the servers lifecycle via the provided Runbooks.

The runbooks InstanceRefresh, RemoveNodes, RemoveServers, and CheckServerZoneResilience are supported only for multi-node HA-ready production deployments.

The number of servers after running any runbook must be odd and greater than three ( e.g., you cannot execute an Instance Refresh if you have 4 servers; you cannot remove a server if you have a total of five).

All the VMs in VMSSes should be in Running state.

Only one runbook must run at a time.

InstanceRefresh

Description

The InstanceRefresh runbook has the following use cases:

Update VMSS OS SKU on the server, agent, and GPU scale sets.
Perform a node rotation operation for one/more VMSSes.
Other VMSS configuration changes that were applied to the VMSS beforehand.

Usage

Go to the Azure Portal and search for the resource called InstanceRefresh.
Click the start button to open the parameter list. Complete the parameters considering:
- A node rotation operation is performed on a VMSS only if the parameter REFRESH<node_type> is set to True. If multiple REFRESH<node_type> parameters are set to True, the VMSS node rotation order will be Servers -> Agents -> GPU Agents.
- You must provide the NEWOSVERSION parameter to update VMSS OS SKU. You can find the available Azure Marketplace VMs image SKU using az vm image list-skus --location <deployment_location> --offer RHEL --publisher RedHat --output table. The current VMs are not automatically updated to the latest model (a node rotation operation is needed for that).
  
  Click the OK button to start the runbook.

Implementation details

The InstanceRefresh runbook is a wrapper for the RemoveNodes runbook. As a result, the status is tracked while runningRemoveNodes. It updates all the VMSS OS versions (if needed) and extracts, based on the received parameters, the hostname for the node rotation operation and forwards them to the RemoveNodes. If the cluster has exactly three servers, the InstanceRefresh runbook creates three new servers; otherwise, RemoveNodes handles the scale-up to maintain at least one server in each Availability Zone at all times.

RemoveNodes

Description

The RemoveNodes runbook has the following use cases:

Remove the specified nodes from the Automation Suite cluster.
Perform a node rotation operation for one/two VMs.

Usage

Search the computer names of the nodes you want to remove. To do that, go to a VMSS, and click on Instances in the Settings section.
Go to the Azure Portal and search for the resource called RemoveNodes.
Click the start button to open the parameter list. Complete the parameters considering the following:
- NODESTOBEREMOVEDCOMPUTERNAME is a comma-separated list of computer names of the VMs you want to delete (e.g., pxlqw-agent-000009,pxlqw-agent-00000A), and it is the only mandatory parameter. We recommend removing nodes from a single VMSS at a time.
- ISINSTANCEREFRESH and THREESERVERSSCENARIO are flags populated by the InstanceRefresh wrapper.
  
  Click the OK button to start the runbook.

Implementation Details

The RemoveNodes runbook has a recursive approach to overcome the 3-hour fair share timeout. It removes or repaves the first or the first two nodes (the number is chosen in order to fulfill the odd number of servers constraint) from the received list and reruns another instance of the runbook with the remaining list.

The node repaving operation for a node requires taking the following steps:

Scale out the VMSS with one or two VMs based on the number of nodes that will be removed.
Perform the node removal for the old instances.

The node removal operation for a node requires taking the following steps:

Cordon and drain the instances. The operation times out after 20 minutes for an agent and number_of_instances * 60 minutes for servers.
Stop the rke service on the instances. The operation times out after 5 minutes.
Remove the nodes from the Automation Suite cluster and delete the VMs. The operation times out after 20 minutes for agents and number_of_instances * 60 minutes for servers.

RemoveServers

Description

The RemoveServers runbook has the following use case:

remove servers from the Automation Suite cluster.

Usage

Go to the Azure Portal and search for the resource called RemoveServers .
Click the start button to open the parameter list. Complete the parameters considering the following:

REMOVEDSERVERSCOUNT is the number of servers that will be removed. We recommend removing no more than 2 servers at a time in order not to hit the fair share timeout.

Implementation details

The RemoveServers runbook removes the number of servers received as a parameter from the Availability Zones with the most VMs.

CheckServerZoneResilience

Description

The CheckServerZoneResilience runbook scales out the server VMSS and uses the RemoveServers runbook to balance the servers across Availability Zones. This is part of the InstanceRefresh flow and should not be run manually.

Troubleshooting

In case a VM fails to join the Automation Suite cluster, a rollback will be tried. The newly created VMs will follow the same steps as an usual node removal (cordon, drain, stop the rke service, remove the node from the cluster, and delete the VMs). You can find the logs from the joining node procedure in the storage account, inside the logs container, in blobs like infra-<hostname>.log.
In case of a failure while deleting nodes, any runbook will stop and display the logs for the step that failed. Fix the issue, complete the process manually or using the RemoveNodes runbook. You can find all the logs in the storage account, inside the logs container, as follows:
- Cordon and drain – <timestamp>-<runbook_abreviation>-drain_nodes.log
- Stop the rke service – <timestamp>-<runbook_abreviation>-stop_rke.log
- Remove the node from the cluster – <timestamp>-<runbook_abreviation>-remove_nodes.log
In case of a timeout, you should wait for the step to finish its execution, check the logs, and complete the process manually or using the RemoveNodes runbook. All runbooks use the Azure Run Command feature to execute code in the context of the VMs. One limitation of this method is that it does not return the status of the execution. Therefore, the steps for cordoning, draining, and stopping the rke service run asynchronous, and the status is kept with blobs in the following format: <timestamp>-<runbook_abreviation>-<step_name>.<success/fail>.

Automation Suite installation guide

Step 3: Post-deployment steps

Validating the installation

Updating Certificates

Exploring flags and logs

Flags container

Logs container

Accessing Deployment Outputs

Deployment Outputs

Accessing cluster VMs

DNS requirements

Accessing Automation Suite general interface

Accessing Host Administration

Accessing ArgoCD

Accessing Rancher

Scaling your cluster

Azure VM Lifecycle Operations

InstanceRefresh

Description

Usage

RemoveNodes

Description

Usage

Implementation Details

RemoveServers

Description

Usage

CheckServerZoneResilience

Description

Troubleshooting

Was this page helpful?