- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Managing products
- Backing up and Restoring the Cluster
- Monitoring and alerting
- Migration and upgrade
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to disable TLS 1.0 and 1.1
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to debug failed Automation Suite installations
- How to disable TX checksum offloading
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- SQL connection string validation error
- Failure After Certificate Update
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Cannot Log in After Migration
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- ArgoCD goes into progressing state after first installation
- Unexpected Inconsistency; Run Fsck Manually
- Missing Self-heal-operator and Sf-k8-utils Repo
- Degraded MongoDB or Business Applications After Cluster Restore
- Unhealthy Services After Cluster Restore or Rollback
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite support bundle
- Exploring Logs
Backing up and Restoring the Cluster
In order to use backup and restore functionality, you need to enable an NFS Server, the Backup Cluster, and the Restore Cluster. All three are defined below.
The NFS Server is the server that stores the backup data and facilitates the restoration. You can set up the NFS server on any machine or a PaaS service offered by cloud providers. Note that we do not support Windows-based NFS and Azure blob-based NFS.
The Backup Cluster is where the Automation Suite is installed. This refers to the cluster you set up during installation.
The Restore Cluster is the cluster where you would like to restore all the data from the Backup Cluster. This cluster becomes the new cluster where you run the Automation Suite after the restoration is complete.
The following steps show how to set all three up.
- This step will not enable a backup for any external datasource backup (SQL Server). You need to enable the external data source backup separately.
- We do not support cross-zone backup and restore.
- The NFS Server should be reachable from all cluster nodes (both backup and restore clusters).
- The cluster you want to back up and the NFS server must be in the same region.
- Before the cluster restore, make sure to disable the backup as described in Disabling the cluster backup
-
Make sure to enable the following ports:
Port
Protocol
Source
Destination
Purpose
Requirements
2049
,111
TCP
NFS Server
All nodes in backup cluster
Data sync between backup cluster and NFS Server
This communication should be allowed from the NFS Server to the backup cluster node before running Step 2: Enabling the cluster backup.
2049
,111
TCP
All nodes in backup cluster
NFS Server
Data sync between backup cluster and NFS Server
This communication should be allowed from the backup cluster node to the NFS Server before running Step 2: Enabling the cluster backup.
2049
,111
TCP
NFS Server
All nodes in restore cluster
Data sync between NFS Server and restore cluster
This communication should be allowed from the NFS Server to restore the cluster node before running Step 3: Setting up the Restore Cluster.
2049
,111
TCP
All nodes in restore cluster
NFS Server
Data sync between backup cluster and NFS Server
This communication should be allowed from the NFS Server to the backup cluster node before running Step 3: Setting up the Restore Cluster.
The NFS Server must meet the following requirements:
-
You can set up the NFS server on any machine and any OS of your choice or alternatively use any PaaS service offered by cloud providers. Note that we do not support Windows-based NFS and Azure blob-based NFS.
-
The NFS Server version must be NFSv4 on Linux.
-
The NFS Server must run outside the Backup Cluster and the Restore Cluster.
-
The NFS Server disk size must be greater than the data disk size of the primary server node.
See Hardware requirements for more details.
nfs-utils
library on the node you plan to use as the NFS Server.
dnf install nfs-utils -y
systemctl start nfs-server.service
systemctl enable nfs-server.service
dnf install nfs-utils -y
systemctl start nfs-server.service
systemctl enable nfs-server.service
Configure the mount path that you want to expose from the NFS Server.
chown -R nobody: "/datadisk"
chmod -R 777 "/datadisk"
systemctl restart nfs-utils.service
chown -R nobody: "/datadisk"
chmod -R 777 "/datadisk"
systemctl restart nfs-utils.service
Firewalld is a security library that manages networking and firewall rules.
See official Firewalld documentation for more details.
To disable Firewalld, run the following command.
systemctl stop firewalld
systemctl disable firewalld
systemctl stop firewalld
systemctl disable firewalld
/etc/exports
file, and add an entry for the FQDN for each node (both server and agent) for both the Backup Cluster and the Restore Cluster.
Below is an example of how to add an entry, where the entry below specifies the FQDN of a machine and the corresponding permissions on that machine:
echo "/datadisk sfdev1868610-d053997f-node.eastus.cloudapp.azure.com(rw,sync,no_all_squash,root_squash)" >> /etc/exports
echo "/datadisk sfdev1868610-d053997f-node.eastus.cloudapp.azure.com(rw,sync,no_all_squash,root_squash)" >> /etc/exports
Then run the following command to export mount path:
exportfs -arv
exportfs -s
exportfs -arv
exportfs -s
- Make sure you have followed the Environment prerequisites step.
- Make sure to back up the
cluster_config.json
file used for installation. - This step will not enable the backup for any external datasource backup (such as the SQL Server). You need to enable external data source backup separately.
- It is not recommended to reduce the backup interval to less than 15 minutes.
- Automation Suite does not make a backup of all the Persistent Volumes, such as the volumes attached to the training pipeline
in AI Center. A backup is created only for a few Persistent Volumes such as
Alert Manager
,Prometheus
,Docker Registry
,MongoDB
,RabbitMQ
,Ceph Objectstore
, andInsights
.
backup.json
. Make sure to fill it out based on the field definitions below.
{
"backup": {
"etcdBackupPath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"backup_interval": "15"
}
{
"backup": {
"etcdBackupPath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"backup_interval": "15"
}
backup.etcdBackupPath
— Relative path where the backup data will be stored on the NFS Serverbackup.nfs.endpoint
— Endpoint of the NFS Server (IP address or DNS name)backup.nfs.mountpath
— Path on the NFS Server (endpoint)backup_interval
— The backup time interval in minutes.
/datadisk/backup/cluster0
on the NFS server:
{
"backup": {
"etcdBackupPath": "cluster0",
"nfs": {
"endpoint": "20.224.01.66",
"mountpath": "/datadisk"
}
}
}
{
"backup": {
"etcdBackupPath": "cluster0",
"nfs": {
"endpoint": "20.224.01.66",
"mountpath": "/datadisk"
}
}
}
To enable the backup on the primary node of the cluster, run the following command:
./install-uipath.sh -i backup.json -o output.json -b --accept-license-agreement
./install-uipath.sh -i backup.json -o output.json -b --accept-license-agreement
To enable the backup on secondary nodes of the cluster, run the following command on the agent node:
./install-uipath.sh -i backup.json -o output.json -b -j server --accept-license-agreement
./install-uipath.sh -i backup.json -o output.json -b -j server --accept-license-agreement
To enable the backup on agent nodes of the cluster, run the following command:
./install-uipath.sh -i backup.json -o output.json -b -j agent --accept-license-agreement
./install-uipath.sh -i backup.json -o output.json -b -j agent --accept-license-agreement
- Make sure the backup is disabled before restoring the cluster. See Disabling the cluster backup.
- Make sure package wget, unzip, jq are availabe on all restore nodes.
- Make sure you have followed the Environment prerequisites step.
- All external datasource source should be the same (SQL Server).
- Restart the NFS Server before cluster restoration. Execute the following command on the NFS Server node:
systemctl restart nfs-server
.
- The Restore Cluster should have the same
fqdn
as the Backup Cluster. - The Restore Cluster should have the same number of
server
andagent
nodes as that of Backup Cluster. -
The Restore Cluster should have the same
server
andagent
nodes resources as the Backup Cluster, as shown below:- Hardware configuration for CPU
- Hardware configuration for Memory
- Hardware configuration for Disk Space
-
Node Hostname
Installation type
Installation instructions
Requirements
Online single-node evaluation mode
Download onlysf-installer
zip and providechmod -R 755 <sf_installer_folder>
to extracted folder.chmod -R 755 <sf_installer_folder>
to extracted folder.Offline single-node evaluation mode
Download onlysf-installer
zip andsf-infra-bundle.tar.gz
.chmod -R 755 <sf_installer_folder>
to extracted folder.Online multi-node HA-ready production mode
Download onlysf-installer
zip and providechmod -R 755 <sf_installer_folder>
to extracted folder.Offline multi-node HA-ready production mode
Download onlysf-installer
zip andsf-infra-bundle.tar.gz
.chmod -R 755 <sf_installer_folder>
to extracted folder.Create a file and name itrestore.json
. Make sure to fill it out based on the following field definitions.
{
"fixed_rke_address": "PLACEHOLDER",
"gpu_support": false,
"fqdn": "PLACEHOLDER",
"rke_token": "PLACEHOLDER",
"restore": {
"etcdRestorePath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"infra": {
"docker_registry": {
"username": "PLACEHOLDER",
"password": "PLACEHOLDER"
}
}
}
{
"fixed_rke_address": "PLACEHOLDER",
"gpu_support": false,
"fqdn": "PLACEHOLDER",
"rke_token": "PLACEHOLDER",
"restore": {
"etcdRestorePath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"infra": {
"docker_registry": {
"username": "PLACEHOLDER",
"password": "PLACEHOLDER"
}
}
}
fqdn
— The load balancer FQDN for the multi-node HA-ready production mode or the machine FQDN for the single-node evaluation modefixed_rke_address
— The fqdn of the load balancer if one is configured, otherwise it is the fqdn of the first restore server node. Used to load balance node registration and kube API request.gpu_support
— Usetrue
orfalse
to enable or disable GPU support for the cluster (use if you have agent nodes with GPUs).rke_token
— This is a pre-shared, cluster-specific secret. This should be the same as Backup Cluster and can be found in thecluster_config.json
file. It is needed for all the nodes joining the cluster.restore.etcdRestorePath
— Path where backup data is stored for the cluster in NFS Server. Configured at Backup withetcdBackupPath
.restore.nfs.endpoint
— Endpoint of NFS Server.restore.nfs.mountpath
: Mount Path of NFS Server.infra.docker_registry.username
— The username that you have set in the Backup Cluster. It can be found in thecluster_config.json
file and is needed for the docker registry.infra.docker_registry.password
— The password that you have set in the Backup Cluster. It can be found in thecluster_config.json
file and is needed for the docker registry installation.
Step 3.1: Restoring etcd
on the primary node of the cluster
etcd
on the primary node of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r --accept-license-agreement --install-type online
./install-uipath.sh -i restore.json -o output.json -r --accept-license-agreement --install-type online
Step 3.2: Restoring etcd
on secondary nodes of the cluster
etcd
on secondary nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j server --accept-license-agreement --install-type online
./install-uipath.sh -i restore.json -o output.json -r -j server --accept-license-agreement --install-type online
Step 3.3: Restoring etcd
on agent nodes of the cluster
etcd
on agent nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j agent --accept-license-agreement --install-type online
./install-uipath.sh -i restore.json -o output.json -r -j agent --accept-license-agreement --install-type online
Step 3.4: Disabling maintenance mode
etcd
restore is complete, make sure you disable the maintenance mode:
/path/to/old-installer/configureUiPathAS.sh disable-maintenance-mode
/path/to/old-installer/configureUiPathAS.sh disable-maintenance-mode
To verify the maintenance mode is disabled, run the following command:
/path/to/old-installer/configureUiPathAS.sh is-maintenance-enabled
/path/to/old-installer/configureUiPathAS.sh is-maintenance-enabled
Step 3.5: Running volume restore on primary node
etcd
restore is complete, run volume restore on the primary node using the following command:
./install-uipath.sh -i restore.json -o output.json -r --volume-restore --accept-license-agreement --install-type online
./install-uipath.sh -i restore.json -o output.json -r --volume-restore --accept-license-agreement --install-type online
Step 3.6: Installing the Automation Suite cluster certificate on the restore primary node
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
Enabling AI Center on the Restored Cluster
After restoring an Automation Suite cluster with AI Center™ enabled, follow the steps from the Enabling AI Center on the Restored Cluster procedure.
Step 3.1: Restoring etcd
on the primary node of the cluster
etcd
on primary node of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
./install-uipath.sh -i restore.json -o output.json -r --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.2: Restoring etcd
on secondary nodes of the cluster
./install-uipath.sh -i restore.json -o output.json -r -j server --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
./install-uipath.sh -i restore.json -o output.json -r -j server --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.3: Restoring etcd
on agent nodes of the cluster
etcd
on agent nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j agent --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
./install-uipath.sh -i restore.json -o output.json -r -j agent --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.4: Disabling maintenance mode
etcd
restore is complete, make sure you disable the maintenance mode:
/path/to/old-installer/configureUiPathAS.sh disable-maintenance-mode
/path/to/old-installer/configureUiPathAS.sh disable-maintenance-mode
To verify the maintenance mode is disabled, run the following command:
/path/to/old-installer/configureUiPathAS.sh is-maintenance-enabled
/path/to/old-installer/configureUiPathAS.sh is-maintenance-enabled
Step 3.5: Running volume restore on primary node
etcd
restore is complete, run volume restore on the primary node using the following command:
./install-uipath.sh -i restore.json -o ./output.json -r --volume-restore --accept-license-agreement --install-type offline
./install-uipath.sh -i restore.json -o ./output.json -r --volume-restore --accept-license-agreement --install-type offline
Step 3.6: Installing the Automation Suite cluster certificate on the restore primary node
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
Enabling AI Center on the Restored Cluster
After restoring an Automation Suite cluster with AI Center™ enabled, follow the steps from the Enabling AI Center on the Restored Cluster procedure.
backup_interval
parameter. Disabling the cluster backup will cause data loss that was created between the last scheduled run and the time
you disabled the backup.
To disable the backup, run the following commands in this order:
To update the NFS server, do the following:
- Re-run the following steps:
- Update the NFS Server information, and then include the new
nfs.endpoint
in both thebackup.json
andrestore.json
files.
To add a new node to the cluster, re-run the following steps:
Redis restore fails when the restore is run, so you need to run a few additional steps.
Follow the steps in the Troubleshooting section.
orchestrator
pods.
- Terminology
- Environment Prerequisites
- Step 1: Setting up the External NFS Server
- Requirements
- Step 1.1: Installing NFS Libraries
- Step 1.2: Configuring the Mount Path
- Step 1.3: Disabling the Firewall
- Step 1.4: Allowing Access of NFS Mount Path to All Backup and Restore Nodes
- Step 2: Enabling the Cluster Backup
- Backup.json
- Step 2.1: Enabling the Backup on the Primary Node of the Cluster
- Step 2.2: Enabling the Backup on Secondary Nodes of the Cluster
- Step 2.3: Enabling the Backup on Agent Nodes of the Cluster
- Step 3: Setting up the Restore Cluster
- Restore Cluster Requirements
- Restore.json
- Online Installation
- Offline Installation
- Disabling the Cluster Backup
- Additional Configurations
- Updating the NFS Server
- Adding a New Node to Cluster
- Known Issues
- Redis Restore
- Insights Looker Pod Fails to Start After Restore