In order to use backup and restore functionality, you need to enable an NFS Server, the Backup Cluster, and the Restore Cluster. All three are defined below.
Terminology
NFS Server is a server that stores the backup data and is used to facilitate the restoration. This can be any machine.
The Backup Cluster is where the Automation Suite is installed. This refers to the cluster you set up during installation.
The Restore Cluster is the cluster where you would like to restore all the data from the Backup Cluster. This cluster becomes the new cluster where you run the Automation Suite after the restoration is complete.
The following steps show how to set all three up.
Â
Environment prerequisites
Important!
- This step will not enable a backup for any external datasource backup (SQL Server). You need to enable the external data source backup separately.
- We do not support cross-zone backup and restore.
- The NFS Server should be reachable from all cluster nodes (both backup and restore clusters).
- The cluster you want to back up and the NFS server must be in the same region.
- Before the cluster restore, make sure to disable the backup as described in Disabling the cluster backup
- Make sure to enable the following ports:
Port | Protocol | Source | Destination | Purpose | Requirements |
---|---|---|---|---|---|
| TCP | NFS Server | All nodes in backup cluster | Data sync between backup cluster and NFS Server | This communication should be allowed from the NFS Server to the backup cluster node before running Step 2: Enabling the cluster backup. |
| TCP | All nodes in backup cluster | NFS Server | Data sync between backup cluster and NFS Server | This communication should be allowed from the backup cluster node to the NFS Server before running Step 2: Enabling the cluster backup. |
| TCP | NFS Server | All nodes in restore cluster | Data sync between NFS Server and restore cluster | This communication should be allowed from the NFS Server to restore the cluster node before running Step 3: Setting up the Restore Cluster. |
| TCP | All nodes in restore cluster | NFS Server | Data sync between backup cluster and NFS Server | This communication should be allowed from the NFS Server to the backup cluster node before running Step 3: Setting up the Restore Cluster. |
Â
Step 1: Setting up the external NFS server
Requirements
The NFS Server should run outside the Backup Cluster and the Restore Cluster.
The NFS Server disk size should be greater than the data disk size of the primary server node.
See Hardware requirements for more details.
Step 1.1: Installing NFS libraries
Important!
Ignore Step 1.1 if you already have an NFS server.
Install nfs-utils
library on the node you plan to use as the NFS Server.
dnf install nfs-utils -y
systemctl start nfs-server.service
systemctl enable nfs-server.service
Step 1.2: Configuring the mount path
Configure the mount path that you want to expose from the NFS Server.
chown -R nobody: "/datadisk"
chmod -R 777 "/datadisk"
systemctl restart nfs-utils.service
Step 1.3: Disabling the firewall
Firewalld is a security library that manages networking and firewall rules.
See official Firewalld documentation for more details.
To disable Firewalld, run the following command.
systemctl stop firewalld
systemctl disable firewalld
Step 1.4: Allowing access of NFS mount path to all backup and restore nodes
All nodes must be able to access the NFS mount path. On the NFS Server, go to the /etc/exports
file, and add an entry for the FQDN for each node (both server and agent) for both the Backup Cluster and the Restore Cluster.
Below is an example of how to add an entry, where the entry below specifies the FQDN of a machine and the corresponding permissions on that machine:
echo "/datadisk sfdev1868610-d053997f-node.eastus.cloudapp.azure.com(rw,sync,no_all_squash,root_squash)" >> /etc/exports
Then run the following command to export mount path:
exportfs -arv
exportfs -s
Â
Step 2: Enabling the cluster backup
Important!
Make sure you have followed the Environment prerequisites step.
Make sure to back up the
cluster_config.json
file used for installation.This step will not enable the backup for any external datasource backup (such as the SQL Server). You need to enable external data source backup separately.
It is not recommended to reduce the backup interval to less than 15 minutes.
Automation Suite does not make a backup of all the Persistent Volumes, such as the volumes attached to the training pipeline in AI Center. A backup is created only for a few Persistent Volumes such as
Alert Manager
,Prometheus
,Docker Registry
,MongoDB
,RabbitMQ
,Ceph Objectstore
, andInsights
.
Create a file and name it backup.json
. Make sure to fill it out based on the field definitions below.
backup.json
backup.json
{
"backup": {
"etcdBackupPath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"backup_interval": "15"
}
backup.etcdBackupPath
— Path where the backup data will be stored on the NFS Serverbackup.nfs.endpoint
— Endpoint of the NFS Server;backup.nfs.mountpath
— Mount path of the NFS Server;backup_interval
— The backup time interval in minutes.
In the following example, the backup data will be stored under /datadisk/backup/cluster0
on the NFS server:
{
"backup": {
"etcdBackupPath": "cluster0",
"nfs": {
"endpoint": "20.224.01.66",
"mountpath": "/datadisk"
}
}
}
Step 2.1: Enabling the backup on the primary node of the cluster
To enable the backup on the primary node of the cluster, run the following command:
./install-uipath.sh -i backup.json -o output.json -b --accept-license-agreement
Step 2.2: Enabling the backup on secondary nodes of the cluster
To enable the backup on secondary nodes of the cluster, run the following command:
./install-uipath.sh -i backup.json -o output.json -b -j server --accept-license-agreement
Step 2.3: Enabling the backup on agent nodes of the cluster
To enable the backup on agent nodes of the cluster, run the following command:
./install-uipath.sh -i backup.json -o output.json -b -j agent --accept-license-agreement
Â
Step 3: Setting up the Restore Cluster
Important!
Make sure the backup is disabled before restoring the cluster. See Disabling the cluster backup.
Make sure package wget, unzip, jq are availabe on all restore nodes.
Make sure you have followed the Environment prerequisites step.
All external datasource source should be the same (SQL Server).
Restart the NFS Server before cluster restoration. Execute the following command on the NFS Server node:
systemctl restart nfs-server
.
Restore Cluster requirements
- The Restore Cluster should have the same
fqdn
as the Backup Cluster. - The Restore Cluster should have the same number of
server
andagent
nodes as that of Backup Cluster. - The Restore Cluster should have the same
server
andagent
nodes resources as the Backup Cluster, as shown below:- Hardware configuration for CPU
- Hardware configuration for Memory
- Hardware configuration for Disk Space
- Node Hostname
Installation type | Installation instructions | Requirements |
---|---|---|
Online single-node evaluation mode | Configuring the machines | Download only |
Offline single-node evaluation mode | Configuring the machines | Download only |
Online multi-node HA-ready production mode | Configuring the machines | Download only |
Offline multi-node HA-ready production mode | Configuring the machines | Download only |
Create a file and name it restore.json
. Make sure to fill it out based on the following field definitions.
restore.json
restore.json
{
"fixed_rke_address": "PLACEHOLDER",
"gpu_support": false,
"fqdn": "PLACEHOLDER",
"rke_token": "PLACEHOLDER",
"restore": {
"etcdRestorePath": "PLACEHOLDER",
"nfs": {
"endpoint": "PLACEHOLDER",
"mountpath": "PLACEHOLDER"
}
},
"infra": {
"docker_registry": {
"username": "PLACEHOLDER",
"password": "PLACEHOLDER"
}
}
}
fqdn
— The load balancer FQDN for the multi-node HA-ready production mode or the machine FQDN for the single-node evaluation modefixed_rke_address
— The fqdn of the load balancer if one is configured, otherwise it is the fqdn of the first restore server node. Used to load balance node registration and kube API request. Refer to Fixed address and Configuring the load balancer for more info.gpu_support
— Usetrue
orfalse
to enable or disable GPU support for the cluster (use if you have agent nodes with GPUs).rke_token
— This is a pre-shared, cluster-specific secret. This should be the same as Backup Cluster and can be found in thecluster_config.json
file. It is needed for all the nodes joining the cluster.restore.etcdRestorePath
— Path where backup data is stored for the cluster in NFS Server. Configured at Backup withetcdBackupPath
.restore.nfs.endpoint
— Endpoint of NFS Server.restore.nfs.mountpath
: Mount Path of NFS Server.infra.docker_registry.username
— The username that you have set in the Backup Cluster. It can be found in thecluster_config.json
file and is needed for the docker registry.infra.docker_registry.password
— The password that you have set in the Backup Cluster. It can be found in thecluster_config.json
file and is needed for the docker registry installation.
Â
Online installation
Step 3.1: Restoring etcd
on the primary node of the cluster
etcd
on the primary node of the cluster To restore etcd
on the primary node of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r --accept-license-agreement --install-type online
Step 3.2: Restoring etcd
on secondary nodes of the cluster
etcd
on secondary nodes of the cluster To restore etcd
on secondary nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j server --accept-license-agreement --install-type online
Important!
Node role is mandatory for all secondary server nodes.
Step 3.3: Restoring etcd
on agent nodes of the cluster
etcd
on agent nodes of the cluster To restore etcd
on agent nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j agent --accept-license-agreement --install-type online
Step 3.4: Running volume restore on primary node
Once the etcd
restore is complete, run volume restore on the primary node using the following command:
./install-uipath.sh -i restore.json -o output.json -r --volume-restore --accept-license-agreement --install-type online
Step 3.5: Installing the Automation Suite cluster certificate on the restore primary node
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
Â
Offline installation
Step 3.1: Restoring etcd
on the primary node of the cluster
etcd
on the primary node of the cluster To restore etcd
on primary node of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.2: Restoring etcd
on secondary nodes of the cluster
etcd
on secondary nodes of the cluster./install-uipath.sh -i restore.json -o output.json -r -j server --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.3: Restoring etcd
on agent nodes of the cluster
etcd
on agent nodes of the cluster To restore etcd
on agent nodes of the cluster, run the following command:
./install-uipath.sh -i restore.json -o output.json -r -j agent --offline-bundle "/uipath/sf-infra-bundle.tar.gz" --offline-tmp-folder /uipath --install-offline-prereqs --accept-license-agreement --install-type offline
Step 3.4: Running volume restore on primary node
Once the etcd
restore is complete, run volume restore on the primary node using the following command:
./install-uipath.sh -i restore.json -o ./output.json -r --volume-restore --accept-license-agreement --install-type offline
Step 3.5: Installing the Automation Suite cluster certificate on the restore primary node
sudo ./configureUiPathAS.sh tls-cert get --outpath /opt/
cp /opt/ca.crt /etc/pki/ca-trust/source/anchors/
update-ca-trust
Â
Disabling the cluster backup
Important!
You can enable the cluster backup to save data at a specified time using the
backup_interval
parameter. Disabling the cluster backup will cause data loss that was created between the last scheduled run and the time you disabled the backup.
To disable the backup, run the following commands in this order:
- Disable the backup on the primary node of the cluster.
./install-uipath.sh -i backup.json -o output.json -b --disable-backup --accept-license-agreement
- Disable the backup on secondary nodes of the cluster.
./install-uipath.sh -i backup.json -o output.json -b -j server --disable-backup --accept-license-agreement
- Disable the backup on agent nodes of the cluster.
./install-uipath.sh -i backup.json -o output.json -b -j agent --disable-backup --accept-license-agreement
Â
Additional configurations
Updating the NFS server
Important!
- Make sure the backup is disabled before updating the NFS server. See Disabling the cluster backup for details.
To update the NFS server, do the following:
-
Re-run the following steps:
a. Step 1: Setting up the external NFS server
b. Step 2: Setting up the backup cluster
c. Step 3: Setting up the Restore Cluster -
Update the NFS Server information, and then include the new
nfs.endpoint
in both thebackup.json
andrestore.json
files.
Adding a new node to cluster
To add a new node to the cluster, re-run the following steps:
Â
Known Issues
Redis restore
Redis restore fails when the restore is run, so you need to run a few additional steps.
Follow the steps in the Troubleshooting section.
Important!
Once Redis is restored, make sure to restart
orchestrator
pods.
Insights looker pod fails to start after restore
You can fix this issue by deleting the Looker pod from the Insights application in ArgoCD UI. The deployment will create a new pod that should start successfully.
Updated 20 days ago