- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Managing products
- Managing the cluster in ArgoCD
- Setting up the external NFS server
- Automated: Enabling the Backup on the Cluster
- Automated: Disabling the Backup on the Cluster
- Automated, Online: Restoring the Cluster
- Automated, Offline: Restoring the Cluster
- Manual: Enabling the Backup on the Cluster
- Manual: Disabling the Backup on the Cluster
- Manual, Online: Restoring the Cluster
- Manual, Offline: Restoring the Cluster
- Additional configuration
- Migrating objectstore from persistent volume to raw disks
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure After Certificate Update
- Automation Suite not working after OS upgrade
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Volume unable to mount due to not being ready for workloads
- RKE2 fails during installation and upgrade
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to Resize Objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Cannot Log in After Migration
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- After the Initial Install, ArgoCD App Went Into Progressing State
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unexpected Inconsistency; Run Fsck Manually
- Degraded MongoDB or Business Applications After Cluster Restore
- Missing Self-heal-operator and Sf-k8-utils Repo
- Unhealthy Services After Cluster Restore or Rollback
- RabbitMQ pod stuck in CrashLoopBackOff
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Pods cannot communicate with FQDN in a proxy environment
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Configuring the Machines
- To prevent data loss, ensure the infrastructure you use does not automatically delete cluster disks on cluster reboot or shutdown. If this capability is enabled, make sure to disable it.
- To ensure a smooth and uninterrupted SSH session, we strongly recommend that you follow the steps in Installation best practices before configuring the disks and installing Automation Suite.
configureUiPathDisks.sh
script. For details, see the following sections.
Before the installation, you must partition and configure the disk using LVM, so that its size can be altered easily and without any data migration or data loss.
Disk partitioning
/var
partition. By default, the var
partition is allocated only 8 GiB of space.
- The supported format for disk is
ext4
orxfs
. - All partitions must be created using LVM. This is to ensure that cluster data can reside on a different disk, but still be able to be viewed coherently. This also helps in extending the partition size in the future without the risk of data migration or data loss.
- All the pods and application logs are stored under the
/var/log/pods
directory. Make sure that the capacity of this directory is at least 8 GiB. We also recommend configuringlogrotate
to rotate the logs at an interval ranging from daily to weekly.
For the RHEL OS, you need to ensure you have the following minimum mount point sizes on the machine.
Online
Disk label |
Partition |
Size |
Purpose |
---|---|---|---|
Cluster disk |
|
190 GiB |
Rancher folder stores container images and layers |
|
56 GiB |
Kubelet folder stores runtime Kubernetes configurations such as secrets, configmaps, and emptyDir | |
|
20 GiB |
Installer binary | |
etcd disk |
|
16 GiB |
Distributed database for Kubernetes |
Data disk |
|
512 GiB (Basic installation) |
Block storage abstraction |
2 TiB (Complete installation) |
We recommend that you do not use OS disk for any of the above purposes, to ensure processes get their fair share of resources.
Offline
The requirements for offline are the same as the ones for online, except for the machine where you first run the installation on, which needs the following requirements.
The extra space is needed to unpack the offline bundle.
Disk label |
Partition |
Size |
Purpose |
---|---|---|---|
Cluster disk |
|
190 GiB |
Rancher folder stores container images and layers |
|
56 GiB |
Kubelet folder stores runtime Kubernetes configurations such as secrets, configmaps, and emptyDir | |
|
20 GiB |
Installer binary | |
etcd disk |
|
16 GiB |
Distributed database for Kubernetes |
Data disk |
|
512 GiB (Basic installation) |
Block storage abstraction |
2 TiB (Complete installation) | |||
UiPath bundle disk | /uipath | 512 GiB |
Air-gapped bundle |
We recommend that you do not use OS disk for any of the above purposes, to ensure processes get their fair share of resources.
Data and etcd disks should be separate physical disks. This physically isolates the data and etcd disks from other cluster workload and activity while also enhancing the performance and stability of the cluster.
See the following section for details on how to use the sample script to partition and configure the disk before the installation.
Downloading the script
configureUiPathDisks.sh
script to configure and partition the disk.
For download instructions, see configureUiPathDisks.sh.
Running the script
configureUiPathDisks.sh
script for the following purposes:
- configure the disks and mount points for a new Automation Suite cluster installation;
- resize the data disk post-installation.
To make the script executable, run:
chmod +x ./configureUiPathDisks.sh
chmod +x ./configureUiPathDisks.sh
For more details on the script usage, run:
sudo ./configureUiPathDisks.sh --help
sudo ./configureUiPathDisks.sh --help
***************************************************************************************
Utility to configure the disk for UiPath Automation Suite Installation.
Run this script to configure the disks on new machine or to extend the size of datadisk
Arguments
-n|--node-type NodeType, Possible values: agent, server. Default to server
-i|--install-type Installation mode, Possible values: online, offline. Default to online
-c|--cluster-disk-name Device to host rancher and kubelet. Ex: /dev/sdb
-e|--etcd-disk-name Device to host etcd, Not required for agent node. Ex: /dev/sdb
-l|--data-disk-name Device to host datadisk, Not required for agent node. Ex: /dev/sdc
-b|--bundle-disk-name Device to host the uipath bundle.
Only required for offline installation on 1st server node
-f|--complete-suite Installing complete product suite or any of these products:
aicenter, apps, taskmining, documentunderstanding.
This will configure the datadisk volume to be 2TiB instead of 512Gi.
-p|--primary-server Is this machine is first server machine? Applicable only for airgap install.
This is the machine on which UiPath AutomationSuite bundle will be installed.
Default to false
-x|--extend-data-disk Extend the datadisk. Either attach new disk or resize the exiting datadisk
-r|--resize Used in conjunction of with --extend-data-disk to resize the exiting volume,
instead of adding new volume
-d|--debug Run in debug
-h|--help Display help
ExampleUsage:
configureUiPathDisks.sh --node-type server --install-type online \
--cluster-disk-name /dev/sdb --etcd-disk-name /dev/sdc \
--data-disk-name /dev/sdd
configureUiPathDisks.sh --data-disk-name /dev/sdh --extend-data-disk
***************************************************************************************
***************************************************************************************
Utility to configure the disk for UiPath Automation Suite Installation.
Run this script to configure the disks on new machine or to extend the size of datadisk
Arguments
-n|--node-type NodeType, Possible values: agent, server. Default to server
-i|--install-type Installation mode, Possible values: online, offline. Default to online
-c|--cluster-disk-name Device to host rancher and kubelet. Ex: /dev/sdb
-e|--etcd-disk-name Device to host etcd, Not required for agent node. Ex: /dev/sdb
-l|--data-disk-name Device to host datadisk, Not required for agent node. Ex: /dev/sdc
-b|--bundle-disk-name Device to host the uipath bundle.
Only required for offline installation on 1st server node
-f|--complete-suite Installing complete product suite or any of these products:
aicenter, apps, taskmining, documentunderstanding.
This will configure the datadisk volume to be 2TiB instead of 512Gi.
-p|--primary-server Is this machine is first server machine? Applicable only for airgap install.
This is the machine on which UiPath AutomationSuite bundle will be installed.
Default to false
-x|--extend-data-disk Extend the datadisk. Either attach new disk or resize the exiting datadisk
-r|--resize Used in conjunction of with --extend-data-disk to resize the exiting volume,
instead of adding new volume
-d|--debug Run in debug
-h|--help Display help
ExampleUsage:
configureUiPathDisks.sh --node-type server --install-type online \
--cluster-disk-name /dev/sdb --etcd-disk-name /dev/sdc \
--data-disk-name /dev/sdd
configureUiPathDisks.sh --data-disk-name /dev/sdh --extend-data-disk
***************************************************************************************
Online
Server nodes
To configure the disk in an online multi-node HA-ready production setup, run the following command on all server machines:
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk
Agent nodes
To configure the disk in an online multi-node HA-ready production setup, run the following command on all agent machines:
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--node-type agent
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--node-type agent
Offline
First server node
In an offline installation, you need to load the product’s images in the docker registry. For that, additional storage in the form of a separate disk is required. This disk will be used to un-tar product bundles and upload images to the docker registry. It is required to be present only on one machine.
--primary-server
flag on one of the server machines or on the machine on which the fabric and service installer will run.
To configure the disk in an offline multi-node HA-ready production setup, run the following command on one of the server machines:
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk \
--bundle-disk-name name_of_uipath_bundle_disk \
--primary-server \
--install-type offline
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk \
--bundle-disk-name name_of_uipath_bundle_disk \
--primary-server \
--install-type offline
Additional server nodes
--primary-server
and --bundle-disk-name
.
To configure the disk in an offline multi-node HA-ready production setup, run the following command on the other server machines:
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk \
--install-type offline
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--etcd-disk-name name_of_etcd_disk \
--data-disk-name name_of_data_disk \
--install-type offline
Agent nodes
--primary-server
and --bundle-disk-name
.
To configure the disk in an offline multi-node HA-ready production setup, run the following command on the other agent machines.
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--node-type agent \
--install-type offline
./configureUiPathDisks.sh --cluster-disk-name name_of_cluster_disk \
--node-type agent \
--install-type offline
To configure the disk for objectstore, run the following command:
./configureUiPathDisks.sh --ceph-raw-disk-name name_ceph_raw_disk
./configureUiPathDisks.sh --ceph-raw-disk-name name_ceph_raw_disk
-
An Azure known issue incorrectly marks the Azure disk as non-SSD. If Azure is your cloud provider, and you want to configure the Objectstore disk, follow the instructions in Troubleshooting.
-
Vertical scaling of the existing disks is not supported. To increase the size of your in-cluster storage post-installation, add new raw disks.
To extend the data disk, you can attach the new physical disk or resize the existing disk.
Adding a new disk
To extend the data disk using the newly attached disk, run the following command on the server machines:
./configureUiPathDisks.sh --data-disk-name name_of_data_disk \
--extend-data-disk
./configureUiPathDisks.sh --data-disk-name name_of_data_disk \
--extend-data-disk
Resizing the existing disk
To extend the data disk by resizing the existing disk, run the following command on the server machines:
./configureUiPathDisks.sh --extend-data-disk --resize
./configureUiPathDisks.sh --extend-data-disk --resize
-
Take the following steps to validate
/etc/fstab
is correctly configured to handle rebooting of system.Note:Make sure that etcd and datadisk mount points are added in thefstab
file.If you have separate disk partition for/var/lib/rancher
and/var/lib/kubelet
, thenfstab
should also contain these two folders. Also make sure to includenofail
option in thosefstab
entries so that it does not affect the VM boot in case of failures. -
Validate the disks are mounted correctly by running the following command:
mount -afv
mount -afv -
You should get the following response:
/datadisk : already mounted /var/lib/rancher/rke2/server/db : already mounted /var/lib/rancher : already mounted /var/lib/kubelet : already mounted
/datadisk : already mounted /var/lib/rancher/rke2/server/db : already mounted /var/lib/rancher : already mounted /var/lib/kubelet : already mounted
The following page helps Linux administrators on managing OS and kernel level settings before performing an Automation Suite installation.
Usually, these settings are managed via a dedicated management configuration tool, such as Puppet. Make sure that the changes you make are according to the control process of your environment for consistency and documentation purposes.
Make sure to complete the following steps before starting the installation, as misconfigurations at the OS and kernel level can lead to non-intuitive errors. Checking these specific settings can often avoid such errors.
sysctl
settings are required on the machine:
-
enable IP forwarding
-
disable reverse path filtering
You can do this by running the following command:
cat <<EOF >>"/etc/sysctl.d/99-sysctl.conf"
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.conf.all.rp_filter=0
EOF
cat <<EOF >>"/etc/sysctl.d/99-sysctl.conf"
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.conf.all.rp_filter=0
EOF
nf-call-iptables
is needed for most Kubernetes deployments. Kubernetes creates virtual networks internal to the cluster. This allows every
pod to have its own IP address, which is used in conjunction with the internal name services to facilitate service-to-service
communication. The cluster does not work without nf-call-iptables
enabled. For details, see the official Kubernetes documentation.
To apply the settings, run the following command:
sysctl --system
sysctl --system
If using fapolicy, an RKE2 specific policy is required. To generate this, use the following command:
cat <<-EOF >>"/etc/fapolicyd/rules.d/69-rke2.rules"
allow perm=any all : dir=/var/lib/rancher/
allow perm=any all : dir=/opt/cni/
allow perm=any all : dir=/run/k3s/
allow perm=any all : dir=/var/lib/kubelet/
allow perm=any all : dir=/root/.local/share/helm
EOF
cat <<-EOF >>"/etc/fapolicyd/rules.d/69-rke2.rules"
allow perm=any all : dir=/var/lib/rancher/
allow perm=any all : dir=/opt/cni/
allow perm=any all : dir=/run/k3s/
allow perm=any all : dir=/var/lib/kubelet/
allow perm=any all : dir=/root/.local/share/helm
EOF
Ensure that the change is communicated to your Linux team and goes through the appropriate configuration management processes.
/var/lib/rancher
mount must not have noexec
or nosuid
set. The disk tool automatically creates these mounts without these properties.
If a Linux administrator manually sets these properties, the instance becomes non-functional.
For more details on disk configuration, see Disk Requirements.
Make sure that you have the following ports enabled on your firewall for each source.
Port |
Protocol |
Source |
Purpose |
Requirements |
---|---|---|---|---|
|
TCP |
Jump Server / client machine |
For SSH (installation, cluster management debugging) |
Do not open this port to the internet. Allow access to client machine or jump server. |
|
TCP |
Offline installation only: required for sending system email notifications. | ||
|
TCP |
All nodes in a cluster + load balancer |
For HTTPS (accessing Automation Suite) |
This port should have inbound and outbound connectivity from all the nodes in the cluster and the load balancer. |
|
TCP |
Offline installation only: required for sending system email notifications. | ||
|
TCP |
All nodes in a cluster |
etcd client port |
Must not expose to the internet. Access between nodes should be enough over a private IP address. |
|
TCP |
All nodes in a cluster |
etcd peer port |
Must not expose to the internet. Access between nodes should be enough over a private IP address. |
|
TCP |
All nodes in a cluster |
For accessing Kube API using HTTPS, and required for node joining |
This port should have inbound and outbound connectivity from all nodes in the cluster. |
|
UDP |
All nodes in a cluster |
Required for Cilium. |
Must not expose to the internet. Access between nodes should be enough over a private IP address. |
|
TCP |
All nodes in a cluster + load balancer |
For accessing Kube API using HTTP, required for node joining |
This port should have inbound and outbound connectivity from all nodes in the cluster and the load balancer. |
|
TCP |
All nodes in a cluster |
kubelet / metrics server |
Must not expose to the internet. Access between nodes should be enough over a private IP address. |
|
TCP |
All nodes in a cluster |
NodePort port for internal communication between nodes in a cluster |
Must not expose to the internet. Access between nodes should be enough over a private IP address. |
6443
outside the cluster boundary is mandatory if there is a direct connection to the Kerberos API.
9345
is used by nodes to discover existing nodes and join the cluster in the multi-node deployment. To keep the high availability
discovery mechanisms running, we recommend exposing it via the load balancer with health check.
Also ensure you have connectivity from all nodes to the SQL server.
Do not expose the SQL server on one of the Istio reserved ports, as it may lead to connection failures.
If you have a firewall setup in the network, make sure that it has these ports open and allows traffic according to the requirements mentioned above.
To configure a proxy, you need to perform additional configuration steps while setting up your environment with the prerequisites and during the advanced configuration phase of installation time.
The following steps are required when setting up your environment.
Make sure that you have the following rules enabled on your network security group for the given Virtual Network.
Source |
Destination |
Route via proxy |
Port |
Description |
---|---|---|---|---|
Virtual Network |
SQL |
No |
SQL server port |
Required for SQL Server. |
Virtual Network |
Load Balancer |
No |
|
Required to add new nodes to the cluster. |
Virtual Network |
Cluster(subnet) |
No |
All ports |
Required for communication over a private IP range. |
Virtual Network |
|
No |
|
Required for login and using ArgoCD client during deployment. |
Virtual Network |
Proxy Server |
Yes |
All ports |
Required to route traffic to the proxy server. |
Virtual Network |
NameServer |
No |
All ports |
Most of the cloud services such as Azure and AWS use this to fetch the VM metadata and consider this a private IP. |
Virtual Network |
MetaDataServer |
No |
All ports |
Most of the cloud services such as Azure and AWS use the IP address
169.254.169.254 to fetch machine metadata.
|
When configuring the nodes, you need to add the proxy configuration to each node that is part of the cluster. This step is required to route outbound traffic from the node via the proxy server.
-
Add the following configuration in
/etc/environment
:http_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> https_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> no_proxy=alm.<fqdn>,<fixed_rke2_address>,<named server address>,<metadata server address>,<private_subnet_ip>,localhost,<Comma separated list of ips that should not got though proxy server>
http_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> https_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> no_proxy=alm.<fqdn>,<fixed_rke2_address>,<named server address>,<metadata server address>,<private_subnet_ip>,localhost,<Comma separated list of ips that should not got though proxy server> -
Add the following configuration in
/etc/wgetrc
:http_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> https_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> no_proxy=alm.<fqdn>,<fixed_rke2_address>,<named server address>,<metadata server address>,<private_subnet_ip>,localhost,<Comma separated list of ips that should not got though proxy server>
http_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> https_proxy=http://<PROXY-SERVER-IP>:<PROXY-PORT> no_proxy=alm.<fqdn>,<fixed_rke2_address>,<named server address>,<metadata server address>,<private_subnet_ip>,localhost,<Comma separated list of ips that should not got though proxy server>Mandatory parameters
Description
http_proxy
Used to route HTTP outbound requests from the node. This should be the proxy server FQDN and port.
https_proxy
Used to route HTTPS outbound requests from the node. This should be the proxy server FQDN and port.
no_proxy
Comma-separated list of hosts, IP addresses that you do not want to route via the proxy server. This should be a private subnet, SQL server host, named server address, metadata server address:alm.<fqdn>,<fixed_rke2_address>,<named server address>,<metadata server address>
.metadata server address
– Most of the cloud services such as Azure and AWS use the IP address169.254.169.254
to fetch machine metadata.named server address
– Most of the cloud services such as Azure and AWS use this to resolve DNS query.
-
Verify if the proxy settings are properly configured by running the following command:
curl -v $HTTP_PROXY curl -v <fixed_rke_address>:9345
curl -v $HTTP_PROXY curl -v <fixed_rke_address>:9345Important: Once you meet the proxy server requirements, make sure to continue with the proxy configuration during installation. Follow the steps in Optional: Configuring the proxy server to ensure the proxy server is set up properly.
- Configuring the disk
- Disk Requirements
- Using the Script to Configure the Disk
- Configuring the disk for a multi-node HA-ready production setup
- Configuring the Objectstore disk
- Extending the data disk post-installation
- Validating disk mounts
- Configuring kernel and OS level settings
- Configuring sysctl settings
- Configuring fapolicy settings
- Configuring noexec and nosuid settings
- Enabling Ports
- Optional: Configuring the proxy server
- Step 1: Enabling ports on the virtual network
- Step 2: Adding proxy configuration to each node