- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Managing products
- Managing the cluster in ArgoCD
- Setting up the external NFS server
- Automated: Enabling the Backup on the Cluster
- Automated: Disabling the Backup on the Cluster
- Automated, Online: Restoring the Cluster
- Automated, Offline: Restoring the Cluster
- Manual: Enabling the Backup on the Cluster
- Manual: Disabling the Backup on the Cluster
- Manual, Online: Restoring the Cluster
- Manual, Offline: Restoring the Cluster
- Additional configuration
- Migrating objectstore from persistent volume to raw disks
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bundle
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure After Certificate Update
- Automation Suite not working after OS upgrade
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Volume unable to mount due to not being ready for workloads
- RKE2 fails during installation and upgrade
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to Resize Objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Cannot Log in After Migration
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- After the Initial Install, ArgoCD App Went Into Progressing State
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unexpected Inconsistency; Run Fsck Manually
- Degraded MongoDB or Business Applications After Cluster Restore
- Missing Self-heal-operator and Sf-k8-utils Repo
- Unhealthy Services After Cluster Restore or Rollback
- RabbitMQ pod stuck in CrashLoopBackOff
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Backing up and restoring the cluster
Automation Suite supports the backup and restore functionality to prevent data loss in various scenarios. You can configure a backup any time post-installation.
To use the backup and restore functionality, you must enable an NFS server, backup cluster, and restore cluster. These concepts are defined in the following section.
NFS Server – The server that stores the backup data and facilitates the restoration. You can set up the NFS server on any machine or a PaaS service offered by cloud providers. Note that we do not support Windows-based NFS and Azure blob-based NFS.
Backup Cluster – The cluster you set up to install Automation Suite. This is the cluster where you will enable the backup.
Restore Cluster – The cluster where you restore all the data from the backup cluster. This becomes the new cluster where you run Automation Suite once the restore process is complete.
/datadisk
attached to server machines.
However, this will not enable the backup of any external data sources, such as the SQL database. You must enable the external data source backup separately.
- Configure the NFS server to allow access to the new node. For details, see Allowing nodes to access NFS mount point.
-
Enable the backup on the new server node:
To set up the backup and restore functionality, you must meet the following requirements:
- You must use NFSv4 on Linux.
- You must set up the NFS server on a separate machine hosted outside of the backup and restore cluster.
- There must not be more than 10-millisecond Round Trip Time (RTT) latency between the NFS server and the backup and restore cluster.
- The cluster you want to back up and the NFS server must be in the same region.
-
The NFS server must meet the following hardware requirements:
CPU
RAM
Disk
4(v-) CPU
16 GiB
10 TiB SSD (1100 IOPS)
- The NFS server must be reachable from all the cluster nodes.
-
You must enable the following ports on the NFS server and all the nodes in the backup cluster. When restoring the cluster, the same ports must be open on all the nodes in the restore cluster.
Port
Protocol
Purpose
2049
TCP
Bidirectional communication between the NFS server and the backup and restore cluster.
This is the port on which the NFS server will run.
111
TCP
Bidirectional communication between the NFS server and the backup and restore cluster.
This port is used for rpcbind between the NFS server and the backup and restore cluster.
backup.json
.
To do that, take the following steps:
-
Create a file named
backup.json
.{ "backup": { "etcdBackupPath": "PLACEHOLDER", "nfs": { "endpoint": "PLACEHOLDER", "mountpath": "PLACEHOLDER" } }, "backup_interval": "15" }
{ "backup": { "etcdBackupPath": "PLACEHOLDER", "nfs": { "endpoint": "PLACEHOLDER", "mountpath": "PLACEHOLDER" } }, "backup_interval": "15" } -
Fill out the file based on the following field definitions:
Parameter
Configuration
backup.etcdBackupPath
The relative path where the backup data will be stored on the NFS server. You can give it the name of cluster.
Example:cluster0
backup.nfs.endpoint
The endpoint of the NFS server (IP address or DNS name). This will be either the FQDN or IP address of the NFS machine. There must not be any protocol present in the endpoint.
Example:nfs.automationsuite.mycompany.com
or20.224.01.66
backup.nfs.mountpath
The path on the NFS server (endpoint). This is the location where you have attached the disk for storing the cluster backup.
Example:/asbackup
backup_interval
The backup time interval in minutes. This interval is the lead time between two consecutive backups. You can only restore the last successful backup, so you should carefully decide on this interval. The minimum backup interval can be as low as 15 minutes.
Important:- If the backup interval is too short, for instance 30 minutes, the backup operations will be too frequent, forcing you to only store the data that was backed up in last 30 minutes. Similarly, if the backup interval is 1 week, this can cause data loss during the interval between the last backup and disaster. Therefore, it is recommended to keep the backup interval in line with your Recovery Point Objective (RPO) requirements.
- When setting up the backup of the external SQL server, you should take the cluster backup interval into consideration. It is recommended to set up the same interval for both the external SQL server and the Automation Suite cluster.
- When the backup is enabled in the cluster, regardless of the backup interval, Automation Suite will instantly trigger the backup. After that, the next backup will be scheduled based on the backup interval.
- You can verify the backup by logging into the NFS server and navigating to the following path:
/backup.nfs.mountpath/backup.etcdBackupPath
. For example:/asbackup/cluster0
.
Alert Manager
, Prometheus
, Docker Registry
, MongoDB
, RabbitMQ
, Ceph Objectstore
, and Insights
.
restore.json
.
To do that, take the following steps:
-
Create a file named
restore.json
.{ "restore": { "etcdRestorePath": "PLACEHOLDER", "nfs": { "endpoint": "PLACEHOLDER", "mountpath": "PLACEHOLDER" } } }
{ "restore": { "etcdRestorePath": "PLACEHOLDER", "nfs": { "endpoint": "PLACEHOLDER", "mountpath": "PLACEHOLDER" } } } -
Fill out the file based on the following field definitions:
Parameter
Configuration
restore.etcdRestorePath
The path on the NFS server from where the data will be restored. It must be the same name as the one you have provided forbackup.etcBackupPath
inbackup.json
.Example:cluster0
restore.nfs.endpoint
The endpoint of the NFS server. This will be either the FQDN or the IP address of the NFS machine. There must not be any protocol present in the endpoint.
Example:nfs.automationsuite.mycompany.com
or20.224.01.66
restore.nfs.mountpath
The mount path of the NFS server. This is the location where you have attached the disk for storing the cluster backup.
Example:/asbackup