- Manual: Preparing the installation
- Step 1: Configuring the OCI-compliant registry for offline installations
- Step 2: Configuring the external objectstore
- Step 3: Configuring Microsoft SQL Server
- Step 4: Configuring the load balancer
- Step 5: Configuring the DNS
- Step 6: Configuring the disks
- Step 7: Configuring the node ports
- Step 8: Applying miscellaneous settings
- Step 10: Validating and installing the required RPM packages
- Step 11: Generating cluster_config.json
- Certificate configuration
- Database configuration
- External Objectstore configuration
- Pre-signed URL configuration
- External OCI-compliant registry configuration
- Disaster recovery: Active/Passive configuration
- Orchestrator-specific configuration
- Insights-specific configuration
- Process Mining-specific configuration
- Document Understanding-specific configuration
- Automation Suite Robots-specific configuration
- Monitoring configuration
- Optional: Configuring the proxy server
- Optional: Enabling resilience to zonal failures in a multi-node HA-ready production cluster
- Optional: Passing custom resolv.conf
- Optional: Increasing fault tolerance
- install-uipath.sh parameters
- Enabling Redis High Availability Add-On for the cluster
- Adding a dedicated agent node with GPU support
- Connecting Task Mining application
- Adding a dedicated agent Node for Task Mining
- Adding a Dedicated Agent Node for Automation Suite Robots
- Step 13: Configuring the temporary Docker registry for offline installations
- Step 14: Validating the prerequisites for the installation
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone Orchestrator database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the Orchestrator connection strings
- Step 6: Migrating standalone Orchestrator
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Migrating from Automation Suite on Linux to Automation Suite on EKS/AKS
- Upgrading Automation Suite
- Downloading the installation packages and getting all the files on the first server node
- Retrieving the latest applied configuration from the cluster
- Updating the cluster configuration
- Configuring the OCI-compliant registry for offline installations
- Migrating to an external OCI-compliant registry
- Executing the upgrade
- Performing post-upgrade operations
- How to troubleshoot services during installation
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to clean up old logs stored in the sf-logs bundle
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- Unable to run an offline installation on RHEL 8.4 OS
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- Cluster unhealthy after automated upgrade from 2021.10
- Upgrade fails due to unhealthy Ceph
- RKE2 not getting started due to space issue
- Volume unable to mount and remains in attach/detach loop state
- Upgrade fails due to classic objects in the Orchestrator database
- Ceph cluster found in a degraded state after side-by-side upgrade
- Unhealthy Insights component causes the migration to fail
- Service upgrade fails for Apps
- In-place upgrade timeouts
- Docker registry migration stuck in PVC deletion stage
- AI Center provisioning failure after upgrading to 2023.10
- Setting a timeout interval for the management portals
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Update the underlying directory connections
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unhealthy services after cluster restore or rollback
- Pods stuck in Init:0/X
- Running the diagnostics tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Ceph cluster found in a degraded state after side-by-side upgrade
Description
Ocassionally, after a side-by-side upgrade, the Rook-ceph application goes into a "sync failed" state in the ArgoCD portal. This is due to an upstream Ceph issue.
To identify the reason of the degraded state, run the following command:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s
If you receive an output resembling the following example, the issue is related to the Rook-Ceph health:
cluster:
id: 936b2e58-1014-4237-b2a5-6e95449a9ce8
health: HEALTH_ERR
Module 'devicehealth' has failed: disk I/O error
services:
mon: 3 daemons, quorum a,b,c (age 11h)
mgr: b(active, since 37h), standbys: a
osd: 3 osds: 3 up (since 37h), 3 in (since 37h)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 8 pools, 225 pgs
objects: 53.57k objects, 26 GiB
usage: 80 GiB used, 688 GiB / 768 GiB avail
pgs: 225 active+clean
io:
client: 561 KiB/s rd, 61 KiB/s wr, 316 op/s rd, 123 op/s wr
cluster:
id: 936b2e58-1014-4237-b2a5-6e95449a9ce8
health: HEALTH_ERR
Module 'devicehealth' has failed: disk I/O error
services:
mon: 3 daemons, quorum a,b,c (age 11h)
mgr: b(active, since 37h), standbys: a
osd: 3 osds: 3 up (since 37h), 3 in (since 37h)
rgw: 2 daemons active (2 hosts, 1 zones)
data:
pools: 8 pools, 225 pgs
objects: 53.57k objects, 26 GiB
usage: 80 GiB used, 688 GiB / 768 GiB avail
pgs: 225 active+clean
io:
client: 561 KiB/s rd, 61 KiB/s wr, 316 op/s rd, 123 op/s wr
Solution
To fix the issue, take the following steps:
-
In the output snippet, identify the manager
mgr
service with an active state. In the example provided,mgr: b
is marked as active. -
To identify the exact pod name, run the following command:
kubectl -n rook-ceph get pods | grep "rook-ceph-mgr-<active-manager-name>"
kubectl -n rook-ceph get pods | grep "rook-ceph-mgr-<active-manager-name>"The command should return an output similar to the following example, where
rook-ceph-mgr-b-6d7bdb4b54-zz47v
is the manager pod name:rook-ceph-mgr-b-6d7bdb4b54-zz47v 0/1 Init:0/1 0 3h55m
rook-ceph-mgr-b-6d7bdb4b54-zz47v 0/1 Init:0/1 0 3h55m -
Delete the active manager by running the following command:
kubectl -n rook-ceph delete pod <active-manager-pod-name> // for example: kubectl -n rook-ceph delete pod rook-ceph-mgr-b-6d7bdb4b54-zz47v
kubectl -n rook-ceph delete pod <active-manager-pod-name> // for example: kubectl -n rook-ceph delete pod rook-ceph-mgr-b-6d7bdb4b54-zz47v
Deleting the active manager forces it to restart, turning the Ceph cluster state to healthy.