automation-suite
2023.4
false
- Overview
- Requirements
- Installation
- Q&A: Deployment templates
- Configuring the machines
- Configuring the external objectstore
- Configuring an external Docker registry
- Configuring the load balancer
- Configuring the DNS
- Configuring Microsoft SQL Server
- Configuring the certificates
- Online multi-node HA-ready production installation
- Offline multi-node HA-ready production installation
- Disaster recovery - Installing the secondary cluster
- Downloading the installation packages
- install-uipath.sh parameters
- Enabling Redis High Availability Add-On for the cluster
- Document Understanding configuration file
- Adding a dedicated agent node with GPU support
- Adding a dedicated agent Node for Task Mining
- Connecting Task Mining application
- Adding a Dedicated Agent Node for Automation Suite Robots
- Post-installation
- Cluster administration
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bundle
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to manually set the ArgoCD log level to Info
- How to generate the encoded pull_secret_value for external registries
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- GPU node affected by resource unavailability
- Volume unable to mount due to not being ready for workloads
- Support bundle log collection failure
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to resize PVC
- Failure to resize objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Issues accessing the ArgoCD read-only account
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unhealthy services after cluster restore or rollback
- Pods stuck in Init:0/X
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Running High Availability with Process Mining
- Process Mining ingestion failed when logged in using Kerberos
- Unable to connect to AutomationSuite_ProcessMining_Warehouse database using a pyodbc format connection string
- Airflow installation fails with sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
- How to add an IP table rule to use SQL Server port 1433
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Pods stuck in Init:0/X
Automation Suite on Linux Installation Guide
Last updated Sep 5, 2024
Pods stuck in Init:0/X
DescriptionPods using LH volume are stuck in Init:0/X (where X is an integer representing the number of containers), and the kubectl describe command on the pod returns MapVolume.MapPodDevice failed for volume in events.
To fix this issue, run the following command:
for podPv in $(kubectl get events -A -o json | jq -r '.items[] | select(.reason == "FailedMapVolume" and .involvedObject.kind == "Pod" and (.message | contains("MapVolume.MapPodDevice failed for volume") and contains("Could not mount \"/dev/longhorn"))) | .involvedObject.namespace + "=" + .involvedObject.name + "=" + (.message | match("(pvc-[0-9a-z-]+)").captures[0].string )') ; do
echo "Found 'FailedMapVolume' error: '${podPv}'"
NS=$(echo "${podPv}" | cut -d'=' -f1)
POD=$(echo "${podPv}" | cut -d'=' -f2)
PV=$(echo "${podPv}" | cut -d'=' -f3)
[[ -z "${NS}" ]] && echo "Could not extract namespace for error: '${podPv}'" && continue
[[ -z "${POD}" ]] && echo "Could not extract pod name for error: '${podPv}'" && continue
[[ -z "${PV}" ]] && echo "Could not extract Persistent Volume for error: '${podPv}'" && continue
controller_data=$(kubectl -n "${NS}" get po "${POD}" -o json | jq -r '[.metadata.ownerReferences[] | select(.controller==true)][0] | .kind + "=" + .name')
[[ -z "$controller_data" ]] && echo "Could not determine owner for pod: ${POD} in namespace: ${NS}" && continue
CONTROLLER_KIND=$(echo "${controller_data}" | cut -d'=' -f1)
[[ -z "${CONTROLLER_KIND}" ]] && echo "Could not extract controller kind for pod: '${POD}' in namespace: '${NS}' && continue
CONTROLLER_NAME=$(echo "${controller_data}" | cut -d'=' -f2)
[[ -z "${CONTROLLER_NAME}" ]] && echo "Could not extract controller name for pod: '${POD}' in namespace: '${NS}' && continue
if [[ $CONTROLLER_KIND == "ReplicaSet" ]]
then
controller_data=$(kubectl -n "${NS}" get "${CONTROLLER_KIND}" "${CONTROLLER_NAME}" -o json | jq -r '[.metadata.ownerReferences[] | select(.controller==true)][0] | .kind + "=" + .name')
CONTROLLER_KIND=$(echo "${controller_data}" | cut -d'=' -f1)
[[ -z "${CONTROLLER_KIND}" ]] && echo "Could not extract controller kind(from rs) for pod: '${POD}' in namespace: '${NS}'" && continue
CONTROLLER_NAME=$(echo "${controller_data}" | cut -d'=' -f2)
[[ -z "${CONTROLLER_NAME}" ]] && echo "Could not extract controller name(from rs) for pod: '${POD}' in namespace: '${NS}'" && continue
fi
org_replicas=$(kubectl -n "${NS}" get "$CONTROLLER_KIND" "$CONTROLLER_NAME" -o json | jq -r '.status.replicas')
echo "Scaling down ${CONTROLLER_KIND}/${CONTROLLER_NAME}"
kubectl -n "${NS}" patch "$CONTROLLER_KIND" "${CONTROLLER_NAME}" -p "{\"spec\":{\"replicas\":0}}"
if kubectl -n "${NS}" get pod "${POD}" ; then
kubectl -n "${NS}" wait --for=delete pod "${POD}" --timeout=300s
fi
if kubectl get volumeattachment | grep -q "${PV}"; then
volumeattachment_id=$(kubectl get volumeattachment | grep "${PV}" | awk '{print $1}')
kubectl delete volumeattachment ${volumeattachment_id}
fi
[[ -z "$org_replicas" || "${org_replicas}" -eq 0 ]] && org_replicas=1
echo "Scaling up ${CONTROLLER_KIND}/${CONTROLLER_NAME}"
kubectl -n "${NS}" patch "$CONTROLLER_KIND" "${CONTROLLER_NAME}" -p "{\"spec\":{\"replicas\":${org_replicas}}}"
done
for podPv in $(kubectl get events -A -o json | jq -r '.items[] | select(.reason == "FailedMapVolume" and .involvedObject.kind == "Pod" and (.message | contains("MapVolume.MapPodDevice failed for volume") and contains("Could not mount \"/dev/longhorn"))) | .involvedObject.namespace + "=" + .involvedObject.name + "=" + (.message | match("(pvc-[0-9a-z-]+)").captures[0].string )') ; do
echo "Found 'FailedMapVolume' error: '${podPv}'"
NS=$(echo "${podPv}" | cut -d'=' -f1)
POD=$(echo "${podPv}" | cut -d'=' -f2)
PV=$(echo "${podPv}" | cut -d'=' -f3)
[[ -z "${NS}" ]] && echo "Could not extract namespace for error: '${podPv}'" && continue
[[ -z "${POD}" ]] && echo "Could not extract pod name for error: '${podPv}'" && continue
[[ -z "${PV}" ]] && echo "Could not extract Persistent Volume for error: '${podPv}'" && continue
controller_data=$(kubectl -n "${NS}" get po "${POD}" -o json | jq -r '[.metadata.ownerReferences[] | select(.controller==true)][0] | .kind + "=" + .name')
[[ -z "$controller_data" ]] && echo "Could not determine owner for pod: ${POD} in namespace: ${NS}" && continue
CONTROLLER_KIND=$(echo "${controller_data}" | cut -d'=' -f1)
[[ -z "${CONTROLLER_KIND}" ]] && echo "Could not extract controller kind for pod: '${POD}' in namespace: '${NS}' && continue
CONTROLLER_NAME=$(echo "${controller_data}" | cut -d'=' -f2)
[[ -z "${CONTROLLER_NAME}" ]] && echo "Could not extract controller name for pod: '${POD}' in namespace: '${NS}' && continue
if [[ $CONTROLLER_KIND == "ReplicaSet" ]]
then
controller_data=$(kubectl -n "${NS}" get "${CONTROLLER_KIND}" "${CONTROLLER_NAME}" -o json | jq -r '[.metadata.ownerReferences[] | select(.controller==true)][0] | .kind + "=" + .name')
CONTROLLER_KIND=$(echo "${controller_data}" | cut -d'=' -f1)
[[ -z "${CONTROLLER_KIND}" ]] && echo "Could not extract controller kind(from rs) for pod: '${POD}' in namespace: '${NS}'" && continue
CONTROLLER_NAME=$(echo "${controller_data}" | cut -d'=' -f2)
[[ -z "${CONTROLLER_NAME}" ]] && echo "Could not extract controller name(from rs) for pod: '${POD}' in namespace: '${NS}'" && continue
fi
org_replicas=$(kubectl -n "${NS}" get "$CONTROLLER_KIND" "$CONTROLLER_NAME" -o json | jq -r '.status.replicas')
echo "Scaling down ${CONTROLLER_KIND}/${CONTROLLER_NAME}"
kubectl -n "${NS}" patch "$CONTROLLER_KIND" "${CONTROLLER_NAME}" -p "{\"spec\":{\"replicas\":0}}"
if kubectl -n "${NS}" get pod "${POD}" ; then
kubectl -n "${NS}" wait --for=delete pod "${POD}" --timeout=300s
fi
if kubectl get volumeattachment | grep -q "${PV}"; then
volumeattachment_id=$(kubectl get volumeattachment | grep "${PV}" | awk '{print $1}')
kubectl delete volumeattachment ${volumeattachment_id}
fi
[[ -z "$org_replicas" || "${org_replicas}" -eq 0 ]] && org_replicas=1
echo "Scaling up ${CONTROLLER_KIND}/${CONTROLLER_NAME}"
kubectl -n "${NS}" patch "$CONTROLLER_KIND" "${CONTROLLER_NAME}" -p "{\"spec\":{\"replicas\":${org_replicas}}}"
done