- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Managing products
- Managing the cluster in ArgoCD
- Setting up the external NFS server
- Automated: Enabling the Backup on the Cluster
- Automated: Disabling the Backup on the Cluster
- Automated, Online: Restoring the Cluster
- Automated, Offline: Restoring the Cluster
- Manual: Enabling the Backup on the Cluster
- Manual: Disabling the Backup on the Cluster
- Manual, Online: Restoring the Cluster
- Manual, Offline: Restoring the Cluster
- Additional configuration
- Migrating objectstore from persistent volume to raw disks
- Manual: Migrating Ceph Data Pool From Replicated to Erasure-coded Type
- Automated: Migrating Ceph Data Pool From Replicated to Erasure-coded Type
- Monitoring and alerting
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bundle
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable NIC checksum offloading
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure After Certificate Update
- Automation Suite not working after OS upgrade
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Volume unable to mount due to not being ready for workloads
- RKE2 fails during installation and upgrade
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to Resize Objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Cannot Log in After Migration
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- After the Initial Install, ArgoCD App Went Into Progressing State
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unexpected Inconsistency; Run Fsck Manually
- Degraded MongoDB or Business Applications After Cluster Restore
- Missing Self-heal-operator and Sf-k8-utils Repo
- Unhealthy Services After Cluster Restore or Rollback
- RabbitMQ pod stuck in CrashLoopBackOff
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite Support Bundle Tool
- Exploring Logs
Manual: Migrating Ceph Data Pool From Replicated to Erasure-coded Type
/ceph-data
path on the server0
Kubernetes node.
server0
).
export ROOK_CEPH_EXPORT_PATH="/ceph-data"
export ROOK_CEPH_EXPORT_PATH="/ceph-data"
To view the storage space used by objects in the Ceph cluster, run the following command:
ceph_objects_bytes=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json | jq -r '.pgmap.data_bytes')
numfmt --to=iec-i $ceph_objects_bytes
ceph_objects_bytes=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json | jq -r '.pgmap.data_bytes')
numfmt --to=iec-i $ceph_objects_bytes
To prepare the Ceph tools to use the path selected at Step 1, take the following steps:
-
Disable self-heal for
rook-ceph-object-store
:kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]'
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' -
Edit Ceph tools deployment to mount
${ROOK_CEPH_EXPORT_PATH}
on kubernetes node: server0kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "add", "path":"/spec/template/spec/nodeName", "value": "server0"},{"op": "add", "path":"/spec/template/spec/volumes/2", "value": {"name":"ceph-export", "hostPath": {"path": "'${ROOK_CEPH_EXPORT_PATH}'", "type":"Directory"} }}, {"op":"add", "path": "/spec/template/spec/containers/0/volumeMounts/2", "value": {"name": "ceph-export", "mountPath": "'${ROOK_CEPH_EXPORT_PATH}'"}},{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' kubectl -n rook-ceph rollout status deploy rook-ceph-tools
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "add", "path":"/spec/template/spec/nodeName", "value": "server0"},{"op": "add", "path":"/spec/template/spec/volumes/2", "value": {"name":"ceph-export", "hostPath": {"path": "'${ROOK_CEPH_EXPORT_PATH}'", "type":"Directory"} }}, {"op":"add", "path": "/spec/template/spec/containers/0/volumeMounts/2", "value": {"name": "ceph-export", "mountPath": "'${ROOK_CEPH_EXPORT_PATH}'"}},{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' kubectl -n rook-ceph rollout status deploy rook-ceph-tools -
Allow the tools pods to write inside
${ROOK_CEPH_EXPORT_PATH}
chmod 777 ${ROOK_CEPH_EXPORT_PATH}
chmod 777 ${ROOK_CEPH_EXPORT_PATH}
-
Block traffic coming towards the
rook-ceph
namespace from any other namespace except therook-ceph
namespace:kubectl apply -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: namespace: rook-ceph name: block-rook-ceph-from-other-ns spec: podSelector: matchLabels: ingress: - from: - podSelector: {} EOF
kubectl apply -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: namespace: rook-ceph name: block-rook-ceph-from-other-ns spec: podSelector: matchLabels: ingress: - from: - podSelector: {} EOF -
Restart RGW deployment to close already established connections from other namespaces:
for rgw_deploy in $(kubectl -n rook-ceph get deploy -l app=rook-ceph-rgw -o name);do kubectl -n rook-ceph rollout restart "${rgw_deploy}" kubectl -n rook-ceph rollout status "${rgw_deploy}" done
for rgw_deploy in $(kubectl -n rook-ceph get deploy -l app=rook-ceph-rgw -o name);do kubectl -n rook-ceph rollout restart "${rgw_deploy}" kubectl -n rook-ceph rollout status "${rgw_deploy}" done
To view the cluster object count, run the following command:
BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
echo "BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}"
BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
echo "BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}"
It is recommended to re-check the object count after migration to ensure no data loss occurred.
To export the Ceph data pool, run the following command:
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' export --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-export.log 2>&1 &
wait $!
if [[ $? -eq 0 && -f ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool ]]; then
echo "Export ran successfully"
else
echo "Error while running export"
fi
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' export --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-export.log 2>&1 &
wait $!
if [[ $? -eq 0 && -f ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool ]]; then
echo "Export ran successfully"
else
echo "Error while running export"
fi
rook-ceph
operator, run the following command:
kubectl -n rook-ceph scale --replicas=0 deployment/rook-ceph-operator
kubectl -n rook-ceph scale --replicas=0 deployment/rook-ceph-operator
To recreate the erasure-coded pool, run the following command:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool rm rook-ceph.rgw.buckets.data rook-ceph.rgw.buckets.data --yes-i-really-really-mean-it --yes-i-really-really-mean-it-not-faking
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule rm rook-ceph.rgw.buckets.data
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd erasure-code-profile set rook-ceph_ecprofile k=2 m=1 crush-failure-domain=host
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool create rook-ceph.rgw.buckets.data erasure rook-ceph_ecprofile
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool set rook-ceph.rgw.buckets.data compression_mode none
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool application enable rook-ceph.rgw.buckets.data rook-ceph-rgw --yes-i-really-mean-it
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool rm rook-ceph.rgw.buckets.data rook-ceph.rgw.buckets.data --yes-i-really-really-mean-it --yes-i-really-really-mean-it-not-faking
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule rm rook-ceph.rgw.buckets.data
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd erasure-code-profile set rook-ceph_ecprofile k=2 m=1 crush-failure-domain=host
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool create rook-ceph.rgw.buckets.data erasure rook-ceph_ecprofile
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool set rook-ceph.rgw.buckets.data compression_mode none
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool application enable rook-ceph.rgw.buckets.data rook-ceph-rgw --yes-i-really-mean-it
To import the data into the data pool, run the following command:
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' import --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-import.log 2>&1 &
wait $!
if [[ $? -eq 0 ]]; then
echo "Import ran successfully"
else
echo "Error while running import"
fi
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' import --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-import.log 2>&1 &
wait $!
if [[ $? -eq 0 ]]; then
echo "Import ran successfully"
else
echo "Error while running import"
fi
To verify the loaded data, run the following command:
try=120
return_code=1
for index in $(seq 0 "${try}"); do
AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
if [[ $AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT -eq $BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT ]]; then
return_code=0
break
fi
[[ $index -eq $try ]] || sleep 5
done
if [[ $return_code -eq 0 ]]; then
echo "Found equal object count(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT})"
else
echo "Found difference in object count for pool before(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}) and after(${AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT})"
echo "Please raise a support ticket with uipath to complete the migration"
fi
try=120
return_code=1
for index in $(seq 0 "${try}"); do
AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
if [[ $AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT -eq $BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT ]]; then
return_code=0
break
fi
[[ $index -eq $try ]] || sleep 5
done
if [[ $return_code -eq 0 ]]; then
echo "Found equal object count(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT})"
else
echo "Found difference in object count for pool before(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}) and after(${AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT})"
echo "Please raise a support ticket with uipath to complete the migration"
fi
To revert temporary changes, run the following command:
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n rook-ceph scale --replicas=1 deployment/rook-ceph-operator
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "remove", "path":"/spec/template/spec/nodeName"},{"op": "remove", "path":"/spec/template/spec/volumes/2"}, {"op":"remove", "path": "/spec/template/spec/containers/0/volumeMounts/2"},{"op": "add", "path": "/spec/template/spec/containers/0/resources/limits", "value": {"memory": "256Mi"}}]'
kubectl -n rook-ceph delete NetworkPolicy block-rook-ceph-from-other-ns
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n rook-ceph scale --replicas=1 deployment/rook-ceph-operator
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "remove", "path":"/spec/template/spec/nodeName"},{"op": "remove", "path":"/spec/template/spec/volumes/2"}, {"op":"remove", "path": "/spec/template/spec/containers/0/volumeMounts/2"},{"op": "add", "path": "/spec/template/spec/containers/0/resources/limits", "value": {"memory": "256Mi"}}]'
kubectl -n rook-ceph delete NetworkPolicy block-rook-ceph-from-other-ns
You must now ensure you keep the configuration and actual state in sync. To do that, update the ArgoCD configuration by running the following command:
kubectl -n argocd get application fabric-installer -o json | jq 'if ([.spec.source.helm.parameters[].name] | index ("global.rook.dataPoolType")) == null then .spec.source.helm.parameters += [{"name": "global.rook.dataPoolType" , "value": "erasure-coded"}] else (.spec.source.helm.parameters[] | select(.name == "global.rook.dataPoolType").value) |= "erasure-coded" end' | kubectl apply -f -
kubectl -n argocd get application fabric-installer -o json | jq 'if ([.spec.source.helm.parameters[].name] | index ("global.rook.dataPoolType")) == null then .spec.source.helm.parameters += [{"name": "global.rook.dataPoolType" , "value": "erasure-coded"}] else (.spec.source.helm.parameters[] | select(.name == "global.rook.dataPoolType").value) |= "erasure-coded" end' | kubectl apply -f -
- Step 1: Selecting a path for Ceph objects
- Step 2: Preparing the Ceph tools to use the path
- Step 3: Blocking access to
rook-ceph
namespace from other namespaces - Step 4: Viewing cluster object count
- Step 5: Exporting Ceph data pool
- Step 6: Scaling down the
rook-ceph
operator - Step 7: Recreating the erasure-coded pool
- Step 8: Importing data into the data pool
- Step 9: Verifying loaded data
- Step 10: Reverting temporary changes
- Step 11: Updating ArgoCD configuration