Automation Suite
2022.4
False
- 概述
- 要求
- 安装
- 安装后
- 集群管理
- 管理产品
- 在 ArgoCD 中管理集群
- 将对象存储从持久性卷迁移到原始磁盘
- 手册:将 Ceph 数据池从复制类型迁移到纠删码类型
- 自动:将 Ceph 数据池从复制类型迁移到纠删码类型
- 监控和警示
- 迁移和升级
- 特定于产品的配置
- 最佳实践和维护
- 故障排除
- 无法获取沙盒映像
- Pod 未显示在 ArgoCD 用户界面中
- Redis 探测器失败
- RKE2 服务器无法启动
- 在 UiPath 命名空间中找不到密码
- 初始安装后,ArgoCD 应用程序进入“进行中”状态
- MongoDB Pod 处于 CrashLoopBackOff 状态或在删除后处于“等待 PVC 配置”状态
- 意外错误:不一致;手动运行 fsck
- 集群还原后 MongoDB 或业务应用程序降级
- 缺少 Self-heal-operator 和 Sf-k8-utils 存储库
- 集群还原或回滚后服务不正常
- RabbitMQ Pod 卡在 CrashLoopBackOff 中
- Prometheus 处于 CrashLoopBackoff 状态,并出现内存不足 (OOM) 错误
- 监控仪表板中缺少 Ceph-rook 指标
- 使用 Automation Suite 诊断工具
- 使用 Automation Suite 支持包工具
- 探索日志
手册:将 Ceph 数据池从复制类型迁移到纠删码类型
Automation Suite 安装指南
上次更新日期 2024年4月24日
手册:将 Ceph 数据池从复制类型迁移到纠删码类型
您必须选择具有可用于保存 Ceph 对象的存储空间的文件系统路径。 例如,假设您决定在
server0
Kubernetes 节点上使用/ceph-data
路径。
重要提示: 您必须对齐 Ceph 工具才能使用此主机路径,并且必须在同一台计算机上执行所有后续命令(在本例中为
server0
)。
export ROOK_CEPH_EXPORT_PATH="/ceph-data"
export ROOK_CEPH_EXPORT_PATH="/ceph-data"
要查看 Ceph 集群中的对象使用的存储空间,请运行以下命令:
ceph_objects_bytes=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json | jq -r '.pgmap.data_bytes')
numfmt --to=iec-i $ceph_objects_bytes
ceph_objects_bytes=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json | jq -r '.pgmap.data_bytes')
numfmt --to=iec-i $ceph_objects_bytes
要准备 Ceph 工具以使用在步骤 1 中选择的路径,请执行以下步骤:
-
禁用
rook-ceph-object-store
的自我修复:kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]'
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":false}]' -
编辑 Ceph 工具部署以在 Kubernetes 节点 server0 上装载
${ROOK_CEPH_EXPORT_PATH}
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "add", "path":"/spec/template/spec/nodeName", "value": "server0"},{"op": "add", "path":"/spec/template/spec/volumes/2", "value": {"name":"ceph-export", "hostPath": {"path": "'${ROOK_CEPH_EXPORT_PATH}'", "type":"Directory"} }}, {"op":"add", "path": "/spec/template/spec/containers/0/volumeMounts/2", "value": {"name": "ceph-export", "mountPath": "'${ROOK_CEPH_EXPORT_PATH}'"}},{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' kubectl -n rook-ceph rollout status deploy rook-ceph-tools
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "add", "path":"/spec/template/spec/nodeName", "value": "server0"},{"op": "add", "path":"/spec/template/spec/volumes/2", "value": {"name":"ceph-export", "hostPath": {"path": "'${ROOK_CEPH_EXPORT_PATH}'", "type":"Directory"} }}, {"op":"add", "path": "/spec/template/spec/containers/0/volumeMounts/2", "value": {"name": "ceph-export", "mountPath": "'${ROOK_CEPH_EXPORT_PATH}'"}},{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' kubectl -n rook-ceph rollout status deploy rook-ceph-tools -
允许工具 pod 在
${ROOK_CEPH_EXPORT_PATH}
内写入chmod 777 ${ROOK_CEPH_EXPORT_PATH}
chmod 777 ${ROOK_CEPH_EXPORT_PATH}
-
阻止从
rook-ceph
命名空间以外的任何其他命名空间流向rook-ceph
命名空间的流量:kubectl apply -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: namespace: rook-ceph name: block-rook-ceph-from-other-ns spec: podSelector: matchLabels: ingress: - from: - podSelector: {} EOF
kubectl apply -f - <<EOF kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: namespace: rook-ceph name: block-rook-ceph-from-other-ns spec: podSelector: matchLabels: ingress: - from: - podSelector: {} EOF -
重新启动 RGW 部署,以关闭已与其他命名空间建立的连接:
for rgw_deploy in $(kubectl -n rook-ceph get deploy -l app=rook-ceph-rgw -o name);do kubectl -n rook-ceph rollout restart "${rgw_deploy}" kubectl -n rook-ceph rollout status "${rgw_deploy}" done
for rgw_deploy in $(kubectl -n rook-ceph get deploy -l app=rook-ceph-rgw -o name);do kubectl -n rook-ceph rollout restart "${rgw_deploy}" kubectl -n rook-ceph rollout status "${rgw_deploy}" done
要查看集群对象计数,请运行以下命令:
BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
echo "BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}"
BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
echo "BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT=${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}"
建议在迁移后重新检查对象计数,以确保没有发生数据丢失。
要导出 Ceph 数据池,请运行以下命令:
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' export --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-export.log 2>&1 &
wait $!
if [[ $? -eq 0 && -f ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool ]]; then
echo "Export ran successfully"
else
echo "Error while running export"
fi
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' export --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-export.log 2>&1 &
wait $!
if [[ $? -eq 0 && -f ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool ]]; then
echo "Export ran successfully"
else
echo "Error while running export"
fi
要缩减
rook-ceph
运算符,请运行以下命令:
kubectl -n rook-ceph scale --replicas=0 deployment/rook-ceph-operator
kubectl -n rook-ceph scale --replicas=0 deployment/rook-ceph-operator
要重新创建纠删码池,请运行以下命令:
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool rm rook-ceph.rgw.buckets.data rook-ceph.rgw.buckets.data --yes-i-really-really-mean-it --yes-i-really-really-mean-it-not-faking
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule rm rook-ceph.rgw.buckets.data
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd erasure-code-profile set rook-ceph_ecprofile k=2 m=1 crush-failure-domain=host
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool create rook-ceph.rgw.buckets.data erasure rook-ceph_ecprofile
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool set rook-ceph.rgw.buckets.data compression_mode none
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool application enable rook-ceph.rgw.buckets.data rook-ceph-rgw --yes-i-really-mean-it
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool rm rook-ceph.rgw.buckets.data rook-ceph.rgw.buckets.data --yes-i-really-really-mean-it --yes-i-really-really-mean-it-not-faking
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd crush rule rm rook-ceph.rgw.buckets.data
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd erasure-code-profile set rook-ceph_ecprofile k=2 m=1 crush-failure-domain=host
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool create rook-ceph.rgw.buckets.data erasure rook-ceph_ecprofile
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool set rook-ceph.rgw.buckets.data compression_mode none
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool application enable rook-ceph.rgw.buckets.data rook-ceph-rgw --yes-i-really-mean-it
要将数据导入数据池,请运行以下命令:
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' import --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-import.log 2>&1 &
wait $!
if [[ $? -eq 0 ]]; then
echo "Import ran successfully"
else
echo "Error while running import"
fi
nohup kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados -p 'rook-ceph.rgw.buckets.data' import --workers 5 ${ROOK_CEPH_EXPORT_PATH}/ceph-data-pool >> /tmp/ceph-data-pool-import.log 2>&1 &
wait $!
if [[ $? -eq 0 ]]; then
echo "Import ran successfully"
else
echo "Error while running import"
fi
要验证加载的数据,请运行以下命令:
try=120
return_code=1
for index in $(seq 0 "${try}"); do
AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
if [[ $AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT -eq $BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT ]]; then
return_code=0
break
fi
[[ $index -eq $try ]] || sleep 5
done
if [[ $return_code -eq 0 ]]; then
echo "Found equal object count(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT})"
else
echo "Found difference in object count for pool before(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}) and after(${AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT})"
echo "Please raise a support ticket with uipath to complete the migration"
fi
try=120
return_code=1
for index in $(seq 0 "${try}"); do
AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- rados df --format json | jq -r --arg poolName "rook-ceph.rgw.buckets.data" '.pools[] | select(.name==$poolName).num_objects')
if [[ $AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT -eq $BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT ]]; then
return_code=0
break
fi
[[ $index -eq $try ]] || sleep 5
done
if [[ $return_code -eq 0 ]]; then
echo "Found equal object count(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT})"
else
echo "Found difference in object count for pool before(${BEFORE_MIGRATION_DATA_POOL_OBJECT_COUNT}) and after(${AFTER_MIGRATION_DATA_POOL_OBJECT_COUNT})"
echo "Please raise a support ticket with uipath to complete the migration"
fi
要还原临时更改,请运行以下命令:
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n rook-ceph scale --replicas=1 deployment/rook-ceph-operator
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "remove", "path":"/spec/template/spec/nodeName"},{"op": "remove", "path":"/spec/template/spec/volumes/2"}, {"op":"remove", "path": "/spec/template/spec/containers/0/volumeMounts/2"},{"op": "add", "path": "/spec/template/spec/containers/0/resources/limits", "value": {"memory": "256Mi"}}]'
kubectl -n rook-ceph delete NetworkPolicy block-rook-ceph-from-other-ns
kubectl -n argocd patch application rook-ceph-operator --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application rook-ceph-object-store --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n argocd patch application fabric-installer --type=json -p '[{"op":"replace","path":"/spec/syncPolicy/automated/selfHeal","value":true}]'
kubectl -n rook-ceph scale --replicas=1 deployment/rook-ceph-operator
kubectl -n rook-ceph patch deploy rook-ceph-tools --type='json' -p='[{"op": "remove", "path":"/spec/template/spec/nodeName"},{"op": "remove", "path":"/spec/template/spec/volumes/2"}, {"op":"remove", "path": "/spec/template/spec/containers/0/volumeMounts/2"},{"op": "add", "path": "/spec/template/spec/containers/0/resources/limits", "value": {"memory": "256Mi"}}]'
kubectl -n rook-ceph delete NetworkPolicy block-rook-ceph-from-other-ns
现在,您必须确保使配置和实际状态保持同步。 为此,请运行以下命令来更新 ArgoCD 配置:
kubectl -n argocd get application fabric-installer -o json | jq 'if ([.spec.source.helm.parameters[].name] | index ("global.rook.dataPoolType")) == null then .spec.source.helm.parameters += [{"name": "global.rook.dataPoolType" , "value": "erasure-coded"}] else (.spec.source.helm.parameters[] | select(.name == "global.rook.dataPoolType").value) |= "erasure-coded" end' | kubectl apply -f -
kubectl -n argocd get application fabric-installer -o json | jq 'if ([.spec.source.helm.parameters[].name] | index ("global.rook.dataPoolType")) == null then .spec.source.helm.parameters += [{"name": "global.rook.dataPoolType" , "value": "erasure-coded"}] else (.spec.source.helm.parameters[] | select(.name == "global.rook.dataPoolType").value) |= "erasure-coded" end' | kubectl apply -f -