无法在对象存储中上传或下载数据

描述

当对象存储由于置放群组 (PG) 不一致而处于降级状态时，可能会发生此问题。

通过运行以下命令，验证问题是否确实与 rook-ceph PG 不一致有关：

export KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin
ROOK_CEPH_TOOLS=$(kubectl -n rook-ceph get pods | grep rook-ceph-tools)
kubectl -n rook-ceph exec -it $ROOK_CEPH_TOOLS -- ceph statusexport KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin
ROOK_CEPH_TOOLS=$(kubectl -n rook-ceph get pods | grep rook-ceph-tools)
kubectl -n rook-ceph exec -it $ROOK_CEPH_TOOLS -- ceph status

如果问题与 rook-ceph PG 不一致有关，则输出将包含以下消息：

....
....
Possible data damage: X pgs inconsistent
....
....
X active+clean+inconsistent
....
........
....
Possible data damage: X pgs inconsistent
....
....
X active+clean+inconsistent
....
....

解决方案

要修复不一致的 PG，请执行以下步骤：

执行到 rook-ceph 工具：

kubectl -n rook-ceph exec -it $ROOK_CEPH_TOOLS -- shkubectl -n rook-ceph exec -it $ROOK_CEPH_TOOLS -- sh

触发 rook-ceph 垃圾收集器流程。等待该流程完成。
```
radosgw-admin gc processradosgw-admin gc process
```

查找包含 active+clean+inconsistent 个 PG 的列表：

ceph health detail
# output of this command be like
# ....
# pg <pg-id> is active+clean+inconsistent, acting ..
# pg <pg-id> is active+clean+inconsistent, acting ..
# ....
#ceph health detail
# output of this command be like
# ....
# pg <pg-id> is active+clean+inconsistent, acting ..
# pg <pg-id> is active+clean+inconsistent, acting ..
# ....
#

一次触发一个对 PG 的深度清理。此命令需要几分钟才能运行，具体取决于 PG 大小。
```
ceph pg deep-scrub <pg-id>ceph pg deep-scrub <pg-id>
```

观察清理状态：

ceph -w | grep <pg-id>ceph -w | grep <pg-id>

检查 PG 清理状态。如果 PG 清理成功，则 PG 状态应为 active+clean+inconsistent。
```
ceph health detail | grep <pg-id>ceph health detail | grep <pg-id>
```

修复 PG：

ceph pg repair <pg-id>ceph pg repair <pg-id>

检查 PG 修复状态。如果成功修复 PG，则应从 active+clean+inconsistent 列表中删除 PG ID。
```
ceph health detail | grep <pg-id>ceph health detail | grep <pg-id>
```
对其余不一致的 PG 重复步骤 3 到 8。