所有 Longhorn 副本均发生故障

描述

如果 Longhorn 副本处于故障状态并需要手动抢救，则卷可能无法附加，并继续处于分离状态。

要检查卷是否需要手动抢救，请运行以下命令：

kubectl  logs -l app=longhorn-manager  -n longhorn-system  -c longhorn-manager --prefix=true  --tail=-1  |grep "set engine salvageRequested to true" | grep <PV NAME>kubectl  logs -l app=longhorn-manager  -n longhorn-system  -c longhorn-manager --prefix=true  --tail=-1  |grep "set engine salvageRequested to true" | grep <PV NAME>

示例输出：

2023-11-20T18:22:16.667609096+11:00 time="2023-11-20T07:22:16Z" level=info msg="All replicas are failed, set engine salvageRequested to true" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=rpa-suite-dev-01.it.csiro.au owner=rpa-suite-dev-01.it.csiro.au state=detaching volume=pvc-031fd6bc-9cfe-420a-9213-da38509d733a2023-11-20T18:22:16.667609096+11:00 time="2023-11-20T07:22:16Z" level=info msg="All replicas are failed, set engine salvageRequested to true" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=rpa-suite-dev-01.it.csiro.au owner=rpa-suite-dev-01.it.csiro.au state=detaching volume=pvc-031fd6bc-9cfe-420a-9213-da38509d733a

解决方案

要解决此问题，请执行以下步骤：

缩小工作负载 Pod。

通过运行以下命令查找相关卷的副本：

kubectl get replicas.longhorn.io -n longhorn-system |grep <PV_NAME>kubectl get replicas.longhorn.io -n longhorn-system |grep <PV_NAME>

通过运行以下命令编辑相关 PV 的 replicas.longhorn.io 对象，并将 spec.failedat 字段设置为空 (““)。
```
kubectl get replicas.longhorn.io -n longhorn-system |grep <PV_NAME>kubectl get replicas.longhorn.io -n longhorn-system |grep <PV_NAME>
```
扩展工作负载 Pod。