Pod fails to come up after node reboot due to filesystem corruption

Description

Occasionally, when the host gets rebooted, the insights-insightslooker pod fails to come up due to a volume attachment issue. When this happens, the insights app gets stuck in progressing state, as shown in the following image:

If you check the insights-insightslooker pod in the ArgoCD UI, you should get the following error message:

Solution

To fix the issue, take the following steps:

Identify the faulty volume. In the previous message, it is pvc-5abe3c8f-7422-44da-9132-92be5641150a.
Scale down the workload that uses the affected volume. Ensure that the volume is detached from the node. To check if the volume is detached, run the following command:
```
kubectl get volumes.longhorn.io -n longhorn-system |grep <PV>kubectl get volumes.longhorn.io -n longhorn-system |grep <PV>
```
Manually attach the faulty volume to any node from the Longhorn UI.
Log in to the node and fix the device corresponding to that volume by running the following command:
```
fsck.ext4 /dev/longhorn/<ERRORED_VOLUME>fsck.ext4 /dev/longhorn/<ERRORED_VOLUME>
```
For details, see the following example:
After repairing the faulty volume, detach it from the node. You can do this from the Longhorn UI.
Scale up the workload.
The pod should come up automatically and, after some time, become healthy.

On this page

Description
Solution

Was this page helpful?

PREVIOUSLooker fails to initialize

NEXTProcess Mining troubleshooting