Cluster unhealthy after automated upgrade from 2021.10

During the automated upgrade from Automation Suite 2021.10, the CNI provider is migrated from Canal to Cilium. This operation requires that all nodes are restarted. On rare occasions, one or more nodes might not be successfully rebooted, causing pods running on those nodes to remain unhealthy.

Recovery steps

Identify failed restarts.

During the Ansible execution, you might see output similar to the following snippet:
```
TASK [Reboot the servers] ***************************************************************************************************************************
fatal: [10.0.1.6]: FAILED! =>
  msg: 'Failed to connect to the host via ssh: ssh: connect to host 10.0.1.6 port 22: Connection timed out'TASK [Reboot the servers] ***************************************************************************************************************************
fatal: [10.0.1.6]: FAILED! =>
  msg: 'Failed to connect to the host via ssh: ssh: connect to host 10.0.1.6 port 22: Connection timed out'
```
Alternatively, browse the logs on the Ansible host machine, located at /var/tmp/uipathctl_<version>/_install-uipath.log. If any failed restarts were identified, execute steps 2 through 4 on all nodes.
Confirm a reboot is needed on each node.

Connect to the each node and run the following command:
```
ssh <username>@<ip-address>
iptables-save 2>/dev/null | grep -i cali -cssh <username>@<ip-address>
iptables-save 2>/dev/null | grep -i cali -c
```
If the result is not zero, a reboot is needed.
Reboot the node:
```
sudo rebootsudo reboot
```
Wait for the node to become responsive (you should be able to SSH to it) and repeat steps 2 through 4 on every other node.

On this page

Recovery steps

Was this page helpful?

PREVIOUSUpgrade and migration troubleshooting

NEXTStorage troubleshooting

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy