Performing Node Maintenance

There are scenarios where you may want to perform a node maintenance activity, such as the following:

When applying security patches;
When performing an operating system upgrade;
When changing any network configuration;
When performing any other activity that your organization mandates.

While performing node maintenance operations, it is possible that you may accidentally break the cluster. To avoid any adverse situation, follow the guidance provided here.

Note:

UiPath does not provide guidance on how to perform node maintenance activities. You must contact your IT team for this.
The following guidelines only provide instructions on the steps you must take before and after the node maintenance operation, to ensure the cluster is healthy.
It is good practice to perform node maintenance activities on one node at a time.

Pre-node Maintenance

To ensure that the cluster is healthy while you are performing node maintenance activity, you must drain the workloads running on that node to other nodes. To drain the node, save the drain-node.sh script on the targeted node and run it using the following command:

sudo bash drain-node.shsudo bash drain-node.sh

drain-node.sh script

#!/bin/bash

# =================
#
#
#
#
# Copyright UiPath 2021
#
# =================
# LICENSE AGREEMENT
# -----------------
#   Use of paid UiPath products and services is subject to the licensing agreement
#   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
#   UiPath products is subject to the associated licensing agreement available here:
#   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
#   You must not use this file separately from the product it is a part of or is associated with.
#
#
#
# =================

fetch_hostname(){

    HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")

    if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
        for private_ip in $(hostname --all-ip-addresses); do
            output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
            ip_address=$(echo "$output" | cut -f2 -d$'\t')

            if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                break
            fi
        done
    fi
}

set_kubeconfig(){
    export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
    [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
    [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
}

is_kubectl_enabled(){
  local try=0
  local maxtry=60
  local status="notready"
  echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
  while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
          try=$((try+1))
          kubectl cluster-info >/dev/null 2>&1  && status="ready"
          sleep 5;
  done

  if [[ ${status} == "notready" ]]; then
    echo "Node is not ready to accept kubectl command"
  else
    echo "Node is ready to accept kubectl command"
  fi
}

enable_ipforwarding() {
  local file_name="/etc/sysctl.conf"
  echo "Enable IP Forwarding..."

  if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
    # either file is not available or user doesn't have edit permission
    echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
    echo 1 > /proc/sys/net/ipv4/ip_forward
  else
    echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
    is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
    if [[ -z ${is_ipforwarding_available} ]]; then
      echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
      echo "net.ipv4.ip_forward = 1" >> ${file_name}
    else
      echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
      # shellcheck disable=SC2016
      sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
    fi
    sysctl -p
  fi
}

set_kubeconfig
is_kubectl_enabled
fetch_hostname

if [[ -n "$HOST_NAME_NODE" ]]; then
    # Pass an argument to uncordon the node. This is to cover reboot scenarios.
    if [ "$1" ]; then
        # enable ip forward
        enable_ipforwarding
        # uncordan node
        echo "Uncordon $HOST_NAME_NODE ..."
        kubectl uncordon "$HOST_NAME_NODE"
    else
        #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
        #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
        kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
        node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
        if [[ -n "${node_mounted_pv}" ]] ; then
          while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
          do
            PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
            VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
            if [[ -n "${PV_ID}" ]] ; then
              mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
              if [[ -n $mounts ]] ; then
                echo "Removing dangling mounts for pvc: ${PV_ID}"
                {
                  timeout 20s xargs umount -l <<< "${mounts}"
                  exitCode="$?"
                  if [[ $exitCode -eq 0 ]] ; then
                    echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                    echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                    if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                      kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                      if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                      #shellcheck disable=SC1012
                        find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                        echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                      else
                       echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                     fi
                    else
                      echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                    fi
                  elif [[ $exitCode -eq 124 ]] ; then
                    echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                  else
                    echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                  fi
                } &
              fi
            fi
          done <<< "${node_mounted_pv}"
          wait
        fi
    fi
else
  echo "Not able to fetch hostname"
fi#!/bin/bash

# =================
#
#
#
#
# Copyright UiPath 2021
#
# =================
# LICENSE AGREEMENT
# -----------------
#   Use of paid UiPath products and services is subject to the licensing agreement
#   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
#   UiPath products is subject to the associated licensing agreement available here:
#   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
#   You must not use this file separately from the product it is a part of or is associated with.
#
#
#
# =================

fetch_hostname(){

    HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")

    if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
        for private_ip in $(hostname --all-ip-addresses); do
            output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
            ip_address=$(echo "$output" | cut -f2 -d$'\t')

            if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                break
            fi
        done
    fi
}

set_kubeconfig(){
    export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
    [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
    [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
}

is_kubectl_enabled(){
  local try=0
  local maxtry=60
  local status="notready"
  echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
  while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
          try=$((try+1))
          kubectl cluster-info >/dev/null 2>&1  && status="ready"
          sleep 5;
  done

  if [[ ${status} == "notready" ]]; then
    echo "Node is not ready to accept kubectl command"
  else
    echo "Node is ready to accept kubectl command"
  fi
}

enable_ipforwarding() {
  local file_name="/etc/sysctl.conf"
  echo "Enable IP Forwarding..."

  if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
    # either file is not available or user doesn't have edit permission
    echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
    echo 1 > /proc/sys/net/ipv4/ip_forward
  else
    echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
    is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
    if [[ -z ${is_ipforwarding_available} ]]; then
      echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
      echo "net.ipv4.ip_forward = 1" >> ${file_name}
    else
      echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
      # shellcheck disable=SC2016
      sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
    fi
    sysctl -p
  fi
}

set_kubeconfig
is_kubectl_enabled
fetch_hostname

if [[ -n "$HOST_NAME_NODE" ]]; then
    # Pass an argument to uncordon the node. This is to cover reboot scenarios.
    if [ "$1" ]; then
        # enable ip forward
        enable_ipforwarding
        # uncordan node
        echo "Uncordon $HOST_NAME_NODE ..."
        kubectl uncordon "$HOST_NAME_NODE"
    else
        #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
        #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
        kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
        node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
        if [[ -n "${node_mounted_pv}" ]] ; then
          while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
          do
            PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
            VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
            if [[ -n "${PV_ID}" ]] ; then
              mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
              if [[ -n $mounts ]] ; then
                echo "Removing dangling mounts for pvc: ${PV_ID}"
                {
                  timeout 20s xargs umount -l <<< "${mounts}"
                  exitCode="$?"
                  if [[ $exitCode -eq 0 ]] ; then
                    echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                    echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                    if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                      kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                      if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                      #shellcheck disable=SC1012
                        find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                        echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                      else
                       echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                     fi
                    else
                      echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                    fi
                  elif [[ $exitCode -eq 124 ]] ; then
                    echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                  else
                    echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                  fi
                } &
              fi
            fi
          done <<< "${node_mounted_pv}"
          wait
        fi
    fi
else
  echo "Not able to fetch hostname"
fi

Stop the Kubernetes process running on the node. Run either of the following commands:
- Server node:
```
systemctl stop rke2-serversystemctl stop rke2-server
```
- Agent node:
```
systemctl stop rke2-agentsystemctl stop rke2-agent
```
If your maintenance activity includes upgrading the RPM packages on the machine, you must skip upgrading the rke2 package to avoid any compatibility issues.
- It is recommended to add the rke2 package to the exclusion list of the RPM upgrade. To modify the /etc/yum.conf file, add rke2 in exclusion. For details, see these instructions.
- Alternatively, you can temporarily exclude rke2 during yum upgrade using the following command:
```
yum upgrade --exclude "rke2-*"yum upgrade --exclude "rke2-*"
```
  Important:
  If not excluded, rke2- packages might get upgraded to the latest version, causing issues in the Automation Suite cluster. rke2-* package upgrade will be handled via the Automation Suite upgrade.
  
  Updating yum overwrites the /etc/yum.conf file and removes rke2-* from the exclusion list. To prevent that, update the yum tool using the following command: yum update --exclude yum-utils.
  
  To check if rke-2 is excluded, review the /etc/yum.conf file.
Proceed with your node maintenance activity. Once the upgrade is complete, continue with the post-node maintenance activity.

Post-node Maintenance

Reboot the node either by running sudo reboot or using any other safe reboot mechanism you may prefer.
Once the node is rebooted, ensure the rke2 service is started. Run either of the following commands:
- Server node:
```
systemctl start rke2-serversystemctl start rke2-server
```
- Agent node:
```
systemctl start rke2-agentsystemctl start rke2-agent
```
Once the rke2 service is started, you must restart the node by running the following command:
```
sudo bash drain-node.sh nodestartsudo bash drain-node.sh nodestart
```

On this page

Pre-node Maintenance
Post-node Maintenance

Was this page helpful?

PREVIOUSRenaming a node

NEXTConfiguring the cluster

Support and Services

Get The Help You Need

UiPath Academy

Learning RPA - Automation Courses

UiPath Forum

UiPath Community Forum

Trust and Security

Cookies Policy