Automation Suite
2021.10
False
横幅背景图像
Automation Suite 安装指南
上次更新日期 2024年4月19日

执行节点维护

在某些情况下,您可能需要执行节点维护活动,例如:

  • 应用安全补丁时;
  • 执行操作系统升级时;
  • 更改任何网络配置时;
  • 执行组织要求的任何其他活动时。

在执行节点维护操作时,您可能会意外中断集群。 为避免出现任何不利情况,请按照此处提供的指导进行操作。

备注:
  • UiPath 不提供有关如何执行节点维护活动的指南。 为此,您必须联系您的 IT 团队。
  • 以下准则仅提供有关在节点维护操作之前和之后必须采取的步骤的说明,以确保集群运行正常。
  • 一次在一个节点上执行节点维护活动是一种很好的做法。

节点前维护

  1. 为确保在执行节点维护活动时集群运行状况良好,您必须将该节点上运行的工作负载引流到其他节点。 要耗尽节点,请在目标节点上保存drain-node.sh脚本,并使用以下命令运行它:
    sudo bash drain-node.shsudo bash drain-node.sh
    

    drain-node.sh 脚本

    #!/bin/bash
    
    # =================
    #
    #
    #
    #
    # Copyright UiPath 2021
    #
    # =================
    # LICENSE AGREEMENT
    # -----------------
    #   Use of paid UiPath products and services is subject to the licensing agreement
    #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
    #   UiPath products is subject to the associated licensing agreement available here:
    #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
    #   You must not use this file separately from the product it is a part of or is associated with.
    #
    #
    #
    # =================
    
    fetch_hostname(){
    
        HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")
    
        if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
            for private_ip in $(hostname --all-ip-addresses); do
                output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
                ip_address=$(echo "$output" | cut -f2 -d$'\t')
    
                if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                    HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                    break
                fi
            done
        fi
    }
    
    set_kubeconfig(){
        export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
        [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
        [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
    }
    
    is_kubectl_enabled(){
      local try=0
      local maxtry=60
      local status="notready"
      echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
      while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
              try=$((try+1))
              kubectl cluster-info >/dev/null 2>&1  && status="ready"
              sleep 5;
      done
    
      if [[ ${status} == "notready" ]]; then
        echo "Node is not ready to accept kubectl command"
      else
        echo "Node is ready to accept kubectl command"
      fi
    }
    
    enable_ipforwarding() {
      local file_name="/etc/sysctl.conf"
      echo "Enable IP Forwarding..."
    
      if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
        # either file is not available or user doesn't have edit permission
        echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
        echo 1 > /proc/sys/net/ipv4/ip_forward
      else
        echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
        is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
        if [[ -z ${is_ipforwarding_available} ]]; then
          echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
          echo "net.ipv4.ip_forward = 1" >> ${file_name}
        else
          echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
          # shellcheck disable=SC2016
          sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
        fi
        sysctl -p
      fi
    }
    
    set_kubeconfig
    is_kubectl_enabled
    fetch_hostname
    
    if [[ -n "$HOST_NAME_NODE" ]]; then
        # Pass an argument to uncordon the node. This is to cover reboot scenarios.
        if [ "$1" ]; then
            # enable ip forward
            enable_ipforwarding
            # uncordan node
            echo "Uncordon $HOST_NAME_NODE ..."
            kubectl uncordon "$HOST_NAME_NODE"
        else
            #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
            #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
            kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
            node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
            if [[ -n "${node_mounted_pv}" ]] ; then
              while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
              do
                PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
                VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
                if [[ -n "${PV_ID}" ]] ; then
                  mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
                  if [[ -n $mounts ]] ; then
                    echo "Removing dangling mounts for pvc: ${PV_ID}"
                    {
                      timeout 20s xargs umount -l <<< "${mounts}"
                      exitCode="$?"
                      if [[ $exitCode -eq 0 ]] ; then
                        echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                        echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                        if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                          kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                          if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                          #shellcheck disable=SC1012
                            find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                            echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                          else
                           echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                         fi
                        else
                          echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                        fi
                      elif [[ $exitCode -eq 124 ]] ; then
                        echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                      else
                        echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                      fi
                    } &
                  fi
                fi
              done <<< "${node_mounted_pv}"
              wait
            fi
        fi
    else
      echo "Not able to fetch hostname"
    fi#!/bin/bash
    
    # =================
    #
    #
    #
    #
    # Copyright UiPath 2021
    #
    # =================
    # LICENSE AGREEMENT
    # -----------------
    #   Use of paid UiPath products and services is subject to the licensing agreement
    #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
    #   UiPath products is subject to the associated licensing agreement available here:
    #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
    #   You must not use this file separately from the product it is a part of or is associated with.
    #
    #
    #
    # =================
    
    fetch_hostname(){
    
        HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")
    
        if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
            for private_ip in $(hostname --all-ip-addresses); do
                output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
                ip_address=$(echo "$output" | cut -f2 -d$'\t')
    
                if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                    HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                    break
                fi
            done
        fi
    }
    
    set_kubeconfig(){
        export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
        [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
        [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
    }
    
    is_kubectl_enabled(){
      local try=0
      local maxtry=60
      local status="notready"
      echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
      while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
              try=$((try+1))
              kubectl cluster-info >/dev/null 2>&1  && status="ready"
              sleep 5;
      done
    
      if [[ ${status} == "notready" ]]; then
        echo "Node is not ready to accept kubectl command"
      else
        echo "Node is ready to accept kubectl command"
      fi
    }
    
    enable_ipforwarding() {
      local file_name="/etc/sysctl.conf"
      echo "Enable IP Forwarding..."
    
      if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
        # either file is not available or user doesn't have edit permission
        echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
        echo 1 > /proc/sys/net/ipv4/ip_forward
      else
        echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
        is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
        if [[ -z ${is_ipforwarding_available} ]]; then
          echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
          echo "net.ipv4.ip_forward = 1" >> ${file_name}
        else
          echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
          # shellcheck disable=SC2016
          sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
        fi
        sysctl -p
      fi
    }
    
    set_kubeconfig
    is_kubectl_enabled
    fetch_hostname
    
    if [[ -n "$HOST_NAME_NODE" ]]; then
        # Pass an argument to uncordon the node. This is to cover reboot scenarios.
        if [ "$1" ]; then
            # enable ip forward
            enable_ipforwarding
            # uncordan node
            echo "Uncordon $HOST_NAME_NODE ..."
            kubectl uncordon "$HOST_NAME_NODE"
        else
            #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
            #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
            kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
            node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
            if [[ -n "${node_mounted_pv}" ]] ; then
              while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
              do
                PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
                VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
                if [[ -n "${PV_ID}" ]] ; then
                  mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
                  if [[ -n $mounts ]] ; then
                    echo "Removing dangling mounts for pvc: ${PV_ID}"
                    {
                      timeout 20s xargs umount -l <<< "${mounts}"
                      exitCode="$?"
                      if [[ $exitCode -eq 0 ]] ; then
                        echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                        echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                        if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                          kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                          if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                          #shellcheck disable=SC1012
                            find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                            echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                          else
                           echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                         fi
                        else
                          echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                        fi
                      elif [[ $exitCode -eq 124 ]] ; then
                        echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                      else
                        echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                      fi
                    } &
                  fi
                fi
              done <<< "${node_mounted_pv}"
              wait
            fi
        fi
    else
      echo "Not able to fetch hostname"
    fi
  2. 停止节点上运行的 Kubernetes 进程。 运行以下任一命令:
    • 服务器节点

      systemctl stop rke2-serversystemctl stop rke2-server
    • 代理节点

      systemctl stop rke2-agentsystemctl stop rke2-agent
  3. 如果您的维护活动包括升级计算机上的 RPM 包,则必须跳过升级rke2包,以避免出现任何兼容性问题。
    • 建议将 rke2 包添加到 RPM 升级的排除列表中。要修改 /etc/yum.conf 文件,请在排除中添加 rke2。有关详细信息,请参阅这些说明
    • 或者,您可以使用以下命令在yum upgrade期间临时排除rke2
      yum upgrade --exclude "rke2-*"yum upgrade --exclude "rke2-*"
      重要提示:
      如果不排除, rke2-包可能会升级到最新版本,从而导致 Automation Suite 集群出现问题。 rke2-*包升级将通过 Automation Suite 升级处理。
      更新yum会覆盖/etc/yum.conf文件,并将rke2-*从排除列表中删除。 为防止出现这种情况,请使用以下命令更新yum工具: yum update --exclude yum-utils
      要检查是否排除了 rke-2 ,请查看 /etc/yum.conf 文件。
  4. 继续执行节点维护活动。 升级完成后,继续执行节点后维护活动。

节点后期维护

  1. 通过运行sudo reboot或使用您可能喜欢的任何其他安全重新启动机制来重新启动节点。
  2. 重新启动节点后,请确保rke2服务已启动。 运行以下任一命令:
    • 服务器节点

      systemctl start rke2-serversystemctl start rke2-server
    • 代理节点

      systemctl start rke2-agentsystemctl start rke2-agent
  3. 启动rke2服务后,您必须通过运行以下命令重新启动节点:
    sudo bash drain-node.sh nodestartsudo bash drain-node.sh nodestart
  • 节点前维护
  • 节点后期维护

此页面是否有帮助?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath 白色徽标
信任与安全
© 2005-2024 UiPath. All rights reserved.