Automation Suite
2021.10
falso
Imagem de fundo do banner
Guia de instalação do Automation Suite
Última atualização 19 de abr de 2024

Executando a manutenção do nó

Há cenários em que você pode querer realizar uma atividade de manutenção do nó, como os seguintes:

  • Ao aplicar patches de segurança;
  • Ao realizar uma atualização do sistema operacional;
  • Ao alterar qualquer configuração de rede;
  • Ao realizar qualquer outra atividade que sua organização exige.

Ao executar operações de manutenção do nó, é possível que você quebre acidentalmente o cluster. Para evitar qualquer situação adversa, siga as orientações fornecidas aqui.

Observação:
  • A UiPath não fornece orientação sobre como realizar atividades de manutenção de nós. Você deve entrar em contato com sua equipe de TI para esse fim.
  • As diretrizes a seguir fornecem apenas instruções sobre as etapas que você deve seguir antes e depois da operação de manutenção do nó, para garantir que o cluster esteja íntegro.
  • É uma boa prática realizar atividades de manutenção em um nó por vez.

Manutenção pré-nó

  1. Para garantir que o cluster esteja íntegro enquanto você executa a atividade de manutenção do nó, você deve drenar as cargas de trabalho em execução nesse nó para outros nós. Para drenar o nó, salve o script drain-node.sh no nó de destino e execute-o usando o seguinte comando:
    sudo bash drain-node.shsudo bash drain-node.sh
    

    script dreno-node.sh

    #!/bin/bash
    
    # =================
    #
    #
    #
    #
    # Copyright UiPath 2021
    #
    # =================
    # LICENSE AGREEMENT
    # -----------------
    #   Use of paid UiPath products and services is subject to the licensing agreement
    #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
    #   UiPath products is subject to the associated licensing agreement available here:
    #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
    #   You must not use this file separately from the product it is a part of or is associated with.
    #
    #
    #
    # =================
    
    fetch_hostname(){
    
        HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")
    
        if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
            for private_ip in $(hostname --all-ip-addresses); do
                output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
                ip_address=$(echo "$output" | cut -f2 -d$'\t')
    
                if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                    HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                    break
                fi
            done
        fi
    }
    
    set_kubeconfig(){
        export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
        [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
        [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
    }
    
    is_kubectl_enabled(){
      local try=0
      local maxtry=60
      local status="notready"
      echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
      while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
              try=$((try+1))
              kubectl cluster-info >/dev/null 2>&1  && status="ready"
              sleep 5;
      done
    
      if [[ ${status} == "notready" ]]; then
        echo "Node is not ready to accept kubectl command"
      else
        echo "Node is ready to accept kubectl command"
      fi
    }
    
    enable_ipforwarding() {
      local file_name="/etc/sysctl.conf"
      echo "Enable IP Forwarding..."
    
      if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
        # either file is not available or user doesn't have edit permission
        echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
        echo 1 > /proc/sys/net/ipv4/ip_forward
      else
        echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
        is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
        if [[ -z ${is_ipforwarding_available} ]]; then
          echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
          echo "net.ipv4.ip_forward = 1" >> ${file_name}
        else
          echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
          # shellcheck disable=SC2016
          sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
        fi
        sysctl -p
      fi
    }
    
    set_kubeconfig
    is_kubectl_enabled
    fetch_hostname
    
    if [[ -n "$HOST_NAME_NODE" ]]; then
        # Pass an argument to uncordon the node. This is to cover reboot scenarios.
        if [ "$1" ]; then
            # enable ip forward
            enable_ipforwarding
            # uncordan node
            echo "Uncordon $HOST_NAME_NODE ..."
            kubectl uncordon "$HOST_NAME_NODE"
        else
            #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
            #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
            kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
            node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
            if [[ -n "${node_mounted_pv}" ]] ; then
              while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
              do
                PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
                VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
                if [[ -n "${PV_ID}" ]] ; then
                  mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
                  if [[ -n $mounts ]] ; then
                    echo "Removing dangling mounts for pvc: ${PV_ID}"
                    {
                      timeout 20s xargs umount -l <<< "${mounts}"
                      exitCode="$?"
                      if [[ $exitCode -eq 0 ]] ; then
                        echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                        echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                        if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                          kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                          if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                          #shellcheck disable=SC1012
                            find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                            echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                          else
                           echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                         fi
                        else
                          echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                        fi
                      elif [[ $exitCode -eq 124 ]] ; then
                        echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                      else
                        echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                      fi
                    } &
                  fi
                fi
              done <<< "${node_mounted_pv}"
              wait
            fi
        fi
    else
      echo "Not able to fetch hostname"
    fi#!/bin/bash
    
    # =================
    #
    #
    #
    #
    # Copyright UiPath 2021
    #
    # =================
    # LICENSE AGREEMENT
    # -----------------
    #   Use of paid UiPath products and services is subject to the licensing agreement
    #   executed between you and UiPath. Unless otherwise indicated by UiPath, use of free
    #   UiPath products is subject to the associated licensing agreement available here:
    #   https://www.uipath.com/legal/trust-and-security/legal-terms (or successor website).
    #   You must not use this file separately from the product it is a part of or is associated with.
    #
    #
    #
    # =================
    
    fetch_hostname(){
    
        HOST_NAME_NODE=$(kubectl get nodes -o name | cut -d'/' -f2 | grep "$(hostname)")
    
        if ! [[ -n ${HOST_NAME_NODE} && "$(hostname)" == "$HOST_NAME_NODE" ]]; then
            for private_ip in $(hostname --all-ip-addresses); do
                output=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | grep "$private_ip")
                ip_address=$(echo "$output" | cut -f2 -d$'\t')
    
                if [[ -n ${ip_address} && "$private_ip" == "$ip_address" ]]; then
                    HOST_NAME_NODE=$(echo "$output" | cut -f1 -d$'\t')
                    break
                fi
            done
        fi
    }
    
    set_kubeconfig(){
        export PATH=$PATH:/var/lib/rancher/rke2/bin:/usr/local/bin
        [[ -f "/var/lib/rancher/rke2/agent/kubelet.kubeconfig" ]] && export KUBECONFIG="/var/lib/rancher/rke2/agent/kubelet.kubeconfig"
        [[ -f "/etc/rancher/rke2/rke2.yaml" ]] && export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
    }
    
    is_kubectl_enabled(){
      local try=0
      local maxtry=60
      local status="notready"
      echo "Checking if node $HOST_NAME_NODE is ready to run kubectl command."
      while [[ ${status} == "notready" ]] && (( try != maxtry )) ; do
              try=$((try+1))
              kubectl cluster-info >/dev/null 2>&1  && status="ready"
              sleep 5;
      done
    
      if [[ ${status} == "notready" ]]; then
        echo "Node is not ready to accept kubectl command"
      else
        echo "Node is ready to accept kubectl command"
      fi
    }
    
    enable_ipforwarding() {
      local file_name="/etc/sysctl.conf"
      echo "Enable IP Forwarding..."
    
      if [[ ! -f "${file_name}" || -w "${file_name}" ]]; then
        # either file is not available or user doesn't have edit permission
        echo "Either file ${file_name} not present or file is not writable. Enabling ip forward using /proc/sys/net/ipv4/ip_forward..."
        echo 1 > /proc/sys/net/ipv4/ip_forward
      else
        echo "File ${file_name} is available and is writable. Checking and enabling ip forward..."
        is_ipforwarding_available=$(grep "net.ipv4.ip_forward" "${file_name}") || true
        if [[ -z ${is_ipforwarding_available} ]]; then
          echo "Adding net.ipv4.ip_forward = 1 in ${file_name}..."
          echo "net.ipv4.ip_forward = 1" >> ${file_name}
        else
          echo "Updating net.ipv4.ip_forward value with 1 in ${file_name}..."
          # shellcheck disable=SC2016
          sed -i -n -e '/^net.ipv4.ip_forward/!p' -e '$anet.ipv4.ip_forward = 1' ${file_name}
        fi
        sysctl -p
      fi
    }
    
    set_kubeconfig
    is_kubectl_enabled
    fetch_hostname
    
    if [[ -n "$HOST_NAME_NODE" ]]; then
        # Pass an argument to uncordon the node. This is to cover reboot scenarios.
        if [ "$1" ]; then
            # enable ip forward
            enable_ipforwarding
            # uncordan node
            echo "Uncordon $HOST_NAME_NODE ..."
            kubectl uncordon "$HOST_NAME_NODE"
        else
            #If PDB is enabled and they are zero available replicas on other nodes, drain would fail for those pods but thats not the behaviour we want
            #Thats when the second command would come to rescue which will ignore the PDB and continue with the eviction of those pods for which eviction failed earlier https://github.com/kubernetes/kubernetes/issues/83307
            kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --timeout=90s --skip-wait-for-delete-timeout=10 --force --ignore-errors || kubectl drain "$HOST_NAME_NODE" --delete-emptydir-data --ignore-daemonsets  --force  --disable-eviction=true --timeout=30s --ignore-errors --skip-wait-for-delete-timeout=10 --pod-selector 'app!=csi-attacher,longhorn.io/component!=instance-manager,k8s-app!=kube-dns'
            node_mounted_pv=$(kubectl get volumeattachment -o json | jq --arg node "${HOST_NAME_NODE}" -r '.items[] | select(.spec.nodeName==$node) | .metadata.name + ":" + .spec.source.persistentVolumeName')
            if [[ -n "${node_mounted_pv}" ]] ; then
              while IFS=$'\n' read -r VOL_ATTACHMENT_PV_ID
              do
                PV_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f2)
                VOL_ATTACHMENT_ID=$(echo "${VOL_ATTACHMENT_PV_ID}" | cut -d':' -f1)
                if [[ -n "${PV_ID}" ]] ; then
                  mounts=$(grep "${PV_ID}" /proc/mounts  | awk '{print $2}')
                  if [[ -n $mounts ]] ; then
                    echo "Removing dangling mounts for pvc: ${PV_ID}"
                    {
                      timeout 20s xargs umount -l <<< "${mounts}"
                      exitCode="$?"
                      if [[ $exitCode -eq 0 ]] ; then
                        echo "Command to remove dangling mounts for pvc ${PV_ID} executed successfully"
                        echo "Waiting to remove dangling mounts for pvc ${PV_ID}"
                        if timeout 1m bash -c "while grep -q '${PV_ID}' /proc/mounts ; do sleep 1 ; done"  ; then
                          kubectl delete volumeattachment "${VOL_ATTACHMENT_ID}"
                          if timeout 2m bash -c "while kubectl get node '${HOST_NAME_NODE}' -o yaml | grep -q '${PV_ID}' ; do sleep 1 ; done" ; then
                          #shellcheck disable=SC1012
                            find /var/lib/kubelet -name "${PV_ID}" -print0 | xargs -0 \rm -rf
                            echo "Removed dangling mounts for pvc: ${PV_ID} successfully"
                          else
                           echo "Timeout while waiting to remove node dangling mounts for pvc: ${PV_ID}"
                         fi
                        else
                          echo "Timeout while waiting to remove dangling mounts for pvc: ${PV_ID}"
                        fi
                      elif [[ $exitCode -eq 124 ]] ; then
                        echo "Timeout while executing remove dangling mounts for pvc: ${PV_ID}"
                      else
                        echo "Error while executing remove dangling mounts for pvc: ${PV_ID}"
                      fi
                    } &
                  fi
                fi
              done <<< "${node_mounted_pv}"
              wait
            fi
        fi
    else
      echo "Not able to fetch hostname"
    fi
  2. Pare o processo do Kubernetes em execução no nó. Execute um dos seguintes comandos:
    • Nó do servidor:

      systemctl stop rke2-serversystemctl stop rke2-server
    • Nó do agente:

      systemctl stop rke2-agentsystemctl stop rke2-agent
  3. Se sua atividade de manutenção incluir a atualização dos pacotes RPM na máquina, você deverá ignorar a atualização do pacote rke2 para evitar problemas de compatibilidade.
    • Recomenda-se adicionar o pacote rke2 à lista de exclusão da atualização do RPM. Para modificar o arquivo /etc/yum.conf, adicione rke2 à lista de exclusão. Para obter mais detalhes, consulte essas instruções.
    • Como alternativa, você pode excluir rke2 temporariamente durante yum upgrade usando o seguinte comando:
      yum upgrade --exclude "rke2-*"yum upgrade --exclude "rke2-*"
      Importante:
      Se não forem excluídos, os pacotes rke2- poderão ser atualizados para a versão mais recente, causando problemas no cluster do Automation Suite. A atualização do pacote rke2-* será feita por meio da atualização do Automation Suite.
      A atualização yum substitui o arquivo /etc/yum.conf e remove rke2-* da lista de exclusão. Para evitar isso, atualize a ferramenta yum usando o seguinte comando: yum update --exclude yum-utils.
      Para verificar se rke-2 foi excluído, revise o arquivo /etc/yum.conf .
  4. Prossiga com sua atividade de manutenção do nó. Quando a atualização estiver concluída, continue com a atividade de manutenção pós-nó.

Manutenção pós-nó

  1. Reinicie o nó executando sudo reboot ou usando qualquer outro mecanismo de reinicialização seguro que preferir.
  2. Depois que o nó for reinicializado, verifique se o serviço rke2 foi iniciado. Execute um dos seguintes comandos:
    • Nó do servidor:

      systemctl start rke2-serversystemctl start rke2-server
    • Nó do agente:

      systemctl start rke2-agentsystemctl start rke2-agent
  3. Depois que o serviço rke2 for iniciado, você deve reiniciar o nó executando o seguinte comando:
    sudo bash drain-node.sh nodestartsudo bash drain-node.sh nodestart
  • Manutenção pré-nó
  • Manutenção pós-nó

Was this page helpful?

Obtenha a ajuda que você precisa
Aprendendo RPA - Cursos de automação
Fórum da comunidade da Uipath
Logotipo branco da Uipath
Confiança e segurança
© 2005-2024 UiPath. All rights reserved.