automation-suite
2023.10
true
Automation Suite unter Linux – Installationsanleitung
Last updated 4. Okt. 2024

Upgrade schlägt aufgrund eines fehlerhaften Ceph . fehl

Beschreibung

Wenn Sie versuchen, auf eine neue Automation Suite-Version zu aktualisieren, wird möglicherweise die folgende Fehlermeldung angezeigt: Ceph objectstore is not completely healthy at the moment. Inner exception - Timeout waiting for all PGs to become active+clean .

Lösung

Um dieses Problem zu beheben, überprüfen Sie, ob die OSD-Pods laufen und fehlerfrei sind, indem Sie den folgenden Befehl ausführen:

kubectl -n rook-ceph get pod -l app=rook-ceph-osd  --no-headers | grep -P '([0-9])/\1'  -vkubectl -n rook-ceph get pod -l app=rook-ceph-osd  --no-headers | grep -P '([0-9])/\1'  -v
  • Wenn der Befehl keine Pods ausgibt, überprüfen Sie, ob die Ceph-Platzierungsgruppen (PGs) wiederhergestellt werden oder nicht, indem Sie den folgenden Befehl ausführen:

    function is_ceph_pg_active_clean() {
      local return_code=1
      if kubectl -n rook-ceph exec  deploy/rook-ceph-tools -- ceph status --format json | jq '. as $root | ($root | .pgmap.num_pgs) as $total_pgs | try ( ($root | .pgmap.pgs_by_state[] | select(.state_name == "active+clean").count)  // 0) as $active_pgs | if $total_pgs == $active_pgs then true else false end' | grep -q 'true';then
        return_code=0
      fi
      [[ $return_code -eq 0 ]] && echo "All Ceph Placement groups(PG) are active+clean"
      if [[ $return_code -ne 0 ]]; then
        echo "All Ceph Placement groups(PG) are not active+clean. Please wait for PGs to become active+clean"
        kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg dump --format json | jq -r '.pg_map.pg_stats[] | select(.state!="active+clean") | [.pgid, .state] | @tsv'
      fi
      return "${return_code}"
    }
    # Execute the function multiple times to get updated ceph PG status
    is_ceph_pg_active_cleanfunction is_ceph_pg_active_clean() {
      local return_code=1
      if kubectl -n rook-ceph exec  deploy/rook-ceph-tools -- ceph status --format json | jq '. as $root | ($root | .pgmap.num_pgs) as $total_pgs | try ( ($root | .pgmap.pgs_by_state[] | select(.state_name == "active+clean").count)  // 0) as $active_pgs | if $total_pgs == $active_pgs then true else false end' | grep -q 'true';then
        return_code=0
      fi
      [[ $return_code -eq 0 ]] && echo "All Ceph Placement groups(PG) are active+clean"
      if [[ $return_code -ne 0 ]]; then
        echo "All Ceph Placement groups(PG) are not active+clean. Please wait for PGs to become active+clean"
        kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg dump --format json | jq -r '.pg_map.pg_stats[] | select(.state!="active+clean") | [.pgid, .state] | @tsv'
      fi
      return "${return_code}"
    }
    # Execute the function multiple times to get updated ceph PG status
    is_ceph_pg_active_clean
    Hinweis: Wenn auch nach einer Wartezeit von mehr als 30 Minuten keine der betroffenen Ceph-PGs wiederhergestellt wird, erstellen Sie ein Ticket beim UiPath®-Support.
  • Wenn der Befehl Pods ausgibt, müssen Sie zuerst das Problem beheben, das sie betrifft:

    • Wenn ein Pod in Init:0/4 hängen bleibt, könnte ein Problem mit dem PV-Anbieter (Longhorn) vorliegen. Erstellen Sie ein Ticket beim UiPath®-Support, um dieses Problem zu melden.
    • Wenn sich ein Pod in CrashLoopBackOff befindet, beheben Sie das Problem, indem Sie den folgenden Befehl ausführen:
      function cleanup_crashing_osd() {
          local restart_operator="false"
          local min_required_healthy_osd=1
          local in_osd
          local up_osd
          local healthy_osd_pod_count
          local crashed_osd_deploy
          local crashed_pvc_name
      
          if ! kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool ls detail  | grep 'rook-ceph.rgw.buckets.data' | grep -q 'replicated'; then
              min_required_healthy_osd=2
          fi
          in_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_in_osds')
          up_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_up_osds')
          healthy_osd_pod_count=$(kubectl -n rook-ceph get pod -l app=rook-ceph-osd | grep 'Running' | grep -c -P '([0-9])/\1')
          if ! [[ $in_osd -ge $min_required_healthy_osd && $up_osd -ge $min_required_healthy_osd && $healthy_osd_pod_count -ge $min_required_healthy_osd ]]; then
              return
          fi
          for crashed_osd_deploy in $(kubectl -n rook-ceph get pod -l app=rook-ceph-osd  | grep 'CrashLoopBackOff' | cut -d'-' -f'1-4') ; do
              if kubectl -n rook-ceph logs "deployment/${crashed_osd_deploy}" | grep -q '/crash/'; then
                  echo "Found crashing OSD deployment: '${crashed_osd_deploy}'"
                  crashed_pvc_name=$(kubectl -n rook-ceph get deployment "${crashed_osd_deploy}" -o json | jq -r '.metadata.labels["ceph.rook.io/pvc"]')
                  info "Removing crashing OSD deployment: '${crashed_osd_deploy}' and PVC: '${crashed_pvc_name}'"
                  timeout 60  kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" || kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" --force --grace-period=0
                  timeout 100 kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" || kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" --force --grace-period=0
                  restart_operator="true"
              fi
          done
          if [[ $restart_operator == "true" ]]; then
              kubectl -n rook-ceph rollout restart deployment/rook-ceph-operator
          fi
          return 0
      }
      # Execute the cleanup function
      cleanup_crashing_osdfunction cleanup_crashing_osd() {
          local restart_operator="false"
          local min_required_healthy_osd=1
          local in_osd
          local up_osd
          local healthy_osd_pod_count
          local crashed_osd_deploy
          local crashed_pvc_name
      
          if ! kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool ls detail  | grep 'rook-ceph.rgw.buckets.data' | grep -q 'replicated'; then
              min_required_healthy_osd=2
          fi
          in_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_in_osds')
          up_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_up_osds')
          healthy_osd_pod_count=$(kubectl -n rook-ceph get pod -l app=rook-ceph-osd | grep 'Running' | grep -c -P '([0-9])/\1')
          if ! [[ $in_osd -ge $min_required_healthy_osd && $up_osd -ge $min_required_healthy_osd && $healthy_osd_pod_count -ge $min_required_healthy_osd ]]; then
              return
          fi
          for crashed_osd_deploy in $(kubectl -n rook-ceph get pod -l app=rook-ceph-osd  | grep 'CrashLoopBackOff' | cut -d'-' -f'1-4') ; do
              if kubectl -n rook-ceph logs "deployment/${crashed_osd_deploy}" | grep -q '/crash/'; then
                  echo "Found crashing OSD deployment: '${crashed_osd_deploy}'"
                  crashed_pvc_name=$(kubectl -n rook-ceph get deployment "${crashed_osd_deploy}" -o json | jq -r '.metadata.labels["ceph.rook.io/pvc"]')
                  info "Removing crashing OSD deployment: '${crashed_osd_deploy}' and PVC: '${crashed_pvc_name}'"
                  timeout 60  kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" || kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" --force --grace-period=0
                  timeout 100 kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" || kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" --force --grace-period=0
                  restart_operator="true"
              fi
          done
          if [[ $restart_operator == "true" ]]; then
              kubectl -n rook-ceph rollout restart deployment/rook-ceph-operator
          fi
          return 0
      }
      # Execute the cleanup function
      cleanup_crashing_osd

Nachdem Sie das abstürzende OSD behoben haben, überprüfen Sie, ob PGs wiederhergestellt werden oder nicht, indem Sie den folgenden Befehl ausführen:

is_ceph_pg_active_cleanis_ceph_pg_active_clean
  • Beschreibung
  • Lösung

War diese Seite hilfreich?

Hilfe erhalten
RPA lernen – Automatisierungskurse
UiPath Community-Forum
Uipath Logo White
Vertrauen und Sicherheit
© 2005–2024 UiPath. Alle Rechte vorbehalten