Upgrade schlägt aufgrund eines fehlerhaften Ceph . fehl

Beschreibung

Wenn Sie versuchen, auf eine neue Automation Suite-Version zu aktualisieren, wird möglicherweise die folgende Fehlermeldung angezeigt:

Ceph objectstore is not completely healthy at the moment. Inner exception - Timeout waiting for all PGs to become active+clean

Lösung

Um dieses Problem zu beheben, überprüfen Sie, ob die OSD-Pods laufen und fehlerfrei sind, indem Sie den folgenden Befehl ausführen:

kubectl -n rook-ceph get pod -l app=rook-ceph-osd  --no-headers | grep -P '([0-9])/\1'  -vkubectl -n rook-ceph get pod -l app=rook-ceph-osd  --no-headers | grep -P '([0-9])/\1'  -v

Wenn der Befehl keine Pods ausgibt, überprüfen Sie, ob die Ceph-Platzierungsgruppen (PGs) wiederhergestellt werden oder nicht, indem Sie den folgenden Befehl ausführen:
```
function is_ceph_pg_active_clean() {
  local return_code=1
  if kubectl -n rook-ceph exec  deploy/rook-ceph-tools -- ceph status --format json | jq '. as $root | ($root | .pgmap.num_pgs) as $total_pgs | try ( ($root | .pgmap.pgs_by_state[] | select(.state_name == "active+clean").count)  // 0) as $active_pgs | if $total_pgs == $active_pgs then true else false end' | grep -q 'true';then
    return_code=0
  fi
  [[ $return_code -eq 0 ]] && echo "All Ceph Placement groups(PG) are active+clean"
  if [[ $return_code -ne 0 ]]; then
    echo "All Ceph Placement groups(PG) are not active+clean. Please wait for PGs to become active+clean"
    kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg dump --format json | jq -r '.pg_map.pg_stats[] | select(.state!="active+clean") | [.pgid, .state] | @tsv'
  fi
  return "${return_code}"
}
# Execute the function multiple times to get updated ceph PG status
is_ceph_pg_active_cleanfunction is_ceph_pg_active_clean() {
  local return_code=1
  if kubectl -n rook-ceph exec  deploy/rook-ceph-tools -- ceph status --format json | jq '. as $root | ($root | .pgmap.num_pgs) as $total_pgs | try ( ($root | .pgmap.pgs_by_state[] | select(.state_name == "active+clean").count)  // 0) as $active_pgs | if $total_pgs == $active_pgs then true else false end' | grep -q 'true';then
    return_code=0
  fi
  [[ $return_code -eq 0 ]] && echo "All Ceph Placement groups(PG) are active+clean"
  if [[ $return_code -ne 0 ]]; then
    echo "All Ceph Placement groups(PG) are not active+clean. Please wait for PGs to become active+clean"
    kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg dump --format json | jq -r '.pg_map.pg_stats[] | select(.state!="active+clean") | [.pgid, .state] | @tsv'
  fi
  return "${return_code}"
}
# Execute the function multiple times to get updated ceph PG status
is_ceph_pg_active_clean
```
Note: If none of the affected Ceph PG recovers even after waiting for more than 30 minutes, raise a ticket with UiPath® Support.

Wenn der Befehl Pods ausgibt, müssen Sie zuerst das Problem beheben, das sie betrifft:

If a pod is stuck in Init:0/4, then it could be a PV provider (Longhorn) issue. To debut this issue, raise a ticket with UiPath® Support.

Wenn sich ein Pod in CrashLoopBackOff befindet, beheben Sie das Problem, indem Sie den folgenden Befehl ausführen:

function cleanup_crashing_osd() {
    local restart_operator="false"
    local min_required_healthy_osd=1
    local in_osd
    local up_osd
    local healthy_osd_pod_count
    local crashed_osd_deploy
    local crashed_pvc_name

    if ! kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool ls detail  | grep 'rook-ceph.rgw.buckets.data' | grep -q 'replicated'; then
        min_required_healthy_osd=2
    fi
    in_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_in_osds')
    up_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_up_osds')
    healthy_osd_pod_count=$(kubectl -n rook-ceph get pod -l app=rook-ceph-osd | grep 'Running' | grep -c -P '([0-9])/\1')
    if ! [[ $in_osd -ge $min_required_healthy_osd && $up_osd -ge $min_required_healthy_osd && $healthy_osd_pod_count -ge $min_required_healthy_osd ]]; then
        return
    fi
    for crashed_osd_deploy in $(kubectl -n rook-ceph get pod -l app=rook-ceph-osd  | grep 'CrashLoopBackOff' | cut -d'-' -f'1-4') ; do
        if kubectl -n rook-ceph logs "deployment/${crashed_osd_deploy}" | grep -q '/crash/'; then
            echo "Found crashing OSD deployment: '${crashed_osd_deploy}'"
            crashed_pvc_name=$(kubectl -n rook-ceph get deployment "${crashed_osd_deploy}" -o json | jq -r '.metadata.labels["ceph.rook.io/pvc"]')
            info "Removing crashing OSD deployment: '${crashed_osd_deploy}' and PVC: '${crashed_pvc_name}'"
            timeout 60  kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" || kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" --force --grace-period=0
            timeout 100 kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" || kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" --force --grace-period=0
            restart_operator="true"
        fi
    done
    if [[ $restart_operator == "true" ]]; then
        kubectl -n rook-ceph rollout restart deployment/rook-ceph-operator
    fi
    return 0
}
# Execute the cleanup function
cleanup_crashing_osdfunction cleanup_crashing_osd() {
    local restart_operator="false"
    local min_required_healthy_osd=1
    local in_osd
    local up_osd
    local healthy_osd_pod_count
    local crashed_osd_deploy
    local crashed_pvc_name

    if ! kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd pool ls detail  | grep 'rook-ceph.rgw.buckets.data' | grep -q 'replicated'; then
        min_required_healthy_osd=2
    fi
    in_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_in_osds')
    up_osd=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status   -f json  | jq -r '.osdmap.num_up_osds')
    healthy_osd_pod_count=$(kubectl -n rook-ceph get pod -l app=rook-ceph-osd | grep 'Running' | grep -c -P '([0-9])/\1')
    if ! [[ $in_osd -ge $min_required_healthy_osd && $up_osd -ge $min_required_healthy_osd && $healthy_osd_pod_count -ge $min_required_healthy_osd ]]; then
        return
    fi
    for crashed_osd_deploy in $(kubectl -n rook-ceph get pod -l app=rook-ceph-osd  | grep 'CrashLoopBackOff' | cut -d'-' -f'1-4') ; do
        if kubectl -n rook-ceph logs "deployment/${crashed_osd_deploy}" | grep -q '/crash/'; then
            echo "Found crashing OSD deployment: '${crashed_osd_deploy}'"
            crashed_pvc_name=$(kubectl -n rook-ceph get deployment "${crashed_osd_deploy}" -o json | jq -r '.metadata.labels["ceph.rook.io/pvc"]')
            info "Removing crashing OSD deployment: '${crashed_osd_deploy}' and PVC: '${crashed_pvc_name}'"
            timeout 60  kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" || kubectl -n rook-ceph delete deployment "${crashed_osd_deploy}" --force --grace-period=0
            timeout 100 kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" || kubectl -n rook-ceph delete pvc "${crashed_pvc_name}" --force --grace-period=0
            restart_operator="true"
        fi
    done
    if [[ $restart_operator == "true" ]]; then
        kubectl -n rook-ceph rollout restart deployment/rook-ceph-operator
    fi
    return 0
}
# Execute the cleanup function
cleanup_crashing_osd

Nachdem Sie das abstürzende OSD behoben haben, überprüfen Sie, ob PGs wiederhergestellt werden oder nicht, indem Sie den folgenden Befehl ausführen:

is_ceph_pg_active_cleanis_ceph_pg_active_clean

Auf dieser Seite

Beschreibung
Lösung

War diese Seite hilfreich?

PREVIOUSFehler im Cluster nach automatisiertem Upgrade von 2021.10

WeiterRke2 wird aufgrund von Platzproblemen nicht gestartet

Support und Services

Hilfe erhalten

UiPath Academy

RPA lernen – Automatisierungskurse

UiPath-Forum

UiPath Community-Forum

Vertrauen und Sicherheit

Nutzungsbedingungen

Datenschutzerklärung

Cookie-Richtlinie