automation-suite
2022.4
false
- 概述
- 要求
- 安装
- 安装后
- 集群管理
- 监控和警示
- 迁移和升级
- 特定于产品的配置
- 最佳实践和维护
- 故障排除
- 无法获取沙盒映像
- Pod 未显示在 ArgoCD 用户界面中
- Redis 探测器失败
- RKE2 服务器无法启动
- 在 UiPath 命名空间中找不到密码
- 初始安装后,ArgoCD 应用程序进入“进行中”状态
- MongoDB Pod 处于 CrashLoopBackOff 状态或在删除后处于“等待 PVC 配置”状态
- 意外错误:不一致;手动运行 fsck
- 集群还原后 MongoDB 或业务应用程序降级
- 缺少 Self-heal-operator 和 Sf-k8-utils 存储库
- 集群还原或回滚后服务不正常
- RabbitMQ Pod 卡在 CrashLoopBackOff 中
- Prometheus 处于 CrashLoopBackoff 状态,并出现内存不足 (OOM) 错误
- 监控仪表板中缺少 Ceph-rook 指标
- Pod 无法在代理环境中与 FQDN 通信
- 使用 Automation Suite 诊断工具
- 使用 Automation Suite 支持包工具
- 探索日志
监控仪表板中缺少 Ceph-rook 指标
重要 :
请注意此内容已使用机器翻译进行了部分本地化。
Automation Suite 安装指南
Last updated 2024年11月4日
监控仪表板中缺少 Ceph-rook 指标
要解决此问题,请运行以下脚本:
#!/bin/bash
set -euo pipefail
# Enable kubectl command
export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
export PATH="$PATH:/var/lib/rancher/rke2/bin"
function clearCephMgrAlert() {
local ceph_status
local active_ceph_mgr
# Check the curernt rook-ceph cluster status
ceph_status=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json-pretty | jq -r '.health.status')
if [[ "${ceph_status}" != "HEALTH_OK" ]]; then
echo "Error: Your rook-ceph cluster is not healthy. Please review your environment."
return 1
fi
active_ceph_mgr=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph mgr dump | jq -r '.active_name')
if [[ "${active_ceph_mgr}" != "a" ]]; then
echo "Curently your active rook-ceph-mgr is ${active_ceph_mgr}. Failing over to to the standby mgr a to fix CephMgrIsAbsent alert."
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph mgr fail ${active_ceph_mgr}
if [[ $? -ne 0 ]]; then
echo "Error: Failed to failover to the standby mgr a. Please manually check your ceph status."
return 1
fi
fi
echo "Your active ceph-mgr should be failed over to a. Please wait for several minutes and ensure CephMgrIsAbsent alert is cleared."
return 0
}
clearCephMgrAlert
#!/bin/bash
set -euo pipefail
# Enable kubectl command
export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
export PATH="$PATH:/var/lib/rancher/rke2/bin"
function clearCephMgrAlert() {
local ceph_status
local active_ceph_mgr
# Check the curernt rook-ceph cluster status
ceph_status=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json-pretty | jq -r '.health.status')
if [[ "${ceph_status}" != "HEALTH_OK" ]]; then
echo "Error: Your rook-ceph cluster is not healthy. Please review your environment."
return 1
fi
active_ceph_mgr=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph mgr dump | jq -r '.active_name')
if [[ "${active_ceph_mgr}" != "a" ]]; then
echo "Curently your active rook-ceph-mgr is ${active_ceph_mgr}. Failing over to to the standby mgr a to fix CephMgrIsAbsent alert."
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph mgr fail ${active_ceph_mgr}
if [[ $? -ne 0 ]]; then
echo "Error: Failed to failover to the standby mgr a. Please manually check your ceph status."
return 1
fi
fi
echo "Your active ceph-mgr should be failed over to a. Please wait for several minutes and ensure CephMgrIsAbsent alert is cleared."
return 0
}
clearCephMgrAlert