- 概述
- 要求
- 推荐:部署模板
- 手动:准备安装
- 手动:准备安装
- 步骤 1:为离线安装配置符合 OCI 的注册表
- 步骤 2:配置外部对象存储
- 步骤 3:配置 High Availability Add-on
- 步骤 4:配置 Microsoft SQL Server
- 步骤 5:配置负载均衡器
- 步骤 6:配置 DNS
- 步骤 7:配置磁盘
- 步骤 8:配置内核和操作系统级别设置
- 步骤 9:配置节点端口
- 步骤 10:应用其他设置
- 步骤 12:验证并安装所需的 RPM 包
- 步骤 13:生成 cluster_config.json
- 证书配置
- 数据库配置
- 外部对象存储配置
- 预签名 URL 配置
- 符合 OCI 的外部注册表配置
- Disaster Recovery:主动/被动和主动/主动配置
- High Availability Add-on 配置
- 特定于 Orchestrator 的配置
- Insights 特定配置
- Process Mining 特定配置
- Document Understanding 特定配置
- Automation Suite Robot 特定配置
- 监控配置
- 可选:配置代理服务器
- 可选:在多节点 HA 就绪生产集群中启用区域故障恢复
- 可选:传递自定义 resolv.conf
- 可选:提高容错能力
- install-uipath.sh 参数
- 添加具有 GPU 支持的专用代理节点
- 为 Task Mining 添加专用代理节点
- 连接 Task Mining 应用程序
- 为 Automation Suite Robot 添加专用代理节点
- 步骤 15:为离线安装配置临时 Docker 注册表
- 步骤 16:验证安装的先决条件
- 手动:执行安装
- 安装后
- 集群管理
- 监控和警示
- 迁移和升级
- 特定于产品的配置
- 最佳实践和维护
- 故障排除
- 如何在安装过程中对服务进行故障排除
- 如何卸载集群
- 如何清理离线工件以改善磁盘空间
- 如何清除 Redis 数据
- 如何启用 Istio 日志记录
- 如何手动清理日志
- 如何清理存储在 sf-logs 捆绑包中的旧日志
- 如何禁用 AI Center 的流日志
- 如何对失败的 Automation Suite 安装进行调试
- 如何在升级后从旧安装程序中删除映像
- 如何禁用 TX 校验和卸载
- 如何从 Automation Suite 2022.10.10 和 2022.4.11 升级到 2023.10.2
- 如何手动将 ArgoCD 日志级别设置为 Info
- 如何扩展 AI Center 存储
- 如何为外部注册表生成已编码的 pull_secret_value
- 如何解决 TLS 1.2 中的弱密码问题
- 运行诊断工具
- 使用 Automation Suite 支持包工具
- 探索日志
运行诊断工具
Automation Suite 诊断工具会运行一系列检查,以生成有关集群运行状况的报告,您可以分析该报告,以识别问题及其潜在的根本原因。该工具可帮助您查找常见问题,例如数据库连接丢失或凭据无效或过期。
uipathctl
和 uipathtools
中均可用,您可以在管理计算机上下载该工具。
uipathtools
是一个 CLI 工具,其中包含特定于运行状况命令的 uipathctl
功能的子集。该工具向后兼容,适用于任何受支持的 Automation Suite 版本。如果您遇到任何问题,我们建议首先使用 uipathtools
。
check
和 test
命令可让您快速了解集群的状态,而无需运行深度分析。
-
check
依赖于 ArgoCD 运行状况和同步状态,不会修改集群中的任何状态 -
test
会调查应用程序、部署或 Pod,并临时改变集群的状态,为您提供这些见解。
要运行运行状况检查,请根据您使用的 CLI 工具使用以下命令之一:
- 如果使用
uipathctl
,请运行:./uipathctl health check
./uipathctl health check - 如果使用
uipathtools
,请运行:./uipathtools health check
./uipathtools health check
生成的报告的示例输出:
Checks run on cluster/
✔ [NOTIFICATIONSERVICE]
✔ [NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
✔ [ACTION_CENTER]
✔ [ACTIONCENTER_HEALTH] Application is healthy and in sync
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
✔ [RELOADER]
✔ [RELOADER_HEALTH] Application is healthy and in sync
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AIEVENTS]
✔ [AIEVENTS_HEALTH] Application is healthy and in sync
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
✔ [PLATFORM]
✔ [PLATFORM_HEALTH] Application is healthy and in sync
✔ [TASK_MINING]
✔ [TASKMINING_HEALTH] Application is healthy and in sync
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [WEBHOOK]
✔ [WEBHOOK_HEALTH] Application is healthy and in sync
Checks run on cluster/
✔ [NOTIFICATIONSERVICE]
✔ [NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
✔ [ACTION_CENTER]
✔ [ACTIONCENTER_HEALTH] Application is healthy and in sync
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
✔ [RELOADER]
✔ [RELOADER_HEALTH] Application is healthy and in sync
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AIEVENTS]
✔ [AIEVENTS_HEALTH] Application is healthy and in sync
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
✔ [PLATFORM]
✔ [PLATFORM_HEALTH] Application is healthy and in sync
✔ [TASK_MINING]
✔ [TASKMINING_HEALTH] Application is healthy and in sync
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [WEBHOOK]
✔ [WEBHOOK_HEALTH] Application is healthy and in sync
uipathctl health check
命令会检查所有组件的运行状况。但是,它也允许您严格检查您感兴趣的组件:
- 如果要从执行中排除组件,请使用
--excluded
标志。例如,如果您不想检查 SQL 的运行状况,请运行uipathctl health check --excluded SQL
。该命令会检查所有组件的运行状况,SQL 除外。 - 如果您只想在执行中包含某些组件,请使用
--included
标志。例如,如果您只想检查 DNS 和对象存储的运行状况,请运行uipathctl health check --included DNS,OBJECTSTORAGE
。
分析日志
- 运行检查运行状况检查后,日志显示 Data Service 应用程序的运行状况检查失败。
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced - 经过进一步调查,很明显,Data Service 应用程序失败是因为
dataservice-runtime-8f5bb7d56-v5krg
和dataservice-taskrunner-787df76c74-98h5l
Pod 处于失败状态。 如果进一步分析,您会发现缺少的dataservice-external-storage-secret
缺失。❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found - 要解决此问题,请确保已在
cluster_config.json
中为对象存储提供正确的凭据。
要运行运行状况测试,请使用以下命令之一,具体取决于您使用的 CLI 工具:
- 如果使用
uipathctl
,请运行:./uipathctl health test
./uipathctl health test - 如果使用
uipathtools
,请运行:./uipathtools health test
./uipathtools health test
生成的报告的示例输出:
Checks run on cluster/
✔ [GATEKEEPER]
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [ACTION_CENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqk6b72
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqxwlsb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
✔ [APPS]
✔ [CREATE_NAMESPACE] Created namespace prereq9z6nb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
✔ [AUTOMATION_HUB]
✔ [CREATE_NAMESPACE] Created namespace prereq4jkbt
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
✔ [TEST_MANAGER]
✔ [CREATE_NAMESPACE] Created namespace prereqnvvpc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
✔ [ORCHESTRATOR]
✔ [CREATE_NAMESPACE] Created namespace prereq8pf2f
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
✔ [INSIGHTS]
✔ [CREATE_NAMESPACE] Created namespace prereqqmgjc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
✔ [AUTOMATION_OPS]
✔ [CREATE_NAMESPACE] Created namespace prereqgkkrz
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
✔ [AICENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqdls88
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9
Checks run on cluster/
✔ [GATEKEEPER]
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [ACTION_CENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqk6b72
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqxwlsb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
✔ [APPS]
✔ [CREATE_NAMESPACE] Created namespace prereq9z6nb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
✔ [AUTOMATION_HUB]
✔ [CREATE_NAMESPACE] Created namespace prereq4jkbt
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
✔ [TEST_MANAGER]
✔ [CREATE_NAMESPACE] Created namespace prereqnvvpc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
✔ [ORCHESTRATOR]
✔ [CREATE_NAMESPACE] Created namespace prereq8pf2f
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
✔ [INSIGHTS]
✔ [CREATE_NAMESPACE] Created namespace prereqqmgjc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
✔ [AUTOMATION_OPS]
✔ [CREATE_NAMESPACE] Created namespace prereqgkkrz
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
✔ [AICENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqdls88
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9
uipathctl health test
命令会对所有组件执行运行状况测试。但是,它也允许您严格检查您感兴趣的组件:
- 如果要从执行中排除组件,请使用
--excluded
标志。例如,如果您不想检查 SQL 的运行状况,请运行uipathctl health test --excluded SQL
。该命令会检查所有组件的运行状况,SQL 除外。 - 如果您只想在执行中包含某些组件,请使用
--included
标志。例如,如果您只想检查 DNS 和对象存储的运行状况,请运行uipathctl health test --included DNS,OBJECTSTORAGE
。
check
和 test
命令的输出,您可以看到前者验证应用程序的运行状况,而后者则检查路由。
已知问题
您可能会收到类似于以下示例的错误消息。 您可以忽略它,因为您无需执行任何 Actions 。
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
diagnose
命令可提供对集群状态的深入见解。 它可以帮助您识别各个级别的问题,例如 SQL、对象存储、节点、密码、Istio、metworking 等。
- 它涵盖了
check
和test
命令。 - 它会运行在安装 Automation Suite 之前执行的先决条件检查,以验证在安装后对环境配置进行的更改以及可能导致问题的更改。
-
它在所有节点上运行,以收集任何特定于节点的问题,例如资源不可用、任何网络干扰等。
要运行诊断检查,请使用以下命令之一,具体取决于您使用的 CLI 工具:
- 如果使用
uipathctl
,请运行:./uipathctl health diagnose cluster_config.json --versions version.json
./uipathctl health diagnose cluster_config.json --versions version.json - 如果使用
uipathtools
,请运行:./uipathtools health diagnose cluster_config.json --versions version.json
./uipathtools health diagnose cluster_config.json --versions version.json
生成的报告的示例输出:
Checks run on nodes/aks-pool0-27031798-vmss000001
✔ [REDIS(PORT=6380)]
✔ [CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380
✔ [OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)]
✔ [CHECK_API] Object storage test passed for orchestrator
✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)]
✔ [EXECUTE_NATIVE] Successfully executed command
✔ [BUILD_CLIENT] Successfully built ADO client
✔ [CONNECT] Successfully connected ADO client to DB
✔ [DB_ROLES] SQL user has the required roles to DB
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [DNS(FQDN=ALM.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
Checks run on cluster/
✔ [NODE]
✔ [NODE_EXISTS] 12 Nodes present in the cluster
✔ [NODE_READY] All the nodes are in ready state
✔ [GATEKEEPER]
✔ [GATEKEEPER_HEALTH] Application is healthy and in sync
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqctzhp
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
✔ [ROBOTUBE]
✔ [ROBOTUBE_HEALTH] Application is healthy and in sync
✔ [AIRFLOW]
✔ [AIRFLOW_HEALTH] Application is healthy and in sync
✔ [ARGOCD]
✔ [ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AICENTER]
✔ [AICENTER_HEALTH] Application is healthy and in sync
✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/
✔ [CONNECTIVITY]
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
✔ [INGRESS]
✔ [INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443
✔ [OSS(COMPONENT=MONITORING)]
✔ [OSS(component=monitoring)] Check for component monitoring passed
✔ [OSS(COMPONENT=GATEKEEPER)]
✔ [OSS(component=gatekeeper)] Check for component gatekeeper passed
✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)]
✔ [STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
✔ [REGISTRY]
✔ [CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
✔ [NETWORK-POLICIES]
✔ [CREATE_NAMESPACE] Namespace prereqw4t9b created
✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
✔ [CREATE_SERVICE] Service echo-server-svc created
✔ [STORAGECLASS(NAME=STORAGE_CLASS)]
✔ [STORAGE_CLASS_EXISTS] Storage class managed-premium exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [NODE(CPU >= 8, RAM >= 16GI)]
✔ [LIST_NODES] Listed 12 nodes
✔ [AT_LEAST_ONE_NODE] At least one node found
✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00
✔ [OSS(COMPONENT=CERT-MANAGER)]
✔ [OSS(component=cert-manager)] Check for component cert-manager passed
✔ [RESOURCE]
✔ [Capacity] Automation suite already installed on cluster
✔ [OSS(COMPONENT=LOGGING)]
✔ [OSS(component=logging)] Check for component logging passed
✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)]
✔ [BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [ISTIO]
✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
Checks run on nodes/aks-pool0-27031798-vmss000001
✔ [REDIS(PORT=6380)]
✔ [CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380
✔ [OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)]
✔ [CHECK_API] Object storage test passed for orchestrator
✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)]
✔ [EXECUTE_NATIVE] Successfully executed command
✔ [BUILD_CLIENT] Successfully built ADO client
✔ [CONNECT] Successfully connected ADO client to DB
✔ [DB_ROLES] SQL user has the required roles to DB
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [DNS(FQDN=ALM.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
Checks run on cluster/
✔ [NODE]
✔ [NODE_EXISTS] 12 Nodes present in the cluster
✔ [NODE_READY] All the nodes are in ready state
✔ [GATEKEEPER]
✔ [GATEKEEPER_HEALTH] Application is healthy and in sync
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqctzhp
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
✔ [ROBOTUBE]
✔ [ROBOTUBE_HEALTH] Application is healthy and in sync
✔ [AIRFLOW]
✔ [AIRFLOW_HEALTH] Application is healthy and in sync
✔ [ARGOCD]
✔ [ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AICENTER]
✔ [AICENTER_HEALTH] Application is healthy and in sync
✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/
✔ [CONNECTIVITY]
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
✔ [INGRESS]
✔ [INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443
✔ [OSS(COMPONENT=MONITORING)]
✔ [OSS(component=monitoring)] Check for component monitoring passed
✔ [OSS(COMPONENT=GATEKEEPER)]
✔ [OSS(component=gatekeeper)] Check for component gatekeeper passed
✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)]
✔ [STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
✔ [REGISTRY]
✔ [CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
✔ [NETWORK-POLICIES]
✔ [CREATE_NAMESPACE] Namespace prereqw4t9b created
✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
✔ [CREATE_SERVICE] Service echo-server-svc created
✔ [STORAGECLASS(NAME=STORAGE_CLASS)]
✔ [STORAGE_CLASS_EXISTS] Storage class managed-premium exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [NODE(CPU >= 8, RAM >= 16GI)]
✔ [LIST_NODES] Listed 12 nodes
✔ [AT_LEAST_ONE_NODE] At least one node found
✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00
✔ [OSS(COMPONENT=CERT-MANAGER)]
✔ [OSS(component=cert-manager)] Check for component cert-manager passed
✔ [RESOURCE]
✔ [Capacity] Automation suite already installed on cluster
✔ [OSS(COMPONENT=LOGGING)]
✔ [OSS(component=logging)] Check for component logging passed
✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)]
✔ [BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [ISTIO]
✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
diagnose
命令在多个级别运行,例如基础架构、网络、存储、Pod、DNS 等。
分析日志
在之前的日志中,您可能会注意到两个潜在的问题:
- Istio 的配置错误,可能会导致访问 Document Understanding 平台时出现问题:
❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000" - Data Service 不可用。 请参阅代码示例中的 Ceph。
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
已知问题
您可能会收到类似于以下示例的错误消息。 您可以忽略它,因为您无需执行任何 Actions 。
I0622 01:31:28.917107 28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3
I0622 01:31:28.917107 28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3
check
、 test
和 diagnose
)都支持其他筛选和输出格式。
筛选
筛选条件 |
描述 |
用法 |
---|---|---|
|
要包含在验证中的服务的逗号分隔列表 |
此命令仅针对 Istio 和 Insights 运行诊断。 |
|
要从验证中排除的服务的逗号分隔列表 |
此命令在整个集群中运行测试,Istio 和 Insights 除外。 |
输出格式
json
、 yaml
、 text
和 junit
。 您可以通过 --output
标志将这些值传递给任何命令。 当您要利用这些工具在其基础上构建自己的故障排除框架时,这些输出格式非常方便。
用法示例
用法 |
示例输出 |
---|---|
|
|
|
|
|
|
|
|
如果这些服务关闭,则意味着节点已关闭。尝试使用 systemctl restart <service-name> 命令来重新启动服务,因为这应该可以解决问题。
/var/lib
中装载的目录大小,因为 Kubernetes 使用它来存储其数据。如果目录已满,则可能会出现各种问题。为防止出现这些问题,请确保增加其大小。
对于所有节点,我们指定它们是处于磁盘压力之下,还是内存压力之下。如果发生这种情况,这些节点上的工作负载可能会开始出现问题。检查这些节点上是否正在运行任何其他正在消耗资源的流程,如果是这种情况,请将其删除。
我们使用 Ceph 作为 S3 对象存储,用于存储来自不同应用程序的日志和文件。您可以查看其服务的状态。如果它们已关闭,则可能必须重新启动它们。请务必同时检查 Ceph 的磁盘使用情况是否已满。
由于某些服务要求集群中的某些节点存在 GPU,因此 Automation Suite 诊断工具会检查是否存在 GPU 节点并打印此类节点的数量。如果您期望 GPU 节点存在,但它们没有显示在此处,则意味着 GPU 设置中出现了问题。