automation-suite
2023.10
true
Linux 版 Automation Suite 安装指南
Last updated 2024年10月4日

运行诊断工具

Automation Suite 诊断工具会运行一系列检查,以生成有关集群运行状况的报告,您可以分析该报告,以识别问题及其潜在的根本原因。该工具可帮助您查找常见问题,例如数据库连接丢失或凭据无效或过期。

Automation Suite 诊断工具在 uipathctluipathtools 中均可用,您可以在管理计算机上下载该工具。
uipathtools 是一个 CLI 工具,其中包含特定于运行状况命令的 uipathctl 功能的子集。该工具向后兼容,适用于任何受支持的 Automation Suite 版本。如果您遇到任何问题,我们建议首先使用 uipathtools

快速验证

快速验证

checktest 命令可让您快速了解集群的状态,而无需运行深度分析。
  • check 依赖于 ArgoCD 运行状况和同步状态,不会修改集群中的任何状态
  • test 会调查应用程序、部署或 Pod,并临时改变集群的状态,为您提供这些见解。

运行状况检查

要运行运行状况检查,请根据您使用的 CLI 工具使用以下命令之一:

  • 如果使用 uipathctl,请运行:
    ./uipathctl health check./uipathctl health check
  • 如果使用 uipathtools,请运行:
    ./uipathtools health check./uipathtools health check

生成的报告的示例输出:

Checks run on cluster/[NOTIFICATIONSERVICE][NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
 ✔ [ACTION_CENTER][ACTIONCENTER_HEALTH] Application is healthy and in sync
 ❌ [SYNC][namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
 ✔ [RELOADER][RELOADER_HEALTH] Application is healthy and in sync
 ❌ [POD][LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
 ✔ [ISTIO][LIST_PODS] Found 2 pods for Istio
    ✔ [ISTIOD_EXISTS] The Istio pods are present and running version -[ISTIOD_READY] Istio pods are healthy
 ✔ [AIEVENTS][AIEVENTS_HEALTH] Application is healthy and in sync
 ❌ [DATASERVICE][DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
 ✔ [PLATFORM][PLATFORM_HEALTH] Application is healthy and in sync
 ✔ [TASK_MINING][TASKMINING_HEALTH] Application is healthy and in sync
 ✔ [LOGGING][LOGGING_HEALTH] Application is healthy and in sync
 ✔ [WEBHOOK][WEBHOOK_HEALTH] Application is healthy and in syncChecks run on cluster/
 ✔ [NOTIFICATIONSERVICE]
    ✔ [NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
 ✔ [ACTION_CENTER]
    ✔ [ACTIONCENTER_HEALTH] Application is healthy and in sync
 ❌ [SYNC]
    ❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
 ✔ [RELOADER]
    ✔ [RELOADER_HEALTH] Application is healthy and in sync
 ❌ [POD]
    ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
 ✔ [ISTIO]
    ✔ [LIST_PODS] Found 2 pods for Istio
    ✔ [ISTIOD_EXISTS] The Istio pods are present and running version - 
    ✔ [ISTIOD_READY] Istio pods are healthy
 ✔ [AIEVENTS]
    ✔ [AIEVENTS_HEALTH] Application is healthy and in sync
 ❌ [DATASERVICE]
    ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
 ✔ [PLATFORM]
    ✔ [PLATFORM_HEALTH] Application is healthy and in sync
 ✔ [TASK_MINING]
    ✔ [TASKMINING_HEALTH] Application is healthy and in sync
 ✔ [LOGGING]
    ✔ [LOGGING_HEALTH] Application is healthy and in sync
 ✔ [WEBHOOK]
    ✔ [WEBHOOK_HEALTH] Application is healthy and in sync

默认情况下, uipathctl health check 命令会检查所有组件的运行状况。但是,它也允许您严格检查您感兴趣的组件:
  • 如果要从执行中排除组件,请使用 --excluded 标志。例如,如果您不想检查 SQL 的运行状况,请运行 uipathctl health check --excluded SQL。该命令会检查所有组件的运行状况,SQL 除外。
  • 如果您只想在执行中包含某些组件,请使用 --included 标志。例如,如果您只想检查 DNS 和对象存储的运行状况,请运行 uipathctl health check --included DNS,OBJECTSTORAGE

分析日志

  1. 运行检查运行状况检查后,日志显示 Data Service 应用程序的运行状况检查失败。
    [DATASERVICE][DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced❌ [DATASERVICE]
        ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
  2. 经过进一步调查,很明显,Data Service 应用程序失败是因为 dataservice-runtime-8f5bb7d56-v5krgdataservice-taskrunner-787df76c74-98h5l Pod 处于失败状态。 如果进一步分析,您会发现缺少的 dataservice-external-storage-secret 缺失。
    [POD][LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found❌ [POD]
        ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
  3. 要解决此问题,请确保已在 cluster_config.json 中为对象存储提供正确的凭据。

运行状况测试

要运行运行状况测试,请使用以下命令之一,具体取决于您使用的 CLI 工具:

  • 如果使用 uipathctl,请运行:
    ./uipathctl health test./uipathctl health test
  • 如果使用 uipathtools,请运行:
    ./uipathtools health test./uipathtools health test

生成的报告的示例输出:

Checks run on cluster/[GATEKEEPER][CREATE_CONSTRAINT] Created test constraint
    ✔ [VERIFY] Constraint verified
    ✔ [CLEANUP] Cleaned up the test constraint
 ✔ [ACTION_CENTER][CREATE_NAMESPACE] Created namespace prereqk6b72
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
    ✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
    ✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
 ✔ [DATASERVICE][CREATE_NAMESPACE] Created namespace prereqxwlsb
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
    ✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
 ✔ [APPS][CREATE_NAMESPACE] Created namespace prereq9z6nb
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
    ✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
    ✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
 ✔ [AUTOMATION_HUB][CREATE_NAMESPACE] Created namespace prereq4jkbt
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
 ✔ [TEST_MANAGER][CREATE_NAMESPACE] Created namespace prereqnvvpc
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
 ✔ [ORCHESTRATOR][CREATE_NAMESPACE] Created namespace prereq8pf2f
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
    ✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
    ✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
 ✔ [INSIGHTS][CREATE_NAMESPACE] Created namespace prereqqmgjc
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
    ✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
    ✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
 ✔ [AUTOMATION_OPS][CREATE_NAMESPACE] Created namespace prereqgkkrz
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
 ✔ [AICENTER][CREATE_NAMESPACE] Created namespace prereqdls88
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
    ✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9Checks run on cluster/
 ✔ [GATEKEEPER]
    ✔ [CREATE_CONSTRAINT] Created test constraint
    ✔ [VERIFY] Constraint verified
    ✔ [CLEANUP] Cleaned up the test constraint
 ✔ [ACTION_CENTER]
    ✔ [CREATE_NAMESPACE] Created namespace prereqk6b72
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
    ✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
    ✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
 ✔ [DATASERVICE]
    ✔ [CREATE_NAMESPACE] Created namespace prereqxwlsb
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
    ✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
 ✔ [APPS]
    ✔ [CREATE_NAMESPACE] Created namespace prereq9z6nb
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
    ✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
    ✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
 ✔ [AUTOMATION_HUB]
    ✔ [CREATE_NAMESPACE] Created namespace prereq4jkbt
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
 ✔ [TEST_MANAGER]
    ✔ [CREATE_NAMESPACE] Created namespace prereqnvvpc
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
 ✔ [ORCHESTRATOR]
    ✔ [CREATE_NAMESPACE] Created namespace prereq8pf2f
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
    ✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
    ✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
 ✔ [INSIGHTS]
    ✔ [CREATE_NAMESPACE] Created namespace prereqqmgjc
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
    ✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
    ✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
 ✔ [AUTOMATION_OPS]
    ✔ [CREATE_NAMESPACE] Created namespace prereqgkkrz
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
 ✔ [AICENTER]
    ✔ [CREATE_NAMESPACE] Created namespace prereqdls88
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
    ✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9
默认情况下,uipathctl health test 命令会对所有组件执行运行状况测试。但是,它也允许您严格检查您感兴趣的组件:
  • 如果要从执行中排除组件,请使用 --excluded 标志。例如,如果您不想检查 SQL 的运行状况,请运行 uipathctl health test --excluded SQL。该命令会检查所有组件的运行状况,SQL 除外。
  • 如果您只想在执行中包含某些组件,请使用 --included 标志。例如,如果您只想检查 DNS 和对象存储的运行状况,请运行 uipathctl health test --included DNS,OBJECTSTORAGE
备注:
如果比较 Data Service 应用程序的 checktest 命令的输出,您可以看到前者验证应用程序的运行状况,而后者则检查路由。

已知问题

您可能会收到类似于以下示例的错误消息。 您可以忽略它,因为您无需执行任何 Actions 。

E0621 23:32:56.426321   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceededE0621 23:32:56.426321   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357   24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded

深度验证

深度验证

diagnose 命令可提供对集群状态的深入见解。 它可以帮助您识别各个级别的问题,例如 SQL、对象存储、节点、密码、Istio、metworking 等。
  • 它涵盖了 checktest 命令。
  • 它会运行在安装 Automation Suite 之前执行的先决条件检查,以验证在安装后对环境配置进行的更改以及可能导致问题的更改。
  • 它在所有节点上运行,以收集任何特定于节点的问题,例如资源不可用、任何网络干扰等。

要运行诊断检查,请使用以下命令之一,具体取决于您使用的 CLI 工具:

  • 如果使用 uipathctl,请运行:
    ./uipathctl health diagnose cluster_config.json --versions version.json./uipathctl health diagnose cluster_config.json --versions version.json
  • 如果使用 uipathtools,请运行:
    ./uipathtools health diagnose cluster_config.json --versions version.json./uipathtools health diagnose cluster_config.json --versions version.json

生成的报告的示例输出:

Checks run on nodes/aks-pool0-27031798-vmss000001
 ✔ [REDIS(PORT=6380)][CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380[OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)][CHECK_API] Object storage test passed for orchestrator
 ✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)][EXECUTE_NATIVE] Successfully executed command
    ✔ [BUILD_CLIENT] Successfully built ADO client
    ✔ [CONNECT] Successfully connected ADO client to DB[DB_ROLES] SQL user has the required roles to DB[DNS(FQDN=INSIGHTS.<FQDN>)][VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }][IPS_MATCH] Subdomain resolves to top domain
 ✔ [DNS(FQDN=ALM.<FQDN>)][VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }][IPS_MATCH] Subdomain resolves to top domain
 Checks run on cluster/[NODE][NODE_EXISTS] 12 Nodes present in the cluster
    ✔ [NODE_READY] All the nodes are in ready state
 ✔ [GATEKEEPER][GATEKEEPER_HEALTH] Application is healthy and in sync
    ✔ [CREATE_CONSTRAINT] Created test constraint
    ✔ [VERIFY] Constraint verified
    ✔ [CLEANUP] Cleaned up the test constraint
 ✔ [LOGGING][LOGGING_HEALTH] Application is healthy and in sync
 ✔ [DATASERVICE][CREATE_NAMESPACE] Created namespace prereqctzhp
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
 ✔ [ROBOTUBE][ROBOTUBE_HEALTH] Application is healthy and in sync
 ✔ [AIRFLOW][AIRFLOW_HEALTH] Application is healthy and in sync
 ✔ [ARGOCD][ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
    ✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
    ✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
    ✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
 ✔ [ISTIO][LIST_PODS] Found 2 pods for Istio
    ✔ [ISTIOD_EXISTS] The Istio pods are present and running version -[ISTIOD_READY] Istio pods are healthy
 ✔ [AICENTER][AICENTER_HEALTH] Application is healthy and in sync
    ✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/[CONNECTIVITY][OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
    ✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
    ✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
    ✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
    ✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
    ✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
 ✔ [INGRESS][INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
    ✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
    ✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443[OSS(COMPONENT=MONITORING)][OSS(component=monitoring)] Check for component monitoring passed
 ✔ [OSS(COMPONENT=GATEKEEPER)][OSS(component=gatekeeper)] Check for component gatekeeper passed
 ✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)][STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
    ✔ [LIST_NODES] Listed 12 nodes
    ✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
    ✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
    ✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
 ✔ [REGISTRY][CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
 ✔ [NETWORK-POLICIES][CREATE_NAMESPACE] Namespace prereqw4t9b created
    ✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
    ✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
    ✔ [CREATE_SERVICE] Service echo-server-svc created
 ✔ [STORAGECLASS(NAME=STORAGE_CLASS)][STORAGE_CLASS_EXISTS] Storage class managed-premium exists
    ✔ [LIST_NODES] Listed 12 nodes
    ✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
    ✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
    ✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
    ✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
 ✔ [DNS(FQDN=INSIGHTS.<FQDN>)][VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }][RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }][IPS_MATCH] Subdomain resolves to top domain
 ✔ [NODE(CPU >= 8, RAM >= 16GI)][LIST_NODES] Listed 12 nodes
    ✔ [AT_LEAST_ONE_NODE] At least one node found
    ✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
    ✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
    ✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00[OSS(COMPONENT=CERT-MANAGER)][OSS(component=cert-manager)] Check for component cert-manager passed
 ✔ [RESOURCE][Capacity] Automation suite already installed on cluster
 ✔ [OSS(COMPONENT=LOGGING)][OSS(component=logging)] Check for component logging passed
 ✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)][BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/[DATASERVICE][DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
 ❌ [ISTIO][ISTIO_SYNC_STATUS] Istio sync is up-to-date
    ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"[ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
    ✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
    ✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
 ❌ [POD][LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
 ❌ [SYNC][namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is SyncedChecks run on nodes/aks-pool0-27031798-vmss000001
 ✔ [REDIS(PORT=6380)]
    ✔ [CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380
 ✔ [OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)]
    ✔ [CHECK_API] Object storage test passed for orchestrator
 ✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)]
    ✔ [EXECUTE_NATIVE] Successfully executed command
    ✔ [BUILD_CLIENT] Successfully built ADO client
    ✔ [CONNECT] Successfully connected ADO client to DB
    ✔ [DB_ROLES] SQL user has the required roles to DB
 ✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
    ✔ [VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
    ✔ [IPS_MATCH] Subdomain resolves to top domain
 ✔ [DNS(FQDN=ALM.<FQDN>)]
    ✔ [VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
    ✔ [IPS_MATCH] Subdomain resolves to top domain
 Checks run on cluster/
 ✔ [NODE]
    ✔ [NODE_EXISTS] 12 Nodes present in the cluster
    ✔ [NODE_READY] All the nodes are in ready state
 ✔ [GATEKEEPER]
    ✔ [GATEKEEPER_HEALTH] Application is healthy and in sync
    ✔ [CREATE_CONSTRAINT] Created test constraint
    ✔ [VERIFY] Constraint verified
    ✔ [CLEANUP] Cleaned up the test constraint
 ✔ [LOGGING]
    ✔ [LOGGING_HEALTH] Application is healthy and in sync
 ✔ [DATASERVICE]
    ✔ [CREATE_NAMESPACE] Created namespace prereqctzhp
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
 ✔ [ROBOTUBE]
    ✔ [ROBOTUBE_HEALTH] Application is healthy and in sync
 ✔ [AIRFLOW]
    ✔ [AIRFLOW_HEALTH] Application is healthy and in sync
 ✔ [ARGOCD]
    ✔ [ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
    ✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
    ✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
    ✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
 ✔ [ISTIO]
    ✔ [LIST_PODS] Found 2 pods for Istio
    ✔ [ISTIOD_EXISTS] The Istio pods are present and running version - 
    ✔ [ISTIOD_READY] Istio pods are healthy
 ✔ [AICENTER]
    ✔ [AICENTER_HEALTH] Application is healthy and in sync
    ✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
    ✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/
 ✔ [CONNECTIVITY]
    ✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
    ✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
    ✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
    ✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
    ✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
    ✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
 ✔ [INGRESS]
    ✔ [INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
    ✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
    ✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443
 ✔ [OSS(COMPONENT=MONITORING)]
    ✔ [OSS(component=monitoring)] Check for component monitoring passed
 ✔ [OSS(COMPONENT=GATEKEEPER)]
    ✔ [OSS(component=gatekeeper)] Check for component gatekeeper passed
 ✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)]
    ✔ [STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
    ✔ [LIST_NODES] Listed 12 nodes
    ✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
    ✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
    ✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
 ✔ [REGISTRY]
    ✔ [CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
 ✔ [NETWORK-POLICIES]
    ✔ [CREATE_NAMESPACE] Namespace prereqw4t9b created
    ✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
    ✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
    ✔ [CREATE_SERVICE] Service echo-server-svc created
 ✔ [STORAGECLASS(NAME=STORAGE_CLASS)]
    ✔ [STORAGE_CLASS_EXISTS] Storage class managed-premium exists
    ✔ [LIST_NODES] Listed 12 nodes
    ✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
    ✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
    ✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
    ✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
    ✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
 ✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
    ✔ [VALIDATE_FQDN] FQDN is valid
    ✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
    ✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
    ✔ [IPS_MATCH] Subdomain resolves to top domain
 ✔ [NODE(CPU >= 8, RAM >= 16GI)]
    ✔ [LIST_NODES] Listed 12 nodes
    ✔ [AT_LEAST_ONE_NODE] At least one node found
    ✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
    ✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
    ✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00
 ✔ [OSS(COMPONENT=CERT-MANAGER)]
    ✔ [OSS(component=cert-manager)] Check for component cert-manager passed
 ✔ [RESOURCE]
    ✔ [Capacity] Automation suite already installed on cluster
 ✔ [OSS(COMPONENT=LOGGING)]
    ✔ [OSS(component=logging)] Check for component logging passed
 ✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)]
    ✔ [BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/
 ❌ [DATASERVICE]
    ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
 ❌ [ISTIO]
    ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
    ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
    ✔ [ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
    ✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
    ✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
    ✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
 ❌ [POD]
    ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
    ❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
 ❌ [SYNC]
    ❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
备注:
上述示例已缩减。 实际日志包含更多信息。 您可能会注意到, diagnose 命令在多个级别运行,例如基础架构、网络、存储、Pod、DNS 等。

分析日志

在之前的日志中,您可能会注意到两个潜在的问题:

  • Istio 的配置错误,可能会导致访问 Document Understanding 平台时出现问题:
    [ISTIO][ISTIO_SYNC_STATUS] Istio sync is up-to-date
        ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"❌ [ISTIO]
        ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
        ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
  • Data Service 不可用。 请参阅代码示例中的 Ceph。
    [DATASERVICE][DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found❌ [DATASERVICE]
        ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
    ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
        ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found

已知问题

您可能会收到类似于以下示例的错误消息。 您可以忽略它,因为您无需执行任何 Actions 。

I0622 01:31:28.917107   28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3I0622 01:31:28.917107   28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3

其他实用程序

其他实用程序

所有 Automation Suite 诊断工具命令(checktestdiagnose)都支持其他筛选和输出格式。

筛选

筛选条件

描述

用法

--included

要包含在验证中的服务的逗号分隔列表

/uipathctl health diagnose cluster_config.json --versions.json --included ISTIO,INSIGHTS

此命令仅针对 Istio 和 Insights 运行诊断。

--excluded

要从验证中排除的服务的逗号分隔列表

/uipathctl health test --excluded ISTIO,INSIGHTS

此命令在整个集群中运行测试,Istio 和 Insights 除外。

输出格式

Automation Suite 诊断工具可以生成多种格式的报告: jsonyamltextjunit。 您可以通过 --output 标志将这些值传递给任何命令。 当您要利用这些工具在其基础上构建自己的故障排除框架时,这些输出格式非常方便。

用法示例

用法

示例输出

./uipathctl health check --included DATASERVICE --output json
./uipathtools health check --included DATASERVICE --output json./uipathctl health check --included DATASERVICE --output json
./uipathtools health check --included DATASERVICE --output json
{ "cluster/": { "DATASERVICE": [ { "name": "DATASERVICE_HEALTH", "description": "Application health check failed: health status is Progressing and sync status is Synced", "status": "failed" } ] } }{ "cluster/": { "DATASERVICE": [ { "name": "DATASERVICE_HEALTH", "description": "Application health check failed: health status is Progressing and sync status is Synced", "status": "failed" } ] } }
./uipathctl health check --included DATASERVICE --output yaml
./uipathtools health check --included DATASERVICE --output yaml./uipathctl health check --included DATASERVICE --output yaml
./uipathtools health check --included DATASERVICE --output yaml
? locationType: cluster : DATASERVICE: - name: DATASERVICE_HEALTH description: 'Application health check failed: health status is Progressing and sync status is Synced' status: failed? locationType: cluster : DATASERVICE: - name: DATASERVICE_HEALTH description: 'Application health check failed: health status is Progressing and sync status is Synced' status: failed
./uipathctl health check --included DATASERVICE --output text
./uipathtools health check --included DATASERVICE --output text./uipathctl health check --included DATASERVICE --output text
./uipathtools health check --included DATASERVICE --output text
Checks run on cluster/[DATASERVICE][DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is SyncedChecks run on cluster/ ❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
./uipathctl health check --included DATASERVICE --output junit
./uipathtools health check --included DATASERVICE --output junit./uipathctl health check --included DATASERVICE --output junit
./uipathtools health check --included DATASERVICE --output junit
<testsuite name="Health" tests="1" errors="0" failures="1" time="0" timestamp="2023-06-22T01:59:08.313362+05:30" hostname=""> <testcase name="DATASERVICE_HEALTH" classname="" time="0"> <failure message="Application health check failed: health status is Progressing and sync status is Synced" type=""> </failure> </testcase> </testsuite><testsuite name="Health" tests="1" errors="0" failures="1" time="0" timestamp="2023-06-22T01:59:08.313362+05:30" hostname=""> <testcase name="DATASERVICE_HEALTH" classname="" time="0"> <failure message="Application health check failed: health status is Progressing and sync status is Synced" type=""> </failure> </testcase> </testsuite>

读取诊断报告

信息日志

绿色的 INFO 日志显示所需的检查已通过。但是,您仍应正确检查磁盘/内存的使用情况,以避免隐藏的错误。

警告消息

即使这些消息并不表示高风险,您也可能需要纠正它们,因为在某些情况下它们可能会影响某些服务。

错误消息

您必须修复这些消息描述的问题,因为它们会影响集群中的某些服务。

Rke2-server 或 Rke2-agent 服务关闭

如果这些服务关闭,则意味着节点已关闭。尝试使用 systemctl restart <service-name> 命令来重新启动服务,因为这应该可以解决问题。

装载于 /var/lib 的目录大小

该报告显示在 /var/lib 中装载的目录大小,因为 Kubernetes 使用它来存储其数据。如果目录已满,则可能会出现各种问题。为防止出现这些问题,请确保增加其大小。

Rke2 版本

报告将显示 rke2 版本以供参考。

磁盘压力或内存压力

对于所有节点,我们指定它们是处于磁盘压力之下,还是内存压力之下。如果发生这种情况,这些节点上的工作负载可能会开始出现问题。检查这些节点上是否正在运行任何其他正在消耗资源的流程,如果是这种情况,请将其删除。

Ceph 服务状态

我们使用 Ceph 作为 S3 对象存储,用于存储来自不同应用程序的日志和文件。您可以查看其服务的状态。如果它们已关闭,则可能必须重新启动它们。请务必同时检查 Ceph 的磁盘使用情况是否已满。

端口 443 和 31443

我们希望使用提供的主机名打开 44331443 端口。报告会指出它们是否不可访问。如果指向此处,请确保打开相应的端口。

证书有效性

该工具会检查上传的证书对于给定的主机名是否有效,以及是否未过期。如果证书不符合这些条件,则会发生错误。为防止出现这种情况,请务必检查您上传的证书,并在需要时进行更改。

GPU

由于某些服务要求集群中的某些节点存在 GPU,因此 Automation Suite 诊断工具会检查是否存在 GPU 节点并打印此类节点的数量。如果您期望 GPU 节点存在,但它们没有显示在此处,则意味着 GPU 设置中出现了问题。

RabbitMQ 和 DockerRegistry

RabbitMQ 和 DockerRegistry 是某些服务使用的两个重要组件。如果其中任何一个出现故障,您需要调查问题并重新启动。

ArgoCD 服务关闭

ArgoCD 是我们的应用程序生命周期管理 (ALM) 工具。如果其任何服务关闭,则其他应用程序可能已过期或存在其他问题。恢复这些服务很重要,并且可能需要进一步调试。

ArgoCD 应用程序缺失或降级

Automation Suite 诊断工具显示 ArgoCD 应用程序是否丢失和降级。

  • 如果缺少应用程序,请转到 ArgoCD 用户界面并进行同步。
  • 如果应用程序降级,则需要额外调试以调查 ArgoCD 引发的错误

此页面有帮助吗?

获取您需要的帮助
了解 RPA - 自动化课程
UiPath Community 论坛
Uipath Logo White
信任与安全
© 2005-2024 UiPath。保留所有权利。