- Información general
- Requisitos
- Recomendado: plantillas de implementación
- Manual: preparar la instalación
- Manual: preparar la instalación
- Paso 1: configurar el registro compatible con OCI para las instalaciones sin conexión
- Paso 2: configurar el almacén de objetos externo
- Paso 3: configurar High Availability Add-on
- Paso 4: configurar Microsoft SQL Server
- Paso 5: configurar el equilibrador de carga
- Paso 6: configurar el DNS
- Paso 7: configurar los discos
- Paso 8: configurar el kernel y la configuración en el nivel del sistema operativo
- Paso 9: configurar los puertos de nodo
- Paso 10: aplicar ajustes diversos
- Paso 12: Validar e instalar los paquetes RPM necesarios
- Paso 13: Generar cluster_config.json
- Configuración de certificados
- Configuración de la base de datos
- Configuración del almacén de objetos externo
- Configuración de URL prefirmada
- Configuración de registro externo compatible con OCI
- Disaster recovery: configuraciones activas/pasivas y activas/activas
- Configuración de High Availability Add-on
- Configuración específica de Orchestrator
- Configuración específica de Insights
- Process Mining-specific configuration
- Configuración específica de Document Understanding
- Automation Suite Robots-specific configuration
- Configuración de la supervisión
- Opcional: configurar el servidor proxy
- Opcional: habilitación de la resistencia a fallos de zona en un clúster multinodo de producción preparada para alta disponibilidad
- Opcional: pasar resolv.conf personalizado
- Optional: Increasing fault tolerance
- parámetros de install-uipath.sh
- Inclusión de un nodo agente dedicado compatible con GPU
- Añadir un nodo agente dedicado a Task Mining
- Conexión de la aplicación Task Mining
- Añadir un nodo agente dedicado a Automation Suite Robots
- Paso 15: configurar el registro temporal de Docker para las instalaciones sin conexión
- Paso 16: validar los requisitos previos para la instalación
- Manual: realizar la instalación
- Después de la instalación
- Administración de clústeres
- Gestionar los productos
- Primeros pasos con el Portal de administración del clúster
- Migrating objectstore from persistent volume to raw disks
- Migrar del en el clúster a High Availability Add-on externo
- Migrating data between objectstores
- Migrating in-cluster objectstore to external objectstore
- Migrar a un registro externo compatible con OCI
- Cambiar manualmente al clúster secundario en una configuración activa/pasiva
- Disaster Recovery: realizar operaciones posteriores a la instalación
- Convertir una instalación existente en una configuración en varios sitios
- Directrices sobre la actualización de una implementación activa/pasiva o activa/activa
- Directrices sobre la copia de seguridad y restauración de una implementación activa/pasiva o activa/activa
- Redireccionando el tráfico de los servicios no compatibles al clúster principal
- Supervisión y alertas
- Migración y actualización
- Paso 1: Mover los datos de la organización de identidad de independiente a Automation Suite
- Paso 2: restaurar la base de datos del producto independiente
- Paso 3: Realizar una copia de seguridad de la base de datos de la plataforma en Automation Suite
- Paso 4: Fusionar organizaciones en Automation Suite
- Paso 5: actualizar las cadenas de conexión de los productos migrados
- Paso 6: migrar el Orchestrator independiente
- Paso 7: migrar Insights independiente
- Paso 8: eliminar el tenant predeterminado
- B) Migración de tenant único
- Migrar de Automation Suite en Linux a Automation Suite en EKS / AKS
- Actualizar Automation Suite
- Descargar los paquetes de instalación y obtener todos los archivos del primer nodo del servidor
- Recuperar la última configuración aplicada del clúster
- Actualizar la configuración del clúster
- Configurar el registro compatible con OCI para las instalaciones sin conexión
- Ejecutar la actualización
- Realizar operaciones posteriores a la actualización
- Configuración específica del producto
- Uso de la herramienta de configuración de Orchestrator
- Configurar parámetros de Orchestrator
- Configuración de Orchestrator
- Configurar AppSettings
- Configurar el tamaño máximo de la solicitud
- Anular la configuración de almacenamiento a nivel de clúster
- Configurar almacenes de credenciales
- Configurar clave de cifrado por tenant
- Limpiar la base de datos de Orchestrator
- Buenas prácticas y mantenimiento
- Solución de problemas
- Cómo solucionar los problemas de los servicios durante la instalación
- Cómo desinstalar el clúster
- Cómo limpiar los artefactos sin conexión para mejorar el espacio en disco
- Cómo borrar datos de Redis
- Cómo habilitar el registro de Istio
- Cómo limpiar manualmente los registros
- Cómo limpiar los registros antiguos almacenados en el paquete sf-logs
- Cómo deshabilitar los registros de transmisión para AI Center
- Cómo depurar instalaciones de Automation Suite fallidas
- Cómo eliminar imágenes del instalador antiguo después de la actualización
- Cómo deshabilitar la descarga de la suma de comprobación TX
- Cómo actualizar desde Automation Suite 2022.10.10 y 2022.4.11 a 2023.10.2
- Cómo establecer manualmente el nivel de registro de ArgoCD en Info
- Cómo expandir el almacenamiento de AI Center
- Cómo generar el pull_secret_value codificado para registros externos
- Cómo abordar los cifrados débiles en TLS 1.2
- No se puede ejecutar una instalación sin conexión en el sistema operativo RHEL 8.4
- Error al descargar el paquete
- La instalación sin conexión falla porque falta un binario
- Problema de certificado en la instalación sin conexión
- First installation fails during Longhorn setup
- Error de validación de la cadena de conexión SQL
- Error en la comprobación de requisitos previos para el módulo iscsid de selinux
- Azure disk not marked as SSD
- Fallo tras la actualización del certificado
- El antivirus causa problemas de instalación
- Automation Suite not working after OS upgrade
- Automation Suite requiere que backlog_wait_time se establezca en 0
- El volumen no se puede montar porque no está listo para las cargas de trabajo
- Error de recopilación de registros del paquete de soporte
- La actualización de nodo único falla en la etapa de tejido
- Cluster unhealthy after automated upgrade from 2021.10
- Upgrade fails due to unhealthy Ceph
- RKE2 no se inicia debido a un problema de espacio
- El volumen no se puede montar y permanece en estado de bucle de conexión/desconexión
- La actualización falla debido a objetos clásicos en la base de datos de Orchestrator
- El clúster de Ceph se encuentra en un estado degradado tras una actualización en paralelo.
- Un componente Insights en mal estado provoca el fallo de la migración
- La actualización del servicio falla para Apps
- Tiempos de actualización in situ
- La migración del registro de Docker se atasca en la fase de eliminación de PVC
- Fallo de aprovisionamiento de AI Center después de actualizar a 2023.10
- La actualización falla en entornos sin conexión
- La validación SQL falla durante la actualización
- pod de snapshot-controller-crds en estado CrashLoopBackOff después de la actualización
- Establecer un intervalo de tiempo de espera para los portales de gestión
- La autenticación no funciona tras la migración
- kinit: no se puede encontrar la KDC para el territorio <AD Domain> mientras se obtienen las credenciales iniciales
- kinit: keytab no contiene claves adecuadas para *** mientras se obtienen las credenciales iniciales
- Error en la operación GSSAPI debido a un código de estado no válido
- Alarma recibida por un error en el trabajo de Kerberos-tgt-update
- Proveedor de SSPI: servidor no encontrado en la base de datos de Kerberos
- Error en inicio de sesión de un usuario AD debido a una cuenta deshabilitada
- ArgoCD login failed
- Actualizar las conexiones del directorio subyacente
- Fallo en la obtención de la imagen de Sandbox
- Los pods no se muestran en la interfaz de usuario de ArgoCD
- Fallo de la sonda Redis
- El servidor RKE2 no se inicia
- Secreto no encontrado en el espacio de nombres UiPath
- ArgoCD entra en estado de progreso tras la primera instalación
- Pods MongoDB en CrashLoopBackOff o pendientes de aprovisionamiento de PVC tras su eliminación
- Unhealthy services after cluster restore or rollback
- Pods atascados en Inicialización: 0 / X
- Faltan métricas de Ceph-rook en los paneles de supervisión
- Document Understanding no se encuentra en la barra izquierda de Automation Suite
- Estado fallido al crear una sesión de etiquetado de datos
- Estado fallido al intentar implementar una habilidad ML
- El trabajo de migración falla en ArgoCD
- El reconocimiento de la escritura manual con el extractor de formularios inteligente no funciona
- Ejecutar alta disponibilidad con Process Mining
- La ingestión de Process Mining falló al iniciar sesión con Kerberos
- Después de Disaster Recovery, Dapr no funciona correctamente para Process Mining y Task Mining
- No se puede conectar a la base de datos AutomationSuite_ProcessMining_Warehouse utilizando una cadena de conexión en formato pyodbc
- La instalación de Airflow falla con sqlalchemy.exc.ArgumentError: no se pudo analizar la URL rfc1738 de la cadena ''
- Cómo añadir una regla de tabla de IP para utilizar el puerto 1433 de SQL Server
- Ejecutar la herramienta de diagnóstico
- Uso de la herramienta del paquete de soporte de Automation Suite
- Explorar registros
Ejecutar la herramienta de diagnóstico
La herramienta de diagnóstico de Automation Suite ejecuta un conjunto de comprobaciones para generar un informe sobre el estado del clúster, que puedes analizar para identificar problemas y sus posibles causas. La herramienta te ayuda a encontrar incidencias comunes, como la pérdida de la conectividad de la base de datos o credenciales no válidas o caducadas.
uipathctl
y uipathtools
, y puedes descargarla en tu máquina de administración.
uipathtools
es una herramienta CLI que contiene un subconjunto de uipathctl
capacidades específicas para los comandos de mantenimiento. La herramienta es compatible con versiones anteriores y funciona con cualquiera de las versiones de Automation Suite compatibles. Recomendamos utilizar uipathtools
como primer paso si tienes algún problema.
check
y test
proporcionan información rápida sobre el estado del clúster sin ejecutar un análisis profundo.
-
check
se basa en el estado de salud y sincronización de ArgoCD y no modifica ningún estado del clúster -
test
examina las aplicaciones, la implementación o los pods y modifica temporalmente el estado del clúster para proporcionarte esa información.
Para ejecutar una verificación de estado, usa uno de los siguientes comandos, dependiendo de la herramienta CLI que uses:
- Si utilizas
uipathctl
, ejecuta:./uipathctl health check
./uipathctl health check - Si utilizas
uipathtools
, ejecuta:./uipathtools health check
./uipathtools health check
Salida de muestra del informe generado:
Checks run on cluster/
✔ [NOTIFICATIONSERVICE]
✔ [NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
✔ [ACTION_CENTER]
✔ [ACTIONCENTER_HEALTH] Application is healthy and in sync
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
✔ [RELOADER]
✔ [RELOADER_HEALTH] Application is healthy and in sync
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AIEVENTS]
✔ [AIEVENTS_HEALTH] Application is healthy and in sync
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
✔ [PLATFORM]
✔ [PLATFORM_HEALTH] Application is healthy and in sync
✔ [TASK_MINING]
✔ [TASKMINING_HEALTH] Application is healthy and in sync
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [WEBHOOK]
✔ [WEBHOOK_HEALTH] Application is healthy and in sync
Checks run on cluster/
✔ [NOTIFICATIONSERVICE]
✔ [NOTIFICATIONSERVICE_HEALTH] Application is healthy and in sync
✔ [ACTION_CENTER]
✔ [ACTIONCENTER_HEALTH] Application is healthy and in sync
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
✔ [RELOADER]
✔ [RELOADER_HEALTH] Application is healthy and in sync
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AIEVENTS]
✔ [AIEVENTS_HEALTH] Application is healthy and in sync
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
✔ [PLATFORM]
✔ [PLATFORM_HEALTH] Application is healthy and in sync
✔ [TASK_MINING]
✔ [TASKMINING_HEALTH] Application is healthy and in sync
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [WEBHOOK]
✔ [WEBHOOK_HEALTH] Application is healthy and in sync
uipathctl health check
comprueba el estado de todos los componentes. Sin embargo, también te permite comprobar estrictamente los componentes en los que estás interesado:
- Si quieres excluir componentes de la ejecución, utiliza la marca
--excluded
. Por ejemplo, si no quieres comprobar el estado de SQL, ejecutauipathctl health check --excluded SQL
. El comando comprueba el estado de todos los componentes, excepto para SQL. - Si quieres incluir solo ciertos componentes en la ejecución, utiliza la marca
--included
. Por ejemplo, si solo quieres comprobar el estado del DNS y el almacén de objetos, ejecutauipathctl health check --included DNS,OBJECTSTORAGE
.
Analizando los registros
- Después de ejecutar una comprobación de estado, los registros muestran que falló la comprobación de estado de la aplicación Data Service.
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced - Tras investigar más a fondo, queda claro que la aplicación Data Service falló porque los pods
dataservice-runtime-8f5bb7d56-v5krg
ydataservice-taskrunner-787df76c74-98h5l
están en estado fallido. Si sigue analizando, puede encontrar que falta eldataservice-external-storage-secret
.❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD] ✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found - Para solucionar este problema, asegúrate de proporcionar las credenciales correctas para el almacén de objetos en
input.json
.
Para ejecutar una prueba de estado, usa uno de los siguientes comandos, en función de la herramienta CLI que utilices:
- Si utilizas
uipathctl
, ejecuta:./uipathctl health test
./uipathctl health test - Si utilizas
uipathtools
, ejecuta:./uipathtools health test
./uipathtools health test
Salida de muestra del informe generado:
Checks run on cluster/
✔ [GATEKEEPER]
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [ACTION_CENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqk6b72
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqxwlsb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
✔ [APPS]
✔ [CREATE_NAMESPACE] Created namespace prereq9z6nb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
✔ [AUTOMATION_HUB]
✔ [CREATE_NAMESPACE] Created namespace prereq4jkbt
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
✔ [TEST_MANAGER]
✔ [CREATE_NAMESPACE] Created namespace prereqnvvpc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
✔ [ORCHESTRATOR]
✔ [CREATE_NAMESPACE] Created namespace prereq8pf2f
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
✔ [INSIGHTS]
✔ [CREATE_NAMESPACE] Created namespace prereqqmgjc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
✔ [AUTOMATION_OPS]
✔ [CREATE_NAMESPACE] Created namespace prereqgkkrz
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
✔ [AICENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqdls88
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9
Checks run on cluster/
✔ [GATEKEEPER]
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [ACTION_CENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqk6b72
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqk6b72
✔ [CREATE_NAMESPACE] Created namespace prereqbxjx8
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqbxjx8
✔ [CREATE_NAMESPACE] Created namespace prereq8zvw4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8zvw4
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqxwlsb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxwlsb
✔ [CREATE_NAMESPACE] Created namespace prereq5szsn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq5szsn
✔ [APPS]
✔ [CREATE_NAMESPACE] Created namespace prereq9z6nb
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq9z6nb
✔ [CREATE_NAMESPACE] Created namespace prereq6v7lm
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6v7lm
✔ [CREATE_NAMESPACE] Created namespace prereqxxn5v
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqxxn5v
✔ [AUTOMATION_HUB]
✔ [CREATE_NAMESPACE] Created namespace prereq4jkbt
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4jkbt
✔ [TEST_MANAGER]
✔ [CREATE_NAMESPACE] Created namespace prereqnvvpc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqnvvpc
✔ [ORCHESTRATOR]
✔ [CREATE_NAMESPACE] Created namespace prereq8pf2f
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq8pf2f
✔ [CREATE_NAMESPACE] Created namespace prereq4w4v4
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4w4v4
✔ [CREATE_NAMESPACE] Created namespace prereqkzwqg
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqkzwqg
✔ [INSIGHTS]
✔ [CREATE_NAMESPACE] Created namespace prereqqmgjc
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqqmgjc
✔ [CREATE_NAMESPACE] Created namespace prereq4vnjx
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq4vnjx
✔ [CREATE_NAMESPACE] Created namespace prereqgtg9g
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgtg9g
✔ [AUTOMATION_OPS]
✔ [CREATE_NAMESPACE] Created namespace prereqgkkrz
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqgkkrz
✔ [AICENTER]
✔ [CREATE_NAMESPACE] Created namespace prereqdls88
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqdls88
✔ [CREATE_NAMESPACE] Created namespace prereq6m7x9
✔ [CREATE_POD] Created test pod curl-pod in namespace prereq6m7x9
uipathctl health test
ejecuta pruebas de estado en todos los componentes. Sin embargo, también te permite comprobar estrictamente los componentes en los que estás interesado:
- Si quieres excluir componentes de la ejecución, utiliza la marca
--excluded
. Por ejemplo, si no quieres comprobar el estado de SQL, ejecutauipathctl health test --excluded SQL
. El comando comprueba el estado de todos los componentes, excepto para SQL. - Si quieres incluir solo ciertos componentes en la ejecución, utiliza la marca
--included
. Por ejemplo, si solo quieres comprobar el estado del DNS y el almacén de objetos, ejecutauipathctl health test --included DNS,OBJECTSTORAGE
.
check
y test
para la aplicación de Data Service, puedes ver que el primero valida el estado de la aplicación, mientras que el segundo comprueba el enrutamiento.
Problema conocido
Es posible que obtengas un mensaje de error similar al del siguiente ejemplo. Puede Actions ya que no se requiere ninguna acción por su parte.
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426321 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.426392 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.444420 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.446150 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
E0621 23:32:56.513357 24470 reflector.go:138] external/io_k8s_client_go/tools/cache/reflector.go:167: Failed to watch *v1.Pod: context deadline exceeded
diagnose
proporciona información detallada sobre el estado del clúster. Le ayuda a identificar problemas en todos los niveles, como SQL, almacén de objetos, nodo, secreto, Istio, metworking, etc.
- Cubre los comandos
check
ytest
. - Ejecuta las comprobaciones de requisitos previos realizadas antes de la instalación de Automation Suite para validar los cambios en la configuración del entorno que se realizaron después de la instalación y que pueden ser la posible causa del problema.
-
Se ejecuta en todos los nodos para recopilar cualquier problema específico del nodo, como la falta de disponibilidad de recursos, cualquier interferencia en la red, etc.
Para ejecutar una comprobación de diagnóstico, utiliza uno de los siguientes comandos, dependiendo de la herramienta CLI que utilices:
- Si utilizas
uipathctl
, ejecuta:./uipathctl health diagnose input.json --versions version.json
./uipathctl health diagnose input.json --versions version.json - Si utilizas
uipathtools
, ejecuta:./uipathtools health diagnose input.json --versions version.json
./uipathtools health diagnose input.json --versions version.json
Salida de muestra del informe generado:
Checks run on nodes/aks-pool0-27031798-vmss000001
✔ [REDIS(PORT=6380)]
✔ [CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380
✔ [OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)]
✔ [CHECK_API] Object storage test passed for orchestrator
✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)]
✔ [EXECUTE_NATIVE] Successfully executed command
✔ [BUILD_CLIENT] Successfully built ADO client
✔ [CONNECT] Successfully connected ADO client to DB
✔ [DB_ROLES] SQL user has the required roles to DB
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [DNS(FQDN=ALM.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
Checks run on cluster/
✔ [NODE]
✔ [NODE_EXISTS] 12 Nodes present in the cluster
✔ [NODE_READY] All the nodes are in ready state
✔ [GATEKEEPER]
✔ [GATEKEEPER_HEALTH] Application is healthy and in sync
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqctzhp
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
✔ [ROBOTUBE]
✔ [ROBOTUBE_HEALTH] Application is healthy and in sync
✔ [AIRFLOW]
✔ [AIRFLOW_HEALTH] Application is healthy and in sync
✔ [ARGOCD]
✔ [ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AICENTER]
✔ [AICENTER_HEALTH] Application is healthy and in sync
✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/
✔ [CONNECTIVITY]
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
✔ [INGRESS]
✔ [INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443
✔ [OSS(COMPONENT=MONITORING)]
✔ [OSS(component=monitoring)] Check for component monitoring passed
✔ [OSS(COMPONENT=GATEKEEPER)]
✔ [OSS(component=gatekeeper)] Check for component gatekeeper passed
✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)]
✔ [STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
✔ [REGISTRY]
✔ [CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
✔ [NETWORK-POLICIES]
✔ [CREATE_NAMESPACE] Namespace prereqw4t9b created
✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
✔ [CREATE_SERVICE] Service echo-server-svc created
✔ [STORAGECLASS(NAME=STORAGE_CLASS)]
✔ [STORAGE_CLASS_EXISTS] Storage class managed-premium exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [NODE(CPU >= 8, RAM >= 16GI)]
✔ [LIST_NODES] Listed 12 nodes
✔ [AT_LEAST_ONE_NODE] At least one node found
✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00
✔ [OSS(COMPONENT=CERT-MANAGER)]
✔ [OSS(component=cert-manager)] Check for component cert-manager passed
✔ [RESOURCE]
✔ [Capacity] Automation suite already installed on cluster
✔ [OSS(COMPONENT=LOGGING)]
✔ [OSS(component=logging)] Check for component logging passed
✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)]
✔ [BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [ISTIO]
✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
Checks run on nodes/aks-pool0-27031798-vmss000001
✔ [REDIS(PORT=6380)]
✔ [CONNECTIVITY] Successfully made Redis connection on ci-asaks4011056.redis.cache.windows.net:6380
✔ [OBJECTSTORAGE(PRODUCT=ORCHESTRATOR)]
✔ [CHECK_API] Object storage test passed for orchestrator
✔ [SQL(PRODUCT=PROCESSMINING, TYPE=ADO)]
✔ [EXECUTE_NATIVE] Successfully executed command
✔ [BUILD_CLIENT] Successfully built ADO client
✔ [CONNECT] Successfully connected ADO client to DB
✔ [DB_ROLES] SQL user has the required roles to DB
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [DNS(FQDN=ALM.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_SUBDOMAIN] Resolved alm.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
Checks run on cluster/
✔ [NODE]
✔ [NODE_EXISTS] 12 Nodes present in the cluster
✔ [NODE_READY] All the nodes are in ready state
✔ [GATEKEEPER]
✔ [GATEKEEPER_HEALTH] Application is healthy and in sync
✔ [CREATE_CONSTRAINT] Created test constraint
✔ [VERIFY] Constraint verified
✔ [CLEANUP] Cleaned up the test constraint
✔ [LOGGING]
✔ [LOGGING_HEALTH] Application is healthy and in sync
✔ [DATASERVICE]
✔ [CREATE_NAMESPACE] Created namespace prereqctzhp
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqctzhp
✔ [ROBOTUBE]
✔ [ROBOTUBE_HEALTH] Application is healthy and in sync
✔ [AIRFLOW]
✔ [AIRFLOW_HEALTH] Application is healthy and in sync
✔ [ARGOCD]
✔ [ARGOCD_SERVER_PODS] Component argocd-server has ready Pods
✔ [ARGOCD_REPO_SERVER_PODS] Component argocd-repo-server has ready Pods
✔ [ARGOCD_APP_CONTROLLER_PODS] Component argocd-application-controller has ready Pods
✔ [ARGOCD_REDIS_PODS] Component redis-ha has ready Pods
✔ [ISTIO]
✔ [LIST_PODS] Found 2 pods for Istio
✔ [ISTIOD_EXISTS] The Istio pods are present and running version -
✔ [ISTIOD_READY] Istio pods are healthy
✔ [AICENTER]
✔ [AICENTER_HEALTH] Application is healthy and in sync
✔ [CREATE_NAMESPACE] Created namespace prereqn6sqn
✔ [CREATE_POD] Created test pod curl-pod in namespace prereqn6sqn
Checks run on local/
✔ [CONNECTIVITY]
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-4rffj's IP 10.240.1.86 on aks-pool0-27031798-vmss000002
✔ [OVERLAY_CONNECTIVITY_TEST] echo-a-4rffj on aks-pool0-27031798-vmss000002 can reach echo-a-8c6t5's IP 10.240.3.57 on aks-pool3-27031798-vmss000000
✔ [POD_TO_A] Scenario: http check between two random pods completed successfully
✔ [POD_TO_B_MULTI_NODE_CLUSTERIP] Scenario: http check between from pod to a multinode ClusterIP completed successfully
✔ [POD_TO_B_MULTI_NODE_HEADLESS] Scenario: http check between from pod to a multinode ClusterIP without a clusterIP set completed successfully
✔ [POD_TO_B_INTRA_NODE_CLUSTERIP] Scenario: http check between from two pods colocated on the same node via ClusterIP completed successfully
✔ [INGRESS]
✔ [INGRESS_GATEWAY_FOUND] Found service istio-ingressgateway in the cluster
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on http://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com
✔ [INGRESS_GATEWAY_PORT_CHECK] Service istio-ingressgateway is configured to allow traffic on https://ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com:443
✔ [OSS(COMPONENT=MONITORING)]
✔ [OSS(component=monitoring)] Check for component monitoring passed
✔ [OSS(COMPONENT=GATEKEEPER)]
✔ [OSS(component=gatekeeper)] Check for component gatekeeper passed
✔ [STORAGECLASS(NAME=STORAGE_CLASS_SINGLE_REPLICA)]
✔ [STORAGE_CLASS_EXISTS] Storage class azurefile-csi exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqhcpkc
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-5n272
✔ [LIST_PODS] Listed 1 pods on node aks-pool3-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool3-27031798-vmss000001
✔ [REGISTRY]
✔ [CONNECTIVITY] Successfully made Registry connection on sfbrdevhelmweacr.azurecr.io
✔ [NETWORK-POLICIES]
✔ [CREATE_NAMESPACE] Namespace prereqw4t9b created
✔ [CREATE_EGRESS_NETWORK_POLICY] Created the egress network policies allow-coredns-egress and block-external-traffic
✔ [CREATE_INGRESS_NETWORK_POLICY] Created the ingress network policy: block-echo-server-ingress
✔ [CREATE_SERVICE] Service echo-server-svc created
✔ [STORAGECLASS(NAME=STORAGE_CLASS)]
✔ [STORAGE_CLASS_EXISTS] Storage class managed-premium exists
✔ [LIST_NODES] Listed 12 nodes
✔ [CREATE_NAMESPACE] Created namespace prereqgjhcb
✔ [CREATE_STATEFULSET] Created statefulset storage-class-check-nm9th
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000003
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000003
✔ [LIST_PODS] Listed 1 pods on node aks-pool0-27031798-vmss000001
✔ [POD_RUNNING] Found one pod running on node aks-pool0-27031798-vmss000001
✔ [DNS(FQDN=INSIGHTS.<FQDN>)]
✔ [VALIDATE_FQDN] FQDN is valid
✔ [RESOLVE_TOP_DOMAIN] Resolved ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [RESOLVE_SUBDOMAIN] Resolved insights.ci-asaks4011056.infra-sf-ea.infra.uipath-dev.com to [{20.71.155.129 }]
✔ [IPS_MATCH] Subdomain resolves to top domain
✔ [NODE(CPU >= 8, RAM >= 16GI)]
✔ [LIST_NODES] Listed 12 nodes
✔ [AT_LEAST_ONE_NODE] At least one node found
✔ [CPU_USAGE] Node aks-pool0-27031798-vmss000000 has 12.50% CPU usage
✔ [MEMORY_USAGE] Node aks-pool0-27031798-vmss000000 has 38.27% memory usage
✔ [POD_USAGE] Node aks-pool0-27031798-vmss000000 has 40.00% of pods in use. Number of pods: 40.00 max allowed: 100.00
✔ [OSS(COMPONENT=CERT-MANAGER)]
✔ [OSS(component=cert-manager)] Check for component cert-manager passed
✔ [RESOURCE]
✔ [Capacity] Automation suite already installed on cluster
✔ [OSS(COMPONENT=LOGGING)]
✔ [OSS(component=logging)] Check for component logging passed
✔ [GPU(PRODUCT=DOCUMENTUNDERSTANDING)]
✔ [BASIC_GPU_SUCCESS] Was able to start a CUDA job on a GPU node
Checks run on cluster/
❌ [DATASERVICE]
❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced
❌ [ISTIO]
✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date
❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_REGISTRY_FQDN] Successfully retrieved registry url
✔ [ISTIO_SERVICEMESH_VALIDATION_GET_CLUSTER_FQDN] Successfully retrieved cluster fqdn
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_DEPLOYMENT] Successfully created the test deployment istio-validation-deployment
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_SERVICE] Successfully created the test service istio-validation-service
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_GATEWAY] Successfully created the test gateway istio-validation-gateway
✔ [ISTIO_SERVICEMESH_VALIDATION_CREATE_TEST_VIRTUALSERVICE] Successfully created the test virtual service istio-validation-vs
✔ [ISTIO_SERVICEMESH_VALIDATION_URL_ACCESS] Success exposing the service via servicemesh
❌ [POD]
✔ [LIST_NAMESPACES] Retrieved 25 namespaces to check pod health
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/ah-tenant-service-sync-insights-data-job-28122960-p6rzg cannot mount volume: MountVolume.SetUp failed for volume "ah-insights-secrets" : failed to sync secret cache: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [POD_UNHEALTHY] Latest event for pod uipath/du-documentmanager-dm-maintenance-cron-28122960-4sm5z: Error: failed to sync configmap cache: timed out waiting for the condition
❌ [SYNC]
❌ [namespace:"argocd" | kind:"Application" | name:"dataservice"] Application health check failed: health status is Progressing and sync status is Synced
diagnose
se ejecuta en varios niveles, como la infraestructura, las redes, el almacenamiento, los pods, los DNS, etc.
Analizando los registros
Hay dos problemas potenciales que puede observar en los registros anteriores:
- Istio tiene una configuración incorrecta, lo que puede causar problemas para acceder a la plataforma Document Understanding:
❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000"
❌ [ISTIO] ✔ [ISTIO_SYNC_STATUS] Istio sync is up-to-date ❌ [ISTIO_ENVOY_CONFIG_STATUS] Istio Envoy configs are not healthy: Error [IST0101] (VirtualService uipath/du-platform-vs) Referenced host:port not found: "aistorage:5000" - Data Service no está disponible. Consulta Ceph en el ejemplo de código.
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
❌ [DATASERVICE] ❌ [DATASERVICE_HEALTH] Application health check failed: health status is Progressing and sync status is Synced ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-v5krg cannot mount volume: (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[external-storage-creds], unattached volumes=[workload-socket is-secrets openssl istio-podinfo temp-location cert-location istio-data external-storage-creds workload-certs istio-envoy java domain-cert-config edk2 credential-socket tmp additional-ca-cert-config pem istiod-ca-cert istio-token app-secrets ceph-storage-creds]: timed out waiting for the condition ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-runtime-8f5bb7d56-xs9t5 cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found ❌ [CANNOT_MOUNT_VOLUME] Pod uipath/dataservice-taskrunner-787df76c74-98h5l cannot mount volume: MountVolume.SetUp failed for volume "external-storage-creds" : secret "dataservice-external-storage-secret" not found
Problemas conocidos
Es posible que obtengas un mensaje de error similar al del siguiente ejemplo. Puede Actions ya que no se requiere ninguna acción por su parte.
I0622 01:31:28.917107 28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3
I0622 01:31:28.917107 28815 request.go:601] Waited for 1.017599292s due to client-side throttling, not priority and fairness, request: GET:https://ci-asaks4011056-fwwpyxm7.hcp.westeurope.azmk8s.io:443/apis/networking.istio.io/v1alpha3
check
, test
y diagnose
) admiten filtrado y formatos de salida adicionales.
Filtrar
Filtros |
Descripción |
Usos |
---|---|---|
|
Lista separada por comas de los servicios para incluir en la validación |
Este comando ejecuta el diagnóstico solo frente a Istio e Insights. |
|
Lista separada por comas de los servicios para excluir de la validación |
Este comando ejecuta la prueba en todo el clúster, excepto Istio e Insights. |
Formato de salida
json
, yaml
, text
y junit
. Puedes pasar estos valores a cualquiera de los comandos a través de la --output
. Estos formatos de salida son útiles cuando desea aprovechar estas herramientas para crear su propio marco de resolución de problemas sobre ellas.
Usos de ejemplo
Uso |
Ejemplo de resultado |
---|---|
|
|
|
|
|
|
|
|
Los registros de información en verde muestran que las comprobaciones requeridas han sido satisfactorias. Sin embargo, debe comprobar debidamente el uso del disco o de la memoria para evitar errores ocultos.
Aunque estos mensajes no indican un riesgo elevado, es posible que tenga que rectificarlos, ya que podrían afectar a algunos servicios en determinados escenarios.
Debe solucionar los problemas indicados en estos mensajes, ya que afectan a algún servicio del clúster.
Si estos servicios no están operativos, significa que el nodo está inactivo. Intente reiniciar el servicio utilizando systemctl restart <service-name>, ya que esto debería solucionar el problema.
/var/lib
ya que Kubernetes lo utiliza para almacenar sus datos. Si el directorio está lleno, pueden surgir varios problemas. Para evitar estos problemas, asegúrese de aumentar su tamaño.
En todos los nodos, se especifica si están bajo la presión del disco o de la memoria. Si esto sucede, las cargas de trabajo en estos nodos podrían empezar a presentar problemas. Compruebe si hay otros procesos en ejecución en estos nodos que estén consumiendo recursos y elimínelos si es el caso.
Se utiliza Ceph como almacenamiento de objetos S3 para guardar registros y archivos de distintas aplicaciones. Es posible consultar el estado de sus servicios. Si no están operativos, es posible que deba reiniciarlos. Asegúrese de comprobar también si el disco que utiliza Ceph está lleno.
443
y 31443
deben estar abiertos en el nombre de host proporcionado. El informe indica si no están accesibles. Asegúrese de abrir los puertos apropiados si así se le indica.
La herramienta comprueba si el certificado cargado es válido para el nombre de host proporcionado y si este no ha caducado. Si el certificado no cumple estos criterios, aparecerán errores. Para evitarlo, asegúrese de comprobar el certificado cargado y cambiarlo en caso necesario.
Dado que algunos servicios requieren que la GPU esté presente en algunos nodos del clúster, la herramienta de diagnósticos de Automation Suite comprueba si hay nodos GPU e imprime el número de dichos nodos. Si cree que hay nodos GPU presentes, pero no aparecen aquí, significará que algo ha ido mal en la configuración de la GPU.
RabbitMQ y DockerRegistry son dos componentes relevantes que utilizan algunos servicios. Si cualquiera de ellos no estuviera operativo, habría que investigar el problema y reiniciar.
ArgoCD es nuestra herramienta de gestión del ciclo de vida de las aplicaciones (ALM). Si cualquiera de sus servicios no está operativo, otras aplicaciones pueden quedar obsoletas o tener otros problemas. La recuperación de estos servicios es importante y puede requerir una mayor depuración.
La herramienta de diagnósticos de Automation Suite muestra si las aplicaciones de ArgoCD faltan o están degradadas.
- Si faltan aplicaciones, vaya a la interfaz de usuario de ArgoCD y sincronícela.
- Si las aplicaciones están degradadas, es necesario realizar una depuración adicional para investigar los errores mostrados por ArgoCD.
- Validación rápida
- Validación rápida
- Comprobación de estado
- Test de salud
- Validación profunda
- Validación profunda
- Servicios públicos adicionales
- Facilidades adicionales
- Leer informes de diagnóstico
- Registros de INFO
- Mensajes de ADVERTENCIA
- Mensajes de ERROR
- Servicio caído del servidor Rke2 o del agente Rke2
- Tamaño del directorio ubicado en /var/lib
- Versión Rke2
- Presión del disco o presión de la memoria
- Estado de los servicios Ceph
- Puertos 443 y 31443
- Validez del certificado
- GPU
- RabbitMQ y DockerRegistry
- Servicios ArgoCD no operativos
- Aplicaciones ArgoCD ausentes o degradadas