Automation Suite

2023.10

true

Automation Suite on Linux Installation Guide

Last updated Jul 24, 2024

Kubernetes resources alerts

k8s.rules, kube-apiserver-availability.rules, kube-apiserver-slos

KubeAPIErrorBudgetBurn

The Kubernetes API server is burning too much error budget.

kube-state-metrics

KubeStateMetricsListErrors, KubeStateMetricsWatchErrors

The Kube State Metrics collector is not able to collect metrics from the cluster without errors. This means important alerts may not fire. Contact UiPath® Support.

KubernetesMemoryPressure

This alert indicates that memory usage is very high on the Kubernetes node.

If this alert fires, try to see which pod is consuming more memory.

kubernetes-apps

KubePodCrashLooping

A pod that keeps restarting unexpectedly. This can happen due to an out-of-memory (OOM) error, in which case the limits can be adjusted. Check the pod events with kubectl describe, and logs with kubectl logs to see details on possible crashes. If the issue persists, contact UiPath® Support.

KubePodNotReady

A pod has started, but it is not responding to the health probe with success. This may mean that it is stuck and is not able to serve traffic. You can check pod logs with kubectl logs to see if there is any indication of progress. If the issue persists, contact UiPath® Support.

KubeDeploymentGenerationMismatch, KubeStatefulSetGenerationMismatch

There has been an attempted update to a deployment or statefulset, but it has failed, and a rollback has not yet occurred. Contact UiPath® Support.

KubeDeploymentReplicasMismatch, KubeStatefulSetReplicasMismatch

In high availability clusters with multiple replicas, this alert fires when the number of replicas is not optimal. This may occur when there are not enough resources in the cluster to schedule. Check resource utilization, and add capacity as necessary. Otherwise contact UiPath® Support.

KubeStatefulSetUpdateNotRolledOut

An update to a statefulset has failed. Contact UiPath® Support.

KubeDaemonSetRolloutStuck

Daemonset rollout has failed. Contact UiPath® Support.

KubeContainerWaiting

A container is stuck in the waiting state. It has been scheduled to a worker node, but it cannot run on that machine. Check kubectl describe of the pod for more information. The most common cause of waiting containers is a failure to pull the image. For air-gapped clusters, this could mean that the local registry is not available. If the issue persists, contact UiPath® Support.

KubeDaemonSetNotScheduled, KubeDaemonSetMisScheduled

This may indicate an issue with one of the nodes Check the health of each node, and remediate any known issues. Otherwise contact UiPath® Support.

KubeJobCompletion

A job takes more than 12 hours to complete. This is not expected. Contact UiPath® Support.

KubeJobFailed

A job has failed; however, most jobs are retried automatically. If the issue persists, contact UiPath® Support.

KubeHpaReplicasMismatch

The autoscaler cannot scale the targeted resource as configured. If desired is higher than actual, then there may be a lack of resources. If desired is lower than actual, pods may be stuck while shutting down. If the issue persists, contact UiPath® Support.

KubeHpaMaxedOut

The number of replicas for a given service has reached its maximum. This happens when the amount of requests being made to the cluster is very high. If high traffic is expected and temporary, you may silence this alert. However, this alert is a sign that the cluster is at capacity and cannot handle much more traffic. If more resource capacity is available on the cluster, you can increase the number of maximum replicas for the service by following these instructions:

# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'

kubernetes-resources

KubeCPUOvercommit, KubeMemoryOvercommit

These warnings indicate that the cluster cannot tolerate node failure. For single-node evaluation clusters, this is known, and these alerts may be silenced. For multi-node HA-ready production setups, these alerts fire when too many nodes become unhealthy to support high availability, and they indicate that the nodes should be brought back to health or replaced.

KubeCPUQuotaOvercommit, KubeMemoryQuotaOvercommit, KubeQuotaAlmostFull, KubeQuotaFullyUsed, KubeQuotaExceeded

These alerts pertain to namespace resource quotas that only exist in the cluster if added through customization. Namespace resource quotas are not added as part of Automation Suite installation.

AggregatedAPIErrors, AggregatedAPIDown, KubeAPIDown, KubeAPITerminatedRequests

Indicates problems with the Kubernetes control plane. Check the health of master nodes, resolve any outstanding issues, and contact UiPath® Support if the issues persist.

kubernetes-system-kubelet

KubeNodeNotReady, KubeNodeUnreachable, KubeNodeReadinessFlapping, KubeletPlegDurationHigh, KubeletPodStartUpLatencyHigh, KubeletDown

These alerts indicate a problem with a node. In multi-node HA-ready production clusters, pods would likely be rescheduled onto other nodes. If the issue persists, you should remove and drain the node to maintain the health of the cluster. In clusters without extra capacity, another node should be joined to the cluster first.

KubeletTooManyPods

There are too many pods running on the specified node.

Join another node to the cluster.

kubernetes-system

KubeVersionMismatch

There are different semantic versions of Kubernetes components running. This can happen as a result of an unsuccessful Kubernetes upgrade.

KubeClientErrors

Kubernetes API server client is experiencing greater than 1% errors. There may be an issue with the node this client is running on, or the Kubernetes API server itself.

etdc Alerts

EtcdInsufficientMembers

This alert indicates that the etcd cluster has an insufficient number of members. Note that the cluster must have an odd number of members. The severity of this alert is critical.

Make sure that there is an odd number of server nodes in the cluster, and all of them are up and healthy.

EtcdNoLeader

This alert shows that the etcd cluster has no leader. The severity of this alert is critical.

EtcdHighNumberOfLeaderChanges

This alert indicates that the etcd leader changes more than twice in 10 minutes. This is a warning.

EtcdHighNumberOfFailedGrpcRequests

This alert indicates that a certain percentage of GRPC request failures was detected in etcd.

EtcdGrpcRequestsSlow

This alert indicates that etcd GRPC requests are slow. This is a warning.

EtcdHighNumberOfFailedHttpRequests

This alert indicates that a certain percentage of HTTP failures was detected in etcd.

EtcdHttpRequestsSlow

This alert indicates that HTTP requests are slowing down. This is a warning.

EtcdMemberCommunicationSlow

This alert indicates that etcd member communication is slowing down. This is a warning.

EtcdHighNumberOfFailedProposals

This alert indicates that the etcd server received more than 5 failed proposals in the last hour. This is a warning.

EtcdHighFsyncDurations

This alert indicates that etcd WAL fsync duration is increasing. This is a warning.

EtcdHighCommitDurations

This alert indicates that etcd commit duration is increasing. This is a warning.

kube-api

KubernetesApiServerErrors

This alert indicates that the Kubernetes API server is experiencing a high error rate. This issue could lead to other failures, so it is recommended that you investigate the problem proactively.

Check logs for the api-server pod to find out the root cause of the issue using the kubectl logs <pod-name> -n kube-system command.

On this page