Automation Suite
2023.10
false
Kubernetes resources alerts - Automation Suite 2023.10
Banner background image
logo
Automation Suite on Linux Installation Guide
Last updated Feb 13, 2024

Kubernetes resources alerts

k8s.rules, kube-apiserver-availability.rules, kube-apiserver-slos

KubeAPIErrorBudgetBurn

The Kubernetes API server is burning too much error budget.

kube-state-metrics

KubeStateMetricsListErrors, KubeStateMetricsWatchErrors

The Kube State Metrics collector is not able to collect metrics from the cluster without errors. This means important alerts may not fire. Contact UiPath Support.

KubernetesMemoryPressure

This alert indicates that memory usage is very high on the Kubernetes node.

If this alert fires, try to see which pod is consuming more memory.

kubernetes-apps

KubePodCrashLooping

A pod that keeps restarting unexpectedly. This can happen due to an out-of-memory (OOM) error, in which case the limits can be adjusted. Check the pod events with kubectl describe, and logs with kubectl logs to see details on possible crashes. If the issue persists, contact UiPath Support.

KubePodNotReady

A pod has started, but it is not responding to the health probe with success. This may mean that it is stuck and is not able to serve traffic. You can check pod logs with kubectl logs to see if there is any indication of progress. If the issue persists, contact UiPath Support.

KubeDeploymentGenerationMismatch, KubeStatefulSetGenerationMismatch

There has been an attempted update to a deployment or statefulset, but it has failed, and a rollback has not yet occurred. Contact UiPath Support.

KubeDeploymentReplicasMismatch, KubeStatefulSetReplicasMismatch

In high availability clusters with multiple replicas, this alert fires when the number of replicas is not optimal. This may occur when there are not enough resources in the cluster to schedule. Check resource utilization, and add capacity as necessary. Otherwise contact UiPath Support.

KubeStatefulSetUpdateNotRolledOut

An update to a statefulset has failed. Contact UiPath Support.

See also: StatefulSets.

KubeDaemonSetRolloutStuck

Daemonset rollout has failed. Contact UiPath Support.

See also: DaemonSet.

KubeContainerWaiting

A container is stuck in the waiting state. It has been scheduled to a worker node, but it cannot run on that machine. Check kubectl describe of the pod for more information. The most common cause of waiting containers is a failure to pull the image. For air-gapped clusters, this could mean that the local registry is not available. If the issue persists, contact UiPath Support.

KubeDaemonSetNotScheduled, KubeDaemonSetMisScheduled

This may indicate an issue with one of the nodes Check the health of each node, and remediate any known issues. Otherwise contact UiPath Support.

KubeJobCompletion

A job takes more than 12 hours to complete. This is not expected. Contact UiPath Support.

KubeJobFailed

A job has failed; however, most jobs are retried automatically. If the issue persists, contact UiPath Support.

KubeHpaReplicasMismatch

The autoscaler cannot scale the targeted resource as configured. If desired is higher than actual, then there may be a lack of resources. If desired is lower than actual, pods may be stuck while shutting down. If the issue persists, contact UiPath Support.

KubeHpaMaxedOut

The number of replicas for a given service has reached its maximum. This happens when the amount of requests being made to the cluster is very high. If high traffic is expected and temporary, you may silence this alert. However, this alert is a sign that the cluster is at capacity and cannot handle much more traffic. If more resource capacity is available on the cluster, you can increase the number of maximum replicas for the service by following these instructions:

# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'# Find the horizontal autoscaler that controls the replicas of the desired resource
kubectl get hpa -A
# Increase the number of max replicas of the desired resource, replacing <namespace> <resource> and <maxReplicas>
kubectl -n <namespace> patch hpa <resource> --patch '{"spec":{"maxReplicas":<maxReplicas>}}'

kubernetes-resources

KubeCPUOvercommit, KubeMemoryOvercommit

These warnings indicate that the cluster cannot tolerate node failure. For single-node evaluation clusters, this is known, and these alerts may be silenced. For multi-node HA-ready production setups, these alerts fire when too many nodes become unhealthy to support high availability, and they indicate that the nodes should be brought back to health or replaced.

KubeCPUQuotaOvercommit, KubeMemoryQuotaOvercommit, KubeQuotaAlmostFull, KubeQuotaFullyUsed, KubeQuotaExceeded

These alerts pertain to namespace resource quotas that only exist in the cluster if added through customization. Namespace resource quotas are not added as part of Automation Suite installation.

See also: Resource Quotas.

AggregatedAPIErrors, AggregatedAPIDown, KubeAPIDown, KubeAPITerminatedRequests

Indicates problems with the Kubernetes control plane. Check the health of master nodes, resolve any outstanding issues, and contact UiPath Support if the issues persist.

See also:

kubernetes-system-kubelet

KubeNodeNotReady, KubeNodeUnreachable, KubeNodeReadinessFlapping, KubeletPlegDurationHigh, KubeletPodStartUpLatencyHigh, KubeletDown

These alerts indicate a problem with a node. In multi-node HA-ready production clusters, pods would likely be rescheduled onto other nodes. If the issue persists, you should remove and drain the node to maintain the health of the cluster. In clusters without extra capacity, another node should be joined to the cluster first.

KubeletTooManyPods

There are too many pods running on the specified node.

to the cluster.

kubernetes-system

KubeVersionMismatch

There are different semantic versions of Kubernetes components running. This can happen as a result of an unsuccessful Kubernetes upgrade.

KubeClientErrors

Kubernetes API server client is experiencing greater than 1% errors. There may be an issue with the node this client is running on, or the Kubernetes API server itself.

etdc Alerts

EtcdInsufficientMembers

This alert indicates that the etcd cluster has an insufficient number of members. Note that the cluster must have an odd number of members. The severity of this alert is critical.

Make sure that there is an odd number of server nodes in the cluster, and all of them are up and healthy.

EtcdNoLeader

This alert shows that the etcd cluster has no leader. The severity of this alert is critical.

EtcdHighNumberOfLeaderChanges

This alert indicates that the etcd leader changes more than twice in 10 minutes. This is a warning.

EtcdHighNumberOfFailedGrpcRequests

This alert indicates that a certain percentage of GRPC request failures was detected in etcd.

EtcdGrpcRequestsSlow

This alert indicates that etcd GRPC requests are slow. This is a warning.

EtcdHighNumberOfFailedHttpRequests

This alert indicates that a certain percentage of HTTP failures was detected in etcd.

EtcdHttpRequestsSlow

This alert indicates that HTTP requests are slowing down. This is a warning.

EtcdMemberCommunicationSlow

This alert indicates that etcd member communication is slowing down. This is a warning.

EtcdHighNumberOfFailedProposals

This alert indicates that the etcd server received more than 5 failed proposals in the last hour. This is a warning.

EtcdHighFsyncDurations

This alert indicates that etcd WAL fsync duration is increasing. This is a warning.

EtcdHighCommitDurations

This alert indicates that etcd commit duration is increasing. This is a warning.

kube-api

KubernetesApiServerErrors

This alert indicates that the Kubernetes API server is experiencing a high error rate. This issue could lead to other failures, so it is recommended that you investigate the problem proactively.

Check logs for the api-server pod to find out the root cause of the issue using the kubectl logs <pod-name> -n kube-system command.
Support and Services icon
Get The Help You Need
UiPath Academy icon
Learning RPA - Automation Courses
UiPath Forum icon
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.