Automation Suite

2021.10

false

Automation Suite Installation Guide

Last updated Apr 19, 2024

Using the monitoring stack

The monitoring stack for Automation Suite clusters includes Prometheus, Grafana, and Alertmanager, which are integrated within the Rancher Cluster Explorer UI.

Note:

Node failures might lead to a Kubernetes shutdown, which would disrupt Prometheus alerts. To prevent this, we recommend setting up a separate alert on the RKE2 server.

This page describes a series of monitoring scenarios. For more details, see the official Rancher documentation on using Rancher Monitoring.

Important:

When using collectors to export metrics to third-party tools, enabling application monitoring may disrupt the functionality of Automation Suite.

Accessing the Rancher Monitoring Dashboard

Access the Rancher Cluster Explorer via https://monitoring.{cluster_host}. The screen you are prompted with depends on whether this is the first time connecting to Rancher Server.
Enter the password. The password is the same in either case. It can be retrieved with the following command:
```
kubectl get secret -n cattle-system rancher-admin-password -o jsonpath='{.data.password}' | base64 -d && echokubectl get secret -n cattle-system rancher-admin-password -o jsonpath='{.data.password}' | base64 -d && echo
```
Open the Cluster Explorer by clicking the local link on the Rancher Server main page.
Click the Monitoring menu on the sidebar.

Checking Currently Firing Alerts

In the Monitoring dashboard, check the bottom pane for currently firing alerts. The following screenshots shows several currently firing alerts.

Silencing alerts

If alerts are too noisy, you can silence them. To do that, take the following steps:

Click the Alertmanager tile in the upper left corner of the Monitoring dashboard. The following screen is displayed.
Find the alert in question, and select Silence.
Fill in the Creator and Comment details, and click Create. The alert should no longer show on the Monitoring Dashboard or be reported to any of the configured receivers.

Sending Alerts to an External Receiver

It is highly recommended to set up an external receiver for alerts. This way, alerts will be pushed as they happen, instead of requiring a refresh of the Monitoring dashboard to see the latest alerts.

For details on how to send alerts to an external receiver, see the Rancher documentation on Alertmanager Receiver Configuration.

In addition to a receiver, you must configure at least one route that uses that receiver. A route defines how alerts are grouped, and which alerts are sent to the receiver. See the Rancher documentation on Alertmanager Route Configuration.

See below for an example of how the alerts will be displayed when using the Slack receiver. Clicking the link to AlertManager will take you to the AlertManager console where alerts can be silenced and there are further links to the Prometheus expression that triggered the alert. Clicking the Runbook URL will take you to this page with specific remediation instructions. These links are also present when alerts are sent to other external receivers.

Accessing the Grafana dashboard

On the Monitoring dashboard, click the Grafana tile. The Grafana dashboard is now displayed.

Monitoring the Service Mesh

You can monitor the Istio Service Mesh via the following Grafana dashboards: Istio Mesh and Istio Workload.

Istio Mesh dashboard

This dashboard shows the overall request volume, as well as 400 and 500 error rates across the entire service mesh, for the selected time period. The data is displayed in the upper-right corner of the window. See the 4 charts across the top for this information.

It also shows the immediate Success Rate over the past minute for each individual service. Note that a Success Rate of NaN indicates the service is not currently serving traffic.

Istio Workload dashboard

This dashboard shows the traffic metrics over the time range selected in the upper-right corner of the window.

Use the selectors at the top of the dashboard to drill into specific workloads. Of particular interest is the uipath namespace.

The top section shows overall metrics, the Inbound Workloads section separates out traffic based on origin, and the Outbound Services section separates out traffic based on destination.

Monitoring Persistent Volumes

You can monitor persistent volumes via the Kubernetes / Persistent Volumes dashboard. You can keep track of the free and used space for each volume.

You can also check the status of each volume by clicking the PersistentVolumes item within the Storage menu of the Cluster Explorer.

Monitoring hardware utilization

To check the hardware utilization per node, you can use the Nodes dashboard. Data on the CPU, Memory, Disk, and Network is available.

You can monitor the hardware utilization for specific workloads using the Kubernetes / Compute Resources / Namespace (Workloads) dashboard. Select the uipath namespace to get the needed data.

Creating shareable visual snapshot of a Grafana chart

Click the downwards pointing arrow next to the chart title, and then select Share.
Click the Snapshot tab, and set the Snapshot name,Expire, and Timeout.
Click Publish to snapshot.raintank.io.

For more details, see the Grafana documentation on sharing dashboards.

Note: This snapshot is viewable on the public Internet by anyone with the link.

Creating custom persistent Grafana dashboards

For details on how to create custom persisten Grafana dashboards, see Rancher documentation.

Admin access to Grafana

Admin access to Grafana is not typically needed in Automation Suite clusters as dashboards are available for read access by default to anonymous users, and creating custom persistent dashboards must be created using the Kubernetes-native instructions linked above in this document.

Nonetheless, admin access to Grafana is possible with the instructions below.

The default username and password for Grafana admin access can be retrieved as follows:

kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-user}' | base64 -d && echo
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echokubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-user}' | base64 -d && echo
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echo

Note that in High Availability Automation Suite clusters, there are multiple Grafana pods in order to enable uninterrupted read access in case of node failure, as well as a higher volume of read queries. This is incompatible with admin access because the pods do not share session state and logging in requires it. In order to work around this, the number of Grafana replicas must be temporarily scaled to 1 while admin access is desired. See below for instructions on how to scale the number of Grafana replicas:

# scale down
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=1
# scale up
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=2# scale down
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=1
# scale up
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=2

Querying Prometheus

On the Monitoring Dashboard, click Prometheus Graph. A new window is displayed.

Documentation on the available metrics is here:

Creating custom alerts

You can create custom alerts using a Prometheus query with a Boolean expression.

To do so, click Prometheus Rules in the Advanced menu of the Monitoring Dashboard.
Click Create in the upper-right corner of the window to create a new alert, and follow the Rancher documentation: PrometheusRules
When the alert fires, it should show on the Monitoring Dashboard. Additionally, it will be routed to any of the configured receivers.

Monitoring Kubernetes resource status

To see the status of pods, deployments, statefulsets, etc., you can use the Cluster Explorer UI. This is the same landing page as accessed after logging into the rancher-server endpoint. The homepage shows a summary, with drill downs into specific details for each resource type on the left. Note the namespace selector at the top of the page. This dashboard may also be replaced with the Lens tool.

Exporting Prometheus Metrics to an External System

Prometheus uses the Prometheus remote write feature to collect and export Prometheus metrics to an external system.

Note: UiPath does not support or maintain the remote write endpoint integrations. However, the endpoints are compatible with the Prometheus instance delivered within Automation Suite.

To configure remote_write on an Automation Suite cluster:

Connect to ArgoCD.
Click Applications.
Navigate to fabric-installer.
Open the APP DETAILS panel and disable self-heal.
Navigate to the rancher-monitoring application.
Open the APP DETAILS panel > MANIFEST tab.
Click EDIT and navigate to the values > prometheus > prometheusSpec section.
Add the desired remoteWrite configurations.

Discover the available configurations for the remote write feature.
SAVE the new configuration. The rancher-monitoring application displays OutOfSync until the new configuration is applied.

Note: Prometheus does not need to restart to apply the new remote write configurations.
Test the desired remote write integration. Return to step 8 to add a new configuration.