- Overview
- Requirements
- Installation
- Q&A: Deployment templates
- Configuring the machines
- Configuring the external objectstore
- Configuring an external Docker registry
- Configuring the load balancer
- Configuring the DNS
- Configuring Microsoft SQL Server
- Configuring the certificates
- Online multi-node HA-ready production installation
- Offline multi-node HA-ready production installation
- Disaster recovery - Installing the secondary cluster
- Downloading the installation packages
- install-uipath.sh parameters
- Enabling Redis High Availability Add-On for the cluster
- Document Understanding configuration file
- Adding a dedicated agent node with GPU support
- Adding a dedicated agent Node for Task Mining
- Connecting Task Mining application
- Adding a Dedicated Agent Node for Automation Suite Robots
- Post-installation
- Cluster administration
- Monitoring and alerting
- Using the monitoring stack
- Alert runbooks
- Migration and upgrade
- Migration options
- Step 1: Moving the Identity organization data from standalone to Automation Suite
- Step 2: Restoring the standalone product database
- Step 3: Backing up the platform database in Automation Suite
- Step 4: Merging organizations in Automation Suite
- Step 5: Updating the migrated product connection strings
- Step 6: Migrating standalone Insights
- Step 7: Deleting the default tenant
- B) Single tenant migration
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to troubleshoot services during installation
- How to uninstall the cluster
- How to clean up offline artifacts to improve disk space
- How to clear Redis data
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to disable streaming logs for AI Center
- How to debug failed Automation Suite installations
- How to delete images from the old installer after upgrade
- How to automatically clean up Longhorn snapshots
- How to disable TX checksum offloading
- How to manually set the ArgoCD log level to Info
- How to generate the encoded pull_secret_value for external registries
- How to address weak ciphers in TLS 1.2
- Unable to run an offline installation on RHEL 8.4 OS
- Error in downloading the bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- First installation fails during Longhorn setup
- SQL connection string validation error
- Prerequisite check for selinux iscsid module fails
- Azure disk not marked as SSD
- Failure after certificate update
- Antivirus causes installation issues
- Automation Suite not working after OS upgrade
- Automation Suite requires backlog_wait_time to be set to 0
- GPU node affected by resource unavailability
- Volume unable to mount due to not being ready for workloads
- Support bundle log collection failure
- Failure to upload or download data in objectstore
- PVC resize does not heal Ceph
- Failure to resize PVC
- Failure to resize objectstore PVC
- Rook Ceph or Looker pod stuck in Init state
- StatefulSet volume attachment error
- Failure to create persistent volumes
- Storage reclamation patch
- Backup failed due to TooManySnapshots error
- All Longhorn replicas are faulted
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Authentication not working after migration
- Kinit: Cannot find KDC for realm <AD Domain> while getting initial credentials
- Kinit: Keytab contains no suitable keys for *** while getting initial credentials
- GSSAPI operation failed due to invalid status code
- Alarm received for failed Kerberos-tgt-update job
- SSPI provider: Server not found in Kerberos database
- Login failed for AD user due to disabled account
- ArgoCD login failed
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis probe failure
- RKE2 server fails to start
- Secret not found in UiPath namespace
- ArgoCD goes into progressing state after first installation
- Issues accessing the ArgoCD read-only account
- MongoDB pods in CrashLoopBackOff or pending PVC provisioning after deletion
- Unhealthy services after cluster restore or rollback
- Pods stuck in Init:0/X
- Prometheus in CrashloopBackoff state with out-of-memory (OOM) error
- Missing Ceph-rook metrics from monitoring dashboards
- Pods cannot communicate with FQDN in a proxy environment
- Running High Availability with Process Mining
- Process Mining ingestion failed when logged in using Kerberos
- Unable to connect to AutomationSuite_ProcessMining_Warehouse database using a pyodbc format connection string
- Airflow installation fails with sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
- How to add an IP table rule to use SQL Server port 1433
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite support bundle
- Exploring Logs
Using the monitoring stack
The monitoring stack for Automation Suite clusters includes Prometheus, Grafana, and Alertmanager, which are integrated within the Rancher Cluster Explorer UI.
Node failures might lead to a Kubernetes shutdown, which would disrupt Prometheus alerts. To prevent this, we recommend setting up a separate alert on the RKE2 server.
This page describes a series of monitoring scenarios. For more details, see the official Rancher documentation on using Rancher Monitoring.
When using collectors to export metrics to third-party tools, enabling application monitoring may disrupt the functionality of Automation Suite.
The monitoring stack for Automation Suite clusters includes Prometheus, Grafana, Alert Manager, and Longhorn Dashboard.
This page describes a series of monitoring scenarios.
For more details, see the official Rancher documentation on using Rancher Monitoring.
You can access the Automation Suite monitoring tools individually using the following URLs:
Application |
Tool |
URL |
Example |
---|---|---|---|
Metrics |
Prometheus |
|
|
Dashboard |
Grafana |
|
|
Alert Management |
Alert Manager |
|
|
Persistent Block Storage |
Longhorn dashboard |
|
|
To access the monitoring tools for the first time, log in as an admin with the following default credentials:
- Username: admin
- Password: to retrieve the password , run the
following
command:
kubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -d
kubectl get secrets/dex-static-credential -n uipath-auth -o "jsonpath={.data['password']}" | base64 -d
To update the default password used for Dex authentication while accessing the monitoring tools, take the following steps:
-
Run the following command by replacing
newpassword
with your new password:password="newpassword" password=$(echo -n $password | base64) kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]"
password="newpassword" password=$(echo -n $password | base64) kubectl patch secret dex-static-credential -n uipath-auth --type='json' -p="[{'op': 'replace', 'path': '/data/password', 'value': '$password'}]" -
Run the following command by replacing
<cluster_config.json>
with the path to your configuration file:/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement
/opt/UiPathAutomationSuite/UiPath_Installer/install-uipath.sh -i <cluster_config.json> -f -o output.json --accept-license-agreement
https://monitoring.fqdn/metrics
and click the Alerts tab. Here you can see all the alerts configured in Automation Suite.
To view the active alerts, filter the alert status by clicking the Firing checkbox and the Show annotations checkbox at the top. Here you can see all the alerts that are firing currently and their corresponding messages.
If alerts are too noisy, you can silence them. To do that, take the following steps:
uipathctl
in the Automation Suite installation folder: .../UiPathAutomationSuite/UiPath_Installer/bin
.
Before starting configuring the alerts, make sure to enable kubectl.
To add a new email configuration after an installation, run the following command:
./uipathctl config alerts add-email \
--name test \
--to "admin@example.com" \
--from "admin@example.com" \
--smtp server.mycompany.com \
--username admin \
--password somesecret \
--require-tls \
--ca-file <path_to_ca_file> \
--cert-file <path_to_cert_file> \
--key-file <path_to_key_file> \
--send-resolved
./uipathctl config alerts add-email \
--name test \
--to "admin@example.com" \
--from "admin@example.com" \
--smtp server.mycompany.com \
--username admin \
--password somesecret \
--require-tls \
--ca-file <path_to_ca_file> \
--cert-file <path_to_cert_file> \
--key-file <path_to_key_file> \
--send-resolved
Flag |
Description |
Example |
---|---|---|
|
The name of the email configuration |
|
|
The email address of the receiver |
|
|
The email address of the sender |
|
|
SMTP server URL or IP address and port number |
|
|
Authentication username |
|
|
Authentication password |
|
|
Boolean flag to denote that TLS is enabled at the SMTP server. |
N/A |
|
File path containing the CA Certificate of the SMTP server. This is optional if the CA is private. |
|
|
File path containing the certificate of the SMTP server. This is optional if the certificate is private. |
|
|
File path containing the private key of the certificate of the SMTP server. This is required if the certificate is private. |
|
|
Boolean flag to send an email once the alert is resolved. |
N/A |
To remove an email configuration, you must run the following command. Make sure to pass the name of the email configuration you want to remove.
./uipathctl config alerts remove-email --name test
./uipathctl config alerts remove-email --name test
To update an email configuration, you must run the following command. Make sure to pass the name of the email configuration you want to update and the additional optional parameters you want to edit. These parameters are the same as the ones for adding a new email configuration. You can pass one or more flags at the same time.
./uipathctl config alerts update-email --name test [additional_flags]
./uipathctl config alerts update-email --name test [additional_flags]
To access Grafana dashboards, you must retrieve your credentials and use them to log in:
-
Username:
kubectl -n cattle-monitoring-system get secrets/rancher-monitoring-grafana -o "jsonpath={.data.admin-user}" | base64 -d; echo
kubectl -n cattle-monitoring-system get secrets/rancher-monitoring-grafana -o "jsonpath={.data.admin-user}" | base64 -d; echo -
Password:
kubectl -n cattle-monitoring-system get secrets/rancher-monitoring-grafana -o "jsonpath={.data.admin-password}" | base64 -d; echo
kubectl -n cattle-monitoring-system get secrets/rancher-monitoring-grafana -o "jsonpath={.data.admin-password}" | base64 -d; echo
You can monitor the Istio Service Mesh via the following Grafana dashboards: Istio Mesh and Istio Workload.
This dashboard shows the overall request volume, as well as 400 and 500 error rates across the entire service mesh, for the selected time period. The data is displayed in the upper-right corner of the window. See the 4 charts across the top for this information.
It also shows the immediate Success Rate over the past minute for each individual service. Note that a Success Rate of NaN indicates the service is not currently serving traffic.
This dashboard shows the traffic metrics over the time range selected in the upper-right corner of the window.
Use the selectors at the top of the dashboard to drill into specific workloads. Of particular interest is the uipath namespace.
The top section shows overall metrics, the Inbound Workloads section separates out traffic based on origin, and the Outbound Services section separates out traffic based on destination.
You can monitor persistent volumes via the Kubernetes / Persistent Volumes dashboard. You can keep track of the free and used space for each volume.
You can also check the status of each volume by clicking the PersistentVolumes item within the Storage menu of the Cluster Explorer.
To check the hardware utilization per node, you can use the Nodes dashboard. Data on the CPU, Memory, Disk, and Network is available.
You can monitor the hardware utilization for specific workloads using the Kubernetes / Compute Resources / Namespace (Workloads) dashboard. Select the uipath namespace to get the needed data.
- Click the downwards pointing arrow next to the chart title, and then select Share.
- Click the Snapshot tab, and set the Snapshot name,Expire, and Timeout.
- Click Publish to snapshot.raintank.io.
For more details, see the Grafana documentation on sharing dashboards.
For details on how to create custom persisten Grafana dashboards, see Rancher documentation.
Admin access to Grafana is not typically needed in Automation Suite clusters as dashboards are available for read access by default to anonymous users, and creating custom persistent dashboards must be created using the Kubernetes-native instructions linked above in this document.
Nonetheless, admin access to Grafana is possible with the instructions below.
The default username and password for Grafana admin access can be retrieved as follows:
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-user}' | base64 -d && echo
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echo
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-user}' | base64 -d && echo
kubectl get secret -n cattle-monitoring-system rancher-monitoring-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echo
Note that in High Availability Automation Suite clusters, there are multiple Grafana pods in order to enable uninterrupted read access in case of node failure, as well as a higher volume of read queries. This is incompatible with admin access because the pods do not share session state and logging in requires it. In order to work around this, the number of Grafana replicas must be temporarily scaled to 1 while admin access is desired. See below for instructions on how to scale the number of Grafana replicas:
# scale down
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=1
# scale up
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=2
# scale down
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=1
# scale up
kubectl scale -n cattle-monitoring-system deployment/rancher-monitoring-grafana --replicas=2
Documentation on the available metrics is here:
You can create custom alerts using a Prometheus query with a Boolean expression.
- To do so, click Prometheus Rules in the Advanced menu of the Monitoring Dashboard.
- Click Create in the upper-right corner of the window to create a new alert, and follow the Rancher documentation: PrometheusRules
- When the alert fires, it should show on the Monitoring Dashboard. Additionally, it will be routed to any of the configured receivers.
To see the status of pods, deployments, statefulsets, etc., you can use the Cluster Explorer UI. This is the same landing page as accessed after logging into the rancher-server endpoint. The homepage shows a summary, with drill downs into specific details for each resource type on the left. Note the namespace selector at the top of the page. This dashboard may also be replaced with the Lens tool.
Prometheus uses the Prometheus remote write feature to collect and export Prometheus metrics to an external system.
remote_write
on an Automation Suite cluster:
- Accessing the monitoring tools
- Overview
- Authentication
- Checking currently firing alerts
- Silencing alerts
- Configuring the alerts
- Adding a new email configuration
- Removing an email configuration
- Updating an email configuration
- Accessing Grafana dashboard
- Monitoring the Service Mesh
- Istio Mesh dashboard
- Istio Workload dashboard
- Monitoring Persistent Volumes
- Monitoring hardware utilization
- Creating shareable visual snapshot of a Grafana chart
- Creating custom persistent Grafana dashboards
- Admin access to Grafana
- Querying Prometheus
- Creating custom alerts
- Monitoring Kubernetes resource status
- Exporting Prometheus metrics to an external system