The Diagnostics Tool is the first thing to use when facing any issues with Automation Suite. It checks the health of different required components and gives a consolidated report.
Before you begin
supportability-toolszip and extract its contents using the following commands:
curl "https://download.uipath.com/automation-suite/2021.10.3/supportability-tools-2021.10.3.zip" -o supportability-tools-2021.10.3.zip
unzip supportability-tools-2021.10.3.zip -d support-tools
Then, you can run the diagnostics tool from the
support-tools/diagnostics-tool/folder using the
The following table lists out the checks the Diagnostics Tool performs. Note that you can run the script on any of the nodes in the cluster as well as externally.
Checks if required services are running;
Checks if required services are running on the node
Runs a Kubernetes job to collect the health of the services.
Note: To run the script from an external machine, first set the proper
Click to see a sample report generated by the Diagnostics Tool.
INFO logs in green show that the required checks passed. However, you should still properly check the disk/memory usage to avoid hidden errors.
Even though these messages do not signal a high risk, you might have to rectify them, as they might be affecting some services in certain scenarios.
You must fix the issues described by these messages as they impact some service in the cluster.
If these services are down, it means the node is down. Try restarting the service using systemctl restart as this should fix the issue.
The report displays the directory size mounted at
/var/lib as Kubernetes uses it to store its data. If the directory is full, various issues might arise. To prevent these problems, make sure to increase its size.
The report displays the
rke2 version for reference.
For all the nodes, we specify if they are under Disk Pressure or Memory Pressure. If that happens, workloads on these nodes might start showing issues. Check if there are any other processes running on these nodes that are consuming resources and remove them if that is the case.
We use Ceph as S3 Object storage for storing logs and files from different applications. You can view the status of its services. If they are down, you might have to restart them. Make sure to also check if the disk usage by Ceph is full.
We expect ports
31443 to be open with the hostname that was provided. The report indicates if they are not accessible. Make sure to open the appropriate ports if pointed here.
The tool checks if the uploaded certificate is valid for the given hostname and if it has not expired. If the certificate does not meet these criteria, errors occur. To prevent this, make sure to check your uploaded certificate and change it if required.
Since some services require GPU to be present on some of the nodes in the cluster, the Diagnostics Tool checks if there is are GPU nodes and prints number of such nodes. If you are expecting GPU nodes to be present and they do not show up here, that means something went wrong in GPU setup.
MongoDB is an important component that the UiPath Apps service uses. If either MongoDB or its primary instance is down, you need to investigate the issue using the support bundle.
RabbitMQ and DockerRegistry are two important components that some services use. If any of them is down, you need to investigate the issue and a restart.
ArgoCD is our application lifecycle management (ALM) tool. If any of its services are down, then other applications may become outdated or have other issues. Recovering these services is important, and might need further debugging.
The Diagnostics Tool shows whether ArgoCD applications are missing and degraded.
- If applications are missing, go to the ArgoCD UI and sync it.
- If applications are degraded, additional debugging is needed to investigate the errors thrown by ArgoCD
Updated 4 months ago