Subscribe

UiPath Automation Suite

The UiPath Automation Suite Guide

Adding a dedicated agent node with GPU support

📘

Important

A GPU can be installed only on an agent node, not a server node. Do not use or modify the gpu_support flag from the cluster_config.json. Instead, follow the instructions below to add a dedicated agent node with GPU support to the cluster.

Currently, Automation Suite only supports Nvidia GPU Drivers. See the list of GPU-supported operating systems.

You can find cloud-specific instance types for the nodes here:

Before adding a dedicated agent node with GPU support, make sure to check Hardware requirements.

Installing a GPU driver


  1. Run the following command to install the GPU driver on the agent node:
sudo yum install kernel kernel-tools kernel-headers kernel-devel 
sudo reboot
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel.repo
sudo sed 's/$releasever/8/g' -i /etc/yum.repos.d/epel-modular.repo
sudo yum config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo yum install cuda
  1. Run the following command to install the container toolkits:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
          && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo dnf clean expire-cache && sudo dnf install -y nvidia-container-toolkit
sudo yum install -y nvidia-container-runtime.x86_64
Verify if drivers are installed properly

Run sudo nvidia-smi command on the node to verify if the drivers were installed properly.

📘

Note:

After the cluster has been provisioned, additional steps are required to configure the provisioned GPUs.

At this point, the GPU drivers have been installed and that the GPU nodes have been added to the cluster.

 

Adding the GPU to the Agent Node


Run below two command to update contianerd configuration of agent node.

cat <<EOF > gpu_containerd.sh
if ! nvidia-smi &>/dev/null;
then
  echo "GPU Drivers are not installed on the VM. Please refer the documentation."
  exit 0
fi
if ! which nvidia-container-runtime &>/dev/null;
then
  echo "Nvidia container runtime is not installed on the VM. Please refer the documentation."
  exit 0 
fi
grep "nvidia-container-runtime" /var/lib/rancher/rke2/agent/etc/containerd/config.toml &>/dev/null && info "GPU containerd changes already applied" && exit 0
awk '1;/plugins.cri.containerd]/{print "  default_runtime_name = \"nvidia-container-runtime\""}' /var/lib/rancher/rke2/agent/etc/containerd/config.toml > /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
echo -e '\n[plugins.linux]\n  runtime = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
echo -e '\n[plugins.cri.containerd.runtimes.nvidia-container-runtime]\n  runtime_type = "io.containerd.runc.v2"\n  [plugins.cri.containerd.runtimes.nvidia-container-runtime.options]\n    BinaryName = "nvidia-container-runtime"' >> /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
EOF
sudo bash gpu_containerd.sh

Now run below command to restart rke2-agent

[[ "$(sudo systemctl is-enabled rke2-server 2>/dev/null)" == "enabled" ]] && systemctl restart rke2-server
[[ "$(sudo systemctl is-enabled rke2-agent 2>/dev/null)" == "enabled" ]] && systemctl restart rke2-agent

 

Enabling the GPU driver post-installation


Run the below commands from any of the primary server nodes.

Navigate to UiPathAutomationSuite folder.

cd /opt/UiPathAutomationSuite

Enable in Online Install


DOCKER_REGISTRY_URL=$(cat defaults.json | jq -er ".registries.docker.url")
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonset

Enable in Offline Install


DOCKER_REGISTRY_URL=localhost:30071
sed -i "s/REGISTRY_PLACEHOLDER/${DOCKER_REGISTRY_URL}/g" ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl apply -f ./Infra_Installer/gpu_plugin/nvidia-device-plugin.yaml
kubectl -n kube-system rollout restart daemonset nvidia-device-plugin-daemonset

 

GPU Taints


GPU workloads get scheduled on GPU nodes automatically when a workload requests for it. But normal CPU workloads also might get scheduled on these nodes, reserving the capacity. If you want only GPU workloads to be scheduled on these nodes you can add taints to these nodes using following commands from the first node.

  • nvidia.com/gpu=present:NoSchedule - non-GPU workloads do not get scheduled on this node unless explicitly specified
  • nvidia.com/gpu=present:PreferNoSchedule - this makes it a preferred condition rather than a hard one like the first option

Replace <node-name> with the corresponding GPU node name in your cluster and <taint-name> with one of the above 2 options in following command

kubectl taint node <node-name> <taint-name>

Validating GPU node provisioning


To ensure you have added the GPU nodes successfully, run the following command in the terminal. The output should show nvidia.com/gpu as an output along with the CPU and RAM resources.

kubectl describe node <node-name>

Updated a day ago


Adding a dedicated agent node with GPU support


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.