- Getting Started
- Network requirements
- Single-node requirements and installation
- Multi-node requirements and installation
- Post-installation
- Accessing AI Center
- Provision an AI Center tenant
- Updating Orchestrator and Identity Server certificates
- Resizing PVC
- Adding a new node to the cluster
- ML packages offline installation
- Configuring the cluster
- Configuring the FQDN post-installation
- Backing up and restoring the cluster
- Using the monitoring stack
- Setting up a Kerberos authentication
- Provisioning a GPU
- Using the configuration file
- Node scheduling
- Manage node scheduling
- Migration and upgrade
- Basic Troubleshooting Guide
Manage node scheduling
Node scheduling is especially helpful to better handle hardware and make sure that ML replicas (pods), skills, or pipelines. You should mainly use them in two situations:
- When you have a GPU node to make sure that only workload requiring GPU are scheduled on that node
- To better isolate you ML workloads so they are not disturbing other applications.
By default all nodes having the required resources are considered equal by kubernetes scheduler.
There are two methods in which you can direct the scheduling of ML replicas (pods), skills, or pipelines to specific nodes, and both are supported in UiPath AI Center:
- Assign Pods to Nodes using Node Affinity: this is useful when collocating ML pods with other pods on a node. Multiple labels can be applied to a node.
- Taints and Tolerations: intended for a dedicated node, either repelling all pods (
NoSchedule
), or imposing a low scheduling preference (PreferNoSchedule
) for pods that don't match the scheduling criteria. Although multiple taints can be applied to a node(s), node taints only support the “AND” Boolean logic.
Node affinity is a Pods property that attracts them to a set of nodes, either as a preference or a requirement. Taints on the other hand, allow a node to repel a set of pods.
The first method creates an affinity between replicas and nodes using node labels, while the second method applies anti-affinity by tainting the nodes.
PodSpec
templates are designed to support both methods and are customized based on GPU or non-GPU selection at deployment.
In the case where an agent node has been added to expand the resource pool and you want to influence the scheduling of ML pods on it, you can apply node affinity. Do this by adding a label to the node using the following command:
- For CPU:
kubectl label node <node_name> node.type=aic.ml.cpu
- For GPU:
kubectl label node <node_name> node.type=aic.ml.gpu
Node affinity does not ensure that the node is dedicated to serving ML workloads and does not prevent other workload pods from being scheduled to the same node where the labels are applied.
To dedicate a node, you need to use taints or a combination of node affinity and taints. To dedicate an agent node to serving either ML GPU or CPU pods, you can apply the following taints to the nodes:
- For CPU:
kubectl taint node <node_name> aic.ml/cpu=present:NoSchedule
- For GPU:
kubectl taint node <node_name> nvidia.com/gpu=present:NoSchedule
To dedicate an agent node for serving ML GPU pods and influence scheduling for ML CPU pods to the same nodes, you can use a combination of node affinity and taints:
kubectl taint node <node_name> nvidia.com/gpu=present:PreferNoSchedule
kubectl label node <node_name> node.type=aic.ml.cpu
kubectl taint node <node_name> nvidia.com/gpu=present:PreferNoSchedule
kubectl label node <node_name> node.type=aic.ml.cpu