- Release Notes
- Requirements
- Installation
- Getting Started
- Projects
- Datasets
- ML Packages
- Pipelines
- ML Skills
- ML Logs
- Document Understanding in AI Fabric
- Basic Troubleshooting Guide
1. Provision an Azure AKS Cluster
This section details the steps that need to be taken when provisioning a AKS cluster to run AI Fabric in a Highly Available, multi-node configuration.
- AI Fabric installation is currently supported on a fresh, dedicated AKS cluster. A shared cluster with unknown polices/restriction may need additional workaround and is not officially supported.
- If your AI Fabric usage is very high & Horizontal scaling is required to support wide variety of AI/ML use cases at your org
- You need an highly available solution deployed across multiple zones
- Require Multiple Replicas for both Core & ML Services for Disaster recovery as well as scaling purpose
- Highly Available Kubernetes Cluster Managed by Azure
- Highly Available Container Registry Managed by Azure
- Highly Available Cloud Storage managed by Azure
- Horizontal Pod Autoscaler i.e Scale no of pods for core services based on workload
- Kubernetes Cluster Autoscaler i.e increase the no of nodes in the AKS cluster based on demand automatically for both CPU & GPU Node Pools
- Taints for GPU to ensure GPU resources are used for desired purpose
- Certificates are provisioned & managed through Cert Manager
- AKS Cluster with minimum two node-pools. At least one node-pool with GPU type nodes and proper taints.
- Azure Container Registry in the same RG as of the Cluster
- One storage account in the same RG as of the Cluster
- One application Insights instance in the same RG as of the Cluster
- SQL Server with databases created
This section provides a high level overview of the steps, the next two sections break this process down in detail.
- User creates AKS Cluster, Azure Container Registry, 1 storage account, 1 application-insights instance.
-
Oneinstaller will take following things as input :
AKS Cluster name AKS Cluster Resource Group AKS Worker Resource Group ACR Endpoint ACR Name (Username) ACR Password Storage Account Name Storage Account Key Application Insights Key Flag to indicate whether to expose kotsadm service or not DNS Prefix Name for aifabric
AKS Cluster name AKS Cluster Resource Group AKS Worker Resource Group ACR Endpoint ACR Name (Username) ACR Password Storage Account Name Storage Account Key Application Insights Key Flag to indicate whether to expose kotsadm service or not DNS Prefix Name for aifabric - Then customer logs in to shell.azure.com, sets the correct subscription ID - using "az account set --subscription"
- Shell should have kubectl installed already.
- Then copy the one-installer bundle to the shell, unzip it and run bash setup.sh, above values can either be passed via command line or if it is not passed in command line, our installer will again ask for them.
- Enter "azure" in platform input. And everything else will be provisioned after this.
- Kotsadmin and kots application in aifabric namespace
- Istio installation in istio-system namespace
- Create containers in the storage account provided by them (equivalent of buckets)
- Apply CORS policy on the storage account
- Assign the provided dns prefix to the Public IP of the istio gateway
- Install Cert Manager in cert-manager namespace
- Install Velero (in velero namespace), create a backup and schedule regular backups.
Skip this step if you already have AKS cluster installed. Kubernetes version should be 1.16, 1.17 or 1.18. Other versions are not supported currently.
- Create a resource group in Azure, you should have Owner role at the resource group level.
- Search Kubernetes Services in portal and create a new instance of AKS.
- Select the resource group you created in step #1a and give a suitable cluster name of your choice.
- Make sure your Kubernetes version is 1.16, 1.17 or 1.18.
- Select Node Size for worker nodes, (Min recommended size Standard_D2_v3 with 3 Nodes)
- Under Authentication Tab -
- Authentication Methods, select “System-assigned managed identity”
- Under Networking Tab -
- Select “Network Configuration“ as “Azure CNI“
- Select “Network Policy“ as Azure
- Keep other fields same.
- Under Integrations Tab -
- If you have a container registry in the same resource group select that, this will be used to push docker images built at run time.
- Or select Create New > Give a suitable registry name > Select “Admin User“ as Enable
- Under Tags Tab, give suitable tags as required.
- Click on “Review + Create“ and create the cluster.
AI Center needs min 2 NodePools to be created in your cluster, 1 gets created when you created the Cluster, you will have to create another one as well. 1 of these node pools is expected to be of nodes with GPUs attached. We identify if such node pool exists or not by using the Taints on the node pool.
We check for this taint nvidia.com/gpu=present:NoSchedule
To create a NodePool with this taint, you can use below command (sample)
az aks nodepool add --name gpunodepool \)\)
--enable-cluster-autoscaler \)\)
--resource-group ${RESOURCEGROUP} \)\)
--cluster-name ${AKSCLUSTERNAME} \)\)
--node-vm-size Standard_NC6 \)\)
--node-taints nvidia.com/gpu=present:NoSchedule \)\)
--labels accelerator=nvidia \)\)
--node-count 0 \)\)
--min-count 0 \)\)
--max-count 3
az aks nodepool add --name gpunodepool \)\)
--enable-cluster-autoscaler \)\)
--resource-group ${RESOURCEGROUP} \)\)
--cluster-name ${AKSCLUSTERNAME} \)\)
--node-vm-size Standard_NC6 \)\)
--node-taints nvidia.com/gpu=present:NoSchedule \)\)
--labels accelerator=nvidia \)\)
--node-count 0 \)\)
--min-count 0 \)\)
--max-count 3
--node-vm-size you can change based on the type of node you want to use for GPU. Check here for supported GPU VM sizes in Azure.
Create a storage account in the same resource group where your cluster is deployed. We will be using this storage account to create containers to store the AIFabric related files
- Search storage accounts in the portal
- Create new storage account.
- Select the above resource group
- Give a suitable name to the storage account
- Keep rest of the settings default. Add tags and create the account.
Create 1 application insights instance in the same RG where AIFabric will be publishing its logs
- Search “Application Insights“ in the portal
- Create New Instance
- Select above Resource group
- Give a suitable name to the instance
- In Resource Mode, select Classic
- Add tags and Create.
If your AKS Cluster is private, there will be additional steps detailed here.
Once above infra is setup, open shell.azure.com and set the current subscription to the one where your RG is present, using following command -
az account set --subscription <subscription-id>
az extension add --name application-insights
az account set --subscription <subscription-id>
az extension add --name application-insights
First step is to download installer zip file here and move it to AI Center server. Alternatively, you can download it directly from the machine using following command
wget https://download.uipath.com/aifabric/online-installer/v2020.10.2/aifabric-installer-v20.10.2.tar.gz
wget https://download.uipath.com/aifabric/online-installer/v2020.10.2/aifabric-installer-v20.10.2.tar.gz
Then untar the file and go inside main folder using following command:
tar -xvf aifabric-installer-v20.10.2.tar.gz
cd ./aifabric-installer-v20.10.2
tar -xvf aifabric-installer-v20.10.2.tar.gz
cd ./aifabric-installer-v20.10.2
Then run setup.sh using following command
bash setup.sh --resourceGroup <YOUR RESOURCE GROUP> --clusterName <YOUR CLUSTER NAME> --exposeKots n --acrHost <YOUR CONTAINER REGISTRY> --acrUser <YOUR CR USERNAME> --acrKey <YOUR CR KEY> --workerResourcegroup <YOUR WORKER RG> --dnsPrefix aicapp --storageAccountName <YOUR STORAGE ACCOUNT> --storageAccountAccessKey <YOUR STORAGE ACCOUNT ACCESS KEY> --appInsightsKey <YOUR ACCESS KEY> --platform azure --email <YOUR EMAIL>
bash setup.sh --resourceGroup <YOUR RESOURCE GROUP> --clusterName <YOUR CLUSTER NAME> --exposeKots n --acrHost <YOUR CONTAINER REGISTRY> --acrUser <YOUR CR USERNAME> --acrKey <YOUR CR KEY> --workerResourcegroup <YOUR WORKER RG> --dnsPrefix aicapp --storageAccountName <YOUR STORAGE ACCOUNT> --storageAccountAccessKey <YOUR STORAGE ACCOUNT ACCESS KEY> --appInsightsKey <YOUR ACCESS KEY> --platform azure --email <YOUR EMAIL>
Please replace corresponding values according to your setup, as explained below.
The parameters in the above command are as below (if they are not passed in the command, installer will again ask for them) -
platform → azure
resourceGroup → Resource group where we created the cluster and other resources
workerResourcegroup → AKS creates another resource group along with your cluster to maintain the cluster workload, please give that name. Search with your original resource group name in the portal and you will get the other RG name as well, usually this name will be similar to MC_<rg-name><cluster-name><region>
clusterName → AKS Cluster Name
exposeKots → Whether to expose Kotsadmin over internet or not. (y/n)
acrHost → Go to the container registry that was selected in AKS, and get the server name as below
acrUser → This is the registry name that was created.
acrKey → Go to the Access Keys on the ACR page and give any one of the passwords from there
dnsPrefix → DNS Prefix to be used for AIFabric ingress, (like aifabric, aim-app, anything you want)
storageAccountName → Storage account name that was created in the same resource group in step #3
storageAccountAccessKey → go to the storage account page → Access Keys. Click on “Show Keys” and copy any one of the Key from key1 or key2 and pass it.
appInsightsKey → Application insights key for the one that was created. (Instrumentation Key from below screenshot)
email → Email id to be used to notify about certificate expiration details
The installation might fail with the application insights not found error for the first time, re-run should pass, or you can install the az extension using this command and re-run the installation
az extension add --name application-insights
If your AKS Cluster is private, there are some differences to the installation process above.
- Create 1 Ubuntu VM in the same network as that of the aks cluster
- Install kubectl, az, helm and jq command line tools there.
- There will be an azure client with the same name as your AKS Cluster, go to ask cluster’s vnet and under access policies add this client as Contributor (This is required for private load balancer creation)
- Log in to azure and set the correct subscription
ID as of the AKS Cluster
az login az account set --subscription <subscription-id>
az login az account set --subscription <subscription-id> - In bash setup.sh command you will have to pass another parameter, --isPrivate y
- Istio and KotsAdmin will be exposed over private load balancer IP addresses.
- Expose_kots flag will be ignored in this case and kotsadmin service will be assigned one internal load-balancer
- As private IP addresses cannot have public DNS names, we create a self signed cert for Istio.
- On Kotsadmin page, Ingress Host will be the Istio
private IP from the oneinstaller output. You can check that any time by running and
getting the External IP address
-
kubectl -n istio-system get svc istio-ingressgateway
kubectl -n istio-system get svc istio-ingressgateway - You will be able to use AIFabric at https://<istio-private-ip>/ai-app from the private network wherever AKS VNET is accessible.
- If you want to expose this to the internet, you will have to redirect the traffic from public facing gateway to this internal IP on 443 port.
- Create a public facing gateway
- Configure the gateway to send the traffic from 443 port to <istio-private-ip>:443
- Use the gateway’s public IP or a DNS entry pointing to this gateway as Ingress Host in the Kotsadmin
- Gateway configuration requires TLS certs details
to connect to the backend server, if you want to update your own certs for AIFabric,
you can upload them from kotsadmin -
- Also if you gateway requires health check, you can use /ai-deployer/actuator/health URL.
- Create an Azure Application gateway instance linked to the same vnet as of AKS or some vnet which is peered with the AKS vnet
- Create a backend pools entry with the Istio loadbalancer private IP
- Create https settings entry with
port 443, If you are going to use self signed certs, follow the documentation
https://docs.microsoft.com/en-us/azure/application-gateway/self-signed-certificates
to create self signed CA and server certificates (while generating server cert,
under common name field enter the Istio LB), also under host name setting,
select “Override with specific domain name“ and put the ip of Istio LB
- Upload the CA certificate in the Trusted Root Certificate field.
- Upload the server certificates created above in the kotsadmin, use the public IP of the gateway and trigger a deployment from kotsadmin.
- Generate a pfx file from the
server cert and key created above, like
-openssl pkcs12 -export -out contoso.pfx -inkey fabrikam.key -in fabrikam.crt
. - Create a listener entry in the gateway with 443 port and upload this pfx file there.
- Create a Rule in gateway, selecting the above listener, http setting and backend target.
- Create a health probe entry with
https, put the host as IP address of the Istio loadbalancer. Put this
/ai-deployer/actuator/health as path and select the above http setting created
in step 3.
Requirement | Note |
---|---|
Azure Resource Group with Owner Access | |
Ensure you have sufficient vCPUs available in the region where you have created your Resource Group | Minimum number of vCPUs available should be 16. Run this command to
check what is available to you in the region that hosts your Resource Group:
az vm list-skus --location westeurope --output table | grep
virtualMachines .
|
Ensure you have the appropriate machine types available in the region where you have created your Resource Group | By default, these machines are Standard_D8s_v3 &
Standard_NC6 . Run this command to check what is available to you in the
region that hosts your Resource Group: az vm list-skus --location westeurope
--output table | grep virtualMachines .
|
Ensure you add Mandatory Tags | Please confirm you know if there are mandatory tags that are required for tagging the resources at the time of provisioning under your company's Azure subscription. |
Download Infrastructure Provisioning Script
Download aks-arm.zip folder from UiPath ai-customer-scripts repo i.e from this location, https://github.com/UiPath/ai-customer-scripts/blob/master/platform/aks/aks-arm.zip
Requirement | Note |
---|---|
Working Orchestrator 20.10 Installation | Reference here |
SQL Server 2008 R2 or above | Reference here |
SQL Server Authentication must be enabled | Contact your SQL Server admin. |
SQL Credentials that allow for database and role creation. | Contact your SQL Server admin. |
Have a compatible browser | Reference |
Boot Disk needs to be at least 200GB | Reference here |
On worker nodes, secondary Disk(s) of at least 500GB in aggregate need to be unformatted | Reference here |
Connect to Orchestrator from AI Fabric Master Node. Must be able to connect via domain name. | telnet <Orchestrator-Domain-Name>
<port> from the AI Fabric UCP node must work.
|
Connect to Database from AI Fabric Master node. | telnet <SQL-Server-IP>
<port> from the AI Fabric UCP node must work.
|
Connect from Robot/Studio machines to AI Fabric Master node. | telnet <UCP-Host-IP> 33443 and
telnet <UCP-Host-IP> 33390 from
robot/studio machines must work. That is, robot/studio
machine will be clients to the server on port 33443 and 33390.
|
Connect to Endpoints needed by installer from AI Fabric machine. | AIFabric machine must not have blocked outbound access to the endpoints |
Domain Certificate (for AI Fabric machine) from a trusted CA authority. | Reference here |
AI Fabric license file. | Reference here |
- Prerequisites
- Installation steps overview
- Components to be installed by One-Installer
- Detailed infra setup
- Creating an AKS Cluster
- Create Node Pools
- Create a Storage Account
- Create Application Insights Instance
- AI Fabric installation steps
- Open Shell
- Download the installer
- Run Setup
- AKS private cluster
- Detailed explanation on setting up Azure Gateway
- Infrastructure provisioning
- Requirements before proceeding