AI Center
2020.10
false
Banner background image
AI Center
Last updated Mar 11, 2024

1. Provision an Azure AKS Cluster

This section details the steps that need to be taken when provisioning a AKS cluster to run AI Fabric in a Highly Available, multi-node configuration.

Note:
  • AI Fabric installation is currently supported on a fresh, dedicated AKS cluster. A shared cluster with unknown polices/restriction may need additional workaround and is not officially supported.
  • If your AI Fabric usage is very high & Horizontal scaling is required to support wide variety of AI/ML use cases at your org
  • You need an highly available solution deployed across multiple zones
  • Require Multiple Replicas for both Core & ML Services for Disaster recovery as well as scaling purpose
  • Highly Available Kubernetes Cluster Managed by Azure
  • Highly Available Container Registry Managed by Azure
  • Highly Available Cloud Storage managed by Azure
  • Horizontal Pod Autoscaler i.e Scale no of pods for core services based on workload
  • Kubernetes Cluster Autoscaler i.e increase the no of nodes in the AKS cluster based on demand automatically for both CPU & GPU Node Pools
  • Taints for GPU to ensure GPU resources are used for desired purpose
  • Certificates are provisioned & managed through Cert Manager

Prerequisites

  1. AKS Cluster with minimum two node-pools. At least one node-pool with GPU type nodes and proper taints.
  2. Azure Container Registry in the same RG as of the Cluster
  3. One storage account in the same RG as of the Cluster
  4. One application Insights instance in the same RG as of the Cluster
  5. SQL Server with databases created

Installation steps overview

This section provides a high level overview of the steps, the next two sections break this process down in detail.

  1. User creates AKS Cluster, Azure Container Registry, 1 storage account, 1 application-insights instance.
  2. Oneinstaller will take following things as input :

    AKS Cluster name
    AKS Cluster Resource Group
    AKS Worker Resource Group
    ACR Endpoint
    ACR Name (Username)
    ACR Password
    Storage Account Name
    Storage Account Key
    Application Insights Key
    Flag to indicate whether to expose kotsadm service or not
    DNS Prefix Name for aifabricAKS Cluster name
    AKS Cluster Resource Group
    AKS Worker Resource Group
    ACR Endpoint
    ACR Name (Username)
    ACR Password
    Storage Account Name
    Storage Account Key
    Application Insights Key
    Flag to indicate whether to expose kotsadm service or not
    DNS Prefix Name for aifabric
  3. Then customer logs in to shell.azure.com, sets the correct subscription ID - using "az account set --subscription"
  4. Shell should have kubectl installed already.
  5. Then copy the one-installer bundle to the shell, unzip it and run bash setup.sh, above values can either be passed via command line or if it is not passed in command line, our installer will again ask for them.
  6. Enter "azure" in platform input. And everything else will be provisioned after this.

Components to be installed by One-Installer

  • Kotsadmin and kots application in aifabric namespace
  • Istio installation in istio-system namespace
  • Create containers in the storage account provided by them (equivalent of buckets)
  • Apply CORS policy on the storage account
  • Assign the provided dns prefix to the Public IP of the istio gateway
  • Install Cert Manager in cert-manager namespace
  • Install Velero (in velero namespace), create a backup and schedule regular backups.

Detailed infra setup

Creating an AKS Cluster

Skip this step if you already have AKS cluster installed. Kubernetes version should be 1.16, 1.17 or 1.18. Other versions are not supported currently.

  1. Create a resource group in Azure, you should have Owner role at the resource group level.
  2. Search Kubernetes Services in portal and create a new instance of AKS.
  3. Select the resource group you created in step #1a and give a suitable cluster name of your choice.
  4. Make sure your Kubernetes version is 1.16, 1.17 or 1.18.
  5. Select Node Size for worker nodes, (Min recommended size Standard_D2_v3 with 3 Nodes)
  6. Under Authentication Tab -
    • Authentication Methods, select “System-assigned managed identity”
  7. Under Networking Tab -
    • Select “Network Configuration“ as “Azure CNI“
    • Select “Network Policy“ as Azure
    • Keep other fields same.
  8. Under Integrations Tab -
    • If you have a container registry in the same resource group select that, this will be used to push docker images built at run time.
    • Or select Create New > Give a suitable registry name > Select “Admin User“ as Enable
  9. Under Tags Tab, give suitable tags as required.
  10. Click on “Review + Create“ and create the cluster.

Create Node Pools

AI Center needs min 2 NodePools to be created in your cluster, 1 gets created when you created the Cluster, you will have to create another one as well. 1 of these node pools is expected to be of nodes with GPUs attached. We identify if such node pool exists or not by using the Taints on the node pool.

We check for this taint nvidia.com/gpu=present:NoSchedule

To create a NodePool with this taint, you can use below command (sample)

az aks nodepool add --name gpunodepool \)\)
      --enable-cluster-autoscaler \)\)
      --resource-group ${RESOURCEGROUP} \)\)
      --cluster-name ${AKSCLUSTERNAME} \)\)
      --node-vm-size Standard_NC6 \)\)
      --node-taints nvidia.com/gpu=present:NoSchedule \)\)
      --labels accelerator=nvidia \)\)
      --node-count 0 \)\)
      --min-count 0 \)\)
      --max-count 3az aks nodepool add --name gpunodepool \)\)
      --enable-cluster-autoscaler \)\)
      --resource-group ${RESOURCEGROUP} \)\)
      --cluster-name ${AKSCLUSTERNAME} \)\)
      --node-vm-size Standard_NC6 \)\)
      --node-taints nvidia.com/gpu=present:NoSchedule \)\)
      --labels accelerator=nvidia \)\)
      --node-count 0 \)\)
      --min-count 0 \)\)
      --max-count 3

--node-vm-size you can change based on the type of node you want to use for GPU. Check here for supported GPU VM sizes in Azure.

Create a Storage Account

Create a storage account in the same resource group where your cluster is deployed. We will be using this storage account to create containers to store the AIFabric related files

  1. Search storage accounts in the portal
  2. Create new storage account.
  3. Select the above resource group
  4. Give a suitable name to the storage account
  5. Keep rest of the settings default. Add tags and create the account.

Create Application Insights Instance

Create 1 application insights instance in the same RG where AIFabric will be publishing its logs

  1. Search “Application Insights“ in the portal
  2. Create New Instance
  3. Select above Resource group
  4. Give a suitable name to the instance
  5. In Resource Mode, select Classic
  6. Add tags and Create.

AI Fabric installation steps

If your AKS Cluster is private, there will be additional steps detailed here.

Open Shell

Once above infra is setup, open shell.azure.com and set the current subscription to the one where your RG is present, using following command -

az account set --subscription <subscription-id>
az extension add --name application-insightsaz account set --subscription <subscription-id>
az extension add --name application-insights

Download the installer

First step is to download installer zip file here and move it to AI Center server. Alternatively, you can download it directly from the machine using following command

wget https://download.uipath.com/aifabric/online-installer/v2020.10.2/aifabric-installer-v20.10.2.tar.gzwget https://download.uipath.com/aifabric/online-installer/v2020.10.2/aifabric-installer-v20.10.2.tar.gz

Then untar the file and go inside main folder using following command:

tar -xvf aifabric-installer-v20.10.2.tar.gz
cd ./aifabric-installer-v20.10.2tar -xvf aifabric-installer-v20.10.2.tar.gz
cd ./aifabric-installer-v20.10.2

Run Setup

Then run setup.sh using following command

bash setup.sh --resourceGroup <YOUR RESOURCE GROUP> --clusterName <YOUR CLUSTER NAME> --exposeKots n --acrHost <YOUR CONTAINER REGISTRY> --acrUser <YOUR CR USERNAME> --acrKey <YOUR CR KEY> --workerResourcegroup <YOUR WORKER RG> --dnsPrefix aicapp --storageAccountName <YOUR STORAGE ACCOUNT> --storageAccountAccessKey <YOUR STORAGE ACCOUNT ACCESS KEY> --appInsightsKey <YOUR ACCESS KEY> --platform azure --email <YOUR EMAIL>bash setup.sh --resourceGroup <YOUR RESOURCE GROUP> --clusterName <YOUR CLUSTER NAME> --exposeKots n --acrHost <YOUR CONTAINER REGISTRY> --acrUser <YOUR CR USERNAME> --acrKey <YOUR CR KEY> --workerResourcegroup <YOUR WORKER RG> --dnsPrefix aicapp --storageAccountName <YOUR STORAGE ACCOUNT> --storageAccountAccessKey <YOUR STORAGE ACCOUNT ACCESS KEY> --appInsightsKey <YOUR ACCESS KEY> --platform azure --email <YOUR EMAIL>

Please replace corresponding values according to your setup, as explained below.

The parameters in the above command are as below (if they are not passed in the command, installer will again ask for them) -

platform → azure

resourceGroup → Resource group where we created the cluster and other resources

workerResourcegroup → AKS creates another resource group along with your cluster to maintain the cluster workload, please give that name. Search with your original resource group name in the portal and you will get the other RG name as well, usually this name will be similar to MC_<rg-name><cluster-name><region>



clusterName → AKS Cluster Name

exposeKots → Whether to expose Kotsadmin over internet or not. (y/n)

acrHost → Go to the container registry that was selected in AKS, and get the server name as below



acrUser → This is the registry name that was created.

acrKey → Go to the Access Keys on the ACR page and give any one of the passwords from there



dnsPrefix → DNS Prefix to be used for AIFabric ingress, (like aifabric, aim-app, anything you want)

storageAccountName → Storage account name that was created in the same resource group in step #3

storageAccountAccessKey → go to the storage account page → Access Keys. Click on “Show Keys” and copy any one of the Key from key1 or key2 and pass it.



appInsightsKey → Application insights key for the one that was created. (Instrumentation Key from below screenshot)



email → Email id to be used to notify about certificate expiration details

Important:

The installation might fail with the application insights not found error for the first time, re-run should pass, or you can install the az extension using this command and re-run the installation

az extension add --name application-insights

AKS private cluster

If your AKS Cluster is private, there are some differences to the installation process above.

  1. Create 1 Ubuntu VM in the same network as that of the aks cluster
  2. Install kubectl, az, helm and jq command line tools there.
  3. There will be an azure client with the same name as your AKS Cluster, go to ask cluster’s vnet and under access policies add this client as Contributor (This is required for private load balancer creation)
  4. Log in to azure and set the correct subscription ID as of the AKS Cluster
    az login
    az account set --subscription <subscription-id>az login
    az account set --subscription <subscription-id>
  5. In bash setup.sh command you will have to pass another parameter, --isPrivate y
  6. Istio and KotsAdmin will be exposed over private load balancer IP addresses.
  7. Expose_kots flag will be ignored in this case and kotsadmin service will be assigned one internal load-balancer
  8. As private IP addresses cannot have public DNS names, we create a self signed cert for Istio.
  9. On Kotsadmin page, Ingress Host will be the Istio private IP from the oneinstaller output. You can check that any time by running and getting the External IP address -
    kubectl -n istio-system get svc istio-ingressgatewaykubectl -n istio-system get svc istio-ingressgateway
  10. You will be able to use AIFabric at https://<istio-private-ip>/ai-app from the private network wherever AKS VNET is accessible.
  11. If you want to expose this to the internet, you will have to redirect the traffic from public facing gateway to this internal IP on 443 port.
  • Create a public facing gateway
  • Configure the gateway to send the traffic from 443 port to <istio-private-ip>:443
  • Use the gateway’s public IP or a DNS entry pointing to this gateway as Ingress Host in the Kotsadmin
  • Gateway configuration requires TLS certs details to connect to the backend server, if you want to update your own certs for AIFabric, you can upload them from kotsadmin -


  • Also if you gateway requires health check, you can use /ai-deployer/actuator/health URL.

Detailed explanation on setting up Azure Gateway

  1. Create an Azure Application gateway instance linked to the same vnet as of AKS or some vnet which is peered with the AKS vnet
  2. Create a backend pools entry with the Istio loadbalancer private IP
  3. Create https settings entry with port 443, If you are going to use self signed certs, follow the documentation https://docs.microsoft.com/en-us/azure/application-gateway/self-signed-certificates to create self signed CA and server certificates (while generating server cert, under common name field enter the Istio LB), also under host name setting, select “Override with specific domain name“ and put the ip of Istio LB


  4. Upload the CA certificate in the Trusted Root Certificate field.
  5. Upload the server certificates created above in the kotsadmin, use the public IP of the gateway and trigger a deployment from kotsadmin.
  6. Generate a pfx file from the server cert and key created above, like -openssl pkcs12 -export -out contoso.pfx -inkey fabrikam.key -in fabrikam.crt.
  7. Create a listener entry in the gateway with 443 port and upload this pfx file there.
  8. Create a Rule in gateway, selecting the above listener, http setting and backend target.
  9. Create a health probe entry with https, put the host as IP address of the Istio loadbalancer. Put this /ai-deployer/actuator/health as path and select the above http setting created in step 3.


Infrastructure provisioning

RequirementNote
Azure Resource Group with Owner Access 
Ensure you have sufficient vCPUs available in the region where you have created your Resource Group Minimum number of vCPUs available should be 16. Run this command to check what is available to you in the region that hosts your Resource Group: az vm list-skus --location westeurope --output table | grep virtualMachines.
Ensure you have the appropriate machine types available in the region where you have created your Resource Group By default, these machines are Standard_D8s_v3 & Standard_NC6. Run this command to check what is available to you in the region that hosts your Resource Group: az vm list-skus --location westeurope --output table | grep virtualMachines.
Ensure you add Mandatory TagsPlease confirm you know if there are mandatory tags that are required for tagging the resources at the time of provisioning under your company's Azure subscription.
Download Infrastructure Provisioning Script

Download aks-arm.zip folder from UiPath ai-customer-scripts repo i.e from this location, https://github.com/UiPath/ai-customer-scripts/blob/master/platform/aks/aks-arm.zip



Requirements before proceeding

RequirementNote
Working Orchestrator 20.10 InstallationReference here
SQL Server 2008 R2 or aboveReference here
SQL Server Authentication must be enabled Contact your SQL Server admin.
SQL Credentials that allow for database and role creation. Contact your SQL Server admin.
Have a compatible browserReference
Boot Disk needs to be at least 200GBReference here
On worker nodes, secondary Disk(s) of at least 500GB in aggregate need to be unformattedReference here
Connect to Orchestrator from AI Fabric Master Node. Must be able to connect via domain name. telnet <Orchestrator-Domain-Name> <port> from the AI Fabric UCP node must work.
Connect to Database from AI Fabric Master node.telnet <SQL-Server-IP> <port> from the AI Fabric UCP node must work.
Connect from Robot/Studio machines to AI Fabric Master node. telnet <UCP-Host-IP> 33443 and telnet <UCP-Host-IP> 33390 from robot/studio machines must work. That is, robot/studio machine will be clients to the server on port 33443 and 33390.
Connect to Endpoints needed by installer from AI Fabric machine. AIFabric machine must not have blocked outbound access to the endpoints
Domain Certificate (for AI Fabric machine) from a trusted CA authority. Reference here
AI Fabric license file.Reference here

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.