automation-suite
2021.10
false
- Overview
- Requirements
- Installation
- Post-installation
- Cluster administration
- Monitoring and alerting
- Migration and upgrade
- Product-specific configuration
- Best practices and maintenance
- Troubleshooting
- How to Troubleshoot Services During Installation
- How to Uninstall the Cluster
- How to clean up offline artifacts to improve disk space
- How to disable TLS 1.0 and 1.1
- How to enable Istio logging
- How to manually clean up logs
- How to clean up old logs stored in the sf-logs bucket
- How to debug failed Automation Suite installations
- How to disable TX checksum offloading
- Unable to run an offline installation on RHEL 8.4 OS
- Error in Downloading the Bundle
- Offline installation fails because of missing binary
- Certificate issue in offline installation
- SQL connection string validation error
- Failure After Certificate Update
- Automation Suite Requires Backlog_wait_time to Be Set 1
- Cannot Log in After Migration
- Setting a timeout interval for the management portals
- Update the underlying directory connections
- Kinit: Cannot Find KDC for Realm <AD Domain> While Getting Initial Credentials
- Kinit: Keytab Contains No Suitable Keys for *** While Getting Initial Credentials
- GSSAPI Operation Failed With Error: An Invalid Status Code Was Supplied (Client's Credentials Have Been Revoked).
- Login Failed for User <ADDOMAIN><aduser>. Reason: The Account Is Disabled.
- Alarm Received for Failed Kerberos-tgt-update Job
- SSPI Provider: Server Not Found in Kerberos Database
- Failure to get the sandbox image
- Pods not showing in ArgoCD UI
- Redis Probe Failure
- RKE2 Server Fails to Start
- Secret Not Found in UiPath Namespace
- ArgoCD goes into progressing state after first installation
- Unexpected Inconsistency; Run Fsck Manually
- Missing Self-heal-operator and Sf-k8-utils Repo
- Degraded MongoDB or Business Applications After Cluster Restore
- Unhealthy Services After Cluster Restore or Rollback
- Using the Automation Suite Diagnostics Tool
- Using the Automation Suite support bundle
- Exploring Logs
GCP Deployment Architecture
OUT OF SUPPORT
Automation Suite Installation Guide
Last updated Nov 21, 2024
GCP Deployment Architecture
Important: You can currently use the GCP deployment template only with Automation Suite 2023.10. We therefore recommend referring to
the Automation Suite 2023.10 documentation.
This page offers insight into the deployment architecture on GCP, the required components, and all known limitations.
- Virtual network
- A subnet where all nodes reside.
- A NAT Gateway for outbound connectivity (a Cloud Nat resource attached to a Cloud Router);
- Firewall Rules to secure subnet traffic.
- A DNS private zone needed for installation. For more details, check the Known limitations section.
- 3 Managed Instance Groups. You can choose the instance type for server, agent, and GPU agent nodes. Make sure to check Multi-node HA-ready production machine requirements and Single-node evaluation machine requirements to meet the hardware requirements. Each VM has a 128 GiB OS and 256 GiB cluster binaries and state disk. Server nodes have
an additional 512GiB/2048GiB data disk, depending on whether the AI products are installed.
- Server nodes (cluster control plane). Server nodes also run workloads.
- Agent nodes. Designed to only run workloads (they have no control plane services). If the number of desired agent nodes is 0, an empty Managed Instance Group is created.
- GPU nodes. Nodes used specifically for ML models that have video cards. If the number of desired GPU nodes is 0, no Managed Instance Group is created.
- Public load balancer used to balance HTTPS traffic from port
443
to nodes. - 2 Internal load balancers and a Managed Instance Group needed for forwarding node registration requests. The VMs have the smallest instance size possible.
- Task Mining node deployed as a separate VM. Its instance type is n2-standard-32.
- Bastion instance used to access the other nodes. It has a public IP and SSH enabled.
- SQL Database Instance:
- 8 cores and 32 GiB RAM
- 1000 GiB disk size that can be manually extended
- The databases are created by the installer
- Secret Manager used to store auto-generated credentials for SQL server, Automation Suite Platform, and ArgoCD console.
DNS
- Due to the fact that a DNS cannot be
automatically attached to the LB:
- The steps for configuring DNS can be completed only after the installation, while they are needed during installation. The private DNS zone solves this issue and can be safely deleted after the installation is completed. Alternatively, for testing purposes check the Step 4: Configuring the DNS.
- Core DNS upstream servers
must be forced to match the node’s nameservers. This could lead to the
corruption of the rke config file (
/etc/rancher/rke2/config.yaml
) at a VM restart, which can impact the post-installation process of upgrading.
RHEL
- Google may update the RHEL version without notice, thus leaving Automation Suite deployments outside of support. Currently, manual deployments using custom RHEL images are the only way to and stay in support.