基本架构注意事项

As with any multi-site deployment, the primary architecture considerations for Automation Suite account for infrastructure, latency, data source, management, Recovery Time Objective, Recovery Point Objective, etc.

基础架构

我们建议对两个集群使用相同的硬件。但是，Automation Suite 集群可能会在几乎没有区别的类似硬件配置下工作。异构硬件可能会增加复杂性并减慢故障排除速度。

延迟

在设计主动/主动模型时，延迟至关重要。它表示两个 Automation Suite 集群之间的往返时间 (RTT)。两个站点之间的延迟级别越短越好，因为它可以大大降低中断期间数据丢失的风险。 RTT 必须低于 10 毫秒的阈值。

在进入生产阶段之前，您应该严格测试 RTT，因为它直接影响性能指标。如果站点对之间的延迟超过 10 毫秒基准，我们建议考虑主动/被动设置，而不是主动/主动配置。

备注：

任何需要同步的组件的 RTT 均不得超过 10 毫秒。这包括 SQL Server、HAA、对象存储等。

管理

The two Automation Suite clusters are independent and do not share any configuration. Therefore, any management or maintenance activity must be done individually on these clusters. For instance, you must update the SQL connection strings on both clusters, configure certificates separately, etc. In addition, you must monitor the two clusters independently, upgrade them individually, etc.

数据源

对象存储与 SQL 数据库一起构成 Automation Suite 上已安装产品的状态。

SQL Server configuration plays a vital role in a multi-site deployment. Though SQL Server is a component external to Automation Suite, a few additional steps are required to ensure true HA when working with Automation Suite.

必须在“始终开启”可用性组或故障转移组中配置 SQL Server。必须将其分布在两个站点上，以确保在一个站点关闭时提供准确的高可用性。两个集群必须在连接字符串中使用相同的 SQL 侦听器端点。此外，当 SQL Server/数据库分布在多个子网中时，建议在连接字符串中设置 MultiSubnetFailover=True 属性。

有关更多详细信息，请参阅“始终开启”可用性组和“始终开启”可用性组的先决条件、限制和建议。

The external objectstore is immune to possible corruption due to node failure. Data replication and disaster recovery can be carried out independently of Automation Suite. Like SQL Server, the external objectstore must be configured in a highly available Disaster Recovery setup. The primary objectstore instance is physically located in the primary data center, and at least one secondary instance is located in the secondary data center with data sync enabled. You can configure a load balancer on the objectstore to ensure both Automation Suite clusters refer to the same endpoints. This makes the deployment independent of how the objectstore is configured internally.

重要提示：

对于 AWS S3，多区域访问点不支持 Automation Suite 中运行的所有产品所需的所有 s3 API。有关支持 API 列表的详细信息，请参阅将多区域接入点与受支持的 API 操作一起使用。

您可以在这两个区域中为每个产品/套件创建两个存储桶并启用同步。在同一区域中运行的 Automation Suite 集群将引用同一区域中的存储桶。

恢复时间目标

在设计多站点 Automation Suite 集群时，您组织的 RTO 策略至关重要。要实现所需的 RTO，请考虑以下方面：

流量管理器的设计；
辅助/被动集群中节点的可用性；
辅助集群上的动态工作负载可用性，例如，ML 技能；
配置管理。

流量管理器

要释放两个集群的全部潜力，适当配置流量管理器至关重要。理想情况下，设置应有助于将流量分配到两个集群。此策略不仅可确保均衡的负载分布，还可保障业务连续性，从而在任一站点完全关闭时减少任何潜在的中断。

节点可用性

如果发生灾难导致一个站点完全无法运行，则另一个站点必须有足够的容量，以确保业务自动化不受影响。正常站点的容量不足可能会对企业的运营产生负面影响，并可能导致重大的运营问题。

动态工作负载可用性

一些产品（例如 AI Center）会在运行时动态部署 ML 技能。另一个集群中的技能部署始终为异步。这不能保证它们的可用性。为确保您的自动化解决方案在所需时间内恢复在线，您可以定期同步另一个集群中的技能。

配置管理

Since multi-site Automation Suite deployments consist of two distinct clusters, any operation performed on any cluster must be performed on the other cluster in time to reduce the drift. This ensures that both clusters possess similar configurations and that no additional effort is required during recovery.

恢复点目标

在设计多站点 Automation Suite 集群时，您组织的恢复点目标 (RTO) 策略至关重要。要实现所需的 RPO，必须考虑以下方面：

数据同步；
计划的备份。

数据同步

When written to the primary data source, data must also be synced to the secondary cluster. However, there is a risk of data loss when the data center is down, and data is not synced. Exemplary network configurations, such as high bandwidth and low latency between the two data centers, can speed up synchronization.