I’m aware that the AWX can be made redundant within its hosting
cluster, in that the cluster can contain nodes in different
availability zones. However a Kubernetes cluster, at least looking at
how they can be deployed on a cloud service, can only be in a single
region. This means that when there is a regional outage which affects
all availability zones, AWX deployed in a cluster in that region is
down.
Is there a mechanism to have AWX controllers, deployed in diverse
clusters in diverse regions, to synchronise, in such a way one can
take over from the other? Or would this mean deploying each AWX
controller independently and deploying the DB using HA, or if on a
public cloud to use the DB SQL service?
As a follow up question, is it possible to set up 2 AWX controllers to
each use the same DB? Would one controller need to be kept in cold
standby?
I think you would need to have a highly available external postgresql cluster that is available to and in all regions first. Then you could deploy AWX to each region with identical postgresql configs and secret keys using that external postgresql cluster. Each region would have its own AWX Operator, and scale independently from each other, but you could have 2 control planes for each region and they would probably be aware of each other thanks to the shared database.
That being said, I’m pretty sure there’s way more to it than that, and probably wouldn’t work well if at all.