multiple instance for AWX instalation (HA)

Oguz_Yarimtepe · January 3, 2018, 8:39am

How can i install a high available version of AWX? I want to test a fail over scenario where the current installation server is down, but i should be able to access to another instance.

Any tip?

matburt · January 8, 2018, 4:21pm

I have been working an openshift/kubernetes based scalable system under a branch called scalable_clusters on the awx github repo.

For traditional redundancy, I’m afraid that’s not a focus for this at the moment.

Bruno_Casano · January 26, 2018, 5:45pm

Is it possible to install two versions and point them to the same DB instance?

Bill_Nottingham1 · January 26, 2018, 5:49pm

Bruno Casano (brunito@gmail.com) said:

Is it possible to install two versions and point them to the same DB
instance?

Same database *server*? Yes. Same AWX database *on* the server? Absolutely not.

Bill

Bruno_Casano · January 26, 2018, 8:09pm

Thanks Bill!

Oguz_Yarimtepe · January 29, 2018, 6:29am

Is it stable and can be tested now?

Philipp_Wiesner · January 30, 2018, 9:54am

We are running AWX in a clustered HA environment. But for this, some manual adjustments in the installation roles had been done. Further you need to create a RabbitMQ Cluster. For this we disabled the RabbitMQ containers for the installation and set them up beforehand on all the nodes. After the RabbitMQ cluster was running, we changed the RabbitMQ connection details in the roles.

The following files were changed:

image_build/files/launch_awx_task.sh

awx-manage provision_instance --hostname=$CLUSTER_NODE

awx-manage register_queue --queuename=tower --hostnames=$CLUSTER_NODE

image_build/files/settings.py

CLUSTER_HOST_ID = os.getenv(``"CLUSTER_NODE"``, ``"awx"``)

image_build/files/supervisor_task.conf
command = /var/lib/awx/venv/awx/bin/celery worker -A awx -l ERROR --autoscale=``50``,``4 -Ofair -Q
tower_scheduler,tower_broadcast_all,tower,%(ENV_CLUSTER_NODE)s -n celery@%(ENV_CLUSTER_NODE)s

local_docker/tasks/main.yml
uncomment every rabbitmq container reference

- name: Activate AWX Web Container
...

env:` CLUSTER_NODE: ``“{{ cluster_node | default(‘localhost’) }}”`

```…`

RABBITMQ_USER: ``"awx"` RABBITMQ_PASSWORD: "<password>"` ```RABBITMQ_HOST: “{{ cluster_node | default(‘localhost’)}}” ```RABBITMQ_PORT: ``"5672"
```RABBITMQ_VHOST: ``“awx”`

The same was changed for the AWX Task container

in front of the nodes a HAProxy is running with roundrobin load balacing.

Dong_Yang · February 8, 2018, 1:17am

Hi Philipp- I’m currently looking into your solution by separating out RabbitMQ container from the installer . How you handle the postgresdb - you have it installed separately as well or within a container ?

Dong_Yang · February 9, 2018, 6:24pm

Phillipp - can you comment where you specify the CLUSTER_NODE name . I’m getting errors connecting to local host : amqp://awx:**@127.0.0.1:5672/awx

Jay_Kumar1 · February 13, 2018, 8:54am

My HA solution for AWX could be extreme, but here is what I am doing.

HAproxy loadbalancing RabbitMQ, Memcached & Postgresql
RabbitMQ Cluster 3 nodes
Memcached on 3 nodes, active/passive configured on HAproxy
Posgresql HA (Master/Slave) using Patroni, active/passive with automatic failover configured on HAProxy
AWX task/web docker instances.

I am using only a single instances on awx-task/awx-web containers, everything seems to be working fine.

Would want to test multiple loadbalanced awx-task/awx-web contrainers.

Philipp_Wiesner · February 13, 2018, 11:25am

dnc92301:

Postgres is running on a seperate node, we have set the Postgres DB connection details in the installation inventory for the application nodes. On these only awx-task, awx-web and memcached are running in a container.
The variable CLUSTER_NODE was added by ourself to the installation inventory and is set to the hostname of the machine where the playbook is executed on. We copy the modified AWX installation directory to a new node, set the correct CLUSTER_NODE variable and run the installation playbook, which will then setup a new cluster member.

Dong_Yang · February 25, 2018, 5:26am

Hi Phillipp

I still couldn’t get it to work . It would be great if we can work to resolve this separately . I had rabbitmq installed separately as well as Postgres . I’m getting a bunch of errors with regards to .

INFO success : awx-celeryd-beat entered RUNNING state , process has stayed up for > 1 than 1 seconds.
INFO exited : channels-worker ( exit status 1 ; not expected )

I think the problem has to do with celeryd -

ps -ef|grep celery - shows that - celery@localhost - where it should be mapped to celery@hostname (on a working Tower installation )

Thanks again

Dong_Yang · February 25, 2018, 11:29pm

Earlier issue was due to misconfiguration at the external database server which was fixed .

I’ve got 2 AWX instances running but getting connection refised . Consumer: cannot connect to amgp://awx:**@127.0.0.1:5672/awx [Errnk 111] Connection refused .

Philipp_Wiesner · March 6, 2018, 8:52am

Hi dnc92301,

sorry for my late response, somehow the notification is not working properly. The issue is probably, that your RABBITMQ_HOST environment variable inside the container is not set properly set. At the moment it tries to connect against your local container network. As rabbitMQ has been moved out of the container context you need to set the RABBITMQ_HOST environment variable to your host where RabbitMQ is running on. We have set it to the FQDN of the RabbitMQ Host.

In the local_docker role local_docker/tasks/main.yml you can set those environment variables like this:

env:` CLUSTER_NODE: ``“{{ cluster_node | default(‘localhost’) }}”`

```…`

RABBITMQ_USER: ``"awx"` RABBITMQ_PASSWORD: "<password>"` ```RABBITMQ_HOST: “{{ cluster_node | default(‘localhost’)}}” ```RABBITMQ_PORT: ``"5672"
```RABBITMQ_VHOST: ``“awx”`

You have to set them both for the awx_web and awx_task container image. If you have any further issues, let me know.

Dong_Yang · March 7, 2018, 2:19am

Thanks Phillipp -

I see 2 references of cluster_node within env both needs to be set to “cluster_node = hostname.fqdn” - And this need to be set within INVENTORY file ?

env:
      CLUSTER_NODE: "{{ cluster_node | default('localhost') }}"
      ...
      RABBITMQ_USER: "awx"
      RABBITMQ_PASSWORD: "<password>"
      RABBITMQ_HOST: "{{ cluster_node | default('localhost')}}"
      RABBITMQ_PORT: "5672"
      RABBITMQ_VHOST: "awx"

Philipp_Wiesner · March 7, 2018, 9:47am

Yes, we have set this in the inventory file.

Dong_Yang · March 8, 2018, 2:27am

Hi Phillipp - can you provide the specific tag where you had HA working? is using the latest 1.0.4.*? or previous release.

Thanks again.

Dong_Yang · March 8, 2018, 3:50am

Here’s the error msg I’ve been getting -

2018-03-08 03:17:34,613: ERROR/MainProcess] Unrecoverable error: AccessRefused(403, u"ACCESS_REFUSED - access to exchange ‘celeryev’ in vhost ‘awx’ refused for user ‘awx’", (40, 10), ‘Exchange.declare’)

Philipp_Wiesner · March 8, 2018, 9:14am

Hi dnc92301,

we are currently using the release 1.0.2. But I thing it should also work with later release on a fresh installment. The error you got looks like a connection issue against RabbitMQ. Have you set up the user AWX in your RabbitMQ cluster?

We have set up the RabbitMQ Cluster with the following commands:

[root@host rabbitmq]``# rabbitmqctl delete_user guest
[root@```host rabbitmq]# rabbitmqctl add_vhost awx` `[root@````host` `rabbitmq]# rabbitmqctl add_user awx [root@host` `rabbitmq]``# rabbitmqctl set_permissions -p awx awx ".*" ".*" ".*"` `[root@host rabbitmq]``# rabbitmqctl set_policy -p awx ha-all “.*” ‘{“ha-mode”:“all”,“ha-sync-mode”:“automatic”}’`

Dong_Yang · March 8, 2018, 6:15pm

Hi Phillipp ,

Yes it looks like it’s working somewhat now after setting the proper permission for the awx user . However , how do you have ensure cluster is configured correctly . Under instance group , do you see both nodes within the cluster . Right now I only see 1 node . I’ve tried kicking off a job and then reboot the server in the middle of the run , does the job fails over automatically to the other node within the cluster ?

Thanks

Topic		Replies	Views
AWX HA AWX Project awx	10	23	March 8, 2019
Scaling AWX in a docker-compose/rancher installation AWX Project awx , kubernetes	13	9	April 10, 2019
AWX cluster installation AWX Project awx , kubernetes	13	7	March 31, 2020
Cluster/HA AWX Project awx , kubernetes	4	1	November 6, 2019
Need help with AWX Install using local Docker and external postgres database AWX Project awx , kubernetes	13	39	March 21, 2019

multiple instance for AWX instalation (HA)

Related topics