Hi there, I have completed the installation following the guide and am using the docker standalone setup. All works well. I would like to be able to add a second awx_task container so I can distribute the load across multiple containers. What I have done so far is to modify the docker-compose file that was generated to add a second awx_task container called awx_task2 with a hostname of awx2. everything seems to work where the task container starts up fine and registers itself. I can successfully see two instances registered both belonging to the tower group.
The problem seems to be that
all jobs always report as being run on the “awx” instance when viewing the “total jobs” in the web ui
heartbeats are not being sent properly, resulting in the status being unavailable in the web ui
Can someone clarify the easiest way to configure multiple awx_task containers? is this a supported configuration? what might I be doing wrong?
My understanding is that the only supported way of running multiple worker nodes is to run on Kubernetes/OpenShift. Once you’re on Kubernetes you’d simply change the scale on the stateful set (as of the last release it looks like they moved from deployments/replica sets to stateful sets) to whatever number of worker nodes you want/need. This will not only spin up new awx_task containers it will actually spin up another memcached container, another rabbitmq container. I have this setup running in my lab currently and pointing to a postgres server that’s hosted outside the kubernetes cluster and it seems to work alright, but I do have some issues where worker nodes will go offline if I have more than 3 in the replica set which is kind of an odd behavior, but this is my lab instance so I’m not terribly worried about it/investing a whole lot of time since I can toggle the nodes off/on in the instance group settings and it brings them back into the cluster and makes them available for running jobs.
does anyone know what the limitation is? In my current environment we don’t yet have K8S, it’s coming but we don’t have it yet. It would be nice if I could somehow get multiple awx_task containers working with standalone docker or even swarm.
Can anyone provide any insight on what would be necessary to make this happen? Does it truly require multiple membache and rabbitmq? Maybe there are just some minor changes needed to make this happen?
What Michael mentioned below is true, I deployed AWX on AWS EKS using RDS PostgreSQL. As Michael mentioned it is deployed as statefulset and not as deployment which most of stateless apps are using. When we scale the statefulset, it creates number of PODs we mention in scaling. Each POD will have memcached, rabbitmq, awx_tasks & awx_web. All these PODs run behind 3 services awx-rmq-mgmt, awx-web-svc. rabbitmq
We need to ensure that instance group is updated accordingly in AWX with those POD names. I found one of the issue when I scaled in, AWX was still looking for the PODs those were already deleted. So I had to delete the last POD i had and then it recreated it back and the issue was fixed. Things which I am still working on to figure out =
-How do I manage instance groups dynamically? if there is any way or awx-cli can help on this, please let me know.
-Adding persistent storage to containers to store project files or any other data.
-How do I validate if rabbitmq is working as a cluster? It is exposed as service so expecting it is might be working as single cluster across PODs but wasn’t able to validate it.