I don’t see any documentation yet with respect to how to dynamically add an instance to a k8s cluster, even though it’s supported in 1.0.4.
Adding a replica seems to break it, so that’s obviously not the right solution. Adding a host to the inventory file also seems incorrect since the host parameters (pod name or IP) are not known until it’s spun up. Not seeing anything in the PR, either.
With 1 replica it works perfectly, and I see the instance in instance groups.
$ kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-547f49f598-qcfmf 4/4 Running 0 7d
etcd-7547d8c67c-fbb72 1/1 Running 0 7d
After adding a 2nd replica in deployment.yml and re-running the install playbook, awx-web immediately starts throwing 500s on attempting to navigate to the UI.
$ kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-547f49f598-d9crs 4/4 Running 0 4m
awx-547f49f598-qcfmf 4/4 Running 0 7d
etcd-7547d8c67c-fbb72 1/1 Running 0 7d
kubectl logs does not show any output, so it’s a little difficult to tell why that’s happening, or what the status is of awx-celery. Any tricks for that, by the way?
This is with an external database over SSL, so the configmap is adjusted. I don’t expect this is relevant but let me know if that could impact this.
We would definitely need to see exceptions to know what’s going on for sure.
Having said that, we’ve been doing a lot of cleanup on the clustering side of things over the last week or so … so it might be worth it to check out the more recent releases.
Logs should go to the docker console when the 500 is generated… if not you might look in /var/log/tower in the container.
Any updates on this? I have installed AWX 2.1 in Openshift and when I add a second replica some jobs starts failing (all jobs in the second instance) throwing this:"Task was marked as running in Tower but was not present in the job queue, so it has been marked as failed. "
Are you running the Postgres database as a container in the pod? It needs to be external for clustering to work. Otherwise each pod has its own database and jobs would only exist in one of them.