Unable to deploy the AWX tower on Kubernetes(EKS)

Hello,

We have been using AWX1.0.0 on EKS with external Postgress DB(AWS RDS postgress), which is up and running now on our clusters. I was trying to deploy the same thing on different clusters, which is not getting up and was not giving any error logs. The pods were able to communicate with our external Postgresdb for some time and kept getting disconnected after some time. The deployment was able to create internal loadbalancer but was not working as expected. I even tried the latest version AWX 2.2.1 throws me the same thing. Here are the logs of awx-task container logs below. Please let me know what has changed recently, or am I missing anything?

Hi,

What actual issues are you seeing with your awx deployment? The problem statement seems to indicate database connectivity but then starts talking about load balancing. Can you give a bit more information about the problem?

Thanks,

AWX Team

Sorry for the confusion, I’m unable to get the deployment working in EKS cluster. We have been using AWX 1.0.0 in all our environments and are trying to expand it to multiple environments. when I deploy this into the new EKS cluster, I see all the pods running, but awx-task container is getting killed and restarted after ~10minutes. The AWX deployment is creating the service, but UI was unable to load. I have even tried it with the latest version of AWX.

Does the task container restart even if the system is idle for 10 minutes? or does this only happen while running jobs during that window of time?

I see these logs

2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)

2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)

do you see those errors repeatedly in your task container?

Also are is web pod staying up and healthy during that time?

can you share the output of
kubectl get pods -n $namespace

AWX Team

I even tried the latest version, which was released yesterday. This time the pods stayed running and didn’t restart. All the pod logs look fine too, but I could not access the UI. The webpage not loading at all.

we probably need more details in order to help further. Do you still see a crash loop of services restarting? for example are these showing

2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)

2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)

Have you explored whether this could be a networking issue? What error are you getting in the browser when you attempt to load the UI page? Also can you exec into the pod and access the web service that way?

AWX Team

The pods status and service are stable after installing the latest version. It shouldn’t be a network issue, as there is another awx instance running in other EKS clusters with the same configuration. So, in the browser, I see nothing, it says the page is not working. And I did try the port-forward, but it doesn’t work either.

Here are the logs of pods:

kubectl get pods -n awx:
NAME READY STATUS RESTARTS AGE
awx-dev-task-55c485db77-rnvw9 4/4 Running 0 6m35s
awx-dev-web-859b9f59cc-g7vpb 3/3 Running 0 4m55s
awx-operator-controller-manager-bf9cd85f4-p9zn7 2/2 Running 0 19h

kubectl logs -n awx awx-dev-task-55c485db77-rnvw9:
Defaulted container “redis” out of: redis, awx-dev-task, awx-dev-ee, awx-dev-rsyslog, init (init)
1:C 19 Jul 2023 14:36:44.435 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 19 Jul 2023 14:36:44.435 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 19 Jul 2023 14:36:44.435 # Configuration loaded
1:M 19 Jul 2023 14:36:44.435 * monotonic clock: POSIX clock_gettime
1:M 19 Jul 2023 14:36:44.435 * Running mode=standalone, port=0.
1:M 19 Jul 2023 14:36:44.435 # Server initialized
1:M 19 Jul 2023 14:36:44.436 * The server is now ready to accept connections at /var/run/redis/redis.sock

kubectl logs -n awx awx-dev-web-859b9f59cc-g7vpb:
Defaulted container “redis” out of: redis, awx-dev-web, awx-dev-rsyslog
1:C 19 Jul 2023 14:37:00.864 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 19 Jul 2023 14:37:00.864 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 19 Jul 2023 14:37:00.864 # Configuration loaded
1:M 19 Jul 2023 14:37:00.864 * monotonic clock: POSIX clock_gettime
1:M 19 Jul 2023 14:37:00.865 * Running mode=standalone, port=0.
1:M 19 Jul 2023 14:37:00.865 # Server initialized
1:M 19 Jul 2023 14:37:00.865 * The server is now ready to accept connections at /var/run/redis/redis.sock

Hi Ravi,

Your pods have not come up, the logs that you are showing are of the default container in those pods which is redis in both cases. You should execute the commands “kubectl describe pod awx-dev-task-55c485db77-rnvw9 -n awx” and “kubectl describe pod awx-dev-web-859b9f59cc-g7vpb -n awx”. There’ll probably be an event that indicates what is going wrong,

Good luck!

Rod

Hello,
Thank you for this additional information. It does appear that your containers are up. We would recommend following the steps to get the web and task logs as rodooliver mentioned in the previous reply. For best results, we would also recommend using the latest version of AWX Operator.

Are you trying to connect to the same Postgress instance as the other AWX instance?

Can you also provide us with the redacted spec as well? This will better enable us to identify what is occurring here.

-AWX Team