We have been using AWX1.0.0 on EKS with external Postgress DB(AWS RDS postgress), which is up and running now on our clusters. I was trying to deploy the same thing on different clusters, which is not getting up and was not giving any error logs. The pods were able to communicate with our external Postgresdb for some time and kept getting disconnected after some time. The deployment was able to create internal loadbalancer but was not working as expected. I even tried the latest version AWX 2.2.1 throws me the same thing. Here are the logs of awx-task container logs below. Please let me know what has changed recently, or am I missing anything?
What actual issues are you seeing with your awx deployment? The problem statement seems to indicate database connectivity but then starts talking about load balancing. Can you give a bit more information about the problem?
Sorry for the confusion, I’m unable to get the deployment working in EKS cluster. We have been using AWX 1.0.0 in all our environments and are trying to expand it to multiple environments. when I deploy this into the new EKS cluster, I see all the pods running, but awx-task container is getting killed and restarted after ~10minutes. The AWX deployment is creating the service, but UI was unable to load. I have even tried it with the latest version of AWX.
I even tried the latest version, which was released yesterday. This time the pods stayed running and didn’t restart. All the pod logs look fine too, but I could not access the UI. The webpage not loading at all.
we probably need more details in order to help further. Do you still see a crash loop of services restarting? for example are these showing
2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)
2023-06-19 15:50:52,797 WARN exited: callback-receiver (terminated by SIGKILL; not expected)
Have you explored whether this could be a networking issue? What error are you getting in the browser when you attempt to load the UI page? Also can you exec into the pod and access the web service that way?
The pods status and service are stable after installing the latest version. It shouldn’t be a network issue, as there is another awx instance running in other EKS clusters with the same configuration. So, in the browser, I see nothing, it says the page is not working. And I did try the port-forward, but it doesn’t work either.
Here are the logs of pods:
kubectl get pods -n awx:
NAME READY STATUS RESTARTS AGE
awx-dev-task-55c485db77-rnvw9 4/4 Running 0 6m35s
awx-dev-web-859b9f59cc-g7vpb 3/3 Running 0 4m55s
awx-operator-controller-manager-bf9cd85f4-p9zn7 2/2 Running 0 19h
kubectl logs -n awx awx-dev-task-55c485db77-rnvw9:
Defaulted container “redis” out of: redis, awx-dev-task, awx-dev-ee, awx-dev-rsyslog, init (init)
1:C 19 Jul 2023 14:36:44.435 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 19 Jul 2023 14:36:44.435 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 19 Jul 2023 14:36:44.435 # Configuration loaded
1:M 19 Jul 2023 14:36:44.435 * monotonic clock: POSIX clock_gettime
1:M 19 Jul 2023 14:36:44.435 * Running mode=standalone, port=0.
1:M 19 Jul 2023 14:36:44.435 # Server initialized
1:M 19 Jul 2023 14:36:44.436 * The server is now ready to accept connections at /var/run/redis/redis.sock
kubectl logs -n awx awx-dev-web-859b9f59cc-g7vpb:
Defaulted container “redis” out of: redis, awx-dev-web, awx-dev-rsyslog
1:C 19 Jul 2023 14:37:00.864 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 19 Jul 2023 14:37:00.864 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 19 Jul 2023 14:37:00.864 # Configuration loaded
1:M 19 Jul 2023 14:37:00.864 * monotonic clock: POSIX clock_gettime
1:M 19 Jul 2023 14:37:00.865 * Running mode=standalone, port=0.
1:M 19 Jul 2023 14:37:00.865 # Server initialized
1:M 19 Jul 2023 14:37:00.865 * The server is now ready to accept connections at /var/run/redis/redis.sock
Your pods have not come up, the logs that you are showing are of the default container in those pods which is redis in both cases. You should execute the commands “kubectl describe pod awx-dev-task-55c485db77-rnvw9 -n awx” and “kubectl describe pod awx-dev-web-859b9f59cc-g7vpb -n awx”. There’ll probably be an event that indicates what is going wrong,
Hello,
Thank you for this additional information. It does appear that your containers are up. We would recommend following the steps to get the web and task logs as rodooliver mentioned in the previous reply. For best results, we would also recommend using the latest version of AWX Operator.
Are you trying to connect to the same Postgress instance as the other AWX instance?
Can you also provide us with the redacted spec as well? This will better enable us to identify what is occurring here.