awx not working on k8s version 1.23.6

awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’

SUMMARY
After Kubernetes Cluster Upgrade to v1.23.6 pods are now unable to communicate with each other with below error in awx-web pod logs

2022-05-10 15:37:04,088 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:04,090 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 2.
2022-05-10 15:37:12,102 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:12,104 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 3.
2022-05-10 15:37:16,111 INFO success: nginx entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: nginx entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: daphne entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: daphne entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:16,111 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:20,123 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:20,126 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 4.
2022-05-10 15:37:28,142 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:28,143 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 5.
2022-05-10 15:37:34,151 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:34,151 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-05-10 15:37:36,152 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:36,154 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 6.
2022-05-10 15:37:44,169 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:44,170 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 7.
2022-05-10 15:37:52,180 WARNING [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 failed: ‘Cannot connect to host 10.36.0.10:8052 ssl:False [Connect call failed (‘10.36.0.10’, 8052)]’.
2022-05-10 15:37:52,182 DEBUG [-] awx.main.wsbroadcast Connection from awx-767cb986bb-fv9tr to 10.36.0.10 attempt number 8.

ENVIRONMENT
AWX version: 0.21.0
Operator version: 0.21.0
Kubernetes version: 1.23.6
AWX install method: K8S via operator
STEPS TO REPRODUCE
remove deployment and redeploy on version k8s 1.23.6

EXPECTED RESULTS
pods communicate and awx comes online

ACTUAL RESULTS
nothing , service does not appear to come online

ADDITIONAL INFORMATION
seems related to SSL

AWX-OPERATOR LOGS
--------------------------- Ansible Task StdOut -------------------------------

TASK [Remove ownerReferences reference] ********************************
ok: [localhost] => (item=None) => {“censored”: “the output has been hidden due to the fact that ‘no_log: true’ was specified for this result”, “changed”: false}

{“level”:“info”,“ts”:1652197108.1342492,“logger”:“proxy”,“msg”:“Read object from cache”,“resource”:{“IsResourceRequest”:true,“Path”:“/api/v1/namespaces/awx/secrets/awx-broadcast-websocket”,“Verb”:“get”,“APIPrefix”:“api”,“APIGroup”:“”,“APIVersion”:“v1”,“Namespace”:“awx”,“Resource”:“secrets”,“Subresource”:“”,“Name”:“awx-broadcast-websocket”,“Parts”:[“secrets”,“awx-broadcast-websocket”]}}

--------------------------- Ansible Task StdOut -------------------------------

TASK [Remove ownerReferences reference] ********************************
ok: [localhost] => (item=None) => {“censored”: “the output has been hidden due to the fact that ‘no_log: true’ was specified for this result”, “changed”: false}

{“level”:“info”,“ts”:1652197108.564614,“logger”:“runner”,“msg”:“Ansible-runner exited successfully”,“job”:“471168312615460271”,“name”:“awx”,“namespace”:“awx”}

----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----

PLAY RECAP *********************************************************************
localhost : ok=63 changed=0 unreachable=0 failed=0 skipped=46 rescued=0 ignored=0

there is a known bug around this – those wsbroadcast warnings from the awx-web pod are benign and can be ignored. Is there any other issue with the app you are experiencing?

Seth

Hi Seth,

We are having the same issue and our awx pods keeps restarting after trying for connecting for few minutes.

Thanks,

Nari

Hi Nari,

I wouldn’t expect restarting from this issue, Do the k8s logs show helpful information as to why they are restarting?

AWX Team

Hi,

This is the only error that we are getting in the logs.

Thanks

I misspoke, the wsbroadcast is supposed to run in the awx-web container, so your original logs are something worth attending to.

@dgowran

where you able to fix the issue? what command did you run to scale up the replica set?

Thanks for any info you can provide

AWX Team

I had to redeploy the whole environment and setup to fix this issue, its also just re surfaced after 3 months , so will have to redeploy again it seems

2022-09-08 12:11:35,290 INFO success: nginx entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: nginx entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: daphne entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: daphne entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:35,290 INFO success: wsbroadcast entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:36,292 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:11:36,294 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 4.
2022-09-08 12:11:44,306 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:11:44,307 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 5.
2022-09-08 12:11:52,316 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:11:52,317 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 6.
2022-09-08 12:11:55,322 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:11:55,322 INFO success: awx-rsyslogd entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)
2022-09-08 12:12:00,328 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:12:00,329 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 7.
2022-09-08 12:12:08,340 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:12:08,341 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 8.
2022-09-08 12:12:16,356 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:12:16,360 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 9.
2022-09-08 12:12:24,372 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 failed: ‘Cannot connect to host 10.44.0.45:8052 ssl:False [Connect call failed (‘10.44.0.45’, 8052)]’.
2022-09-08 12:12:24,373 DEBUG [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 attempt number 10.
2022-09-08 12:12:29,354 WARNING [-] awx.main.wsbroadcast Removing {‘awx-86b895fbbd-lspr7’} from websocket broadcast list
2022-09-08 12:12:29,355 WARNING [-] awx.main.wsbroadcast Connection from awx-86b895fbbd-xld8r to 10.44.0.45 cancelled
2022-09-08 12:14:21,301 DEBUG [d2afa78ba41945ccbdf405db0231b4ba] awx.analytics.performance request: <WSGIRequest: OPTIONS ‘/api/v2/schedules/’>, response_time: 1.050s

interesting , that prior to this modules just stopped working and initiated a pod restart to clear, on restart this issue as surfaced and affects Gitsync and tokens with only this in logs above