Hi,
I’m having an issue where websockets stop working ~20-60 minutes after the application has been deployed. This impacts the task containers ability to post job stdout as well as removes the ability to view job details in the UI.
I saw that an issue was opened here: https://github.com/ansible/awx/issues/1861 and have added comments with my findings as I go.
I have enabled verbose logging on Daphne but I’m a little perplexed.
Daphne is showing that the websocket is opened:
2018-06-27 03:19:21,372 DEBUG Upgraded connection daphne.response.XbupPxYRcS!aPmLgJGDZd to WebSocket daphne.response.XbupPxYRcS!hTzJudfDoM
Then suddenly nginx reports that the client closed the connection
10.255.0.2 - - [27/Jun/2018:03:19:24 +0000] “GET /websocket/ HTTP/1.1” 499 0 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17682”
And then daphne reports that the websocket has closed
2018-06-27 03:19:25,571 DEBUG WebSocket closed for daphne.response.XbupPxYRcS!hTzJudfDoM
The browser itself reports:
WebSocket connection to ‘wss://…/websocket/’ failed: WebSocket is closed before the connection is established.
And the Task container reports (when running the job):
[2018-07-02 19:03:47,717: DEBUG/Worker-4] using channel_id: 2
2018-07-02 19:03:47,718 ERROR awx.main.models.unified_jobs job 15 (running) failed to emit channel msg about status change
Traceback (most recent call last):
File “/usr/lib/python2.7/site-packages/awx/main/models/unified_jobs.py”, line 1169, in _websocket_emit_status
emit_channel_notification(‘jobs-status_changed’, status_data)
File “/usr/lib/python2.7/site-packages/awx/main/consumers.py”, line 70, in emit_channel_notification
Group(group).send({“text”: json.dumps(payload, cls=DjangoJSONEncoder)})
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/channels/channel.py”, line 88, in send
self.channel_layer.send_group(self.name, content)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py”, line 190, in send_group
self.send(channel, message)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py”, line 95, in send
self.recover()
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/asgi_amqp/core.py”, line 77, in recover
self.tdata.consumer.revive(self.tdata.connection.channel())
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/connection.py”, line 255, in channel
chan = self.transport.create_channel(self.connection)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/kombu/transport/pyamqp.py”, line 92, in create_channel
return connection.channel()
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/connection.py”, line 282, in channel
return self.Channel(self, channel_id)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py”, line 101, in init
self._x_open()
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/channel.py”, line 427, in _x_open
self._send_method((20, 10), args)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/abstract_channel.py”, line 56, in _send_method
self.channel_id, method_sig, args, content,
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/method_framing.py”, line 221, in write_method
write_frame(1, channel, payload)
File “/var/lib/awx/venv/awx/lib/python2.7/site-packages/amqp/transport.py”, line 182, in write_frame
frame_type, channel, size, payload, 0xce,
File “/usr/lib64/python2.7/socket.py”, line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer
Can anyone help with what the next troubleshooting steps might be or with any wisdom on additional logging that could be enabled?