AWX deployment on IPv6 k8s cluster?

Hi folks,

does anyone have experience with AWX deployment on a kubernetes IPv6 (only) cluster?
Is AWX ready for that (as well as services like nginx, websocket etc.)?

The idea is to deploy AWX on an IPv6 cluster, so the internal communication between the services and also the ingress, must be IPv6-ready. Is this even possible?

Regards
CWollinger

Yes, you can deploy AWX in an IPv6 only cluster. If you run into technical difficulties, please feel free to post and ask for help. There’s been a handful of users in recent months that had issues with external postgresql databases where some network-related component was trying to use IPv4 despite only IPv6 being available, but they were able to resolve those issues.

1 Like

Hi, so the deployment was successful and AWX is running on the IPv6 cluster. There is only a problem with the websocket (for the UI refreshment). Our client (browsing the UI) is also on IPv6. Do you know if there are known issues? Or websocket has to be reconfigured? I didn’t find any AWX github issue.

Some examples from the logs (can provide the full logs):

ValueError: Port could not be cast to integer value as '111:xxx:11:8052'
ValueError: Invalid URL: port can't be converted to integer
aiohttp.client_exceptions.InvalidURL: http://xxxx:111:xxx:11:8052/websocket/relay/

There are known and recurring issues with websockets, yes. I have issues with them myself.

That being said, that looks like an IPv6 bug, since it’s trying to treat most of the IPv6 address as part of the port number.

@TheRealHaoLiu

@CWollinger What version of AWX have you deployed?

Edit: The full log might be helpful, especially if there’s a traceback.

AWX 23.1.0

2024-04-14 03:04:08,376 INFO spawned: 'wsrelay' with pid 202
2024-04-14 03:04:08,376 INFO spawned: 'wsrelay' with pid 202
2024-04-14 03:04:10,808 INFO     [-] awx.main.wsrelay Active instance with hostname awx-task-9bc7f86-lxbmg is registered.
2024-04-14 03:04:20,824 INFO     [-] awx.main.wsrelay Adding {'awx-web-58d7c867f5-wwls7'} to websocket broadcast list
2024-04-14 03:04:20,835 WARNING  [-] awx.main.wsrelay Connection from awx-task-9bc7f86-lxbmg to xxxx:111:xxx:11::33b failed for unknown reason: 'http://xxxx:111:xxx:11::33b:8052/websocket/relay/'.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/yarl/_url.py", line 166, in __new__
    port = val.port
  File "/usr/lib64/python3.9/urllib/parse.py", line 186, in port
    raise ValueError(message) from None
ValueError: Port could not be cast to integer value as '111:xxx:11::33b:8052'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/aiohttp/client.py", line 423, in _request
    url = self._build_url(str_or_url)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/aiohttp/client.py", line 357, in _build_url
    url = URL(str_or_url)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/yarl/_url.py", line 168, in __new__
    raise ValueError(
ValueError: Invalid URL: port can't be converted to integer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", line 80, in connect
    async with session.ws_connect(uri, ssl=self.verify_ssl, heartbeat=20) as websocket:
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/aiohttp/client.py", line 779, in _ws_connect
    resp = await self.request(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/aiohttp/client.py", line 425, in _request
    raise InvalidURL(str_or_url) from e
aiohttp.client_exceptions.InvalidURL: http://xxxx:111:xxx:11::33b:8052/websocket/relay/
2024-04-14 03:04:30,829 INFO     [-] awx.main.wsrelay Removing {'awx-web-58d7c867f5-wwls7'} from websocket broadcast list
/usr/lib64/python3.9/asyncio/events.py:80: RuntimeWarning: coroutine 'WebSocketRelayManager.cleanup_offline_host' was never awaited
  self._context.run(self._callback, *self._args)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/usr/bin/awx-manage", line 8, in <module>
    sys.exit(manage())
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/__init__.py", line 200, in manage
    execute_from_command_line(sys.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/management/commands/run_wsrelay.py", line 168, in handle
    asyncio.run(websocket_relay_manager.run())
  File "/usr/lib64/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", line 330, in run
    await asyncio.gather(self.cleanup_offline_host(h) for h in deleted_remote_hosts)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", line 330, in <genexpr>
    await asyncio.gather(self.cleanup_offline_host(h) for h in deleted_remote_hosts)
RuntimeError: Task got bad yield: <coroutine object WebSocketRelayManager.cleanup_offline_host at 0x7f3bf8df54c0>
2024-04-14 03:04:31,220 WARN exited: wsrelay (exit status 1; not expected)
2024-04-14 03:04:31,220 WARN exited: wsrelay (exit status 1; not expected)
2024-04-14 03:04:33,225 INFO spawned: 'wsrelay' with pid 209

There was a change in 23.4.0 that fixed the wsrelay for IPv6 in Openshift clusters, but it might also fix your particular issue. There’s also been multiple wsrelay fixes applied in subsequent releases, so it might not hurt to upgrade to something even newer.

1 Like

Thanks for the hint. I will check this out :slight_smile: