[AWX 20+] Cannot extend base image anymore, as it seems network is unreachable since the migration to CentOS 9

Hi,

I recently opened a bug (which is not really a bug, more a question) and I was advised to post it here as it seems more relevant.

Here is the summary :

I’ve encountered a weird behavior while migrating from AWX 17 to 21.11.0.

We are not directly using the AWX base image in the Operator, but we are first extending it with some packages (to suit my client needs).
In 17, we used to define our own YUM repo and installed those packages. Everything was smooth and we could generate our own image without any problem.

We decided to migrate to the latest version (21.11.0 at the time of writing). And then our build pipeline failed. At first, we thought it was because the new internal mirrors towards CentOS9 repo were not up, but that’s not it : no external URL is reachable.

I tried to diagnose the problem, by logging into the image and and performing some network investigations but as the image does not have any network tool (ping, ip, host, dig, tracepath …) it’s very hard to tell what’s wrong. I checked the resolv.conf, the hosts file, the selinux config, access.conf, etc and nothing obvious came out of it.

I checked all the versions and the problem seems to appear in version 20 (with the switch to CentOS9).
Again, I may miss something obvious but I carefully read the docs (AWX + CentOS), browsed the current issues and couldn’t find the slightest clue.

For now, as I’m in an early stage, I just dropped the installation of additional packages, but as they were security related, I won’t be able to go in production without them.

All is detailed here :
https://github.com/ansible/awx/issues/13543

Thanks in advance for any piece of information, advice or experience on that.

Hi,

getaddrinfo() thread failed to start

What version of Docker are you using?

Some times old Docker causes similar issue since new glibc installed in CentOS 9 can’t be worked on old Docker.
I’d recommend you to try it again with newer Docker.

If your issue still exists with the latest Docker, you should start your investigation with plain CentOS Stream 9 image instead of AWX.

Hi,

Nice suggestion, I will try that right away and give you an updated status.

Thanks !

Alas, I have reached the same conclusion.

I was initially working with a RHEL 7 machine with Docker 18.03.
My latest test was on a RHEL 8, with Podman 2.0.5.

Ok, I will investigate directly with a raw CentOS 9 image and see what I can do with that.

Thanks for the reply anyway.

Hi,

Both Docker 18.03 and Podman 2.0.5 are too old :frowning:
I don’t think such old Docker or Podman can handle the security hardened wrapper for syscalls implemented in glibc 2.34.

Hey,

Thanks for your input. I will try to see what are my options here as my client has a determined path regarding the upgrade of packages.
But you are right, it has nothing to do with AWX, it’s more a matter of CentOS 9 and the container runtime.

I guess we can passivate the thread for now, but I will post the information on my future tests here and in the “bug” I opened.

Thanks again for your valuable input @kurokobo.

Hi again,

Ok, I found an alternate repo to get a more recent podman package (4.2.0) but unfortunately, the result remains unchanged :

[root@max-rhel8 ~]# podman --version
podman version 4.2.0
[root@max-rhel8 ~]# podman run --rm -it --entrypoint=bash d7456a00e6af
bash-5.1$ curl -kv https://artifactory.internal.com/artifactory

But you’re right, I’ll dig deeper with a base Centos 9 image.

Anyway, thanks again for your suggestions.

Hi,

the result remains unchanged

I see your error has been changed.

  • On old Docker / Podman:

curl: (6) getaddrinfo() thread failed to start

  • On newer Podman

curl: (6) Could not resolve host: artifactory.internal.com

So I think your initial issue has been solved on newer Podman but there is a different issue now.
I guess it’s DNS related issue. Try double-checking DNS settings inside the container or around Podman.