Ansible Galaxy (https://galaxy.ansible.com/) taking very long to respond

Hi there. This morning it looks like retrieving collections from https://galaxy.ansible.com/ is taking a very long time. It does not timeout, but it seems like something is going awry in the backend. Going to the UI at Ansible Galaxy takes longer than normal for UI elements to populate, but the real problem is doing ansible-galaxy collection install

See for example:

$ time ansible-galaxy  collection install --force community.windows
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/community-windows-3.0.0.tar.gz to /home/someone/.ansible/tmp/ansible-local-2631j9c8x4yq/tmpjs6s_wro/community-windows-3.0.0-v9_bzouf
Installing 'community.windows:3.0.0' to '/home/someone/.ansible/collections/ansible_collections/community/windows'
community.windows:3.0.0 was installed successfully
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/ansible-windows-3.1.0.tar.gz to /home/someone/.ansible/tmp/ansible-local-2631j9c8x4yq/tmpjs6s_wro/ansible-windows-3.1.0-hpcn658p
Installing 'ansible.windows:3.1.0' to '/home/someone/.ansible/collections/ansible_collections/ansible/windows'
ansible.windows:3.1.0 was installed successfully

real    6m3.967s
user    0m1.187s
sys     0m0.267s

This normally takes less than 10 seconds. It seems to be taking over 6 minutes currently. I’ve tested this on my desktop as well as via VPN on a different network and the results seem to be the same, so I don’t believe it’s a network problem on my side.

2 Likes

Login is also down for me Ansible Galaxy

Since today our CI runs into timeouts (10m) when installing Ansible-Collections. :frowning:
When opening https://galaxy.ansible.com/api/v3/collections/community/general/ it also takes VEERRRYY long.
The webpage https://galaxy.ansible.com loads for 30s when opened (is only a white page with navbar)

In the meantime things seem to have degraded and I mostly see 500 errors like:

ERROR! Error when getting collection version metadata for community.crypto:2.26.3 from default (https://galaxy.ansible.com/api/) (HTTP Code: 500, Message: Internal Server Error Code: Unknown)

I’m also running into this today. I’m using a requirement file to pin a number of collections:

collections:
  - name: ansible.netcommon
    source: https://galaxy.ansible.com
    version: 5.3.0
  - name: ansible.posix
    source: https://galaxy.ansible.com
    version: 1.5.4
  - name: ansible.utils
    source: https://galaxy.ansible.com
    version: 2.12.0
  - name: community.crypto
    source: https://galaxy.ansible.com
    version: 2.18.0
  - name: community.general
    source: https://galaxy.ansible.com
    version: 8.5.0
  - name: community.postgresql
    source: https://galaxy.ansible.com
    version: 3.4.0

When I use that (ansible-galaxy collection install -r), then this will always fail at some point, but even thought the process fails late, after the failure there are no collections installed at all :expressionless_face:

Reading requirement file at '/Users/visser/galaxy_requirements.yml'
Starting galaxy collection install process
Process install dependency map
Initial connection to galaxy_server: https://galaxy.ansible.com
Found API version 'v3, pulp-v3, v1' with Galaxy server default (https://galaxy.ansible.com/api/)
Opened /Users/visser/.ansible/galaxy_token
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/netcommon/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/netcommon/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/index/ansible/netcommon/versions/?limit=100&offset=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/posix/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/posix/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/utils/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/utils/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/crypto/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/crypto/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/general/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/general/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/index/community/general/versions/?limit=100&offset=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/postgresql/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/community/postgresql/versions/?limit=100
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/netcommon/versions/5.3.0/
Calling Galaxy at https://galaxy.ansible.com/api/v3/collections/ansible/utils/
ERROR! Error when getting the collection info for ansible.utils from default (https://galaxy.ansible.com/api/) (HTTP Code: 503, Message: Service Unavailable Code: Unknown)

I found that manually installing one collection at a time works better in this situation. It will be just as slow, but at least once it succeeds, then that collection will actually be there:

visser@GA1 ~$ ansible-galaxy collection install ansible.netcommon==5.3.0
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/ansible-netcommon-5.3.0.tar.gz to /Users/visser/.ansible/tmp/ansible-local-12471exilhp0q/tmpuek15fpj/ansible-netcommon-5.3.0-n1cu13sn
Installing 'ansible.netcommon:5.3.0' to '/Users/visser/venv/lib/python3.13/site-packages/ansible_collections/ansible/netcommon'
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/ansible-utils-6.0.0.tar.gz to /Users/visser/.ansible/tmp/ansible-local-12471exilhp0q/tmpuek15fpj/ansible-utils-6.0.0-xsbegy7r
ansible.netcommon:5.3.0 was installed successfully
Installing 'ansible.utils:6.0.0' to '/Users/visser/venv/lib/python3.13/site-packages/ansible_collections/ansible/utils'
ansible.utils:6.0.0 was installed successfully
(venv) visser@GA1 ~$ ansible-galaxy collection list

# /Users/visser/venv/lib/python3.13/site-packages/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 5.3.0
ansible.utils     6.0.0

This issue seems to have knocked a few of our AWX installs offline (504 gateway).

At first it was long too, now it errors

[WARNING]: - ansistrano.deploy was NOT installed successfully: None (HTTP Code:
500, Message: Internal Server Error)

BTW: As a workaround for the collection-install timeouts you can switch to the GitHub repos like this in your requirements.yml.
Example:

---

collections:
  - 'git+https://github.com/ansible-collections/community.general.git,10.7.0'
  - 'git+https://github.com/ansible-collections/ansible.posix.git,2.0.0'

1 Like

For those who may be unaware, it is possible to host your own on-prem instance of galaxy, via the Ansible Galaxy operator, seen here: GitHub - ansible/galaxy-operator: Galaxy-Operator it works well. I have an Ansible playbook that takes a set of collections, downloads them, then uploads them to the private mirrors (you can do some native stuff with mirroring content but it uses pulp workers which are problematic for my implementation).

Once you have a mirror established, if you’re using AWX, you can go to Settings > Job Settings, then at the bottom under “Environment variables for galaxy commands” you’d add some additional environment variables like this:

{
  "ANSIBLE_GALAXY_IGNORE": "true",
  "ANSIBLE_GALAXY_SERVER_MYMIRROR1_IGNORE_CERTS": "true",
  "ANSIBLE_GALAXY_SERVER_MYMIRROR1_URL": "http://mygalaxy1.company.com/api/galaxy/content/mirrorname/",
  "ANSIBLE_GALAXY_SERVER_MYMIRROR2_IGNORE_CERTS": "true",
  "ANSIBLE_GALAXY_SERVER_MYMIRROR2_URL": "http://mygalaxy2.company.com/api/galaxy/content/mirrorname/",
  "ANSIBLE_GALAXY_SERVER_LIST": "mymirror1,mymirror2,main",
  "ANSIBLE_GALAXY_SERVER_MAIN_URL": https://galaxy.ansible.com
}

This will ensure AWX projects will look at those mirrors in the order of
ANSIBLE_GALAXY_SERVER_LIST. Be warned, it will also check the “main” one as I have here, so you’ll still be subject to it during events like this. You can exclude it and it should work, I think my plan is to no longer have it be in the list, but it’ll take some time.

FYI as well as I encountered this recently myself
 AWX keeps a galaxy cache in the awx-ee container in ~/.ansible/galaxy_cache/api.json and it keeps a record of the sha256 sums of collections. This can sometimes raise issues if an older version is cached there (ERROR! Mismatch artifact hash with downloaded file), so you can either restart the awx-ee container or clear that file.(learned this via Ansible Galaxy NG を Docker や Kubernetes ă§æ°—è»œă«è©Šă™æ–čæł•ă„ă‚ă„ă‚ | kurokobo.com)

Note that there is some inconsistency in the way the git tags are set up, which means some collections use the v prefix (at least ansible.netcommon and ansible.utils) while others use the bare version. In my example, this works:

collections:
  - git+https://github.com/ansible-collections/ansible.netcommon.git,v5.3.0
  - git+https://github.com/ansible-collections/ansible.posix.git,1.5.4
  - git+https://github.com/ansible-collections/ansible.utils.git,v2.12.0
  - git+https://github.com/ansible-collections/community.crypto.git,2.18.0
  - git+https://github.com/ansible-collections/community.general.git,8.5.0
  - git+https://github.com/ansible-collections/community.postgresql.git,3.4.0
1 Like

Just a heads up, and I get that users are attempting workarounds, but keep in mind that git repos for collection deps should only be used as a temporary workaround. They are enefficient to use, and lack much of the capabilities that an actual galaxy API uses.

So once this is resolved, I strongly recommend switching back, or at a minimum ensuring that they are restricted to CI and do not make it into your production workflows.

4 Likes

Also, and I cannot speak authoritatively about this as it’s not managed by my team, but I know the galaxy team is aware of this issue, and is working on and may have implemented some work arounds to lessen the impact of what is going on.

So the short of it is that those who need to be aware are, and they are working toward figuring out the root cause of the issue, but it may be temporarily remediated by some recent temporary changes.

2 Likes

I can confirm that the galaxy team is aware and is looking into the cause and possible solutions. +1 to sivel above - we’ve implemented a possible temporary remediation as we continue investigations, so do let us know here if you’re still seeing the issue. Logs and error statements, as always, are much appreciated :slight_smile:

We will reply to this post with any updates as we go.

5 Likes

Thank you, @sivel and @ebock. For now, I have marked @ebock’s post as a solution to make it more visible to people happening on this thread.