Mirroring / proxy caching galaxy

This seems to come up from time to time when there are problems with galaxy (see here and here for examples). So it might be worth to open a topic on mirroring / proxy caching galaxy.

I don’t have a solution because we’re quite happy using the Ansible Community Package. But if people prefer to not use ACP but install the collections directly, or have to because a collection they use isn’t part of it, this won’t help.

So I thought it might be worth having a dedicated discussion on this. Personally, I think having your own solution to mirror or cache galaxy has some advantages:

  • makes your environment more resilient because it can keep working even if galaxy is down
  • can improve security because not all system need internet access, only the mirror / proxy cache
  • reduces load on galaxy

Maybe we can collect possible solutions here. If nothing else, we have a topic now that we can link the next time galaxy is down :laughing:

edit: BTW I don’t think that installing from GitHub is good idea.

7 Likes

I didn’t have a closer look yet, but stumbled on this recently: Manage Ansible Collections with JFrog Artifactory.

If you’re using JFrog Artifactory anyway for other stuff, this might be a possible solution.

1 Like

I’ve written a tool for basically for this purpose:

You just have to use ansible-galaxy collection download into the artifacts dir of your choice, and away you go.

Also, for CI purposes, people should do something similar to the following which caches between runs on GitHub actions:

Also, fwiw, installing a collection every time before an ansible run has always been something I’ve said is a terrible idea. Users should either create something like an EE (execution environment) which effectively vendors/bundles them, or users should literally install them into a collections/ dir adjacent to your playbook content and check them into your own repo, or wherever you store your content.

9 Likes

It would be neat if you could give ansible-galaxy a list of servers to try, rather like $PATH acts for a shell looking for commands. “Try my amanda instance first, my fallback amanda instance second, and galaxy.ansible.com as a last resort” sort of thing. Could be a CLI option, in a requirements.yml, a [galaxy] section of an ansible.cfg, or my favorite: environment variable. Hmm.

1 Like

This is already possible by ansible.cfg or env vars currently. CLI flags are only for explicitly providing a single server.

[galaxy]
server_list=community,amanda

[galaxy_server.community]
url=https://galaxy.ansible.com/api/

[galaxy_server.amanda]
url=http://galaxy.internal.corp:5000/api/

That will automatically try the community server, and then fall back to amanda. This already exists due to the ability to install collections from multiple servers where they don’t exist on all of them. If a failure happens on the first, the next is tried. My example is reversed from yours, but same idea, it’s the server_list that identifies order.

Env vars would look like:

ANSIBLE_GALAXY_SERVER_LIST=community,amanda
ANSIBLE_GALAXY_SERVER_COMMUNITY_URL=https://galaxy.ansible.com/api/
ANSIBLE_GALAXY_SERVER_AMANDA_URL=http://galaxy.internal.corp:5000/api/
6 Likes

Well this is good to know. I wrote galactory because it seemed like this would never happen!

2 Likes

Sounds like we should consider this in the collection template. Thanks!

I’ve never got the hang on EEs. But I’ve never had a closer look at them, either. As I’ve said: We’re quite happy with using the ACP.

We’re using a (shared) Python venv where ACP is installed, and that works fine for us. We don’t re-create this venv for every run, but update it from time to time. Even if the “process” is more or less: Oh, there’s a new ACP release… let’s update the venv and see if everything still works :laughing:

Sounds similar to using an EE. At least, if the goal is to not install all collections and their dependencies every time you run a playbook.

Of course, this only works if you don’t need any collections that aren’t part of ACP.

I think that another widespread product to provide artifacts is Sonatype Nexus. But it looks like they don’t have a solution for galaxy / ansible collections. The only thing I’ve found is this. But it doesn’t look like this resulted in anything.

If anyone knows more, please let us know!

fwiw, I think the name EE (execution environment) slightly over complicates the situation to some extent.

In reality it’s just a docker container, with ansible-core and collections installed in it. So your venv solution is really pretty similar in the end.

There is some other stuff that comes along with EEs primarily in the way if configurations and such to work within the AAP platform. If not for the requirements to handle some of the AAP or even AWX needs, it would just be a simple docker image. ansible-builder abstracts away most if not all of the environment specific stuff, but ultimately, it’s just making a Containerfile and building an image, just with some added stuff.

So just like any container image, mount your playbook dir in, and run ansible-playbook from within the docker container.

Somewhat OT, but maybe you might also be interested in Ansible Community Status page & Notifications.

Oops just noticed the forum thread here but I also posted a comment in GitHub:

I’ve just learned (via Self Hosted Galaxy Server Deployment) about the galaxy-operator.

I didn’t have a closer look yet, but thought this might be worth mentioning here.

Found an old project which is very similar to Amanda reference earlier in this post: GitHub - jctanner/galaxy-mirror: caching mirror for galaxy.ansible.com api