Packaging Ansible Collections in PyPI

I co-maintain EnOSlib, a Python library that is a downstream user of Ansible.

We want to stop depending on the big ansible package because of long installation time and high disk usage, but we need a few collections that are not in ansible-core. However, we cannot force our users to use ansible-galaxy. Also, we found out that ansible-galaxy cannot install collections in a virtualenv, which is annoying in our case: we don’t want to mess with ~/.ansible/collections/ on our users’ systems.

As a result, we have experimented with PyPI packaging of Ansible collections, and it’s working well so far. Here is the build script with a more elaborate rationale, and we published a few collections on TestPyPI.

Since we expect possibly heated debates on the subject, we want to open up the discussion:

  • is this a terrible idea? :slight_smile:
  • is there interest to generalize PyPI packaging for all collections?
  • if the community disagrees with the idea, is it fine to still publish the few collections we need for our own usage? (making it clear that it’s unofficial etc)
  • is there a simpler solution that we may have missed?

Note that we have not uploaded any package to the main PyPI repository, we believe it should be discussed in the community first.

2 Likes

I know that people already tried that; in fact the early antsibull releases contained code that allowed to build ansible as a meta-package that depends on ansible-core + packages for all included collections, you can find that code here: antsibull/antsibull/build_ansible_commands.py at f6bb31d77bf7e67ace12485f5084626b1c0e3403 · ansible-community/antsibull · GitHub (whether it still works at that point is unclear to me, it probably hasn’t been used since very early versions of antsibull when it was unclear in which direction the ansible package would go).

I think I’ve even seen someone package their collection on PyPI, but I forgot which one (and whether someone just talked about it or actually did it; I’m not 100% sure anymore).

So the general idea is OK. (For implementing it, I would use the built version of the collection, either downloaded from Galaxy, or built locally with ansible-galaxy collection build. That will also simplify your life, since you don’t have to implement support for build_ignore.)

For your other questions:

  • It would probably be best if there’s one official tool for packaging collections on PyPI, to make sure that the same conventions are used everywhere. Whether it’s part of antsibull, it’s a new antsibull-xyz tool, or something else, that’s up for discussion.

  • I would avoid publishing collections to PyPI without assent from the collection maintainers, and I appreciate that you’re starting this discussion first before just doing it :slight_smile:

  • For a simpler solution, you can always vendor a ansible_collections subdirectory with the collections in it as data in your distribution, and run ansible/ansible-playbook with ANSIBLE_COLLECTIONS_PATH limited to that directory. The main advantage of this approach is that you can be sure which versions of the collections are actually used. Generally, ansible’s collection loader considers the collections installed as Python packages least - if you don’t tell it to ignore everything else by forcing ANSIBLE_COLLECTIONS_PATH to a specific value. You likely don’t want that ansible prefers something a user of your program installed over the version of the collection you deliver yourself (if just for compatibility reasons: the version explicitly installed by your user might not be compatible with your roles/playbooks). Also this does not install the collections in the place that ansible-core finds them in by default, so you don’t disturb what users install elsewhere.

Then a thought about dependencies (you mentioned that in discovery / ansible_collections_packaging · GitLab): I would suggest to install no Python dependencies of the collection by default, but allow the user to install them as extras. Then users of the package can decide themselves whether they want to install them or not, depending on whether they actually want to run the content on localhost or on a remote. In general it’s probably best to have at least two extras, one to install everything, and one to only install requirements needed for things that always run on the controller (i.e. everything that’s not a module).

2 Likes

Thanks for the feedback and for the pointers!

We can explore the vendoring option for collections that we directly require. But I think our users would still need to be able to install additional collections they use in their own code. In any case, overriding ANSIBLE_COLLECTIONS_PATH is a very neat idea to properly isolate collections.

Since you mention Python dependencies, I realize there’s an ambiguity I didn’t see before: is there a way to tell which dependencies are needed on the controller host running Ansible, and which dependencies are needed on the target hosts?
As far as I understand, the “EE requirements” such as community.docker/meta/ee-requirements.txt at main · ansible-collections/community.docker · GitHub only apply to the controller host.