Place for common (shared) code for collections

Right now all shared code (and docs fragments, which I’ll silently include here) for collections lives in ansible-core itself: https://github.com/ansible/ansible/tree/devel/lib/ansible/module_utils contains code that can be used by both modules and plugins, and all other parts of ansible-core (that aren’t marked as private) can be used by plugins.

The problem with adding new shared code to ansible-core is that it takes some time (potentially years) until all collections can use it, since they usually support multiple ansible-core versions (some even still ansible-base 2.10 and Ansible 2.9). Depending on the kind of shared code (for example new module utils coming with docs fragments), collections have to wait until they dropped support for the last ansible-core version that did not yet have this shared code, which can take many years.

This would be a lot easier if there exists a collection that contains such new code. Other collections can depend on it; bumping a minor version for such a collection (and getting end-users to upgrade to the new version) is a lot easier than getting end-users to upgrade their ansible-core version (which always comes with breaking changes as well when you get new features).

To avoid this being too theoretical, let’s consider an example: some plugin_utils and docs fragment which allows inventory plugins to explicitly include/exclude specific hosts by Jinja2 conditions. I’ve started writing code for that in community.docker (see the PR), but that’s something that could be used in many other inventory plugins in many other collections as well. Since depending a random collection on community.docker just for that piece of code sounds like a bad idea, the best place right now for adding such a feature would be ansible-core. But then this could land earliest in ansible-core 2.17 (2.16 already had its feature freeze), and community.docker supports multiple older ansible-core versions (2.11+ right now), so it would take many years until community.docker would be able to use such functionality from ansible-core itself. So right now the only “good” choice is adding that code to community.docker, and the only “good” choice for other collections who want to use that code is to copy (!) it so they don’t have to depend on community.docker.

There are some choices for such a generic collection that contains shared code. For example, there’s already ansible.utils. But ansible.utils already contains multiple end-user facing tools (plugins and modules). When quickly looking at this idea on IRC (#ansible-community) / Matrix (#community:ansible.com), @briantist and me agreed that such a shared-code collection should better not contain anything for end-users (i.e. no plugins, modules, and roles). The basic idea is to avoid complications due to needs of end-user things, like removals of features that turned out to be not so good - these require a major version bump, which quickly causes problems.

The major problem with having a common collection is versioning. If there are only ever new 1.x.y releases and the collection sticks to semantic versioning, all collections depending on it can depend on >=1.x.y,<2.0.0 (for some specific minimum 1.x.y depending on the collection). So for any set of collections that depends on the shared code collection, you can always pick a version of the shared code collection that makes all of them happy. But as soon as we decide to bump the major version, dependency hell can easily break loose. foo.bar needs 1.x.y of our common collection, baz.bam needs 2.x.y, and foo.bam is happy with both 1.x.y and 2.x.y. With such dependencies there will be no way to install all three of foo.bar, baz.bam, and foo.bam at the same time… So avoiding new major releases is the best strategy, combined with forcing users of this collection to be as nice as possible to all other users. For collections included in Ansible we already have some rules which cover this pretty well: Ansible community package collections requirements — Ansible Documentation (Basically they say that if you screw up and do not support the latest major release when a new major Ansible release comes up, your collection will be thrown out. This is currently mainly aimed at ansible.netcommon, a shared requirement of most network collections, that actually caused such problems in the past.)

One other topic is: what to add to this collections? If every random feature someone wants to have is added, we end up with a huge mess that grows too large and sooner or later causes problems (even if it “just” means that part of the collection are basically unmaintained). So IMO some strict rules for adding new content (or in other word: features) is needed.

I don’t want this post to end up as a large discussion of all possible things, so I’m going to stop here. Let’s discuss what everyone thinks of:

  1. having such a collection (or multiple such collections?),
  2. which rules should apply (for adding features, who should maintain it, …),
  3. … anything else you can come up with?
6 Likes
  1. I like the idea of this collection. It fills a need for very central functionality that takes a long time to be accepted in core (if ever) and then a long time before all core versions support it. It can act as a bridge: a first stop for those new shared features, or a destination for new things in core that aren’t backported.
  2. This is a tough one; everyone who doesn’t work for RedHat is a volunteer, but this collection clearly needs strong guidance. I’m interested in participating in maintenance but like so many of us, time is an ever-shrinking resource. It might be nice we got the backing of some core team members (both in and out of the steering committee) on the idea, especially when it comes to (loose) coordination, but that may not be possible. Like any collection, it will be up to the maintainers to decide what’s a good fit or not. If we wanted to make this really special, we could declare that it’s “owned” by the steering committee and decide that additions and changes go to a vote. I’m not overly concerned with slowing down the introduction of new things because of how carefully we would need to make any breaking changes or removals.
  3. One thing we might consider is doing more prereleases than would we would normally see in collections. Depending collections could do more extensive testing of new and changed features that way (either locally or in their own prereleases).
1 Like

Not to muddy the topic too much, but I’ve been tossing around the idea of a collection for experimental features for a long time, I know Felix and I have chatted about it a bit.

I wonder if it might be a good counterpart to this proposal though: one thing we’re worried about is that the proposed collection needs to be very stable.

Maybe instead of prereleases which can be a bit of a pain to use, new potential things could live in an explicit experimental collection first.

I had thought of that as more for end-user facing features for probably advanced users, and it probably fits better for that. Not sure how useful it would end up being for internal things a collection intends to depend on.

2 Likes

This is a good topic, and I’ve been using things like requests and other pieces from ansible.utils more and more. I think a good option for including in this is a central way to auth to the different products like awx/galaxy_ng console.redhat/galaxy.ansible for modules, but that may be specific to my use case,

its a tricky situation when moving things, I could see it being similar to the six import for translating?
from ansible.module_utils.six import PY2, PY3

1 Like

This is a good topic, and I’ve been using things like requests and other pieces from ansible.utils more and more.

Do you mean ansible.utils, or ansible-core’s module_utils? If you mean ansible.utils, which parts of it exactly do you mean? I couldn’t find anything like requests in the reusable code of ansible.utils (plugins/module_utils/ and plugins/plugin_utils/).

I think a good option for including in this is a central way to auth to the different products like awx/galaxy_ng console.redhat/galaxy.ansible for modules, but that may be specific to my use case,

This sounds like a very specific thing, since I guess 99.9% of all modules never need to talk to awx/galaxy_ng/… Here a more specialized collection is probably more useful if that collection can be used by multiple other collections (I guess that “multiple” will be at most a handful of collections in this case, as this seems to be very specialized).

its a tricky situation when moving things, I could see it being similar to the six import for translating?
from ansible.module_utils.six import PY2, PY3

What kind of moving things do you mean? For Python 2/3 compatibility there’s already six coming with ansible-core. Providing such redirection facilities in a universal utilities collection only makes sense for libraries which are used by a large amount of collections.

Here are some more thoughts about point 2. from my initial post.

Rules for adding new features

IMO new features should only be added if there is a consensus (among which folks, TBD, see below) that the features will be useful for a large set of collections. My initial example of providing filtering to inventory plugins for example would be such a case, since there are many collections (for very different use-cases) out there that provide inventory plugins (just scroll down Index of all Inventory Plugins — Ansible Documentation for example). Having a common functionality for filtering would allow many of these inventory plugins to provide a standarized interface for filtering.

This improves life for maintainers of these plugins / collections, since they don’t each have to implement their own filtering (they can of course offer other filters that use features of the APIs, like amazon.aws.aws_ec2’s exclude_filters) and for end-users, who once they figured out how this filtering works with one inventory plugin, can use that knowledge to configure filtering for another inventory plugin. This is similar to the constructed docs fragment / inventory plugin mixin provided by ansible-core (without the downside of having to require a very new ansible-core for your collection).

Technical guidance / roadmap / …

IMO what is needed is some guidance body, similar to (or even identical to?) the Ansible Community Engineering Steering Committee (or short Steering Committee, even shorter SC), that consists out of community members and Red Hat employees.

The process of approving new features could be similar to the collection inclusion process and/or the community/SC voting process, and similar the the ansible/proposals repository. That is, someone proposes some new functionality, potentially with a WIP PR, then there is a (long or short) discussion, until eventually it looks like there is some consensus. At that point a vote will be started (with both a community component and a steering committee (or however that group will be called) component); if the vote passes, the feature can be added. Feature PRs to be merged require at least - say - 2 approvals by SC members.

Such a complicated workflow will make it hard to get new features added, but at the same time ensures that these features will have a certain quality (or at least that’s my hope :laughing:) and backing by a larger number of folks.

Who should maintain it?

My guess is: the community + (the collection’s) steering committee.

What about certified collections?

Since this collection is a support for all collections, it should also be used by certified collections. That implies (as far as I understand this) that this collection also needs to be certified. I’m not sure what exactly is required for this, but having some Red Hat folks involved in the collection will definitely make this easier.

(This is also why I explicitly mentioned Red Hat employees as members for the steering committee, next to community members, since a) Red Hat tends to employ quite a few suitable persons, many of them originally being from the community, and b) if this is going to be certified, some Red Hat influence is probably mandatory anyway. This does not mean that Red Hat should be able to force decisions on the collection that have no community support.)

Licensing rules, CLA/DCO

This collection should be open source and as open as possible, so it can be used by all collections that can use features from it.

For module_utils (that can be used by modules; these have to be in plugins/module_utils/) I would stick to the same requirement as for module_utils in ansible-core: these should always be BSD 3-clause licensed. (Ref: Using and developing module utilities — Ansible Documentation) The same requirement makes sense for docs fragments (plugins/docs_fragments/), even though these aren’t linked in.

Generic plugin utilities (in plugins/plugin_utils/) or other plugin-specific files should be GPL v3+ licensed. These files are supposed to be imported by Ansible plugins, and also tend to import code from ansible-core.

About CLA/DCO: we have established that collections shouldn’t require CLAs. So this one shouldn’t require one, either. Requiring a DCO on the other hand could be useful; I guess we should start requiring a DCO for the new collection at the same time as a DCO is required for ansible-core contributions. (I’m not 100% if that’s actually planned, or if there’s a timeline for that.)

1 Like

I was specifically referring to things like this in the ansible module utils itself,

Specifically python functions other collection modules can call upon. Technically you can call upon other collections in modules through tings like

from ansible_collections.awx.awx.plugins.module_utils.awxkit import ControllerAWXKitModule

So was more contemplating a meta collection that included functions various collections could pull upon.

In regard to the different products authing being a very specific thing, Agreed, that is more a product side then a community side need.

Agreed on it needing to be certified if its picked up by certified collections. It could just be something built into ansible itself like the URI’s instead of a seperate collection, more just functions and other parts shared like I mentioned?

Not really:

I don’t want to wait 2+ years to be able to use inventory plugin filtering code in a collection, because the earliest ansible-core that the filtering code can be added to is ansible-core 2.17.0 (i.e. next May), and then it takes a couple of years until the collections drop support for ansible-core 2.16 and everything before. (For community.general, which regularly gets rid of no longer supported old ansible-core versions, the first version that would only support ansible-core 2.17+ will be released in Fall 2025: 2.16’s EOL is May 2025, and the community.general rules on dropping support have been approved in this vote.

I don’t want to wait two years to add a feature that I can also use now by bumping the dependency on a collection (which only requires a new minor collection release).

1 Like

I think trying to create a single place for code like this to live causes more headaches than it solves. If you want to release a piece of reusable code, a special-purpose collection for that code makes more sense to me than a single collection with other unrelated code.

Among other things, this means that backwards-incompatible changes can be accommodated by bumping the collection name (e.g. community.inventory_filteringcommunity.inventory_filtering_v2.

@flowerysong that’s a very good idea.

One possible downside is that we end up with a large zoo of tiny such helper collections, similar to stuff like leftpad and isarray on npm :wink: But I guess (hope?) there won’t be that many such collections. I can still imagine we’ll have 20-30 such collections in a couple of years, or even more. (I don’t know what these will be about yet, but there wlil always be functionality that is used by at least two collections that aren’t directly related and don’t have an obvious other place to put them.)

Another point is naming / namespace. Just putting all these collections into the community namespace (or another ‘regularly used’ namespace) can make it pretty confusing for users to decide which collections in community are useful for them, and which are pure helpers for other collections. If there’s just one such collection this won’t be too bad, but if we end up having 20-30 of them it can get confusing. Users might find these collections when searching Galaxy for terms like “filter xxx inventory” and waste some of their time figuring out that this collection by itself doesn’t do any good.

Some possible solutions:

  1. Use another namespace, say community_lib or ansible_lib instead of community resp. ansible (ansible_lib.inventory_filtering);
  2. Prefix collection name with library_ or lib_ (ansible.lib_inventory_filtering);
  3. Have a flag in galaxy.yml / MANIFEST.json that makes them as “library only” so you can only find them only when checking an additional box when searching on Galaxy;
  4. …?

Here 3. requires additional changes to both ansible-core and galaxy-importer and ansible-lint (at least) to support such a flag (without these programs screaming at the user), and more work on galaxy_ng to store and use that flag. 1. and 2. can be “just used” without any code update.

What do you all think? Do you think this is actually a problem (that should better be solved), or is using ‘generic’ names like ansible.inventory_filtering or community.inventory_filtering fine, even if we end up with 20-30 or even more such collections?

One last point is making these collections discoverable. I guess we can add an overview of such collections to the docsite’s dev_guide.

@flowerysong what is the advantage of creating a new collection name with _v2 appended vs. releasing major version of 2 of the original collection?

@felixfontein naming wise, I guess I’d go for the namespace idea, maybe something with lib or library, or I was thinking something along the lines of internals maybe?


Putting these things into a single collection does have its issues as @flowerysong pointed out, but it has several key advantages as well.

With many small collection:

  • It multiplies the overhead of releases, maintenance, etc.
  • Higher barrier to entry for each new thing (creating the collection around it)
  • Less chance for adoption as each consuming collection needs to make the “do we take a dependency” decision for each library collection, thus defeating the purpose of a stable place for useful code
  • Inclusion process for each individual collection
  • For features that are backported core features, we would expect that the collection containing it should be removed/unmaintained at some point, which is another process for community package removal, repo archiving, etc. but not sure how arduous that will be

With a single collection:

  • We still want a high barrier to entry for new things in this collection, but now it’s review by committee about whether it meets the bar
  • The expectation is that many collections will opt to depend on this collection because it’s useful, allows for potentially consuming core features early, and it heavily curated. Once the initial decision to take the dependency is made, the benefits are reaped for new entries.
  • Inclusion process happens once
  • Features are removed in major versions without additional procedures

It might be possible to combine some features of both approaches:

  • designated collection namespace as @felixfontein suggested, and all collections in that namespace (I guess both within Glaaxy and ansible-collections in GitHub?) are owned/maintained by the aforementioned committee
  • inclusion process (well, the review portion at least) is done once for the namespace and review is not needed for the individual collections, except maybe the first. This could end up problematic… the periodic review for removal probably still needs to be done on an individual collection basis

Both the single collection and the compromise option still allow for anyone to release their own collection of course, and apply for inclusion, so they don’t preclude someone doing that on their own.

It’s a way of addressing the issue raised in the initial post:

I don’t think never breaking backwards compatibility is a realistic goal (especially in a multipurpose collection), so releasing it as a new collection allows both versions to be installed side by side. It’s not a perfect solution, but it avoids the temptation to “solve” the dependency problem by making breaking changes but not incrementing the major version.

2 Likes

It’s not a perfect solution

I guess one main downside is that you cannot support multiple versions of that collection at the same time. But then, you also don’t need to, because the only reason why you usually support multiple major versions of a library is that both cannot be installed in parallel for the same environment.

In any case, this solution can be used both for a single shared code collection as well for a set of shared code collections. I would probably even encode this in 1.x.y releases by using the suffix _v1. The repository name should not use that suffix though, so eventually the stable-1 branch contains the foo.bar_v1 collection, the stable-2 branch the foo.bar_v2 collection, etc.

1 Like

Semantic versioning provides for pre-release and build metadata, but doesn’t seem to offer the equivalent of RPM’s epoch tag.

Other than indicating they are different, what meaning can the _v# suffix convey? If it’s specific releases of Ansible that a stable-x branch targets, would it not make sense to include that somehow. Or python version, which is rather like the architecture part of an RPM’s name?

There are too many variables to account for with semantic versions, and bundling into collections doesn’t simplify that for the user, or for maintainers.

I think before going much farther down this road, we need a very clear definition of exactly what problem this addresses as well as what’s out of scope.

Ah, so it sounds like this would be a single repository, of the type that supports multiple major versions (like c.g), but with a non-standard release process?

Rather than, or perhaps in addition to, standard semver releases, each major release branch would release a completely separate collection with the _vX suffix to its name?

That could help with a lot of maintainability concerns in a multi-collection model.

Basically the major version part is the epoch. (The epoch as for rpm isn’t needed since you don’t change the versioning schema if you stick to semantic versioning.)

It’s basically redundant to switching to a new major version.

The main problem is that you can only install one version of a collection at one time. But if you want to install foo.bar which requires my.collection 1.x.y, and at the same time you want to install baz.bam which requires my.collection 2.x.y, then you have a problem: there’s no way to install my.collection twice, so it’s impossible to install both foo.bar and baz.bam.

But if my.collection 1.x.y is actually called my.collection_v1 1.x.y, and my.collection 2.x.y is actually called my.collection_v2, you can install my.collection_v1 and my.collection_v2 at the same time, and thus you can also install foo.bar and baz.bam at the same time.

This becomes important if both foo.bar and baz.bam are part of the Ansible package, for example. We currently already had the problem that ansible.netcommon released a new major release and some network collections started depending on that one, while some others explicitly required the previous ansible.netcommon major release. This basically would have forced us to keep the old ansible.netcommon version for half a year, and also keep older versions of the network collections whose newer versions required the latest ansible.netcommon.

There are two ways to resolve this:

  1. Allow Ansible to use multiple different versions of the same collection at the same time. This would be a very invasive (and likely breaking) change, and thus won’t happen (and if it happens, not anytime soon).
  2. Encode the major version in the collection’s name (since you can install two collections with different names at the same time - assuming there is no other dependency conflict).

@flowerysong’s suggestion is to use 2.

2 Likes

Does anyone objects to using such a versioning scheme, where the major version is encoded in the collection’s name?

And what do you think about community.library_inventory_filtering_vX as the name (with X=1 for the first version)? That way getting a first version set up with Zuul should be relatively easy, as that namespace is already available. (How well publishing a collection with a slightly different name from the collection repo’s name works is a different question, but that’s something to find out…)

I’d like to try this out, and if it looks good, apply that collection for inclusion in Ansible.

1 Like

I’ve now also added the community-wg tag to this topic.

1 Like

I think the argument for refactoring code out of ansible-core into something else is compelling. Regardless of one big collection of unrelated stuff, or multiple tiny collections or anything in between, I think we should be decreasing the overhead of maintaining a collection by largely automating the processes related to that, so that they cover the happy path for the majority of the collections. Whoever wants to customise their process can spend their own time doing it. Or another way of putting it: we need to decrease the need of manual actions, which seems to be the major constraint for the community, IMHO.

So, addressing the granularity of those collections is one thing that resembles componentisation in software engineering. It brings OO to mind as well. In general, it is a Good Idea™ to pursue loose coupling between components, which taken to extremes would lead to … npm universe. I don’t have any objective solution to that and I suspect that there isn’t one, here or in general software development at large, but the impact of that granularity choice is decreased by automating those processes. I digress.

Trying to circle back to @felixfontein original questions, though:

  1. Yes, we should definitely have something like that
  2. That’s the tricky thing. In the absence of any objective rules for it, I would suggest we start something without much restriction, and reassess constantly (until better rules are settled, then we can just apply those rules and decrease the need of manual reassessment)
  3. Not much more other than what I wrote at the beginning.
1 Like

Ok, I created GitHub - ansible-collections/community.library_inventory_filtering: Library for inventory plugins in other collections that allows to filter hosts in a generic way., whose stable-1 branch (the default branch) contains a collection called community.library_inventory_filtering_v1. Add community.library_inventory_filtering by felixfontein · Pull Request #537 · ansible/zuul-config · GitHub adds the repo to Zuul so I can test publishing, and I’ve adjusted Add inventory filter capability by felixfontein · Pull Request #698 · ansible-collections/community.docker · GitHub to use the code from that collection. (Currently CI fails for the community.docker PR, as the new collection hasn’t been published.)