Future of the Ansible Community package

Personally, I don’t think the one-package-to-rule-them-all idea is sustainable. We are discussing how to manually (or not, if Mr. AI joins the party) keep the code quality of a heck of a lot of third-party packages inside the main package of the language.

I have said that before, but every time I mention this, I get a rebuke that this was already discussed many times before, that there are good reasons for this (that were discussed then), and that topic usually dies. It feels like this is turning into one of those things that we do because we always did before, and it is a bit of a taboo to question it. As @felixfontein mentioned above, we always close that can before the worms crawl out, and this topic is not (apparently no topic ever is) the right place to do it again.

Trying to think of this like a project in the company, I think the first think we need to understand here is the WHY. Why is that something we would want? The arguments I have seen so far are:

  • easy for beginners: the “all batteries included”, which is, in many cases, not true, since many collections depend on extra packages to be installed anyways. Installing collections is quite simple these days, and it is something those users will need to do very soon in their Ansible journey, so I see little advantage in forcing them to download and install one bloated package instead of teach them how to install the core package and add collections on top of it. In fact, one could argue that we would be teaching them to do it wrong, and tell it to them later, so they can do it right.
  • documentation: in more than one occasion, I have seen the maintainers of these incoming collections mentioning they want their collection’s documentation to show up in the Ansible site. Compare that with how developers maintain the docs of Python libs, or Java libraries, or any other relatively modern programming language: I see no sense in trying to maintain third-party libs inside the main package. Either code-wise or docs-wise.

But hey, don’t mind be ranting, let’s can this quickly. :slight_smile:

6 Likes

If you want to open it… let’s do it now, and ask someone who has enough powers to split these posts off into a new topic. It’s good to have a new discussion on this, but it shouldn’t sidetrack the current discussion :slight_smile:

I agree that the current package isn’t great. I’ve been thinking about how we could trim it down to something better from time to time, but it’s really hard. In general I think the formula usefullness for its users * number of folks who use it / size should be maximized for the content included in the package.

Thoughts on some specific collections (most of which randomly came up to my mind):

  • ansible.utils: has useful filters for working with IP addresses, should definitely be there
  • ansible.posix, ansible.windows: some commonly used things are in there (like the synchronize modules), should be included
  • community.windows: maybe? I know too little about what’s in there, compared to ansible.windows
  • community.general: parts of it should be in there - some OS package managers are part of this collection, for example. Maybe these (and some more common modules) should be moved to another collection, say community.core, and that one should get included? Things like timezone and ufw also look general enough. But what about programming language package managers, for example? Too specific already?
  • community.dns: at least the DNS lookups and filters are generally useful. The provider-specific modules are … well, provider-specific. Most won’t need them.
  • community.crypto: most of it is probably common enough
  • community.docker, containers.podman: is this already too specific? or not? :thinking:
  • community.routeros, community.hrobot: good example of something that probably shouldn’t be included: way to small audience
  • amazon.aws / community.aws: many folks use AWS, but should it really be included? :thinking:
  • community.mysql, community.postgresql: similar, used a lot, but should they really be there? :thinking:
3 Likes

A bit of history:

  • The community package was created with the expectation that it would eventually fade away as people got used to collections. This expectation was from the Ansible BU/RH. At one point control was ceded to the community itself and so was the decision about this. We also initially named it ‘ansible community package’ (acp) to avoid confusion … but it ended at last minute taking the simpler but confusing ‘ansible’ name.

  • ansible-core was supposed to be beginner friendly, but it ended up being stripped down to the essentials to allow a system to be bootstrapped, install collections and their deps.

  • community.general was a ‘catchall’ for modules/plugins that were being moved from core after 2.9 and did not have enough community or vendor support. The hope was that it would eventually dwindle away as those communities or vendors grew to handle their own collections. While there have been cases of this, there was also been a lot of growth as a very permissive policy was adapted to add new modules/plugins to it.

  • naming conventions, initially it was more based on technology, but the community, ansible and vendor namespaces ended up being added to indicate ‘support/ownership’. For example db.mysql and db.psql vs community.psql/ansible.mysql. This has been both good to setup expectations and bad when making transitions on maintainers team (yay redirects!).

  • A way to easily bootstrap collection requirements was never implemented. The best we have are a convention of adding ‘boostraping’ plays to the collection that actually install the requirements, this has it’s own issues with sourcing (some places don’t want to use language specific package managers) and OS/distro/package manager support ( package action does not really fix that).

  • The only way to add your docs to the ‘docs.ansible.com’ website was being part of the ansible package, this was the main driver for many to add their modules/plugins/collection to it.

5 Likes

Not having time today to dive into any discussions today, I just want to say +1 to this for now.

2 Likes

I’m definitely someone who has given this response before, but I don’t agree with this framing.

The discussion does not “die”, the discussion happens, and so far, consensus has been to maintain the package.

It’s not that it’s a taboo question, after all we are here discussing it again, but when the discussion comes up again, it is by and large with the same arguments as previously from everyone participating.

I think that’s to be expected when the conditions around this have just not changed all that much since all of those previous discussions and their outcomes.

A question like this is also why it is mentioned that we have previously discussed it. The reasons why have been brought up many times, they have not changed much, but bringing this up again is asking for them to be repeated, but it’s not clear why they need to be repeated.

What has changed materially since the last discussion that warrants this?

(to be clear, it’s fine to just talk about it again because it hasn’t gone the way you would like in the past, but I also think it’s important to set your expectations on what that discussion will be like if that’s the case)

Many questions ‘recur’, one thing I’ve seen to avoid ‘question fatigue’ is to have a period on which it is settled and after which it can be revised. I would suggest 6 months to a year for reopening discussions on these types of questions.

2 Likes

Maybe this is a symptom of the predeominate reasons not being consolidated into a place where they can be easily reviewed. I’ve participated in the discussion a few times in the past, and don’t know that I’d be able to find the prior reasons.

2 Likes

I can’t say why other people don’t use collections directly, but I think I’ve already mentioned this several times in other places: For “security” (compliance) reasons, our control nodes can’t download collections directly from the internet / galaxy. We need something in between like Sonatype Nexus or similar that works as a caching proxy or so where we can run security and (license) compliance checks centrally. We have a solution that’s accepted by our IT security for PyPI, but not for galaxy.

As @felixfontein said in another context:

Different problem, but this would help us to move away from the community package.

1 Like

That is why RH provides Automation Hub (AH) and even a private version (PAH ) … which is based on galaxy_ng. Even if you don’t want to setup/maintain a galaxy instance, you can just create a simple file/git repo with the curated collections you want and point your systems at that.

Another even simpler option is to create your own ‘meta collection’, see my example of building my own c.g GitHub - bcoca/acd: acd collection

2 Likes

I didn’t say there are no solutions, I said there are not enough :wink:

1 Like

Can you elaborate? I do see lack of documentation on these solutions, specially outside AH/PAH, but unsure what they are lacking for most scenarios. Though this might be offtopic here and worth it’s own thread.

Does ansible-collections/ansible-inclusion relate to what ends up in ansible-community/ansible-build-data?

Just want to make sure I’m on the same page on this discussion (and the one it split from) before I throw my 2 cents in.

Personally, I use ansible-build-data as a metric for the trust-worthiness of collections. I have a high level of trust for collections that are in the certified/validated AH repositories, and a moderate level of trust for collections in ansible-build-data that are not also in AH. Collections that aren’t in any of the above sources are highly scrutinized, especially if they are only hosted in github but not in galaxy.ansible.com. Having an entry in docs.ansible.com is also a plus.

This doesn’t necessarily mean I use the batteries-included ansible package itself, I just review its requirements.yml for collections I might be interested in that I know has been pre-screened by other experts.

If you were ever to decide that the batteries-included ansible package is “bad practice” and stop building it, I for one would still like some kind of process in place to promote high-quality collections. Whether that means shoeing them into AH or making another repository/distribution in galaxy.ansible.com or something that helps distinguish these collections from the sea of public collections.

Yes, the ansible-inclusion repo’s process is the (main) way to get into the Ansible community package, which is defined by ansible-build-data. (The only other way is to split up an existing collection in it; the parts are then also included without much checking. The most common cases of this second way are renaming collections and moving things from community.general or community.network to their own collection.)

The collections on docs.ansible.com are exactly the one in the Ansible community package, and that’s exactly the ones in ansible-build-data.

1 Like

Thank you for confirming. Already threw my 2 cents in, so whatever comes out of this discussion topic, I just hope that some vetting process will still remain to help promote reputable collections.

1 Like

Like a ‘community certified’ badge on galaxy?

2 Likes

Exactly what I was thinking.

Since a few collections were mentioned, I pulled some stats off the docsite. This isn’t guaranteed to be accurate as there could be highly popular collections that my search didn’t find, but here’s the list:

All collection visitors  - 527.5K in past 90 days.

community.general - 120K visits
ansible.windows - 45.5k
ansible.posix - 45K
amazon.aws - 25.7K
community.docker =- 27.7K
community.vmware - 24.1K
kubernetes.core - 16K
ansible.utils - 15K
cisco.ios - 12.8K
fortinet.fortios - 12.5K
community.crypto - 12K
ansible.netcommon 10.5K
azure.azcollection 9.6K
community.mysql - 9K
community.postgres 8.9K
awx.awx 8.8K
community.hashi_vault - 7k
containers.podman - 6k
chocolatey.chocolatey 3.9K
community.dns - 1.7K

5 Likes

As for prior discussions in this area, this is what I found:

There may be more places so making this post a wiki. Feel free to edit and add other places where this was discussed.

4 Likes

(Disclaimer: I am aware the way I write sometimes seems a bit aggressive - trying to improve on that but still a long way to go - please do not read it as an aggression, I promise it is not, it is just the way eloquence comes out when writing in what is my second language)

Well, I certainly agree we are here discussing it again this time around. In previous occasions there was not really a discussion, and as @sivel mentions in one of the responses here, referring to something that is hard to find is hard to find. I would not really call that discussing.

Well, I cannot compare to what was before (when?) because I wasn’t there and I don’t have the time to search the entire history of this community to find the actual conditions.

That being said, though, I would be extremely surprised if conditions had not changed between then (whenever it was) and now. Ansible has been in a steady adoption curve all over the world, it is becoming or already is the de-facto standard in automation. The size of the package has increased a lot, the invention of collections, the publishing of galaxy_ng, the creation of a Steering Committee, the establishing of a process (which is manual, ironically), all those things and more have happened between then and now.

Again, I am making this claim on the blind, I do not have the previous state to compare to, so I am inferring that all these developments changed conditions by a lot. However you claim the conditions did not change. If I may suggest, I think we should try and go through the exercise of actually listing the conditions/reasons thought of before, and analysing them again. We do have a lot of new people (myself included) in the community, it would be interesting to see what all of us have to say about those conditions.

Well, I think I already addressed that, but basically: I don’t think we had actual discussions before, and the fact that past conditions are always referred to but no one seems to be able to list them (now and before) is unsettling to me. If I am to agree to those reasons/conditions, I need to know what they are.

While I have the same situation in my workplace, I would argue that this is not an Ansible-specific problem and that it does not justify bloating the package for the entire community.

And as @bcoca writes, there are ways to work around that problem.