ansible installation thoughts

Hi

I’m setting out to upgrade ansible in some places here.
We exclusively use ansible in dedicated venvs, using python3.9.
Most of the installs are 3.4.0.
Looking back over the last years I had the impression that doing ‘pip install ansible’ was increasingly slower.
This makes sense as there is increasingly more to it. I just a quick test and indeed it’s been growing a lot:

153M 2.9.27
368M 2.10.7
402M 3.4.0
488M 4.10.0
508M 5.3.0

Half a gig seems like a lot for what we do - we use the same modules as when we ran 2.9.
Since we’ll move to 5.x, I looked at how much is in there, and apparently the bulk of the space is taken up by collections. For example in 5.3.0 the bundled collections are 457M in total (in lib/python3.9/site-packages/ansible_collections):

142M community

78M fortinet
65M cisco
19M dellemc
14M netapp
14M f5networks
14M ansible
10M google
9.5M junipernetworks
9.4M azure
9.4M arista
7.7M vyos
7.5M amazon
6.0M netbox
5.9M purestorage
4.6M inspur
4.0M netapp_eseries
3.7M ngine_io
3.4M ovirt
2.8M check_point
2.6M openstack
2.6M kubernetes
2.5M sensu
2.5M awx
2.3M containers
2.0M mellanox
1.6M hetzner
1.4M theforeman
1.2M infoblox
1.1M t_systems_mms
972K wti
944K cyberark
804K hpe
652K cloudscale_ch
576K frr
548K openvswitch
436K splunk
428K infinidat
416K ibm
336K chocolatey
260K cloud
256K servicenow
152K gluster

Since we only use a fraction of these, I would like to have only those installed.
I couldn’t find any clean way of uninstalling collections from lib/python3.9/site-packages/ansible_collections.
But rather than fetching content and then removing almost all of it again, I thought it’d be cleaner to start with ansible-core and then add collections to that.
Ideally just the collections that we need, with the same versions that are installed using a ‘full’ ansible install.
I couldn’t find any information on where that is defined. The next best thing was to do a temporary full install, and then check what versions are installed there.
To automate this a little bit I came up with (please don’t laugh):

ansible-galaxy collection list --format yaml |
yq -y -r ’
. | with_entries( select( .key | test(
“^(amazon.aws|ansible.(posix|utils)|community.(aws|crypto|docker|general|postgresql))”
)))|
to_entries |
{
collections: map(
{ name: .key,
version: .value.version
}
)
}’ > requirements.yml

FYI this file reads:

collections:

  • name: amazon.aws
    version: 2.1.0
  • name: ansible.posix
    version: 1.3.0
  • name: ansible.utils
    version: 2.4.3
  • name: community.aws
    version: 2.2.0
  • name: community.crypto
    version: 2.2.0
  • name: community.docker
    version: 2.1.1
  • name: community.general
    version: 4.4.0
  • name: community.postgresql
    version: 1.6.1

This file is then used in a venv that has just ansible-core==2.12.2 (and yq) installed.
I force installation into the same location as where the collections end up if the ansible package was installed:

ANSIBLE_COLLECTIONS_PATH=$(python -c “import site; print(site.getsitepackages()[0])”) ansible-galaxy collection install -r requirements.yml

It appears I now have a fully functioning ansible install, with just the collections that I need:

$ ansible-galaxy collection list

/Users/dick.visser/tmp/ansible/5.3.0-slim/lib/python3.9/site-packages/ansible_collections

Collection Version


amazon.aws 2.1.0
ansible.netcommon 2.5.0
ansible.posix 1.3.0
ansible.utils 2.4.3
community.aws 2.2.0
community.crypto 2.2.0
community.docker 2.1.1
community.general 4.4.0
community.network 3.0.0
community.postgresql 1.6.1

I’ve repeated this for ansible 3.4.0 (which uses ansible-base 2.10.7) and 4.10.0 (which uses ansible-core 2.11.8).
Those seem to work fine as well. The resulting installed code base is indeed much smaller.

I’ve labeled the venvs that have a limited collection set as ‘slim’:

403M 3.4.0
109M 3.4.0-slim
489M 4.10.0
96M 4.10.0-slim
509M 5.3.0
114M 5.3.0-slim

Now, while all of this seems to work - I’m not sure if it is supposed to work…
Are there any gotchas with this approach?

thx :slight_smile:

This repo contains the versions of collections corresponding to the “ansible” package versions:

https://github.com/ansible-community/ansible-build-data

Hi Dick,

I'm setting out to upgrade ansible in some places here.
We exclusively use ansible in dedicated venvs, using python3.9.
Most of the installs are 3.4.0.
Looking back over the last years I had the impression that doing 'pip
install ansible' was increasingly slower.
This makes sense as there is increasingly more to it. I just a quick

note that installing Ansible is also pretty slow since there are no
wheels. This is something that will change for Ansible 6 (see
https://docs.ansible.com/ansible/devel/roadmap/COLLECTIONS_6.html#planned-work
and https://github.com/ansible-community/antsibull/pull/395).

test and indeed it's been growing a lot:

153M 2.9.27
368M 2.10.7
402M 3.4.0
488M 4.10.0
508M 5.3.0

Half a gig seems like a lot for what we do - we use the same modules
as when we ran 2.9.

It's not just a lot for what you do, it's also a lot for everyone else
:slight_smile: Actually with Ansible 6 size should drop considerably (roughtly ~50
%) since we plan to leave out tests/ and docs/ folders for collections
(see the roadmap and
https://github.com/ansible-community/community-topics/issues/65).

Obviously that does not solve the problem that Ansible as a package
contains a lot more than most folks will ever need. Your point here:

Since we only use a fraction of these, I would like to have only those
installed.

is totally valid! There have been discussions in the past on whether
it's better to teach folks to install ansible-core + only the
collections they need (which also fits nicely to the concept of
Execution Environments), or whether it's better to just let them
install the Ansible package "with batteries included" (including a lot
of exotic batteries you will never need).

I couldn't find any clean way of uninstalling collections from
lib/python3.9/site-packages/ansible_collections.

I don't think it is really possible to do this in a clean way. (Simply
`rm -rf`-ing them works fine though, pip does not complain if files
have vanished.)

But rather than fetching content and then removing almost all of it
again, I thought it'd be cleaner to start with ansible-core and then
add collections to that.
[...]

It appears I now have a fully functioning ansible install, with just
the collections that I need:

[...]

Sivel aleady pointed you to the
https://github.com/ansible-community/ansible-build-data/ repository,
which contains (among other things) the collection versions that are
included in Ansible.

You can either use the data from there to compile your own collection
requirements file with the same versions, but just the collections you
are interested in, or you can use the same machinery we use to build
the Ansile package (https://github.com/ansible-community/antsibull/) to
create your own ansible Python package which contains exactly the
collections you want. (That's definitely more work, but in the end you
get a single tarball you can also install on air-gapped machines.
Depending on your use-case and the number of machines you want to
install this on, this can be useful.)

Now, while all of this seems to work - I'm not sure if it is supposed
to work....
Are there any gotchas with this approach?

I would change the part of your approach which extracts the collection
versions to use the ansible-build-data repo instead, but besides that,
it's a totally valid approach!

Cheers,
Felix

Don'

Hi

I'm setting out to upgrade ansible in some places here.
We exclusively use ansible in dedicated venvs, using python3.9.
Most of the installs are 3.4.0.
Looking back over the last years I had the impression that doing 'pip install ansible' was increasingly slower.
This makes sense as there is increasingly more to it. I just a quick test and indeed it's been growing a lot:

153M 2.9.27
368M 2.10.7
402M 3.4.0
488M 4.10.0
508M 5.3.0

"pip install ansible" is now outrageously large. Most ansible servers
use only a srlect few, if they use any at all, out of the list of
nearly 100 ansible collection modules, aka "ansible galaxy" modules,
and those are safer and more reliably installed individually on an as
needed basis with the "ansible galaxy" command. At this point, I
suggest that it be called "ansible_collections", and if anyone still
wants "pip install ansible" to work the same way, that should be set
as tiny dependency wrapper for ansible-core and for the more
consistently named ansible_collections.

For you, right now, rip out the "ansible" package, install only
"ansible-core" which actually contains all the ansible scripts and the
"ansible.*" python modules. "ansible" does not contain "ansible-core"
at all, it only lists a dependency on it and pulls in an installation
of ansible-core.

Since we only use a fraction of these, I would like to have only those installed.

I doubt you use more than 3. Maybe the posix module?

I couldn't find any clean way of uninstalling collections from lib/python3.9/site-packages/ansible_collections.

"pip3 uninstall ansible" is a good start.

Use "ansible galaxy" if and only as needed. It's vastly simpler for
most of us to keep a small set of third party modules tested and
coherently deployed than an agglomeration of 100 third party modules.
I'm aware of several projects which have previously agglomerated such
large software suites, into single tarballs: I used to help Akamai do
that back in the day, and rolling back the suite when one little bit
broke became a nightmare. So did testing the complete suite of
software, rather than individual components.

If you feel the need for working versions of the very large "ansible"
labeled collection of ansible collections modules on a RHEL 7 or RHEL
8 system, take a look at my RPM building tools at
https://github.com/nkadel/ansiblerepo/

Right.

Combining all this information, I do see some sort of bootstrapping problem - at least in our workflow.
Namely where to start with selecting what version of which package to use.

Currently we start with ansible==3.4.0. This is because of some issues with mitogen but that’s not really relevant here.
This is part of our requirements.txt which also contains the other packages that we need. Currently the file looks like:

ansible==3.4.0
awscli
boto
boto3
cryptojwt
dnspython
gitpython
natsort
netaddr
packaging
pip-tools
pylint
pysaml2
python-ldap
ruamel.yaml
sshpubkeys
wheel
yamllint
yq

After pip installing this, the list of collections is exactly the list of https://github.com/ansible-community/ansible-build-data/blob/main/3/ansible-3.4.0.deps.
Which is expected.
But the venv also contains ansible-base 2.10.17, while https://github.com/ansible-community/ansible-build-data/blob/main/3/ansible-3.4.0.deps lists 2.10.9. This must be because ansible-3.4.0’s setup.py contains:

install_requires=[
‘ansible-base>=2.10.9,<2.11’,
],

OK. I understand that.
Suppose I decide to forget about which ansible version to use, and instead use ansible-base.
What version should I pick? And more importantly, what versions of collections go with that?
I don’t see a way to find out which collections go with a specific version of ansible-base/core.
The repo at https://github.com/ansible-community/ansible-build-data is built with the ansible version as the starting point, so I can’t really use that.

It looks like I still have to use the ansible version as a starting point:

  1. Pick ansible x.y.z
  2. Look up which ansible-base/core version that needs from setup.py/install_requires, and install that. Looking at that I think I can take a short cut there and just do:

ansible 3 => ansible-base==2.10.*
ansible 4 => ansible-core==2.11.*
ansible 5 => ansible-core==2.12.*

  1. Based on the ansible version I picked, look up which collections go with that at https://github.com/ansible-community/ansible-build-data, and install those.

So starting with just an ansible-base/core version (and no ansible version), there is no way to look up which collection versions go with that.
I know that individual collections have specific requirements, so blindly installing the latest collections on, say ansible-base 2.10.17, will result in breakage.

Is there maybe a repo similar to https://github.com/ansible-community/ansible-build-data, which is keyed off ansible-base/core versions, and that lists the collection versions with their own reported requirements?
(From https://docs.ansible.com/ansible/latest/user_guide/collections_using.html#installing-an-older-version-of-a-collection I see that collections also have a version specifier such as ==, >=, etc)

thx

Whoah. There have been quite a few changes in the anible package and
its related ansible-core requirement since then. If you're having
issues with one of the ansible collection modules, why not just stop
installing the ansible bundle? Install only the modules you need, and
select the specific version of components you may need.

I'm going to restrain my commentary on all the difficulties this
confusing split and renaming has generated, and simply say "don't
install the ansible package". Select and install only the necessary
ansible collection modules as needed for a much more stable, testable,
and maintainable server configuration. The current list of nearly 100
third party modules is much, much larger than almost anyone needs and
introduces a great deal of testing difficulty.

Right.

Combining all this information, I do see some sort of bootstrapping problem - at least in our workflow.
Namely where to start with selecting what version of which package to use.

Currently we start with ansible==3.4.0. This is because of some issues with mitogen but that’s not really relevant here.
This is part of our requirements.txt which also contains the other packages that we need. Currently the file looks like:

ansible==3.4.0

Whoah. There have been quite a few changes in the anible package and
its related ansible-core requirement since then. If you’re having
issues with one of the ansible collection modules, why not just stop
installing the ansible bundle?

As said earlier, we use 3.4.0 because that works with mitogen. We don’t have problems with collections.

Install only the modules you need, and
select the specific version of components you may need.

Yes, I know that, and this is what I pointed out.
The final question remains unanswered though: what versions of modules go with a specific ansible-base/core version?

If there is no such mapping, then I’ll have to manually go through the collections and decide what’s best.
After all, it is now only about 8 collections (in our case), instead of ~100.

Still interested in a better solution if that exists.