Intelligent loop for yum packages installation

Hi all,

I am running into this performance issue with installing a bunch of packages through yum.

  • name: install additional requirements
    yum: name=$item state=present
    with_items:
  • vim-enhanced
  • readline
  • readline-devel
  • ncurses-devel
  • gdbm-devel
  • glibc-devel
  • tcl-devel
  • openssl-devel
  • curl-devel
  • expat-devel
  • db4-devel
  • byacc
  • sqlite-devel
  • gcc-c++
  • libyaml

It appeared to me that that doing this will mean packages are installed one by one.

In my case, I have less than 30 packages I want to install and I sat there for a good 10/20 minutes for this to happen. I saw /usr/bin/repoquery ran twice for each package and using “rpm -qa --last” I can confirmed for certainty that packages were installed one by one.

Are there better ways to do this? There must be ways to install these packages in one fell swoop in Ansible.

Please share your thoughts on this or is this something that can be improved with a new/current module.

Thank you very much,
Steven.

Hi Steven,

The yum and apt modules are smart enough to collapse with_items lists
into single transactions. But I've seen yum behave very slowly anyway:
a common problem is the "fastestmirror" plugin which ironically tends
to make things slower. Try disabling that plugin and see if that
speeds things up.

-Tim

I don’t think fastest mirror should take 10-20 minutes under any case.

It seems most likely you were waiting for a yum lock to clear, perhaps you had PackageKit installed? (I always remove PackageKit).

This is unrelated, but I should point out you are still using old style variables.

Do this as follows:

yum: name={{ item }} state=present

(Old style variables will be removed in a future ansible, the date is not set yet).

I should point out too that if you are still on Ansible 1.2.X or before, there are some performance speedups to yum operations in 1.3. These are significant, but again 10-20 minutes would be unexpected unless you hit
a very slow mirror.

For people doing datacenter updates, I always recommend people considering creating a local mirror with reposync – which is also good for not being surprised by upstream content changes and helps your machines be more consistent – or at least run a cache with something like apt-cacher-ng.

I’ve seeing the same behaviour. Waiting for yum tasks to complete take ages especially when in a list.

I’ve not looked at the code so not sure why this is happening but output from a ps auwwx every second shows the following when installing expect for example:

root 32466 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --disablerepo=* --pkgnarrow=installed --qf %{name}-%{version}-%{release}.%{arch} expect
root 32471 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root 32471 11:08 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root 32471 11:08 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} --whatprovides expect
root 32481 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root 32481 11:08 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root 32481 11:08 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --qf %{name}-%{version}-%{release}.%{arch} expect
root 32491 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root 32491 11:08 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root 32491 11:08 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.i386
root 32501 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root 32501 11:08 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root 32501 11:08 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root 32501 11:08 0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-5.1.x86_64
root 32514 11:08 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root 32514 11:08 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root 32514 11:08 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root 32514 11:08 0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.i386
root 32527 11:09 0:00 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root 32527 11:09 0:01 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root 32527 11:09 0:02 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64
root 32527 11:09 0:03 /usr/bin/python -tt /usr/bin/repoquery --show-duplicates --plugins --quiet -q --pkgnarrow=updates --qf %{name}-%{version}-%{release}.%{arch} expect-5.43.0-8.el5.x86_64

So that’s 22 seconds for one package. So if you have 20 packages or so assuming there are only a few versions in your repo you are looking at several minutes of package checking.

While the initial repo queries make sense I’m not sure I understand the iteration through each version?

Please upgrade to Ansible 1.3 if you haven’t already, it invokes repoquery a lot less.

Michael,

that was with 1.3. The common case for us is that yum with local cache would be fine but this bug:

https://github.com/ansible/ansible/issues/2002

seems to be forcing or hand in having to use repoquery (which would probably be the better long term solution barring this performance issue).

Ok, can you just install yum-utils?

Are there any news about this issue?

I just checked with my ansible 1.3.1 and it seems to me that it is still installing every package alone, one by one doing:

  • name: Install basic packages
    yum: pkg={{ item }} state=latest
    with_items:
  • vim-enhanced
  • curl
  • git
  • java-1.7.0-openjdk
  • make
  • diffutils
  • man
  • policycoreutils
  • htmldoc

Doing so, if I check the processes in “top” I can see that yum is installing every package individually.
If instead I use “shell” like follows:

  • name: Install basic packages
    shell: yum install -y vim-enhanced curl git java-1.7.0-openjdk make diffutils man policycoreutils htmldoc

I can obviously see in “top” the yum command installing the packages alltogether.

Anyway. I timed how log both ways take. I did it in a really small virtual machine with vagrant.
The “yum” module way lasted 10min 17s, the “shell” way 8min 18s . Depending on how many packages to be installed this can be a big deal! For example in an autoscale environment…

Sounds like you don’t have yum-utils installed maybe?

Yes I have it in both guest and host. While the host is a ScientificLinux the guest is a Debian, but I have yum-utils installed on both.

Reading the 1.3.1 code, it appears that state=latest calls into the latest() function.

This function will do a loop /per package/ to determine if the package is installed, and if it needs updates or to be installed, and then will execute that action (install or update). Again, the code does this /per package/.

The intelligence that Ansible uses to collapse lists of packages into single actions, is a matter of single /yum module/ action, not necessarily a single yum action within that module. If Ansible didn't collapse things, you'd get one set of ssh to host, execute yum module, send results back, per-package. Now you get one ssh out, one yum module execution (with perhaps many yum executions within), and one set of results.

Some work could be done within the latest() function and install() function and others like that within the module to build up an install set of packages and run the command with the full set, rather than running the command once per-package.

It also appears that the way things are now, a yum module execution with a list of packages can have a status of /both/ failed and changed. Not sure if that is noteworthy or something that happens in other modules too.

-jlk

Right, I’m talking about the individual SSH steps being batched. Sorry for confusion.

Attempts to improve this would be welcome.

Yeah, I think what I observed adjusts perfectly to what you exposed.
What I could say as a suggestion to solve this particular problem is that, maybe it is not really necessary to check if a package is installed or not as yum itself will do nothing in case it is already installed. I suppose it is a matter of “time”. I mean, I don’t know what takes longer, checking if a package is already installed or just try to install it and if its installed do nothing. As far as I know failing to found a package yum exits with “1”, but trying to install an already installed package exits with “0”. In this case the problem is that in the case of a packages list, if some already exists (or get newly installed) and some fail (don’t exist), yum exits with “0”.

Anyway, thanks guys!

I think it’s important for resources to be idempotent and be able to report change detection appropriately, and not attempt operations unless they need to be attempted.

I understand your point. Change detection is useful. Anyway, I can think of some scenarios in which I would rather prefer speed on installing than getting the change report, at the end, from my sysadmin point of view, all I really care is certain packages to be installed, knowing if they have been just installed by ansible or if they where already in the system is secondary for me.
I really didn’t read the code, so maybe I am wrong, but I don’t think yum itself tries to install a package that already exist. I suppose yum checks first and attempt the install later. If it is like that (I repeat that I don’t really know it), wouldn’t it be redundant having ansible to check it fist?

I hope to find time to read the code and help if I can in someway, I think package installation is so common and speed can be crucial.

While it is of course true that yum won’t need to reinstall things, we are pretty well set on only attempting the underlying system operations for change that need to occur as a general principle of idempotency, and this allows us to get finer grained data out of things.

I’m all for considerations of improvements but the yum module was arrived at a LOT of work from folks like Seth over a long period, and since yum_rhn_plugin has frequently been a total I feel it’s probably best to not tempt fate.

That being said, more than happy to entertain patches that come with very extensive testing on EL 5 and 6.

With what have been said on this topics, I believe that this will only encourage people to avoid using yum module for operations on list of rpm packages. From now on, I might just do this instead of the yum module:

  • name: install a bunch of rpms
    shell: yum -y install abc.rpm abdddd.rpm …

Instead of sitting around waiting more than needed for the taks.

Happy Friday,
Steven.

I don’t believe it discourages anyone, lots of users of the yum module.

That all being said, quite open to improvements and pull requests.

I just ran into this issue, and found this topic :frowning:

I have decided to do a state=installed instead of state=latest.

And then run yum update in shell separately.

it took me almost 15 minutes to do state=latest for 20 packages. With state=installed, it took 1 - 2 minutes.

This gets even worse if you include EPEL or RPMforge into the repo check.

Very curious that it would take a minute per package though.

Sounds like something is wrong upstream.