YUM: Much slower in ansible than on the cli

The yum module looks heaps slower than the actual yum command.

For instance, when I check if a set of three packages are installed in ansible (timestamps are mine)

[19:44:05] TASK: [common | Install presto, fastdownloader and yum-fast-downloader] *******
[19:44:56] ok: [someserver] => (item=yum-presto,yum-fastestmirror,yum-fast-downloader)

But if I run:

time yum install yum-presto yum-fastestmirror yum-fast-downloader --enablerepos=personalrepo,rpmforge

It runs in:

First time:
real 0m7.956s
user 0m0.829s
sys 0m0.190s

Second time:
real 0m5.031s
user 0m1.136s
sys 0m0.269s

If I run the previous command from ansible:

[20:27:21] TASK: [common | Install presto, fastdownloader and yum-fast-downloader] *******
[20:27:28] changed: [someserver]

Any reasons why ansible’s yum module run are that much slower? I have tested on 1.4.5.

I notice that --enablerepos should be --enablerepo - no worries, I tested with the right flags.

Also, just to make sure its not ssh related, I also tried:

time ssh 123.1322.0.453 “sudo yum install yum-presto yum-fastestmirror yum-fast-downloader --enablerepo=personalrepo,rpmforge”

Which gave me comparable times as with running it straight on the server.

It runs some extra ops to ensure it doesn’t need to run change-inducing commands up front.

However I would disagree that 20% is “much slower”.

Do make sure you have “fastest mirror” disabled, BTW, the module usually isn’t faster.

Local mirroring is also always a fantastic idea! Check out “yum reposync”.

I see from the above that you said 50 seconds above and I misread. In your case this is definitely slower than the actual command by a very decent margin. I’m still not seeing this.

If you can benchmark where it is spending it’s time that would be appreciated.

I noticed you were installing fastest-mirror though, which you probably don’t want to do :slight_smile:

I’ll remove fastest-mirror, it indeed looks like it made things slower (this is in fact what I was adding to my stack as an experiment to make YUM faster - at first I thought it was purely YUM-related issue).

I will try to find some information as to how to benchmark, but would you have any recommendation as to how I should proceed?

./hacking/test-module in the checkout is pretty useful for things like this.

Do a checkout on a machine with yum and even inserting some basic print statements or logging could be a useful start to find out what functions or commands are taking the most time.

I think I found the issue - seems to be related to repoquery

Following tests were done as suggested with the test-module on the host

With repoquery:
real 0m21.014s
user 0m4.094s
sys 0m1.337s

Without repoquery:
real 0m8.130s
user 0m1.914s
sys 0m0.449s

I guess it is then no longer an ansible issue (never really were), but has anyone experienced this in the past?

Quick note. My playbooks break if I do not have repoquery… the code seems to suggest this is optional, but I just found a case, for instance, where checking for an already installed package gave me a recursion error, while another fresh install failed on “failed to parse: SUDO-SUCCESS-whatever”

I am away from my Ansible machine and test, however in my playbook the first thing i do is update yum, and yum-utils to the latest update as i had similar issues with older releases.

+1

Also, what (remote) OS is this?

We’d have this discussion before, where yum-utils we were pretty sure was only excluded in @core installs. That might not b e true though – need to check.

I have no problem making the yum module self-add yum-utils if not already there if it resolves problems in those environments as it should be there anyway.

I’m manually adding yum-utils in my RedHat installs as I am performing a minimal install. I figured that this was my fault for trying to install as little as possible. It might make some sense to document that dependency in the yum module page though.

Adam

I just happen to add some crude log traces to my yum module last night to see if I could figure out what it’s doing.

On RPMs that are already installed it will use up all the CPU/IO for a while, on a small instance it can take a long time. The instance I was testing with was an m1.small, so it’s slow anyway, but for just testing if an RPM is already installed, it’s pretty intense. The what_provides() appeared to be the worst, however I didn’t log the exit time of the function to get a good measurement. I’m also not sure why it would need to call that if I just gave it an RPM name instead of a path to look up. This RPM from an onsite repo cache, and we do run "yum clear all” before hand…

2014-02-19T07:21:07.245573+00:00 myserver-01 ansible-yum: Invoked with name=MyRPM list=None disable_gpg_check=False conf_file=None state=latest disablerepo=None enablerepo=None
2014-02-19T07:21:07.245761+00:00 myserver-01 ansible-yum: ensure() MyRPM
2014-02-19T07:21:07.381131+00:00 myserver-01 ansible-yum: latest() [‘MyRPM’]
2014-02-19T07:21:07.381283+00:00 myserver-01 ansible-yum: is_installed() MyRPM
2014-02-19T07:21:07.393524+00:00 myserver-01 ansible-yum: what_provides() MyRPM

@cove_s nice :slight_smile: I didn’t get to go down that much, but that reflects pretty well what I am experiencing.

@Adam @Michael at least for updates, NOT using repoquery made things faster for me. What I did is change the code for the yum module to undefine the repoquery path.

Some feedback

I tried a few things still to make it perform better, including mirror repositories, but the fact that repoquery is forced on the user is perhaps limiting… any ways to make that optional instead of using it if it is present?

We’ve been through this discussion a bit before, and we believe the repoquery needs to be there.

I’m a bit more curious about why you are spending so much time in the operation and most people are not.

When using yum in any sort of important setup, I almost always create a yum mirror with reposync, etc, and even in our testing, we’re not seeing any major timing issues with the yum options at all.

yum_rhn_plugin can sometimes be a very very different story (hence even more reason to mirror content).

For yum, I disable fastestmirror, set hard-coded repo sites, then configure an http_proxy.

For apt, I set hard-coded repo sites, then configure an http_proxy.

This seems much lighter weight then cloning an entire OS distribution, when most packages aren't going to be installed anyways.

ps: if you leave fastestmirror enabled, then the download site will change randomly, so a proxy is worthless. Also, the centralized site that fastestmirror talks to seems to be highly unstable, and returns spurious errors, which cause the ansible yum module to abort, but only sometimes. This isn't a bug in ansible, but in the yum python module that ansible uses.

Yep, that can be a valid approach.

“This seems much lighter weight then cloning an entire OS distribution,”

It’s much smaller than the apt repo, however.

The other bonus is being able to control the package versions on all of your hosts and update when you choose while still coding only state=latest in the ansible content.

Debian has snapshot support, so I could always add that line to sources.list, and then configure version pinning, if I really wanted that.

Was talking about yum.

Sure, but it's easier in debian, as snapshot/backports are also mirrored.