> Hi everyone,
>
> I did some testing with this (255 systems in my case), default
> paramiko type, default number of forks.
My first "tests" (more like 'loose observations') were jumping between
<100 hosts and 300 hosts with 30 forks, and I noticed non-linear
scaling. The total time divided by hosts would jump from ~9 seconds to
~45 seconds. FWIW, this was before 0.4's release, and it feels (again,
with non-rigorous tests) that Ansible handles >100 hosts better now.
Unsure. Could be the indeterminance could be because you were finally hitting swap with the larger case.
> Easiest for this is to remove ohai and facter -- and maybe make it
> configurable in the playbook what facts you want. (Don't think you
> answered whether ohai and/or facter were installed on the nodes...
> would love to know!)
Sorry about that I don't have facter and ohai installed.
That should reduce RAM by 2/3 had it been the case, but you still have a large amount of hosts, so the RAM consumption you saw still seems to roughly track, and yeah, things are slightly rearranged now so that could have also been a factor.
> The way to control this consumption, I'm fairly certain, is to
> generate less facts in this case, and maybe make it controllable, so
> maybe you could choose by telling the setup module to generate facts
> only in certain categories. (Good idea? I'm going to open a
> ticket...)
I like that. FWIW I have a few playbooks that aren't using facts, but
that could have been because of I was still learning Ansible, and not
because it was the best way to solve a particular problem. This might
be less important for mature Ansible users :3.
Yep.
I very seldom have any use for facts, because I usually know top down what something should be, and don't need much info off the server.
I tend to think the central server, in most cases, should be as close to totally authoritative as possible.
They are useful for where you one server needs to know about the ip addresses of all the other servers, or something like that, where you only have the hostnames, but a lot of the data we return is not useful most of the time.
In those cases, I just want the network info.
Or maybe it's style choice. If I'm writing a playbook for 800 hosts,
it behooves me to not use setup/fact generation.
I don't think they'll hurt you if you just gather a few of them, but yeah. If you don't need them, you don't need them.
It might be interesting for you to fork the setup module and pair it down, and see how that affects your environment. I'll do the same just for a quick test, though if that proves true, it can be made configurable.
I really can't think of anything else building up data that would cause any problems.