Ansible and memory usage

I've been testing ansible 1.4 and devel on my larger environments and I've ran myself out of memory a few times (8 gigs on the VM).

So I started watching memory consumption in various scenarios and this is what I observed:

Inventory of 1309 hosts
Forks of 500

6.5G w/ ssh_alt + CP on devel (2m33s)
5.1G w/ ssh + CP on devel (5m40s)
4.5G w/ ssh + CP on 1.2.3

4.7G w/ ssh_alt and no CP on devel (2m53s)
3.7G w/ ssh and no CP on devel (6m27s)
3.3G w/ ssh and no CP on 1.2.3 (6m32s)

Forks of 1400

7.4G w/ ssh_alt + CP on devel (3m4s)
6.2G w/ ssh_alt and no CP on devel (3m16s)

Killing off all the open ssh Control Sockets (as I have mine set to 300s persist) frees up roughly 1G of ram.

When I switched from a 1300~ inventory to a 5K+ inventory I quickly ran myself out of memory when making use of ControlPersist, and when using ControlPersist the memory consumption was a function of the number of hosts in the play, rather than the number of forks used at any one time. Each ControlPersist socket would consume roughly a meg of memory, hence killing 1300~ sockets freed up about 1G of ram.

When not using ControlPersist the memory usage scales right along with the # of forks, and is much more predictable.

Since there does not appear to be a way to limit the number of control sockets created, and the time differences between ssh_alt with CP and without CP are not egregious, I would recommend those of you with larger environments to avoid using CP when dealing with a large inventory set.

-jlk

Hi Jesse,

That’s pretty awesome to see Control Persist is becoming less important (and, I’d venture, accelerate mode in many cases is less important given ssh_alt).

So it sounds like Control Persist is a share of things, but to avoid too much cause for concern, everyone should note that most of this comes with a very high --forks value and not Control Persist.

For those that aren’t quite aware (this is a slight oversimplification), --forks 500 this means 500 copies of the program, and of course in many cases people are going to be using rolling updates and this won’t be the case.

In any event, memory usage is coming more from the --forks than the inventory size, I’d reason.

Fact caching, which is being implemented for this release, will also allow facts to not live in RAM, which will further decrease this a great deal, I haven’t done experiments, but if the cache is not holding a further copy in memory I would hope to see this be 1/4 the size above.

So, yes, the combination of large --forks and a reasonable inventory size does push RAM requirements, but the brunt of it should be from --forks and largely alleviated by fact caching, once properly implemented.

Other strategies for people with large environments also include ansible --pull (pull mode) and using intermediate hosts or just using multiple runs for different portions of their environment, for instance with --limit.

Anyway, it’s good data – just want folks to understand the underlying parameters of the test and that this is generally not characteristic of most deployment architectures.

–Michael

I agree with the above, but I did want to clarify something.

If your task set is fast enough, you can run into memory issues using control persist even with a small number of forks. The default life of a control socket is 60s, but if the tasks are fast enough you can open connections to a LOT of machines in 60 seconds, creating a lot of sockets, each taking up ram.

In my case I had overridden the socket lifespan, setting it to 300s instead to ease repeat testing and long playbooks. This added even more memory pressure as the sockets were not expiring in a short time.

My case is absolutely not characteristic of most deployment architectures, but I did want to get the data out there for anybody who is working with an environment similar to mine.

-jlk

Yep, that’s fair.

It’s probably also a bit of a question as to what SSH versions and things are being used to in some cases.

If it turns out that we find ssh_alt is really really good, we might make this configureable once again as opposed to something that is auto-added for you.

Though it does seem it’s environment specific as to when it provides more or less advantages.