I have a pretty simple question. What is the expected memory usage for a simple playbook that performs a yum install or an rpm and pushes a config file?
Ansible Playbook:
I could really be anything! We have tried just a ping!
We see that if run any playbook that we eventually consume all of the systems memory and oom-kill kicks in and kicks the ansible-playbook running.
Is this normal behavior? Is there something simple in tuning that I am missing?
The tower guide suggests 4gb per 100Fork.
I read through the scaling guide, and it didn’t mention anything for memory tuning.
Any thoughts or criticisms would be GREATLY appreciated. I love me some ansible and I want to start handing off large jobs to our operations team soon!
Thanks in advance!
I am sadly not a Tower user, yet. It feels like a memory leak, and I have installed ansible both on RHEL6.6 via rpm and 7.1 via pip with the same results. This is interesting only because I am using default Python versions 2.6 and 2.7 respectively.
When I get back to work tomorrow I will report my Python and ansible versions more specifically.
I have played with the ansible python module seeing the same results, so I started presenting the inventory in chunks of 10 servers at a time. Knowing that I can break it in Python code is good news. I just have to learn how to trace memory allocation in Python. Learning anything in Python is always fun work for me.
In my experience, OOM’s happen when the inventory is large, especially when having lots of variables in the inventory, combined with lots (several hundreds) of hosts.
Not very scientific, but to get a rough idea on the size of your inventory, could you show the output of
$ time (ansible all -m debug -a var=hostvars |wc -l ; ansible all --list-hosts |wc -l)
19788298
1046
real 2m3.222s
user 6m54.500s
sys 0m7.772s
Mine would yield 19.788.298 hostvars lines for 1046 hosts.
That used to be a lot more, before I made some optimisations on a dynamic inventory script that calculated a bunch of variables:
$ ansible all -m debug -a var=hostvars |wc -l ; ansible all --list-hosts |wc -l
95924259
1037
(that last job took 17 minutes to run, on an i7 quadcore laptop )
Yes, this appears to be similar to the bug SVG pointed out, which I’ve tracked down to being related to the way python queues use pickle to serialize dictionary data (the resulting size of the data can be 200x than it was in the dictionary before). I’m currently working on a solution to this, and hope to include it in the next beta round.