Getting an error...

I’m getting:

e/meggleston/.ansible/tmp/ansible-tmp-1670263852.31-8085-175291763336523/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql602.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql602.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/f012ac57b9 pa2udtlhsql602.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263852.91-8148-50804275661258/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql1023.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql604.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql1023.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/754f3010c5 pa2udtlhsql1023.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263852.31-8085-175291763336523/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql604.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/09c53a2792 pa2udtlhsql604.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263853.52-8164-79599240649234/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql1020.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql1020.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/3301cea578 pa2udtlhsql1020.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263852.15-8057-21113899783559/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql602.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql602.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/f012ac57b9 pa2udtlhsql602.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263852.91-8148-50804275661258/ > /dev/null 2>&1 && sleep 0'"'"''
<pa2udtlhsql1022.prod.harmony.aws2> ESTABLISH SSH CONNECTION FOR USER: None
<pa2udtlhsql1022.prod.harmony.aws2> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/a501b68168 pa2udtlhsql1022.prod.harmony.aws2 '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670263852.07-8072-136961495388876/ > /dev/null 2>&1 && sleep 0'"'"''
ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory
the full traceback was:

Traceback (most recent call last):
File "/usr/bin/ansible-playbook", line 123, in <module>
   exit_code = cli.run()
File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 128, in run
   results = pbex.run()
File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 169, in run
   result = self._tqm.run(play=play)
File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 282, in run
   play_return = strategy.run(iterator, play_context)
File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 311, in run
   self._queue_task(host, task, task_vars, play_context)
File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 390, in _queue_task
   worker_prc.start()
File "/usr/lib/python2.7/site-packages/ansible/executor/process/worker.py", line 100, in start
   return super(WorkerProcess, self).start()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
   self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
   self.pid = os.fork()

When I run a stupid playbook with the command: anssible-playbook -vvv 1.yml

for the playbook:

How many hosts are in your inventory?

5709

Have you changed any defaults for “strategy” or “forks”?

Also I see your ssh is config’d for “-o ControlMaster=auto -o ControlPersist=60s”. I’m not sure how many hosts you’re caching connections for during any give 60 second window, or how much memory that would eat, but it may be a significant factor.

I’ve not changed “strategy”, but I did change “forks” from 5 to 50.
I have copied /etc/ansible.cfg to ~/.ansible.cfg and changed forks = 50, inventory = $HOME/src/ansible/inventory and log_path = /tmp/${USER}_ansible.log.

I’m not particularly fluent in instrumenting resource consumption, but I’m going out on a limb and guessing that 50 or so ssh connections is a lot more light-weight than 50 forks of ansible-playbook. So ignoring ssh as a possible resource limit for the moment, try changing forks back to 5 and running your playbook. At the same time, in another window, monitor (in a way to be determined by you) resource consumption. I’d expect it to work 5 forks, just not as fast as with more forks.

If it does work, then try it again with, say, 10 forks and compare resources during that run to the 5 fork run. I expect this to also work, barely, and that you’ll be almost out of … something. But you’ll also have a much better picture of where the walls are in this black box.

I changed forks = back to 5 (commented out my change) and still I get the out of memory error. I removed all hosts that are in AWS so I’m not using the proxy in ssh(1). My inventory is down to 4400 hosts. I wonder what’s eating the memory…? Any ideas….?

Current stack trace:
82"]}, "sectors": "7812935680", "start": "2048", "holders": , "size": "3.64 TB"}}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": "PERC H730P Mini", "wwn": "0x61866da06192eb0024e6a07712d7ee30", "holders": , "size": "3.64 TB"}, "dm-4": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "104857600", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-optvol", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52yneJbOfXRJ6QddDY581MzfUj6Ai4MOtle8"], "uuids": ["51166620-cd67-4954-8b5f-cf91926b036d"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "50.00 GB"}, "dm-5": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "62914560", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-tmpvol", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52ynDjjkCPQW51kaWpzKqwJkcPy2qbRW0Fxm"], "uuids": ["38f0cd51-d7e1-4bca-a062-1b39ede2fed2"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "30.00 GB"}, "dm-2": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "125829120", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-varvol", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52ynRn8k3kCCl3ICeXjPbpYKBa1d9B7s2bhs"], "uuids": ["5cb9ffc3-fd77-4906-98d7-edb27aa63f40"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "60.00 GB"}, "dm-3": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "8388608", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-homevol", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52ynVWpzyMe28x7F4igthHHVgvTM2K8ZI08R"], "uuids": ["8627acb7-4c2b-4394-95d0-6a084066a23a"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "4.00 GB"}, "dm-0": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "8388608", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-lv_swap", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52ynqqkyDshARxIJhfyP1hRtk5SrMN3BK79c"], "uuids": ["799e361d-7fee-4f6b-ae45-75d75f518985"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "4.00 GB"}, "dm-1": {"scheduler_mode": "", "rotational": "1", "vendor": null, "sectors": "41943040", "links": {"masters": , "labels": , "ids": ["dm-name-rootvg-rootvol", "dm-uuid-LVM-h8Zoe5OZiBjf9awu8HyY4OuQIIZK52ynPxpNeSPUUNGqorDA6GDRwK4jFcd9IzuW"], "uuids": ["798d9b72-8fc1-475f-92f0-0fad71bd3e5a"]}, "sas_device_handle": null, "sas_address": null, "virtual": 1, "host": "", "sectorsize": "512", "removable": "0", "support_discard": "0", "model": null, "partitions": {}, "holders": , "size": "20.00 GB"}}, "ansible_user_uid": 2101067335, "ansible_ssh_host_key_dsa_public": "AAAAB3NzaC1kc3MAAACBAPPCrb44cIbwiG15T60D7doNgsOgwLOW4N76U3gvkeiJUafrLqGexH0XMMEwRhFnGGxckQGhgQE3O2ZKmlgTAYFG+qaCDBjHPGBIxKE9PcMO+enFTUYKHd4KY+xid9f3J4QnpauJZXoB4Et2GGwE0Q8fBJB7bLevybjAgAbMfM51AAAAFQCFf6SYNVwXyG0c1RYjCzeaLMB22wAAAIBm8je+yytTJ7DigfHYoleH4LrWKD0g0PeSBFVKG0snNlorhBtCGa5QIKwgR9OE+BNXQddwcqHf1jwmn54wcROWicNbdJFdIrDHHSnbzBm2tOkiNqovTLx92676L45uOZlBzNHi/bqOSzbSem9Piukn6pDu2XsfLmXfd4wz1Z3XagAAAIEA4B7lnz4xWwgjZCnX2oXiOPOOkVH2Xo7MG3YibLr8DnuK1L8n3m/pkX3WhAqrfw87OHECkCE3Kg4EPnXwW9FfNLR4YQnJBXWCU5IJ5M+HSOE5IDSTyNlj3HEs3SaGC0EU8APei7SvRc4k+TlonHu3m1XeKsB6yCNYZdtGhm5q4Ps=", "ansible_bios_date": "11/26/2019", "ansible_system_capabilities": [""]}}\r\n', 'Connection to pc1udtlhhad561.prodc1.harmony.global closed.\r\n')
<pc1udtlhhad561.prodc1.harmony.global> ESTABLISH SSH CONNECTION FOR USER: None
<pc1udtlhhad561.prodc1.harmony.global> SSH: EXEC ssh -C -o ControlMaster=no -o ControlPersist=30s -o ConnectTimeout=15s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/meggleston/.ansible/cp/e963c4e7ae pc1udtlhhad561.prodc1.harmony.global '/bin/sh -c '"'"'rm -f -r /home/meggleston/.ansible/tmp/ansible-tmp-1670598305.51-22544-115269321697104/ > /dev/null 2>&1 && sleep 0'"'"''
<pc1udtlhhad561.prodc1.harmony.global> (0, '', '')
ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory
the full traceback was:

Traceback (most recent call last):
  File "/usr/bin/ansible-playbook", line 123, in <module>
    exit_code = cli.run()
  File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 128, in run
    results = pbex.run()
  File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 169, in run
    result = self._tqm.run(play=play)
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 282, in run
    play_return = strategy.run(iterator, play_context)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 311, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 390, in _queue_task
    worker_prc.start()
  File "/usr/lib/python2.7/site-packages/ansible/executor/process/worker.py", line 100, in start
    return super(WorkerProcess, self).start()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Does Ansible have a memory leak (that only shows up with a high number of hosts)?

Mike

[meggleston@pc1uepsiadm01 ~]$ rpm -qa | grep -i ansible
ansible-2.9.27-1.el7.noarch

I’ve set “forks = 1” and rerun my test which does an ansible-playbook and an uptime.
Could it be the register rather than a memory leak?
Here free(1) keeps going down:

[meggleston@pc1uepsiadm01 ~]$ while true
> do
> date
> free -m
> sleep 60
> done
Fri Dec 9 16:30:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1139 6190 18 490 6408
Swap: 4095 2897 1198
Fri Dec 9 16:31:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1149 6127 19 544 6398
Swap: 4095 2896 1199
Fri Dec 9 16:32:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1163 6112 19 544 6383
Swap: 4095 2896 1199
Fri Dec 9 16:33:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1173 6100 19 546 6373
Swap: 4095 2896 1199
Fri Dec 9 16:34:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1184 6081 19 554 6362
Swap: 4095 2895 1200
Fri Dec 9 16:35:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1215 6042 20 562 6329
Swap: 4095 2885 1210
Fri Dec 9 16:36:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1225 6032 20 563 6319
Swap: 4095 2884 1211
Fri Dec 9 16:37:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1229 6028 20 563 6315
Swap: 4095 2880 1215
Fri Dec 9 16:38:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1257 5999 20 563 6287
Swap: 4095 2861 1234
Fri Dec 9 16:39:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1286 5971 21 563 6259
Swap: 4095 2860 1235
Fri Dec 9 16:40:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1295 5961 21 563 6249
Swap: 4095 2860 1235

The playbook is really stupid:

[meggleston@pc1uepsiadm01 playbooks]$ cat y.yml
# $Id$
# $Log$

# set cyberark to a known point

# :!ansible-playbook --syntax-check %
# :!ansible-playbook --check --limit pc1uepsiadm01.res.prod.global %
# :!ansible-playbook --limit pc1uepsiadm01.res.prod.global %
# :!ansible-playbook %

I’d try two things if I were on your shoes:

  • Increase memory on the ansible controller
  • Update python to a more up-to-date version
    Regards,

I don’t think it’s the “register”. I removed the “register” (and changed the value of “forks =“ by commenting it out so things don’t take so long) and the free memory still goes down:

while true; do date; free -m; sleep 60; done
Fri Dec 9 16:48:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1381 5864 21 575 6163
Swap: 4095 2858 1237
Fri Dec 9 16:49:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1380 5864 21 575 6164
Swap: 4095 2858 1237
Fri Dec 9 16:50:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 658 6586 21 576 6886
Swap: 4095 2858 1237
Fri Dec 9 16:51:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 874 6369 21 577 6670
Swap: 4095 2857 1238
Fri Dec 9 16:52:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1246 5997 21 576 6298
Swap: 4095 2857 1238
Fri Dec 9 16:53:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1247 5996 21 577 6297
Swap: 4095 2856 1239
Fri Dec 9 16:54:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1304 5939 21 577 6240
Swap: 4095 2856 1239
Fri Dec 9 16:55:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1353 5889 21 577 6190
Swap: 4095 2855 1240
Fri Dec 9 16:56:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1414 5828 21 577 6129
Swap: 4095 2855 1240
Fri Dec 9 16:57:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 1499 5743 21 578 6044
Swap: 4095 2855 1240
Fri Dec 9 16:58:31 EST 2022
              total used free shared buff/cache available
Mem: 7821 2005 5237 21 578 5538
Swap: 4095 2855 1240

My playbook is now:

[meggleston@pc1uepsiadm01 playbooks]$ cat y2.yml
# $Id$
# $Log$

# set cyberark to a known point

# :!ansible-playbook --syntax-check %
# :!ansible-playbook --check --limit pc1uepsiadm01.res.prod.global %
# :!ansible-playbook --limit pc1uepsiadm01.res.prod.global %
# :!ansible-playbook %

I would if I could. The Ansible controller should easily get by with 6G free and I can only update to what’s in the company’s update pipeline.

Mike

My current hypothesis is that “gather_facts: true” is causing the issue with 4400 (4403) hosts.

AFAIK, Ansible does not have memory leaks, it does have a 'ballooning'
issue with inventory and/or facts, this can be mitigated in some ways:
  - targeting smaller inventories
   - using fact caching
   - not gathering facts/not registering when not needed
   - turning fact injection off
(https://docs.ansible.com/ansible/latest/reference_appendices/config.html#inject-facts-as-vars).