Anyone seen a "Resource temporarily unavailable error"?

Hi Ansible folks,

I get an “IOError: [Errno 11] Resource temporarily unavailable” error whenever I run a playbook with fork > 1. Just wondering if anyone has seen this problem before or has trouble-shooting advice.

Python 2.7.3 and Python 2.7.5
Ansible 1.1
Ubuntu 12.04 LTS

Console output below…

Cheers,
Kyle

PLAY [all] *********************

TASK: [Setup passwordless ssh from master to workers] *********************
11
Traceback (most recent call last):
File “./cli.py”, line 60, in
main(sys.argv)
File “./cli.py”, line 30, in main
cluster.Resize(num_instances)
File “/home/ubuntu/git/iwct/build/snap/cirrus/cluster/mapr.py”, line 90, in Resize
self.__AddWorkers(num_to_add)
File “/home/ubuntu/git/iwct/build/snap/cirrus/cluster/mapr.py”, line 505, in __AddWorkers
self.__ConfigureWorkers(new_worker_instances)
File “/home/ubuntu/git/iwct/build/snap/cirrus/cluster/mapr.py”, line 714, in __ConfigureWorkers
CHECK(util.RunPlaybookOnHosts(self.playbooks_path + ‘/worker.yml’, hostnames, self.ssh_key, extra_vars))
File “/home/ubuntu/git/iwct/build/snap/cirrus/util.py”, line 84, in RunPlaybookOnHosts
results = pb.run()
File “/home/ubuntu/git/iwct/build/snap/ansible/playbook/init.py”, line 222, in run
if not self._run_play(play):
File “/home/ubuntu/git/iwct/build/snap/ansible/playbook/init.py”, line 438, in _run_play
if not self._run_task(play, task, False):
File “/home/ubuntu/git/iwct/build/snap/ansible/playbook/init.py”, line 303, in _run_task
results = self._run_task_internal(task)
File “/home/ubuntu/git/iwct/build/snap/ansible/playbook/init.py”, line 277, in _run_task_internal
results = runner.run()
File “/home/ubuntu/git/iwct/build/snap/ansible/runner/init.py”, line 660, in run
results = self._parallel_exec(hosts)
File “/home/ubuntu/git/iwct/build/snap/ansible/runner/init.py”, line 573, in _parallel_exec
job_queue = manager.Queue()
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 667, in temp
token, exp = self._create(typeid, *args, **kwds)
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 565, in _create
conn = self._Client(self._address, authkey=self._authkey)
File “/usr/lib/python2.7/multiprocessing/connection.py”, line 175, in Client
answer_challenge(c, authkey)
File “/usr/lib/python2.7/multiprocessing/connection.py”, line 413, in answer_challenge
message = connection.recv_bytes(256) # reject large message
IOError: [Errno 11] Resource temporarily unavailable
*** Aborted at 1371088581 (unix time) try “date -d @1371088581” if you are using GNU date ***
PC: @ 0x7f3a63db6313 (unknown)
*** SIGTERM (@0x3e800005710) received by PID 22554 (TID 0x7f3a65406700) from PID 22288; stack trace: ***
@ 0x7f3a64fefcb0 (unknown)
@ 0x7f3a63db6313 (unknown)
@ 0x5560a1 (unknown)
@ 0x49890a (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x49f1c0 (unknown)
@ 0x4a8a92 (unknown)
@ 0x4e9f36 (unknown)
@ 0x499bc0 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x49f1c0 (unknown)
@ 0x4a8960 (unknown)
@ 0x4e9f36 (unknown)
@ 0x4ec11a (unknown)
@ 0x4e9f36 (unknown)
@ 0x4eb39e (unknown)
@ 0x4db6a6 (unknown)
@ 0x4e9f36 (unknown)
@ 0x49846a (unknown)
@ 0x498602 (unknown)
@ 0x49f1c0 (unknown)
@ 0x4983b8 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)
@ 0x498602 (unknown)

check dmesg, that is normal when you run out of OS resources (# of open files, buffers, ram, etc).

Hi Brian,

Thanks for the suggestion… I saw nothing posted to dmesg after I run the script that crashes. The script uses very little ram, and file and process/thread limits are set high enough… (ulimit -Hn → 64000, ulimit -Hu → 55457). Not sure what resource it ran out of…

Taking the error message at it’s word that it is “temporary”, I wrapped the call to the multiprocessing manager Queue constructor*** in a retry loop… The exception gets raised only on the first attempt, and always succeeds on the second attempt.

***File “/home/ubuntu/git/iwct/build/snap/ansible/runner/init.py”, line 573, in _parallel_exec
job_queue = manager.Queue()

before hack:

job_queue = manager.Queue()

after hack:

job_queue = None
while not job_queue:
try:
job_queue = manager.Queue()
except:
pass
print ‘error… will retry…’
time.sleep(2)

My script doesn’t crash now… I don’t understand the root cause of the problem. Has anyone else seen such an issue? I’ve seen a few reports of this error outside of ansible when using multiprocessing and sockets… Anyone else had this problem before?

-Kyle

An execeedingly large number of folks are running LTS and I’ve never seen this reported.

Seems like you are using the API instead of /usr/bin/ansible-* though, so not sure what may be going on.

Michael,

Thanks for the reply… I bet you are right… I’ve probably done something in my code before the ansible API gets called that causes the crash. I use the multiprocessing and paramiko modules in other parts of my code before calling the ansible API, perhaps there is an interaction.

PS: I’m converting a bunch of my custom python scripts for launching MapR hadoop clusters on EC2 to ansible playbooks… I’m very pleased with the results (much shorter and easier to maintain). Thanks for making such a well designed tool!

Cheers,
Kyle

I think that is a network error, not a system resource error. IE the resource in question is a network host or service.

Did anybody find any solution to this error. I am having the same issue not.

I am using Ansible 1.7.2 with Eucalyptus cloud.

msg: Instance creation failed => InternalFailure: Not enough resources: no cluster controller is currently available to run instances.

Thanks,
Sp

I am too getting this error and could not find a way to fix it …
I am using ubuntu 14.04 and using the python api to call the playbook.run()

This used to work when i used to invoke my code from apache. But now i am trying to call using a simple python command like “python consumer.py”
Dont know if i need to specify anything else ?

2015-01-31 00:26:04,933 - root - ERROR - Error in executing playbook[Errno 11] Resource temporarily unavailable
Traceback (most recent call last):
File “/opt/stack/venv/local/lib/python2.7/site-packages/attis-1.0.0a1-py2.7.egg/attis/engine/contentprocessor.py”, line 100, in runPlaybook
pb.run()
File “/opt/stack/venv/local/lib/python2.7/site-packages/ansible-1.8.2-py2.7.egg/ansible/playbook/init.py”, line 347, in run
if not self._run_play(play):
File “/opt/stack/venv/local/lib/python2.7/site-packages/ansible-1.8.2-py2.7.egg/ansible/playbook/init.py”, line 674, in _run_play
self._do_setup_step(play)
File “/opt/stack/venv/local/lib/python2.7/site-packages/ansible-1.8.2-py2.7.egg/ansible/playbook/init.py”, line 619, in _do_setup_step
accelerate_port=play.accelerate_port,
File “/opt/stack/venv/local/lib/python2.7/site-packages/ansible-1.8.2-py2.7.egg/ansible/runner/init.py”, line 1458, in run
results = self._parallel_exec(hosts)
File “/opt/stack/venv/local/lib/python2.7/site-packages/ansible-1.8.2-py2.7.egg/ansible/runner/init.py”, line 1349, in _parallel_exec
job_queue = manager.Queue()
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 667, in temp
token, exp = self._create(typeid, *args, **kwds)
File “/usr/lib/python2.7/multiprocessing/managers.py”, line 565, in _create
conn = self._Client(self._address, authkey=self._authkey)
File “/usr/lib/python2.7/multiprocessing/connection.py”, line 175, in Client
answer_challenge(c, authkey)
File “/usr/lib/python2.7/multiprocessing/connection.py”, line 428, in answer_challenge
message = connection.recv_bytes(256) # reject large message
IOError: [Errno 11] Resource temporarily unavailable

OK, so i did a couple of tests. I ran the ansible api call from a regular python file like this

python rough.py

What rough.py does is simply call the python api. This works perfectly fine.
Now why it was not working is because i has a oslo-messaging listener and then i passed the execution to the python api. Some how these 2 dont work together.

i dont know if this is related or not
http://stackoverflow.com/questions/14736766/why-does-gevent-socket-break-multiprocessing-connections-auth

Question is is there any work around ?