Ansible hanging for long shell commands on OSX

Hi,

I’m experience Ansible hanging on long-running shell tasks with Ansible 1.3 (connecting with sudo and ssh). The tasks appear to be complete if I ssh in and ps aux but the playbook never gets to the next task. Strangely, this problem only happens when running the playbook from OSX to Ubuntu 12.04; it appears to work fine running from Ubuntu 12.04 to Ubuntu 12.04. The type of tasks I’m running are:

  • name: install llvm
    shell: cd /sense;tar -xvf llvm-3.2.src.tar.gz; cd llvm-3.2.src;./configure --enable-optimized;REQUIRES_RTTI=1 make;make install creates=/usr/local/lib/libllvm.a

This is obviously one of those annoying to debug issues, since it’s not entirely reproducible. But perhaps this issue has cropped up for others?

Thanks,
Tristan

I haven’t seen any signs of any OSX issues.

Are you sure your command is not going interactive or something in your Makefile is not daemonizing properly?

Suggest possibly feeding it < /dev/null BTW

Couple things to look into.

First is check to see if your ssh client configs are keeping the connection alive. Look into ServerAliveInterval and ServerAliveCountMax.

I’d also highly suggest using Ansible’s async option. http://www.ansibleworks.com/docs/playbooks2.html#asynchronous-actions-and-polling

Oops, just re-read it. Looks like it’s hanging not disconnecting. My suggestions may not be helpful. Sorry.

Thanks, I will try the < /dev/null trick and investigate further. Builds are slow so I can’t quickly try. I actually upgraded to latest git version, in case my Linux and OSX boxes where out of sync on 1.3. I now get an unrelated error about too many open files. I assume this is a bug that was recently introduced?

TASK: [generate locale] *******************************************************
fatal: [ec2-107-22-11-109.compute-1.amazonaws.com] => Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 368, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 455, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 622, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/action_plugins/normal.py”, line 54, in run
return self.runner._execute_module(conn, tmp, module_name, module_args, inject=inject, complex_args=complex_args)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 286, in _execute_module
(remote_module_path, module_style, shebang) = self._copy_module(conn, tmp, module_name, args, inject, complex_args)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 813, in _copy_module
self._transfer_str(conn, tmp, module_name, module_data)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/init.py”, line 252, in _transfer_str
conn.put_file(afile, remote)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ansible/runner/connection_plugins/ssh.py”, line 244, in put_file
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py”, line 672, in init
errread, errwrite)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py”, line 1102, in _execute_child

errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files

Too many open files is probably you just needing to increase the ulimit.

I don’t think there’s a bug.

Not sure of your environmental specifics when you are running this.

This cropped up immediately after upgrading ansible, with no other changes, so it feels as if some close isn’t happening as before. Perhaps there are just more opens though in new code path. I will increase my limits (although ideally this wouldn’t be necessary for basic ansible usable). If it’s helpful, I’m on OSX 10.7, ansible 1.3 HEAD, ssh connection, using executable inventory file that returns JSON on --list and --host.

Tristan

bisect!

Let me know if you manage to track that down with bisect.

as suggested from the closing of https://github.com/ansible/ansible/issues/3877 – I’m seeing a similar symptom on my osx 10.7 install, my playbooks take about ~20mins before it fails with the “too open many files error”, repeated runs give me the same error.

doing “ulimit -n 2048” on my mac works around the problem. I’m going to see if i can run a bisect in the next week or so to find out where this first started happening.

Ok, yeah, so we need a lot more info…

How many hosts do you have?
What Ansible version?

What tasks?

etc

Help us with how to reproduce this.

Thanks!

Just some info on the current system FYI…

  • it fails on 4 hosts (also just fails on my vagrant test system, which consists of one host)
  • it’s a recent version of ansible – as of commit 5847720746de4109f802a8337b4e1a581719ea9b
  • there’s about ~110 steps in total, it seems to fail at different points on my two macs

ansible seems to open up either lots of file handles or processes, i’ve noticed when im running the playbooks which causes the crash I am unable to log into the host that launched the playbook and I get a resource not available error. again, i’ll try and run git bisect when i get a chance during the week or next week.

Thanks,
Jimmy