I ran into an issue where, at the end of GATHERING FACTS, ansible-playbook just sits there. I’d added -v’s until I had -vvvv and that didn’t tell me anything, until i got distracted and left it running for several minutes. Suddenly it sprang back to life. Scrolling back, it was hung on:
fatal: [nightfury] => SSH Error: Shared connection to nightfury closed.
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.
I’ll have to look at that host and figure out what the “fatal” problem is. But why is ansible just sitting and waiting and waiting and waiting, rather than just giving up, like it does when it realizes a given host is down?
This is odd. I can ssh into the host in question, no problem. In the logs, I get:
May 7 13:08:39 nightfury sshd[565]: Accepted publickey for joliver.sa from 2001:480:10:92::60 port 60164 ssh2
May 7 13:08:39 nightfury sshd: joliver.sa [priv][565]: USER_PROCESS: 567 ttys001
May 7 13:08:39 nightfury sshd: joliver.sa [priv][565]: DEAD_PROCESS: 567 ttys001
May 7 13:08:40 nightfury sshd[567]: subsystem request for sftp by user joliver.sa
May 7 13:08:40 nightfury sshd: joliver.sa [priv][565]: USER_PROCESS: 567 ttys001
May 7 13:08:40 nightfury sudo[574]: joliver.sa : TTY=ttys001 ; PWD=/Users/joliver.sa ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-xlqldeyuinxfaszjnflgmawhevvidhwo; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /Users/joliver.sa/.ansible/tmp/ansible-tmp-1431029307.51-246763134788920/setup; rm -rf /Users/joliver.sa/.ansible/tmp/ansible-tmp-1431029307.51-246763134788920/ >/dev/null 2>&1
May 7 13:08:40 nightfury ansible-setup[576]: Invoked with filter=* fact_path=/opt/local/etc/ansible/facts.d
That’s exactly what I see on another machine where ansible is working. Except this one, after ten minutes, will report:
May 7 13:18:44 nightfury sshd[486]: Timeout, client not responding.
I’m looking at .ansible/tmp/ansible-tmp-1431029307.51-246763134788920/setup but I don’t know python, and there’s a lot of it in there.
This is probably going to turn out to be some weird problem on this computer. But what is ansible trying to do here other than a straight SSH connection, and why is it willing to wait for ten minutes? This seems like bad event handling, probably for some odd edge case.
it sounds like the ansible process is dieing but the sockets are not
being closed, can you check dmesg to see if there are any resource
issues?