Strange problem running Ansible against a CentOS 7 box

Hi all,

I just installed CentOS 7 on a new machine, and ‘yum update’-ed it to pick up the latest packages. Here’s the output of ‘uname -a’ and ‘/etc/redhat-release’:

[root@problem-svr ~]# uname -a
Linux problem-svr.mycompany.com 3.10.0-123.6.3.el7.x86_64 #1 SMP Wed Aug 6 21:12:36 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@problem-svr ~]# cat /etc/redhat-release
CentOS Linux release 7.0.1406 (Core)

When I try to do anything with Ansible (v1.7.1 running on Ubuntu 12.04.5) against this box, it just hangs (even ‘-m ping’) When I throw the ‘-vvvv’ on the run, here’s what I see:

will@wdennis-p390:~/ansible-stuff$ ansible -vvvv problem-svr -u root -k -i test -m setup
SSH password:
ESTABLISH CONNECTION FOR USER: root
REMOTE_MODULE setup
EXEC [‘sshpass’, ‘-d6’, ‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/home/will/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘GSSAPIAuthentication=no’, ‘-o’, ‘PubkeyAuthentication=no’, ‘-o’, ‘User=root’, ‘-o’, ‘ConnectTimeout=10’, ‘problem-svr-new’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1409846776.31-88290040276656 && echo $HOME/.ansible/tmp/ansible-tmp-1409846776.31-88290040276656’”]

I do see a SSH session initiated on the host:

[root@problem-svr ~]# ss -4 -t
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 208 192.168.180.22:ssh 192.168.180.50:63172
[root@problem-svr ~]# ss -4 -t
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.180.22:ssh 192.168.180.53:42717 <— Ansible session
ESTAB 0 0 192.168.180.22:ssh 192.168.180.50:63172

But then, the session just times out and finally drops:

State Recv-Q Send-Q Local Address:Port Peer Address:Port
FIN-WAIT-2 0 0 192.168.180.22:ssh 192.168.180.53:42717
ESTAB 0 208 192.168.180.22:ssh 192.168.180.50:63172
[root@problem-svr ~]# ss -4 -t
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 208 192.168.180.22:ssh 192.168.180.50:63172

Meanwhile, the Ansible process on the control machine keeps trying (i.e., does not die when the session ends) and eventually, I kill it with a Ctrl-C.

I did already try setting SELinux to “disabled” on the CentOS 7 box, and turning off the ‘firewalld’ service (does not seem to make a difference.)

I do have another CentOS 7 box that I can successfully run Ansible against, so I think it’s just something strange on the target CentOS 7 box… How can I further debug this?

Thanks,
Will

Huh weird - I've started porting some of our centos6 play books over
to centos7 and didn't have
any trouble (OSX client, pure ssh transport) but that was using SSH
pubkey auth.

Maybe there's something up with the way centos7 does password auth?

I'm guessing you can ssh straight in as the ansible user with the same pass etc?
(If not, fix that first :slight_smile: )

If so I'd check /var/log/secure and see if there are any differences
in how sshd is
seeing the sessions of the ansible connection vs. your vanilla ssh client.

I can indeed SSH straight in (using ‘root’ with password.)

I made sure “PermitRootLogin” was explicitly set to ‘yes’ in sshd_config, restarted sshd, and tried again. The Ansible command still hangs, and no messages in /var/log/secure, other than when I kill the Ansible process, it reports “Connection closed”:

Sep 4 14:06:34 problem-svr sshd[1457]: Received signal 15; terminating.
Sep 4 14:06:34 problem-svr sshd[17358]: Server listening on 0.0.0.0 port 22.
Sep 4 14:06:34 problem-svr sshd[17358]: Server listening on :: port 22.
Sep 4 14:07:16 problem-svr sshd[17360]: Connection closed by 192.168.180.53 [preauth]

Very strange & frustrating…

Are there any differences between the centos 7 box that works and the new one? I would be looking at a recent package updates.

Just a wild guess - can you try running ansible-playbook with ANSIBLE_HOST_KEY_CHECKING=False ?

Wild guess was CORRECT - the runs work now.

So, what could have changed on this box that “export ANSIBLE_HOST_KEY_CHECKING=False” would have fixed? (not a SSH guru here… pls educate me)

Thanks,
Will

Here’s what I found: http://docs.ansible.com/intro_getting_started.html#host-key-checking

CentOS 7 box is probably fine and this is local issue with stored SSH keys. When you connecting via ssh directly - does it asks you anything (f.e. about mismatching keys) ?
If yes - it should provide the offending line # in .ssh/known_hosts. Try to remove this line, then ssh directly to the host to reacquire host key, and then try to run ansible-playbook.

You can also run these commands instead of editing known_hosts file manually (as the same user you run ansible from):
ssh-keygen -f ‘~/.ssh/known_hosts’ -R <centos7_box_ip>
ssh-keygen -f ‘~/.ssh/known_hosts’ -R <centos7_box_hostname>

Yes, I had to reinstall this machine, and did fix the hostname entry in known_hosts, but did not fix the IP address entry. Good to know the cause and the fix, thanks!

Will

CentOS 7 is, FWIW, in our QA matrix and we haven’t seen problems here.

It seems like John is suggesting above that host key checking disabling fixes something for you, but I’m also a bit unclear, as the system will prompt you (ansible will ask you questions) when it is turned on.

It seems like you may have not been seeing the prompts?

Can you clarify a bit perhaps?

Thanks!

Hi Michael,

Yes, when I initially reinstalled CentOS on this host, and tried SSH-ing in to it from my Ansible workstation, I did get the typical error about the fact that the host key had changed, and SSH refused to connect. So when I edited the ~/.ssh/known_hosts file, I just took out the line for the hostname key, and not the one for that host’s IP address. I could then SSH to the host successfully, but there was a warning about the IP key, which I ignored because a) I knew what the cause was, and b) it didn’t terminate the SSH session. I didn’t know that Ansible would refuse to complete the SSH connection (looked like a hang to me - no error was thrown during the Ansible run, even with -vvvv) if one of the two known_hosts keys were wrong. Of course, turning Ansible’s host key checking off “fixed” the problem, whereupon running Ansible against this specific host then worked.

So in the end, a win for me - learned something about SSH and Ansible today :slight_smile:

Thanks,

Will