local_action / delegate_to: 127.0.0.1 fail when they try to use ssh.

Hi,

I have a rare behavior reported earlier, and I want to share it in case a third person encounters it. This behavior appeared misteriously in previously working code, and it causes ansible to ssh into the localhost running the playbook. The error I got was “Authentication failed!”

I fixed it, and it behaves the same as the issue reported here: https://github.com/ansible/ansible/issues/3174 . This issue was closed because of non-reproducible case. I do not have any repo steps, but I have more detail.

Here is the task:

  • name: Generate public key from private key
    sudo: no
    delegate_to: 127.0.0.1
    command: ‘ssh-keygen -y -f …/config/ssh/{{sshkey}}’
    register: pubkeyresult
    changed_when: false

Here is the -vvvv from that task:

TASK: [boxprep | Generate public key from private key] ************************
<127.0.0.1> ESTABLISH CONNECTION FOR USER: azureuser
<127.0.0.1> EXEC [‘sshpass’, ‘-d12’, ‘ssh’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/home/xxx/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘StrictHostKeyChecking=no’, ‘-o’, ‘Port=22’, ‘-o’, ‘GSSAPIAuthentication=no’, ‘-o’, ‘PubkeyAuthentication=no’, ‘-o’, ‘User=prepuser’, ‘-o’, ‘ConnectTimeout=10’, ‘127.0.0.1’, “/bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-1385089465.95-27966096848090 && chmod a+rx $HOME/.ansible/tmp/ansible-1385089465.95-27966096848090 && echo $HOME/.ansible/tmp/ansible-1385089465.95-27966096848090’”]
fatal: [vgrid01] => Authentication failure.

It looks like its trying to ssh into 127.0.0.1 with a user account that is only on the target machine.

My inventory looks like this:

[boxprep]
vgrid01 ansible_ssh_host=myhost ansible_ssh_user=preconfiguser ansible_ssh_pass=myhostpassword

[hdpsolos]
vgrid01 ansible_ssh_host=myhost ansible_ssh_user=postconfiguser ansible_ssh_private_key=…/path/to/preconfiguration_private_key

After I found the original report, I grepped for “ansible_connection” and found it in some group_vars for [hdpsolos]. I commented out the line and delgate_to started working again:

TASK: [boxprep | Generate public key from private key] ************************
<127.0.0.1> EXEC [‘/bin/sh’, ‘-c’, ‘mkdir -p $HOME/.ansible/tmp/ansible-1385090296.49-167490580878593 && chmod a+rx $HOME/.ansible/tmp/ansible-1385090296.49-167490580878593 && echo $HOME/.ansible/tmp/ansible-1385090296.49-167490580878593’]
<127.0.0.1> REMOTE_MODULE command ssh-keygen -y -f …/config/ssh/preconfig.key
<127.0.0.1> PUT /tmp/tmpE2RcBw TO /home/xxx/.ansible/tmp/ansible-1385090296.49-167490580878593/command
<127.0.0.1> EXEC [‘/bin/sh’, ‘-c’, ‘/usr/bin/python /home/xxx/.ansible/tmp/ansible-1385090296.49-167490580878593/command; rm -rf /home/xxx/.ansible/tmp/ansible-1385090296.49-167490580878593/ >/dev/null 2>&1’]
ok: [vgrid01] => {“changed”: false, “cmd”: [“ssh-keygen”, “-y”, “-f”, “…/config/ssh/preconfig.key”], “delta”: “0:00:00.029258”, “end”: “2013-11-21 19:18:16.650485”, “item”: “”, “rc”: 0, “start”: “2013-11-21 19:18:16.621227”, “stderr”: “”, “stdout”: “ssh-rsa ideletedthepubkeyfromthisoutput”, “stdout_lines”: [“ssh-rsa ideletedthepubkeyfromthisoutput”]}

If I uncomment the “ansible_connection: ssh” line, the local_action starts failing again.

This code has worked for weeks and weeks. Now it failed. Further, the playbook running the task operated against [boxprep] but was affected by the setting in [hdpsolo]'s group_vars.

I don’t really need the ansible_connection: ssh value, it was a leftover from my inexperience with ansible. So I’m good to go, but maybe this will help someone in the future. I recommend greping your entire working directory to find the setting. I had forgotten ever using it.

-g

Delegate_to was told to use SSH. Use local_action like we suggested and you will be fine.

– Michael

local_action didn’t work either, but maybe for other reasons. I have to do more research on my local_action failure, as I’d like to follow best practice. I was using this commit " 2 improvements to delegate_to" (https://github.com/ansible/ansible/pull/1093) as my guide previously.

My confusion came from two things:

  • the ansible_connection setting was coming from a section of code I thought was out of scope for the affected playbook, so I didn’t look there right away.
  • the ansible_connection setting existed long before the the affected playbook was authored, so I didn’t go look at older code
  • the affected playbook worked without issue for weeks, but something unknown caused a variable from a different group to enter the scope of the affected playbook.

However, since my real goal was to share the negative case to save the next developer a few hours of debugging, I’ll state the negative case more precisely so that the goog will pick it up for them.

if using:

  • delegate_to: 127.0.0.1
  • delegate_to: localhost
    it can fail with an error:
  • Authentication failed!
    when either appear in your group variables or inventory:
  • ansible_connection: ssh
  • ansible_connection=ssh

The recommended fix is to either:

  • remove the ansible_connection setting
    or:
  • use local_action

Hopefully this negative learning on my part will save someone some time.

-g