Reliably wait_for SSH connection to go up

I also tried this one - doesn’t work.

  • name: waiting for machine to come back
    local_action: wait_for host={{ ansible_ssh_host }} port={{ ansible_ssh_port }} delay=30 timeout=180

#- pause: seconds=5

  • command: echo “X”
    register: status
    ignore_errors: True
    until: status != “X”
    retries: 5
    delay: 10

Same {‘msg’: ‘FAILED: Error reading SSH protocol banner’, ‘failed’: True}

Sounds like the port was open but the SSH agent wasn’t ready to receive traffic yet.

I’d suggest inserting a small call to “pause: seconds=5” after the wait_for and see if that resolves the issue.

It usually does.

Port is open, but SSH agent is not ready to receive traffic.

pause: seconds=5 command already was there, doesn't help.
increasing to 30 sometimes helps and sometimes not - depending
on the load on virtual machine. I hoped that there is a way to set
flexible wait time from 30 seconds up to 15 minutes. Setting pause
to 15 minutes will annoy humans and to 30 seconds will likely to
break automation at some point.

I’m having exactly this problem today. The script already takes 2 minutes to run so I am loathe to add more time to a wait, and at some point that will be insufficient again.

It would be nice to have a wait that specifically checks to make sure a ssh connection works instead of depending on a port check only.

-scott

Since wait_for isn’t work for you, might I suggest as a hacky workaround utilizing a do-until loop?

docs: http://docs.ansible.com/playbooks_loops.html#do-until-loops

Basically you would define a local action that would attempt to ssh to the inventory_hostname and run echo foo. If it returns successfully it should mean that ssh is fully ready. It will keep trying until it gets a return code that indicates a success or fails 5 times with a 10 second interval.

Here’s how it would look:

  • hosts: 10.42.0.6
    gather_facts: no
    tasks:
  • local_action: command ssh vagrant@{{ inventory_hostname }} echo foo
    register: result
    until: result.rc == 0
    retries: 20
    delay: 10

I really don’t like the idea of executing SSH inside of ansible directly because it wouldn’t work with any of the --ask-pass options. (But should be fine with ssh-agent).

Anyway, haven’t seen this myself. Having SSH not functional after 30 seconds of the port being open seems quite odd to me to be honest.