Intermittent connection refused errors using rax module

Hello,

Searched around before posting this but I was unable to find anything useful.

Anyway I am using the rax module to provision a server in the Rackspace Cloud using pyrax etc. The problem is that SOMETIMES the play will fail with a connection refused error.:

PLAY [Wait for port 22 to be ready] *******************************************

TASK: [wait_for port=22 delay=20] *********************************************
fatal: [testing6] => {‘msg’: ‘FAILED: [Errno 111] Connection refused’, ‘failed’: True}

FATAL: all hosts have already failed – aborting

I tried adding the wait_for with a delay of 20 but maybe I am doing it wrong. Here is what my play looks like up to the point of failure:

I believe this email ended up being delayed a bit. Jimmy and I worked through this yesterday and changed the wait_for task to look like:

  • local_action: wait_for port=22 delay=20 hostname=“{{ ansible_ssh_host }}”

It hasn’t failed since I made the change so thank you! Slight correction though in case anyone else runs into this problem. hostname=“{{ ansible_ssh_host }}” is actually host=“{{ ansible_ssh_host }}”

I see a similar issue probably 1/10 times I run a playbook I’m developing.

I am using wait_for after changing sshd listening port

  • hosts: staging
    tasks:
  • name: Wait for ssh access on new port
    wait_for: delay=20 connect_timeout=30 host=“{{ ansible_ssh_host }}” port=“{{ ansible_ssh_port }}”

And here is the error I get (I’ve substituted the actual port with , and my username with ):

GATHERING FACTS ***************************************************************
<162.209.100.204> ESTABLISH CONNECTION FOR USER:
<162.209.100.204> REMOTE_MODULE setup
<162.209.100.204> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=“/Users/brian/.ansible/cp/ansible-ssh-%h-%p-%r” -o StrictHostKeyChecking=no -o Port= -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User= -o ConnectTimeout=300 162.209.100.204 /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1432149978.13-279657059089785 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1432149978.13-279657059089785 && echo $HOME/.ansible/tmp/ansible-tmp-1432149978.13-279657059089785’
fatal: [162.209.100.204] => SSH Error: ssh: connect to host 162.209.100.204 port : Connection refused
while connecting to 162.209.100.204:

This probably doesn’t have anything to do with the rax module. My playbook does the following:

  1. Provision rackspace servers with the rax module
  2. Apply a common role that as its last step reconfigures sshd to listen on a nonstandard port and only allow login by
  3. Adds hosts to a new in-memory group (staging) with updated ansible_ssh_user and ansible_ssh_port
  4. Runs the wait_for play above against these hosts

The error happens when it gathers facts for the wait_for play.

Any help would be much appreciated!