Ansible fails to connect to newly provisioned EC2 on 3rd task after successful running first 2 tasks

ddrake2012 · November 13, 2017, 6:16pm

I have a playbook that creates EC2 instances and adds them to an in memory group using the add_host module. I am then able to connect to the in memory group and perform two successful commands before a third fails.

I am seeing this problem just running the same file module to create directories. I have something like this in my main playbook (ec2hosts is the in-memory group creating after provisioning)

hosts: ec2hosts
user: ubuntu
gather_facts: false
name: try the setup
tasks:
name: Get EC2 facts
ec2_metadata_facts:
register: ec2_facts
name: import configure role
import_role:
name: configure
vars:
efs_ids: “{{ efs_id }}”

The configure role is very simple:

name: Make the aws credentials directory
file:
state: directory
path: ~/.aws
name: Make the hi directory
file:
state: directory
path: ~/.hi
name: Make a temp directory
file:
state: directory
path: ~/.temp
name: Make a bar directory
file:
state: directory
path: ~/.bar

And this fails at the Make a temp directory task. The failed output with -vvv looks like:

<35.160.185.188> (0, ‘’, “Warning: Permanently added ‘35.160.185.188’ (ECDSA) to the list of known hosts.\r\n”) <35.160.185.188> ESTABLISH SSH CONNECTION FOR USER: ubuntu <35.160.185.188> SSH: EXEC ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i AnsibleTest.pem -o ‘IdentityFile=“[omitted_full_path]/AnsibleTest.pem”’ -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ubuntu -o ConnectTimeout=10 -tt 35.160.185.188 ‘/bin/sh -c ‘"’"’/usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/file.py; rm -rf “/home/ubuntu/.ansible/tmp/ansible-tmp-1510596698.75-58373657425242/” > /dev/null 2>&1 && sleep 0’“'”‘’ <35.160.185.188> (255, ‘’, ‘ssh_exchange_identification: read: Connection reset by peer\r\n’) fatal: [35.160.185.188]: UNREACHABLE! => { “changed”: false, “msg”: “Failed to connect to the host via ssh: ssh_exchange_identification: read: Connection reset by peer\r\n”, “unreachable”: true }

I am using the following ssh_args in my ansible.cfg for the playbook:

ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i “AnsibleTest.pem”

Does anyone know what’s happening here? This seems pretty weird and I’m stuck.
Thanks!

ddrake2012 · November 16, 2017, 4:35pm

I eventually fixed this by adding a retries parameter to the [ssh_connection] section of my ansible.cfg so it looks like the below.

[ssh_connection]

ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
retries = 10

Pretty lame! If anyone finds a better solution please let me know…

sivel · November 16, 2017, 4:37pm

Perhaps the instance was not actually up and accessible yet. Did you use wait_for or wait_for_connection after creating the instance to wait and ensure they are up and accessible before moving on?

ddrake2012 · December 8, 2017, 7:42pm

Yes, I use the wait parameter in the ec2 module. And the weirdest thing is that two of the tasks work before the third fails, so the connection is up and working and then just stops working. I’ve seen this when uploading files as well with messages like “the sftp file transfer mechansi failed”, but the retries work and come to the rescue.

From where I sit, I pretty much don’t see a way to do any of this EC2 stuff without the retries and am surprised no one else has seen this.

Topic		Replies	Views
ssh failing for a newly created EC2 instances Ansible Project ubuntu	17	6	September 7, 2014
Playbook can't connect to EC2 instance, but SSH works fine? Ansible Project aws	1	24	October 27, 2016
ssh works for first few tasks then fails (on different tasks each attempt) Ansible Project ubuntu	2	11	November 24, 2014
Random SSH connection reset during EC2 provisionning Ansible Project ubuntu	2	21	April 14, 2015
Spin Up Jenkins on AWS instance Ansible Project aws	1	3	September 8, 2016

Ansible fails to connect to newly provisioned EC2 on 3rd task after successful running first 2 tasks

Related topics