Reliable way to detect when EC2 instance is ready for login

When I provision an EC2 instance, I add a user_data script which drops an SSH pubkey so I can login as root. The problem is that it’s difficult to tell exactly when cloud-init has been completed. Even if port 22 is accepting connections, the pubkey may not be ready yet and thus SSH logins will fail. I use the following task to try to determine when the instance will accept my connection:

`

  • name: ensure instance is ready

become: no

raw: printf “success”

register: result

until: ‘“success” in result.stdout_lines’

retries: 300

delay: 1

failed_when: false

`

This task works maybe 75% of the time. A small fraction of the time, I get a FATAL UNREACHABLE error, but if I rerun the playbook immediately after, it works fine. Since this is a FATAL error, it doesn’t appear that there is any way to retry it.

Prior to using this technique, I used the ‘command’ module to call out to SSH directly which was more reliable because I could do retries, but I have an additional requirement in that I can change the user I’m connecting as using set_fact OR ‘-u’ and this didn’t seem to work with the command module.

Are there any other good patterns to detect when an EC2 instance has completed the entire boot process?

I use the command module locally, to validate that I can actually login using ssh.