We’re trying to implement a system where we can power environments on and off AWS when they’re not in use. However the ec2 inventory module excludes instances that are not in a running state. It seems like adding an option to the ec2 module to include stopped instances would work, but then I guess ansible would need a corresponding option to call the module with to include the stopped instances. Which seems a it hacky…
Maybe ansible needs a notion of host state? Any thoughts?
Full disclosure: Michael believes all inventory should be done via inventory scripts; I respectfully disagree. I find ec2.py to be very slow (20 seconds to refresh the cache with a small number of instances, for example) and prefer querying inventory directly in the script itself for many use cases.
That’s interesting, your module is the same as ec2_facts just with filtering. And the ec2_facts module says it may add filtering in the notes. I think I’d agree with Michael’s pov, but it looks like we’ve already gone down facts being outside the inventory module, so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state …
Actually, it’s not the same as ec2_facts other than it returns facts about an instance.
ec2_facts only works when run on an actual AWS instance (it calls the Amazon ec2 metadata servers) and it only retrieves the facts for that instance alone.
ec2_instance_facts, on the other hand, can retrieve multiple instance facts at once from anywhere (I use it in a local action). It’s more like ec2.py run for specific instances from within a playbook.
"so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state … "
Anything applying to more than one host definitely shouldn’t be done by the facts module.
So, I’m curious, for the case where you want to start “stopped” EC2 instances, what’s the current recommended approach?
I’ve kind of ignored this task for now, managing that by hand (it’s just our dev env, but it’s still a couple of dozen instances at least). I’m almost about to pull Scott’s branch in locally since it looks so much better than manual management.
Here’s an example in case you do use ec2_instance_facts. This example creates maintenance instances for updating AMIs.
Notes:
This is part of a set of scripts that will create an entire load balanced application environment (including DNS, VPC, centralized logging, and RDS) in a bare AWS account in about 20-30 minutes.
app_environment is dev, test, stage, or prod. The scripts will create the same setup in each environment with some differences such as RDS size, domain name, and so forth.
I use a naming convention for AWS resources of ‘---’, eg. foo-stage-ec2-logging or foo-prod-ami-web.
The base image is created from a standard Ubuntu LTS instance. Then, packages common to all
of the images (eg. security, ansible, boto, etc.) are installed and configured.
There’s a separate pull request (also rejected, hi Michael… for the ec2_ami_facts module.
I keep all of my new/modified modules in a library directory under where my play books are. Ansible will find the libraries there and use them over the ones in the Ansible install.
Using local ./library content is fine, but please don’t run a fork with extra packages added if you are going to ask questions about them – or at least identify that you are when you do.
It can make Q&A very confusing when people ask about things that aren’t merged.
Just as a side-note, I was able to get the wait_for mode to work for ssh with a bit of fiddling (so you don’t have to wait with 2 tasks):
hosts: 127.0.0.1
connection: local
gather_facts: false
vars_files:
env.yaml
tasks:
name: Wait for SSH to come up after the reboot
wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
with_items: groups.tag_env_{{pod}}_
ignore_errors: yes
register: result
until: result.failed is not defined
retries: 5
This seems to work for me all the time, but maybe I just got lucky. I create groups based on tags. Groups are tag-based “class_database”: “”, “class_monitoring”: “”, “env_qa1”: “”, which I register using add_host.
I’m always a bit wary when so many keywords come together. It’s usually the sign something can be simplified and is not “Ansible-like” enough.
name: Wait for SSH to come up after the reboot
wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
with_items: groups.tag_env_{{pod}}_
ignore_errors: yes
register: result
until: result.failed is not defined
retries: 5
Then I can consider this a bug report. Without retries, wait_for fails for every EC2 AMI I tried (admitedly, they’re all variations of CentOS).
Things I’ve seen:
it reports port open, then refuses to connect
it reports times out even though I was able to manually log in prior to the timeout
it fails with ssh errors while checking the port (this one is a bit rare)
This combination is less than ideal, but it seemed to work for all my cases. Also, a minor thing, you have an ec2 task then you start using the groups.tag_xxx, is it implied you have an add_host there? Cause my ec2 instances won’t appear unless I add that.