Powering EC2 instances on/off

C_S · April 21, 2014, 4:29pm

Hi folks,

We’re trying to implement a system where we can power environments on and off AWS when they’re not in use. However the ec2 inventory module excludes instances that are not in a running state. It seems like adding an option to the ec2 module to include stopped instances would work, but then I guess ansible would need a corresponding option to call the module with to include the stopped instances. Which seems a it hacky…

Maybe ansible needs a notion of host state? Any thoughts?

Thx!

-cs

Scott_Anderson · April 21, 2014, 5:10pm

I use this module: https://github.com/ansible/ansible/pull/6349

Full disclosure: Michael believes all inventory should be done via inventory scripts; I respectfully disagree. I find ec2.py to be very slow (20 seconds to refresh the cache with a small number of instances, for example) and prefer querying inventory directly in the script itself for many use cases.

Regards,
-scott

C_S · April 21, 2014, 5:39pm

Thanks!

That’s interesting, your module is the same as ec2_facts just with filtering. And the ec2_facts module says it may add filtering in the notes. I think I’d agree with Michael’s pov, but it looks like we’ve already gone down facts being outside the inventory module, so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state …

Scott_Anderson · April 21, 2014, 5:45pm

Actually, it’s not the same as ec2_facts other than it returns facts about an instance.

ec2_facts only works when run on an actual AWS instance (it calls the Amazon ec2 metadata servers) and it only retrieves the facts for that instance alone.

ec2_instance_facts, on the other hand, can retrieve multiple instance facts at once from anywhere (I use it in a local action). It’s more like ec2.py run for specific instances from within a playbook.

Regards,
-scott

C_S · April 21, 2014, 5:54pm

Thanks for the clarification, right, the use case and implementation are a bit different. Seems like they could be combined however.

Michael_DeHaan1 · April 23, 2014, 1:04pm

"so maybe a pull request against ec2_facts with the filters would get accepted. Long run it does seem like hosts and modules need to have some idea of state … "

Anything applying to more than one host definitely shouldn’t be done by the facts module.

ghexsel · April 25, 2014, 6:31pm

So, I’m curious, for the case where you want to start “stopped” EC2 instances, what’s the current recommended approach?

I’ve kind of ignored this task for now, managing that by hand (it’s just our dev env, but it’s still a couple of dozen instances at least). I’m almost about to pull Scott’s branch in locally since it looks so much better than manual management.

Scott_Anderson · April 25, 2014, 6:59pm

Here’s an example in case you do use ec2_instance_facts. This example creates maintenance instances for updating AMIs.

Notes:

This is part of a set of scripts that will create an entire load balanced application environment (including DNS, VPC, centralized logging, and RDS) in a bare AWS account in about 20-30 minutes.
app_environment is dev, test, stage, or prod. The scripts will create the same setup in each environment with some differences such as RDS size, domain name, and so forth.
I use a naming convention for AWS resources of ‘---’, eg. foo-stage-ec2-logging or foo-prod-ami-web.

The base image is created from a standard Ubuntu LTS instance. Then, packages common to all

of the images (eg. security, ansible, boto, etc.) are installed and configured.

There’s a separate pull request (also rejected, hi Michael… for the ec2_ami_facts module.

name: Obtain list of existing AMIs
local_action:
module: ec2_ami_facts
description: “{{ ami_image_name }}”
tags:
environment: “{{ app_environment }}”
region: “{{ vpc_region }}”
aws_access_key: “{{ aws_access_key }}”
aws_secret_key: “{{ aws_secret_key }}”
register: ami_facts
ignore_errors: yes

If a version of the AMI exists, record this. Otherwise use the base Ubuntu image.

set_fact:
environment_base_image_id: “{{ ami_facts.images[0].id }}”
when: ami_facts.images|count > 0
set_fact:
environment_base_image_id: “{{ ami_base_image_id }}”
when: ami_facts.images|count == 0

See if the maintenance image for this image type for this environment is running.

name: Obtain list of existing instances
local_action:
module: ec2_instance_facts
name: "{{ ami_maint_instance_name }}”

Everything but terminated

states:

pending
running
shutting-down
stopped
stopping
tags:
environment: “{{ app_environment }}”
region: “{{ vpc_region }}”
aws_access_key: “{{ aws_access_key }}”
aws_secret_key: “{{ aws_secret_key }}”
register: instance_facts
ignore_errors: yes
set_fact:
environment_maint_instance: “{{ instance_facts.instances_by_name.get(ami_maint_instance_name) }}”
when: instance_facts.instances|count > 0

If there is no such instance, create one.

name: Create an instance for managing the AMI creation
local_action:
module: ec2
state: present
image: “{{ environment_base_image_id }}”
instance_type: t1.micro
group: “{{ environment_public_ssh_security_group }}”
instance_tags:
Name: “{{ ami_maint_instance_name }}”
environment: “{{ app_environment }}”
key_name: “{{ environment_public_ssh_key_name }}”
vpc_subnet_id: “{{ environment_vpc_public_subnet_az1_id }}”
assign_public_ip: yes
wait: yes
wait_timeout: 600
region: “{{ vpc_region }}”
aws_access_key: “{{ aws_access_key }}”
aws_secret_key: “{{ aws_secret_key }}”
register: maint_instance
when: environment_maint_instance is not defined
set_fact:
environment_maint_instance: “{{ maint_instance.instances[0] }}”
when: maint_instance is defined and maint_instance.instances|count > 0
name: Ensure instance is running
local_action:
module: ec2
state: running
instance_ids: “{{ environment_maint_instance.id }}”
wait: yes
wait_timeout: 600
region: “{{ vpc_region }}”
aws_access_key: “{{ aws_access_key }}”
aws_secret_key: “{{ aws_secret_key }}”
register: maint_instance
when: environment_maint_instance is defined

If we had to start the instance then the public IP will not have been defined when

we gathered facts above, so get it again.

name: Obtain public IP of newly running instance
local_action:
module: ec2_instance_facts
name: “{{ ami_maint_instance_name }}”
states:
running
tags:
environment: “{{ app_environment }}”
region: “{{ vpc_region }}”
aws_access_key: “{{ aws_access_key }}”
aws_secret_key: “{{ aws_secret_key }}”
register: instance_facts
when: maint_instance|changed
set_fact:
environment_maint_instance: “{{ instance_facts.instances_by_name.get(ami_maint_instance_name) }}”
when: maint_instance|changed

Pass the collected facts on the new maintenance image host for configuration by role.

name: Add new maintentance instance to host group
local_action:
module: add_host
hostname: “{{ environment_maint_instance.public_ip }}”
groupname: maint_instance
app_environment: “{{ app_environment }}”

This passes the new/existing private key file to ansible for use in contacting the hosts. Better way to do this?

ansible_ssh_private_key_file: “{{ environment_public_ssh_private_key_file }}”
environment_maint_instance: “{{ environment_maint_instance }}”

name: Wait for SSH on maintenance host
local_action:
module: wait_for
host: “{{ environment_maint_instance.public_ip }}”
port: 22

This is annoying as Hades. Sometimes the delay works, sometimes it’s not enough.

The check fails if the port is open but the ssh daemon isn’t yet ready to accept

actual traffic, right after the maintenance instance is started.

#delay: 10
timeout: 320
state: started

TODO fix the hardcoded user too

name: Really wait for SSH on maintenance host
local_action: command ssh -o StrictHostKeyChecking=no -i {{ environment_public_ssh_private_key_file }} ubuntu@{{ environment_maint_instance.public_ip }} echo Rhubarb
register: result
until: result.rc == 0
retries: 20
delay: 10

Regards,
-scott

James_Carroll · April 25, 2014, 7:08pm

I’m fairly new to Ansible. How do I get your code into my Ansible install so I can use it? I run from source.

Thanks!

James

Scott_Anderson · April 25, 2014, 7:10pm

I keep all of my new/modified modules in a library directory under where my play books are. Ansible will find the libraries there and use them over the ones in the Ansible install.

Regards,
-scott

Michael_DeHaan1 · April 25, 2014, 8:57pm

Using local ./library content is fine, but please don’t run a fork with extra packages added if you are going to ask questions about them – or at least identify that you are when you do.

It can make Q&A very confusing when people ask about things that aren’t merged.

ghexsel · April 25, 2014, 11:17pm

Just as a side-note, I was able to get the wait_for mode to work for ssh with a bit of fiddling (so you don’t have to wait with 2 tasks):

hosts: 127.0.0.1
connection: local
gather_facts: false
vars_files:
env.yaml
tasks:
name: Wait for SSH to come up after the reboot
wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
with_items: groups.tag_env_{{pod}}_
ignore_errors: yes
register: result
until: result.failed is not defined
retries: 5

This seems to work for me all the time, but maybe I just got lucky. I create groups based on tags. Groups are tag-based “class_database”: “”, “class_monitoring”: “”, “env_qa1”: “”, which I register using add_host.

Michael_DeHaan1 · April 26, 2014, 12:36pm

I’m always a bit wary when so many keywords come together. It’s usually the sign something can be simplified and is not “Ansible-like” enough.

name: Wait for SSH to come up after the reboot
wait_for: host={{item}} port=22 delay=60 timeout=90 state=started
with_items: groups.tag_env_{{pod}}_
ignore_errors: yes
register: result
until: result.failed is not defined
retries: 5

Can likely be simplified to:

hosts: localhost
tasks:
ec2: # provisioning step here with add_host…
hosts: groups.tag_env_{{ pod }}_
tasks:
name: Wait for SSH to come up after the reboot

local_action: wait_for host={{item}} port=22 delay=60 timeout=90

A few key concepts:

(A) Using the host loop is clearer than doing a “with_items” across the group

(B) You should only need to do one wait_for. Consider increasing the timeout rather than looping over a retry

(C) You should not need to register the result of the retry since there is no loop

(D) You won’t need to ignore errors because we’re running wait_for off localhost, which we know we can connect to.

ghexsel · April 26, 2014, 5:32pm

Then I can consider this a bug report. Without retries, wait_for fails for every EC2 AMI I tried (admitedly, they’re all variations of CentOS).

Things I’ve seen:

it reports port open, then refuses to connect
it reports times out even though I was able to manually log in prior to the timeout
it fails with ssh errors while checking the port (this one is a bit rare)

This combination is less than ideal, but it seemed to work for all my cases. Also, a minor thing, you have an ec2 task then you start using the groups.tag_xxx, is it implied you have an add_host there? Cause my ec2 instances won’t appear unless I add that.

ghexsel · April 26, 2014, 5:33pm

Nvm, saw the add_host in the comment.

Topic		Replies	Views
[newbie]Start/stop EC2 instances based on tags Ansible Project	16	6	August 19, 2014
Retrieve ec2 facts Ansible Project	1	0	December 4, 2015
AWS dynamic inventory and filter out unreachable host Ansible Project aws	2	2	July 19, 2016
stopped instances don't show in the ec2 inventory Ansible Project	4	3	August 28, 2015
Making sure an EC2 instance is running Ansible Project	2	2	April 15, 2014

Powering EC2 instances on/off

The base image is created from a standard Ubuntu LTS instance. Then, packages common to all

of the images (eg. security, ansible, boto, etc.) are installed and configured.

There’s a separate pull request (also rejected, hi Michael… for the ec2_ami_facts module.

If a version of the AMI exists, record this. Otherwise use the base Ubuntu image.

See if the maintenance image for this image type for this environment is running.

Everything but terminated

If there is no such instance, create one.

If we had to start the instance then the public IP will not have been defined when

we gathered facts above, so get it again.

Pass the collected facts on the new maintenance image host for configuration by role.

This passes the new/existing private key file to ansible for use in contacting the hosts. Better way to do this?

This is annoying as Hades. Sometimes the delay works, sometimes it’s not enough.

The check fails if the port is open but the ssh daemon isn’t yet ready to accept

actual traffic, right after the maintenance instance is started.

TODO fix the hardcoded user too

Related topics