extra scaling events with ec2_asg module

robb · March 25, 2015, 1:36pm

For Ansible 1.9-develop Pull request 601 had the fix for Issue 383, which does affect our production ASG about every two weeks or so. We use the ec2_asg module to refresh our ASG instances 3 times a day.

I was eager to test. In doing so, I noticed that the replace_all_instances or replace_instances options cause extra set of scaling events. Has anyone else who uses either replace_ option see this happen? See below for the screen shot which demonstrates the behavior.

We have one instance in two different Availability Zones. So we use a batch size of two (actually a formula based upon the length of the availability_zones list of the ASG).

Interesting… I just tested with batch_size: 1. The extra set of scaling events was 1. I.e. one new instance launched and one new instance terminated.

The batch_size logic is broken. I am going open an Issue in ansible-modules-core, but welcome others to note their experience here. I’ll update this topic with a link to the Issue, too.

`

name: Retrieve Auto Scaling Group properties
local_action:
module: ec2_asg
name: “{{ asg_name }}”
state: present
health_check_type: ELB
register: result_asg
name: Auto Scaling Group properties
debug: var=result_asg
name: Replace current instances with fresh instances
local_action:
module: ec2_asg
name: “{{ asg_name }}”
state: present
min_size: “{{ result_asg.min_size }}”
max_size: “{{ result_asg.max_size }}”
desired_capacity: “{{ result_asg.desired_capacity }}”
health_check_type: “{{ result_asg.health_check_type }}”
lc_check: no
replace_all_instances: yes
replace_batch_size: “{{ result_asg.availability_zones | length() }}”

`

and 2. are expected. a. - d. are extra scaling events.

James_Martin1 · March 25, 2015, 3:02pm

Looking forward to the github issue – make sure you take a look at the autoscale group and the ELB in the AWS console and see if it gives a description why the instances were terminated. I’ve seen cases where things did not come online fast enough and the ELB marks them as unhealthy and the ASG terminates them.

Thanks,

James

robb · March 25, 2015, 5:42pm

Thanks James.

All the instances terminated are due to being marked Unhealthy by terminate_batch().

I am using the changes from this PR: https://github.com/ansible/ansible-modules-core/pull/589, combined with the fixes in PR 601. Rationale: I need lc_check=no to cause all instances to get replaced. With the current way it is written in the module lc_check only works if the active if an instance has a different Launch Config than the one assigned to the ASG. Upon further consideration, I should add a new option instead of overloading the meaning of lc_check.

James_Martin1 · March 30, 2015, 8:44pm

I spent some time re-working the algorithm that does the rolling replacement. It is much smarter now, and it shouldn’t cause unecessary scaling events. I’ve also merged the functionality of #589. Would you mind giving it a whirl?

https://github.com/ansible/ansible-modules-core/pull/1030

Topic		Replies	Views
EC2 Rolling Deploy with an ASG Ansible Project	22	18	November 24, 2014
ec2-asg Module - AWS Autoscaling groups going over the desired instance count and then scaling back Ansible Developer aws	1	4	August 19, 2016
EC2 Auto-Scaling Group Enhancements Ansible Project	1	3	June 5, 2014
ec2 asg does not spin or replace instances when existing auto scaling group is modified with new launch configuration Ansible Project	0	4	February 7, 2020
ec2_asg: attach existing instance Ansible Project	1	0	October 26, 2017

extra scaling events with ec2_asg module

Related topics