Behavior of ec2_elb module

I have a bit of a bone to pick with the way the ec2_elb module currently operates, and I’m wondering if other folks have had issue with this or if they like the way it currently works.

A bit of background first, especially for those who aren’t EC2 & ELB experts. Amazon makes their ELB (elastic load balancer) pretty trivial to use. If you have a server instance registered to an ELB then it basically can be in one of two states, either InService or OutOfService. A server that’s InService is able to respond to requests. A server that’s OutOfService could be out of service for many reasons. It could be anything from the server being powered off to a service like Apache not running to a health check failing to the server being in a transition period while it’s being added or removed from the load balancer.

If you use the ec2_elb module to remove an instance from an ELB then the very first thing it does is to check to see if the instance is InService. If the instance is InService then it removes the instance from the ELB. However, if the instance is OutOfService then the module immediately returns with changed:false. The code in the module explicitly checks for this:

if initial_state and initial_state.state == ‘InService’:
lb.deregister_instances([self.instance_id])
else:
return

We recently ran into a problem because of this behavior. We have an ansible playbook that we use to switch our production site to/from maintenance pages if we ever have problems with the site. We recently had a site problem so I ran the playbook to switch to our maintenance pages. It does this by adding a few instances to the load balancer that run static web pages, then removes all the webapp servers from the load balancer. The only problem is because all our webapp servers were failing their health checks Amazon was reporting them as OutOfService, so Ansible didn’t actually remove them from the ELB. We had to log into AWS and manually remove them from the ELB before we could safely restart them and verify that they were working properly before adding them back into the ELB.

Given that the ec2_elb module documentation states that specifying “state=absent” will deregister an instance from the load balancer, I think it should be doing it no matter what state the instance is in, not just if the instance is InService. I’m more than willing to submit a pull request with the necessary changes, but I’m not entirely certain what the best approach to take on this matter is. I know Michael prefers to maintain the default behavior of modules as much as possible, but in this case I’d argue that it’s a bug that instances are removed only if they’re in a healthy state. I suppose we could keep the current behavior and add a new optional parameter along the lines of “ignore_current_state=true”, but I would think most people would rather have the default be to always remove an instance and instead perhaps have an optional parameter that would cause an instance to be removed only if it’s healthy.

-Bruce

I’d be fine with the change here, it’s going to cause a little bit more activity, but doesn’t seem to change the end result.