Based on discussions in the IRC channel today with hazmat and jarv, I’ve got some comments and suggestions about how the ec2 module currently handles idempotency.
By default, the ec2 module is not idempotent. If I write a playbook in which count=N ec2 instances are provisioned, then every time I run that playbook N new instances are provisioned.
However in version 1.2 it became possible to make this provisioning operation idempotent. This pull request (which I believe made it into 1.2) allows one to specify “id” parameter, which is used by boto to specify the EC2 client-token. In a way, this is a sensible way to add idempotency, because according to the AWS documentation, the client-token is indented to ensure idempotent provisioning of ec2 instances. However, that same documentation points out an annoying limitation:
The client token is valid for at least 24 hours after the termination of the instance. You should not reuse a client token for another call later on.
Thus, imagine I want to provision a cluster of ec2 instances using the client-token “webservers”. The first time I provision these (using id=“webservers”), all is well and the idempotency functions as expected. Now lets say I spin these machines down, and then two weeks later I want to spin them back up. The AWS documentation says I should not do this, because I should not recycle the client-id. I’m not sure what kind of problems this actually creates, but if the AWS documentations says not to do it, then ansible should probably avoid it.
An alternative approach would be to allow one to provision ec2 instances idempotently using ec2 tags. We could add an option called something like tag_idempotency
which would be set to false by default to preserve backward compatability. If set to true, then before provisioning new instances, the ec2 module first checks to see how many instances with the specified tag(s) already exist, and only provisions extra instances if necessary. This could be implemented exactly as the current client-token based idempotency is implemented, except line 305 would change from:
filter_dict = {‘client-token’:id, ‘instance-state-name’ : ‘running’}
to something like
filter_dict = dict((“tag:” + tn, tv) for tn, tv in module.from_json(instance_tags).items())
(I haven’t tested this code yet, it’s just a sketch)
According to this thread, we should also note in the documentation that tag names should not contain underscores.
Any thoughts on this proposal?