hi there,
so we had something odd happen to us and i figured i would reach out to the community for help … so the background:
- ansible 1.9.1
- we have a YML data set that contains “normal” info for SGs like ingress and egress rules
- this data rarely changes
- we execute the ec2_group command on a regular basis since it is part of our “normal ansible runs” (4 times a day)
so this is where it starts to get “weird” … our SGs show “changed” on a regular basis, since we do NOT allow ALL to cidr_ip 0.0.0.0/0 for egress rules … in the ansible code, i THINK you will see that this is part of the default data: https://github.com/ansible/ansible-modules-core/blob/devel/cloud/amazon/ec2_group.py (lines ~305 ?? possibly 409ish ??)… so since we don’t allow this egress rule, ansible thinks the SG has “changed” when in fact it has not – and issues a “changed” command on every run … for example:
changed: [localhost] => (item={‘rules’: [{‘to_port’: 5666, ‘from_port’: 5666, ‘group_name’: ‘1-admin’, ‘proto’: ‘tcp’}, …], ‘rules_egress’: [{‘to_port’: ‘all’, ‘from_port’: ‘all’, ‘cidr_ip’: ‘10.137.0.0/16’, ‘proto’: ‘all’}, …], ‘name’: ‘1-base-zero’, ‘description’: ‘Default global SG to be attached to all EC2 instances’})
this is kind of not cool since it shows as changed when it has not … but since no real change happened, “AWS Config” does not view it as a change … i think there is a feature idea out there for this: https://github.com/ansible/ansible/issues/11249
so then … here is what happened … we were minding our own business ansible ran at 10AM, it did it’s normal “SG business” – no issues … it ran again at 12PM, and BAM !! egress rules from an important SG (<< prolly our most important SG) were removed … what is even more odd is that the output of the run that was successful at 10AM was identical to the output of the run at 12PM … the same “changed” output i alluded to earlier …
but “AWS Config” revealed all kinds of nastiness … it showed that we did this:
i will say the one “odd” thing we do that stands out in my mind is we do this as part of our data set for egress rules:
- proto: icmp
from_port: -1
to_port: -1
cidr_ip: “0.0.0.0/0”
other than a few comments we add in to the array, it all is pretty normal …
sooooo … any ideas about WTF happened ?? we are reaching out to AWS for support as well but no info to share there yet …
thanks for any help you can offer …
NOTE: this has only happened once like in 50+ executions … we re-ran the exact same ansible play to fix what was broken … so that even adds more weirdness to it