I’m currently using the ec2_metric_alarm module to create a CloudWatch alarm. Due to AWS API throttling, the task generally fails during its first try and must be retried with the “retries” parameter. My code looks something like this:
`
- name: create alarm
ec2_metric_alarm:
state: present
region: us-east-1
name: “cpu-low”
metric: “CPUUtilization”
namespace: “AWS/EC2”
statistic: Average
comparison: “<=”
threshold: 5.0
period: 300
evaluation_periods: 3
unit: “Percent”
description: "This will alarm when a bamboo slave’s cpu usage average is lower than 5% for 15 minutes "
dimensions: {‘InstanceId’:‘i-XXX’}
alarm_actions: [“action1”,“action2”]
retries: 5
delay: 30
`
When I actually run the playbook, I get this error:
`
14:00:30 TASK [Create CloudWatch alarm - CPU] ******************************
14:00:30 task path: /path/to/task.yml:91
14:00:31 Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/cloud/amazon/ec2_metric_alarm.py
14:00:31 ESTABLISH LOCAL CONNECTION FOR USER: tools
14:00:31 EXEC /bin/sh -c ‘/usr/bin/python && sleep 0’
14:00:31 fatal: [localhost]: FAILED! => {
14:00:31 “changed”: false,
14:00:31 “failed”: true,
14:00:31 “invocation”: {
14:00:31 “module_args”: {
14:00:31 “alarm_actions”: [
14:00:31 “path-to-policy-arn”
14:00:31 ],
14:00:31 “aws_access_key”: null,
14:00:31 “aws_secret_key”: null,
14:00:31 “comparison”: “>=”,
14:00:31 “description”: "This will alarm when a bamboo slave’s cpu usage average is lower than 5% for 15 minutes “,
14:00:31 “dimensions”: {
14:00:31 “InstanceId”: “i-XXX”
14:00:31 },
14:00:31 “ec2_url”: null,
14:00:31 “evaluation_periods”: 3,
14:00:31 “insufficient_data_actions”: null,
14:00:31 “metric”: “CPUUtilization”,
14:00:31 “name”: “cpu-low”,
14:00:31 “namespace”: “AWS/EC2”,
14:00:31 “ok_actions”: null,
14:00:31 “period”: 300,
14:00:31 “profile”: null,
14:00:31 “region”: “us-east-1”,
14:00:31 “security_token”: null,
14:00:31 “state”: “present”,
14:00:31 “statistic”: “Average”,
14:00:31 “threshold”: 5.0,
14:00:31 “unit”: “Percent”,
14:00:31 “validate_certs”: true
14:00:31 },
14:00:31 “module_name”: “ec2_metric_alarm”
14:00:31 },
14:00:31 “msg”: “BotoServerError: 400 Bad Request\n<ErrorResponse xmlns="http://monitoring.amazonaws.com/doc/2010-08-01/\”>\n \n Sender\n Throttling
\n Rate exceeded\n \n bf38e4d8-2397-11e7-b197-c31f0855aa5d\n\n”
14:00:31 }
`
So it looks like Ansible isn’t actually recognizing the retries/delay parameters at all, and fatally erroring out at the first failed attempt of the task. I’ve already spoken with AWS support and they won’t lift/change the throttling bandwidth. I looked through the Ansible GH issues list and found this related issue, but couldn’t get their suggested workaround to work for me: https://github.com/ansible/ansible-modules-core/issues/143
Any ideas as to why the playbook isn’t recognizing/using the retries/delay parameters here? Or any workaround suggestions?