The push for notification modules yielded IRC, hipchat, Flowdock, and Campfire notification modules, plus mqtt –
On the monitoring side, we already have modules in place for nagios, airbrake, and newrelic.
I have it on very good authority (cough) that modules for AppDynamics, Pingdom, and Pagerduty are coming soon
Thinking ahead, do you have a monitoring system that has either outage window or deployment notification support that we should include? (For those that aren’t compatible with the nagios module, anyway?).
I’d definitely like to expand this list.
Does it make sense to have modules for any of the newer systems? Graphite/etc tend to be more about trending, so a playbook to set the up may be sufficient, I’m not sure if modules would be needed?
Sounds good should someone want to add this, we’ll take it!
Looks like Zabbix outage windows like Nagios would be workable.
If that happens though, let’s make sure it uses a library that is available in pip, and try to approximately parallel the nagios module parameter conventions, if possible.
I also give a vote to Zabbix. Currently using that, Cacti and Icinga, but Icinga works via the Nagios module, because it’s just another clone. But Zabbix gets my vote.
We also run Zabbix at $work. I’ve been using ansible’s ‘uri’ module to interact with zabbix via it’s json-rpc api, though a dedicated module would certainly be cleaner. Since using the uri module to do the work involves a multi-step process of storing login authid in register, etc…
While there are several un-official python zabbix api’s around, if I was writing it I’d probably just use a generic json-rpc module and go from there since last I looked the un-official api’s weren’t maintained very well… (this may have changed).
We have about a week before code freeze on 1.2 if folks want to add some.
I already have pingdom and pagerduty in queue to merge! If not, 1.3 will be exciting (it will be anyway) and we can always pull in more later, and it’s likely those who want to sit on 1.2 a while can just copy those modules over. (and of course, many people run from source!)
Can I still vote for OMD/Check_MK? I know it’s a Nagios clone but it’s config files seem to be different to stock Nagios, it compiles Nagios config files dynamically from it’s own groups and tags.
I love the idea of having my ansible role match my check_mk roles.
We use Librato metrics to aggregate and visualize all our metrics. It supports thresholds and notifies via OpsGenie and PagerDuty (which may expose an API that ansible can automate).
For metrics collection we use collectd and diamond. collectd supports thresholding; but I can’t see how to automate these any better with a special module than with the current ones inside a playbook. Sensu and StatsD are also popular nagios alternatives.
Librato features integration with StatusCake, a status check service similar to pingdom and monit. We use them because of the librato integration – and it’s actually a very nice service!
An ElasticSearch and Datomic module would also be nice. ElasticSearch search provides a REST API that makes it relatively easy to poll status and create/remove indexes (databases). I basically use the uri module with ignore_errors set to true to ensure an index exists.