Thinking about more modules -- what monitoring systems do you use?

The push for notification modules yielded IRC, hipchat, Flowdock, and Campfire notification modules, plus mqtt –

On the monitoring side, we already have modules in place for nagios, airbrake, and newrelic.

I have it on very good authority (cough) that modules for AppDynamics, Pingdom, and Pagerduty are coming soon :slight_smile:

Thinking ahead, do you have a monitoring system that has either outage window or deployment notification support that we should include? (For those that aren’t compatible with the nagios module, anyway?).

I’d definitely like to expand this list.

Does it make sense to have modules for any of the newer systems? Graphite/etc tend to be more about trending, so a playbook to set the up may be sufficient, I’m not sure if modules would be needed?

Very excited for these, these are exactly what we use :slight_smile:

Mark

I'm not working on monitoring directly myself at $work, but we ​have a
Zabbix (http://www.zabbix.org) setup.
I think this could definitely use a module. Zabbix has an API, and there
are a couple of python libraries (
https://www.zabbix.org/wiki/Docs/api/libraries#Python).

  Serge

Op5: https://www.op5.com/manuals/index.html#page/op5_Monitor_Administrator_Manual/API.html

Sounds good should someone want to add this, we’ll take it!

Looks like Zabbix outage windows like Nagios would be workable.

If that happens though, let’s make sure it uses a library that is available in pip, and try to approximately parallel the nagios module parameter conventions, if possible.

+1 Zabbix

I also give a vote to Zabbix. Currently using that, Cacti and Icinga, but Icinga works via the Nagios module, because it’s just another clone. But Zabbix gets my vote.

We also run Zabbix at $work. I’ve been using ansible’s ‘uri’ module to interact with zabbix via it’s json-rpc api, though a dedicated module would certainly be cleaner. Since using the uri module to do the work involves a multi-step process of storing login authid in register, etc…

While there are several un-official python zabbix api’s around, if I was writing it I’d probably just use a generic json-rpc module and go from there since last I looked the un-official api’s weren’t maintained very well… (this may have changed).

Romeo

Even just doing it with the standard json and http libs would be possible.

Yeah, let’s do that…

+1 for Zabbix

+1 for Zabbix here as well.

zenoss, argus, HP and Dell management suites are the only ones i’ve seen missing, most other ones not mentioned are nagios derivatives.

Anybody using monit? It has an unmonitor command that would be applicable for service specific outages. In general a monit module may be useful…

May be useful to have a sensu module.

All great ideas!

We have about a week before code freeze on 1.2 if folks want to add some.

I already have pingdom and pagerduty in queue to merge! If not, 1.3 will be exciting (it will be anyway) and we can always pull in more later, and it’s likely those who want to sit on 1.2 a while can just copy those modules over. (and of course, many people run from source!)

Can I still vote for OMD/Check_MK? I know it’s a Nagios clone but it’s config files seem to be different to stock Nagios, it compiles Nagios config files dynamically from it’s own groups and tags.

I love the idea of having my ansible role match my check_mk roles.

Haven’t had chance to look at it yet.

It wasn’t so much a voting thread as a brainstorming thread :slight_smile:

So far this thread has produced modules for pingdom, pagerduty, airbrake, newrelic, and monit in addition to the already existing Nagios!

More for 1.3 are welcome if anyone would like to add some for major apps!

We use Librato metrics to aggregate and visualize all our metrics. It supports thresholds and notifies via OpsGenie and PagerDuty (which may expose an API that ansible can automate).

For metrics collection we use collectd and diamond. collectd supports thresholding; but I can’t see how to automate these any better with a special module than with the current ones inside a playbook. Sensu and StatsD are also popular nagios alternatives.

Librato features integration with StatusCake, a status check service similar to pingdom and monit. We use them because of the librato integration – and it’s actually a very nice service!

An ElasticSearch and Datomic module would also be nice. ElasticSearch search provides a REST API that makes it relatively easy to poll status and create/remove indexes (databases). I basically use the uri module with ignore_errors set to true to ensure an index exists.

Thanks,

~ Brice

Wanted to point out there’s already a pagerduty module in 1.2 core, BTW :slight_smile: