Thinking about more modules -- what monitoring systems do you use?

Michael_DeHaan2 · May 25, 2013, 5:21am

The push for notification modules yielded IRC, hipchat, Flowdock, and Campfire notification modules, plus mqtt –

On the monitoring side, we already have modules in place for nagios, airbrake, and newrelic.

I have it on very good authority (cough) that modules for AppDynamics, Pingdom, and Pagerduty are coming soon

Thinking ahead, do you have a monitoring system that has either outage window or deployment notification support that we should include? (For those that aren’t compatible with the nagios module, anyway?).

I’d definitely like to expand this list.

Does it make sense to have modules for any of the newer systems? Graphite/etc tend to be more about trending, so a playbook to set the up may be sufficient, I’m not sure if modules would be needed?

Mark_Mandel · May 25, 2013, 5:29am

Very excited for these, these are exactly what we use

Mark

Serge_van_Ginderacht · May 25, 2013, 9:16am

I'm not working on monitoring directly myself at $work, but we have a
Zabbix (http://www.zabbix.org) setup.
I think this could definitely use a module. Zabbix has an API, and there
are a couple of python libraries (
https://www.zabbix.org/wiki/Docs/api/libraries#Python).

Serge

Lester_Wade · May 25, 2013, 9:25am

Op5: https://www.op5.com/manuals/index.html#page/op5_Monitor_Administrator_Manual/API.html

Michael_DeHaan · May 25, 2013, 12:47pm

Sounds good should someone want to add this, we’ll take it!

Looks like Zabbix outage windows like Nagios would be workable.

If that happens though, let’s make sure it uses a library that is available in pip, and try to approximately parallel the nagios module parameter conventions, if possible.

Edgars · May 25, 2013, 2:12pm

+1 Zabbix

TextEditor · May 25, 2013, 3:40pm

I also give a vote to Zabbix. Currently using that, Cacti and Icinga, but Icinga works via the Nagios module, because it’s just another clone. But Zabbix gets my vote.

Romeo_Theriault · May 25, 2013, 8:00pm

We also run Zabbix at $work. I’ve been using ansible’s ‘uri’ module to interact with zabbix via it’s json-rpc api, though a dedicated module would certainly be cleaner. Since using the uri module to do the work involves a multi-step process of storing login authid in register, etc…

While there are several un-official python zabbix api’s around, if I was writing it I’d probably just use a generic json-rpc module and go from there since last I looked the un-official api’s weren’t maintained very well… (this may have changed).

Romeo

Romeo_Theriault · May 25, 2013, 8:04pm

Even just doing it with the standard json and http libs would be possible.

Michael_DeHaan2 · May 26, 2013, 12:37am

Yeah, let’s do that…

Nicolas_G · May 26, 2013, 6:46pm

+1 for Zabbix

aamaas · May 27, 2013, 12:24pm

+1 for Zabbix here as well.

Brian_Coca1 · May 28, 2013, 8:08pm

zenoss, argus, HP and Dell management suites are the only ones i’ve seen missing, most other ones not mentioned are nagios derivatives.

Darryl_Stoflet · May 29, 2013, 2:38am

Anybody using monit? It has an unmonitor command that would be applicable for service specific outages. In general a monit module may be useful…

Ali_Asad_Lotia · May 29, 2013, 9:59am

May be useful to have a sensu module.

Michael_DeHaan2 · May 29, 2013, 11:40am

All great ideas!

We have about a week before code freeze on 1.2 if folks want to add some.

I already have pingdom and pagerduty in queue to merge! If not, 1.3 will be exciting (it will be anyway) and we can always pull in more later, and it’s likely those who want to sit on 1.2 a while can just copy those modules over. (and of course, many people run from source!)

Steve_Irvine · June 10, 2013, 7:47pm

Can I still vote for OMD/Check_MK? I know it’s a Nagios clone but it’s config files seem to be different to stock Nagios, it compiles Nagios config files dynamically from it’s own groups and tags.

I love the idea of having my ansible role match my check_mk roles.

Haven’t had chance to look at it yet.

Michael_DeHaan2 · June 10, 2013, 7:58pm

It wasn’t so much a voting thread as a brainstorming thread

So far this thread has produced modules for pingdom, pagerduty, airbrake, newrelic, and monit in addition to the already existing Nagios!

More for 1.3 are welcome if anyone would like to add some for major apps!

Brice_Burgess · June 11, 2013, 2:37pm

We use Librato metrics to aggregate and visualize all our metrics. It supports thresholds and notifies via OpsGenie and PagerDuty (which may expose an API that ansible can automate).

For metrics collection we use collectd and diamond. collectd supports thresholding; but I can’t see how to automate these any better with a special module than with the current ones inside a playbook. Sensu and StatsD are also popular nagios alternatives.

Librato features integration with StatusCake, a status check service similar to pingdom and monit. We use them because of the librato integration – and it’s actually a very nice service!

An ElasticSearch and Datomic module would also be nice. ElasticSearch search provides a REST API that makes it relatively easy to poll status and create/remove indexes (databases). I basically use the uri module with ignore_errors set to true to ensure an index exists.

Thanks,

~ Brice

Michael_DeHaan2 · June 12, 2013, 10:46am

Wanted to point out there’s already a pagerduty module in 1.2 core, BTW

Topic		Replies	Views
Of the Recent Pushes perhaps suggesting an Ansible-Contrib Ansible Project	1	0	May 10, 2013
Automating monitoring/alerting as part of ansible Ansible Project aws	0	2	June 25, 2016
ansible playbook/role to check system health Ansible Project	2	11	March 20, 2021
notification module Ansible Project	5	0	May 2, 2013
Uptime Robot Module Ansible Developer	1	3	November 14, 2014

Thinking about more modules -- what monitoring systems do you use?

Related topics