I’m in the process of a complete overhall / rewrite of our deployment / provisioning systems using ansible for a
distributed, microservices based, highly available architecture living on the AWS cloud.
In parallel I’ve been investigating improving how our company does monitoring. Currently, our DevOps team manages an
Icinga deploy by manually updating configuration files whenever hosts are added/removed, new services are added, etc.
This has become fairly unwieldy - the new (and not yet complete) ansible project is up to 23 roles and 48 playbooks -
multiplied by horizontal scaling, staging and production, autoscaling, etc, it has become nearly impossible to remain
proactive. It is not uncommon for services to remain unmonitored until they have gone down at least once in production.
Seeing as how our ansible project knows how to configure every piece of software in our stack from the ground up, I see
potential in using ansible to automate the configuration of monitoring and alerting software as part of
deployment.
For a few days I’ve been poking around at integrating automated alerting and monitoring with sensu (compatible with our
current icinga/nagios checks, aims for easy CM automation) and haven’t found an obvious, clean way to do it (whether
I’m using sensu, icinga, nagios, etc, it shouldn’t make too much of a difference).
Good idea / bad idea? It seems logical to me, but maybe ansible isn’t best suited for this? Thoughts? Anyone tried
something like this at a similar scale (~75-100+ services spread across hundreds to a thousand hosts)?