Using statsd for playbooks

Hi all,

I was wondering if anyone had considered statsd integration with Ansible and if so, the best way to go about it.

For example, it'd be cool to use statsd to record how many times a playbook is run or how long a particular task takes.

If anyone has any thoughts or ideas, I'd love to hear them.

I remember someone mentioning making a callback plugin to do this, but
cannot find email/irc mention so I don't know if they ever did.

I wrote a callback plugin to print start/duration times for each task. It was quite clunky to implement since there are no post-* events. It monkey patched the original _run_play and _run_task_internal so that I can give useful output. But it is quite a hack. :frowning: As far as I know, there is no other way to detect the kind of events you are looking for.

`

TASK: [example | gather facts] ***************************************
Task started at: 2015-05-07 14:03:44.877420
ok: [api-01.ssl.stg.example.net]
ok: [api-02.ssl.stg.example.net]
ok: [api-03.ssl.stg.example.net]
Task completed in: 0:00:03.745884

`

I think that was me Brian. We were discussing metrics collection at the day job and I was starting to look at it a number of different ways. The first thing I was investigating was task timings using callbacks. I only got as far as figuring out which callbacks get called when and with what.

Like Danny I found the lack of a post event a bit disconcerting until I was shown an example of stoping the last timer when the next starts. I wouldn't call it a hack and I'm still mildly concerned something could get in between the real post event point and the next pre event. Perhaps I'm a bit too paranoid though.

One outstanding question I still didn't answer for myself separating regular tasks from notification tasks. (This was a general curiosity than something that could impact timers.) The answer may be there but it wasn't immediately apparent to me.

Danny: on playbook run counts are you just envisioning a single increment when run that you look at in aggregate over an hour or a day? Or were you thinking about an event? We were thinking the latter would be more appropriate in the grand scheme of metrics collection (graph the performance of X and show me when a run playbook started and ended) then simple counts. Thing is statsd does support events from what I've seen. There's no standard events publishing in any of the metric protocols and you wouldn't want events grouped and summarized like statsd is made for.

Just thinking out loud. I think there's tremendous value in this area. I need to find the time to work on it but not for a few more weeks.

<tim/>

Like Danny I found the lack of a post event a bit disconcerting until I was shown an example of stoping the last timer when the next starts. I wouldn’t call it a hack and I’m still mildly concerned something could get in between the real post event point and the next pre event. Perhaps I’m a bit too paranoid though.

The problem I had with tracking like that, was that I was emitting output inline. This made the tasks/plays look like the following (which was horrible to read):

Task1 started
Started at: X

Task2 started
Task 1 took: Y
Started at: X

That’s absurd. Heh. So monkey-patching was the solution.

One outstanding question I still didn’t answer for myself separating regular tasks from notification tasks. (This was a general curiosity than something that could impact timers.) The answer may be there but it wasn’t immediately apparent to me.

Do you mean a handler? There is a flag on those.
481 self.callbacks.on_task_start(name, is_handler)

Danny: on playbook run counts are you just envisioning a single increment when run that you look at in aggregate over an hour or a day? Or were you thinking about an event? We were thinking the latter would be more appropriate in the grand scheme of metrics collection (graph the performance of X and show me when a run playbook started and ended) then simple counts. Thing is statsd does support events from what I’ve seen. There’s no standard events publishing in any of the metric protocols and you wouldn’t want events grouped and summarized like statsd is made for.

Honestly I don’t know. I haven’t used statsd, but I believe its made for tick-based tracking of events. Not sure how that’d be combined. We’d like to get performance information out of ansible. But we haven’t discussed how to track it. Right now, just emitting start and duration is enough. We have some big tasks, so just being able to see when it started solves the “when did I start this question” and the duration easily lets us see how long things took. That’s met our immediate needs.

i would look at existing plugins that already display tasks times and
durations (to stdout), they should be easy to repurpose to send that
output to statsd. Also look at the included syslog-json callback
plugin.

How difficult would it be to create some post-* events?

you mean like runner_on_ok?