timing tasks and plays.

There’s been a bit of a metrics jihad going on at the day job that got me thinking about how I’d implement something in Ansible to time a playbook or each task. In fact we just had an incident where recording some playbook metrics would have been pretty helpful in spotting we had a problem. (It was what Ansible was running, not Ansible BTW.)

So my first thought to achieve that was to use callbacks. Thing is there is only start hooks (i.e. playbook_on_task_start, playbook_on_play_start), but not an end. Is that correct or is there some Ansible magic I’m missing? Admittedly I’ve not been able to keep up with the code base especially the v2 changes so I could be asking a question that’s already been answered.

Thoughts?

there already are a few useful callbacks to do what you want, for
general profiling i use
profile_tasks.py, there are others that log to syslog, it should be
pretty easy to compose exactly what you need.

As for the methods available to callbacks, check in
lib/ansible/callbacks.py for the base class.

Thanks Brian. I’m assuming you mean this plugin: https://github.com/jlafon/ansible-profile/blob/master/callback_plugins/profile_tasks.py.

That little bit of code is a clever way of dealing the lack of an explicit “end” callback point. So I can safely assume that between the end of one task and the start of another virtually nothing will run that may skew those timings? Perhaps this is an edge case, but couldn’t notification handler skew the last task? Something like a graceful restart of a server?

For the record I wasn’t being entirely clear to what meant. I wasn’t thinking about profiling but more about feeding a time series database like Graphite or InfluxDB where we could compare historical trends and monitor what’s been happening with Grafana et al. For instance, in the case I eluded to the playbook was being run via autosys and had been completing successful. What got missed was the steep climb in runtime some tasks were taking on a certain hosts over the course of multiple runs and days. Having metrics on the brain, it occurred to me that we would have caught that before it broke down if we had been collecting those sorts of numbers.

Currently only one task executes at a time, handlers are a task and
won't execute until the previous task is done. This is why starting a
new task indicates the end of the previous one.