Async module draft pushed … code review/upgrades please?

Hi all,

The async module is intended to work with any module other than ‘copy’, or ‘template’. I think I’ve got it mostly working now.

Usage from CLI for a long running op, using “-B” for background:

ansible all -B <time_limit> -a “yum update -y”

This returns in the JSON, pretty much immediately:

{ “started” : 1, “ansible_job_id” : 123456 }

The ansible job ID is the job ID for all of the hosts it was started on, such that it will be possible to query job status on all nodes simultaneously. There is no clever auto-polling in the CLI yet, so here’s the low level way to do it:

ansible all -m async_status jid=123456

That will give the job results for all hosts. If the job is finished the job results will be contain the actual results of the command, if not, you’ll just get “started”. There is no real standard for incremental process.

Currently there is a small limitation that the directory used to log the job info isn’t cleaned up unless you do:

ansible all -m async_status jid=123456 mode=cleanup

I suspect the CLI will be working the levers and buttons behind the scenes so users will not need to know about async_status and async_wrapper at all. Of course, if you are using the API, you would have to know a bit, but it is just that simple.

How is it implemented?

When requesting to run something asynchronously, ansible pushes down both an ‘async_wrapper’ module as well as the actual module you want to run.

It is invoked like this via SSH, making this really one of the best ways to test just the modules without the rest of ansible:

(cd library)
async_wrapper <job_id> <time_limit_in_seconds> <path_to_module_script> <arguments_to_module_script>

In debugging or working on modules, I highly recommend just running like this, it’s easier to debug especially if something is going to spew a traceback.

So when I test this from checkout, this is what I do

./library/async_wrapper 123456 300 ./library/command yum update “-y”
123456 is just a made up job id. You can use anything
300 means let it run for 300 seconds … I haven’t tested the time killer so much yet, but it is there.

Once better tested, I can teach the command line to have a --poll option that works with the timeout or whatever, but I’m not so worried about that right now.

While this seems to work, can someone with good OS/fork/zombie/orphan process knowledge review async_wrapper to see if I’m doing anything incredibly stupid here? Patches quite welcome. I’m a little rusty in this regard and occasionally commit travesties with fork().

I tested this by kicking off a long running job update, manually kill -9’d the actual yum command line, and it looked like no orphans were left over in “ps -aux”, and things
returned immediately and looked ok to me. The way I’m checking for process status though (proc), is lame and won’t work for non-root users. I’d really like to know if there is a better way of doing that. We only need it for the watchdog process.

Anyhow, help would be appreciated, though I think this is good for a start!

Next steps:

  • code review from folks here
  • add a --poll option to the /bin/ansible command line that checks status every 5 seconds or so up until the timeout for all the nodes targeted
  • teach playbooks to do something similar with polling, and allow a “async: timeout_in_seconds” to be specified as a flag in each playbook task.
  • test command line and playbook integration

Thoughts?

–Michael