Background task that runs 'forever'

Is there a way to start a script that runs forever - without passing a phenomenally large number to async?

The problem with async is that after the timeout, it calls os.killpg on the process group with SIGKILL, so no matter how much I nohup, detach, subshell the child task, even if it has a PPID of 1, it’s still in the same process group.

I don’t want to have async running forever either, as then there are three long running ansible processes just hanging around making the process table look untidy.

I have tried to make the script run the task using nohup and then calling the command module without async, but then for some reason it hangs in run_command

A fully working example (ok, it only sleeps for 10000 seconds but that will do) is at:
https://gist.github.com/willthames/7260782

It’s worth noting that even after the KeyboardInterrupt, the sleep process is running.

Is there a better way of doing this? Are there arguments against sending a different signal to os.killpg (SIGHUP seems like an obvious one that I could protect against with nohup or trap)

Thanks,
Will

I’ve tested that changing the signal to SIGHUP works in the way I’d expect for the async module, and would be willing to submit a pull request if there’s any likelihood it would be accepted. I have no idea why the popen.communicate blocks on the script that calls the nohup background task when not using async.

Will

if you put poll to 0 on an async call it should carry on…

http://www.ansibleworks.com/docs/playbooks_async.html

Or do you need to do stuff after it completes?

I agree that that is what the docs say, and what would be desirable to happen.

What I’m saying though is that the task gets killed by a os.killpg call when the timeout expires. I’m happy with the killpg cleaning up the async_wrapper module and associated ansible-playbook related processes, but it kills the processes that the async command creates too. Because it sends SIGKILL, I can’t trap it.

I’d be happy to see a working example of any approach where a process continues after the playbook ends and any timeouts expire - my gist is I think a very simple example (and adding async: 5 and poll: 0 just fails in a different way) that hopefully someone has a simple fix for.

Will

It’s fine to insert an insanely high value to async.

Heat death of the universe is fine if you want to.

Nothing to worry about.

But then the three ansible processes hang around in the background making the process list untidy, not to mention making unnecessary checks every five seconds as to whether the process has timed out yet.

Any reason why the killpg sends SIGKILL rather than SIGHUP?

Will

Those processes will die when the operation dies.

You can also change the poll to any interval you want.

If you don’t wish to poll, async with 0 poll and fire & forget.

not sure what you mean about the killpg question

This is for an operation that is supposed to live forever.

The 5 second poll is hardcoded and happens even when poll is set to 0
https://github.com/ansible/ansible/blob/devel/library/internal/async_wrapper#L179-L189

Also hardcoded is the kill signal of the process group which is set to SIGKILL. If it were set to SIGHUP the behaviour would pretty much be identical except that tasks could ignore the SIGHUP signal using nohup or trap (but the rest of the process group would die which is as desired).

Will

I think it’s reasonable to just do a straight exec in the fire and forget case but we’ll have to see about implications – there’s no need for the status watcher in that case, but things elsewhere in Runner might need to change.

I don’t believe SIGHUP is the proper fix, the kill is there to kill the beast when it expires.

Please file a ticket and reference this thread.

I’ve tested execution with SIGHUP, and that is sufficient to kill the other processes when it expires - the only processes that would survive would be ones that trap it or run under nohup

I’ll raise the issue though.

Others may wish to see discussion here:

https://github.com/ansible/ansible/issues/4778