Is there a way to start a script that runs forever - without passing a phenomenally large number to async?
The problem with async is that after the timeout, it calls os.killpg on the process group with SIGKILL, so no matter how much I nohup, detach, subshell the child task, even if it has a PPID of 1, it’s still in the same process group.
I don’t want to have async running forever either, as then there are three long running ansible processes just hanging around making the process table look untidy.
I have tried to make the script run the task using nohup and then calling the command module without async, but then for some reason it hangs in run_command
It’s worth noting that even after the KeyboardInterrupt, the sleep process is running.
Is there a better way of doing this? Are there arguments against sending a different signal to os.killpg (SIGHUP seems like an obvious one that I could protect against with nohup or trap)
I’ve tested that changing the signal to SIGHUP works in the way I’d expect for the async module, and would be willing to submit a pull request if there’s any likelihood it would be accepted. I have no idea why the popen.communicate blocks on the script that calls the nohup background task when not using async.
I agree that that is what the docs say, and what would be desirable to happen.
What I’m saying though is that the task gets killed by a os.killpg call when the timeout expires. I’m happy with the killpg cleaning up the async_wrapper module and associated ansible-playbook related processes, but it kills the processes that the async command creates too. Because it sends SIGKILL, I can’t trap it.
I’d be happy to see a working example of any approach where a process continues after the playbook ends and any timeouts expire - my gist is I think a very simple example (and adding async: 5 and poll: 0 just fails in a different way) that hopefully someone has a simple fix for.
But then the three ansible processes hang around in the background making the process list untidy, not to mention making unnecessary checks every five seconds as to whether the process has timed out yet.
Any reason why the killpg sends SIGKILL rather than SIGHUP?
Also hardcoded is the kill signal of the process group which is set to SIGKILL. If it were set to SIGHUP the behaviour would pretty much be identical except that tasks could ignore the SIGHUP signal using nohup or trap (but the rest of the process group would die which is as desired).
I think it’s reasonable to just do a straight exec in the fire and forget case but we’ll have to see about implications – there’s no need for the status watcher in that case, but things elsewhere in Runner might need to change.
I don’t believe SIGHUP is the proper fix, the kill is there to kill the beast when it expires.
I’ve tested execution with SIGHUP, and that is sufficient to kill the other processes when it expires - the only processes that would survive would be ones that trap it or run under nohup