I tried running a simple playbook with the yum module run
asynchronously. Some of the time, I get this error:
failed: [hardy1] => {"ansible_job_id": "269083845912", "failed": 1,
"finished": 1, "msg": "Argument file not found"}
I think it's a race condition, where the temporary directory is
removed before the async wrapper is finished. If, in runner.py, I
sleep for 1 second right before "self._delete_remote_files(conn, tmp)"
I do not see the problem. But there must be a better fix - any ideas?
My playbook:
To be more clear, while async wrapper starts a background process, the async_status module is not actually asynchronous at all, just
run often in a loop.
(Thus it shouldn’t be subject to deletion… UNLESS the tempdir name is calculated and reused between async_wrapper and async_runs,
in which case, we should take efforts to not do that.)
Just wanted to share that I can in fact reproduce this with the attached playbook.
It doesn’t look like the tempdir files are reused (which was my theory), but just that async poll runs in the meantime between async_wrapper writing the initial JID started file and the yum command (which finishes very quickly) trying to rewrite the file.
The solution seems to be incorporating some try/wait logic in async_wrapper but I’ll have to dig a little. Should be pretty simple to fix.
(The workaround, for now, is to not use async unless you really need to, this particular task doesn’t benefit from async polling, a command that would take 20 minutes would)