I use Ansible to coordinate a CI build that takes place across several servers simultaneously. It’s easiest to feed our CI tool stdout, so I was using the “human_log.py” callback plugin that’s floating around online to add stdout/stderr logging on OK/changed tasks. However, I noticed very strange interleaving of task results in the output:
150812-06:42:34,549 workflow ERROR:
[‘Node trim_fslroi failed to run on host YYY.’]
150812-06:42:34,559 workflow INFO:
Saving crash info to XXX
150812-06:42:34,559 workflow INFO:
Traceback (most recent call last):
File “XXX/pipeline/plugins/linear.py”, line 38, in run
node.run(updatehash=updatehash)
File “XXX/pipeline/engine.py”, line 1386, in run
self._run_interface()
File “XXX/pipeline/engine.py”, line 2252, in _run_interface
updatehash=updatehash))
File "Xfailed: [YYY] => (item={“rc”: 1, “start”: “2015-08-12 06:01:52.770879”, “warnings”: , …}
[later in stdout]
Running: ZZZ/a_different_command
150812-06:54:16,889 workflow INFO:
Executing node a_different_command in dir: ZZZ/a_different_coXX/pipeline/engine.py", line 2153, in _collate_results
if node.result.outputs:
AttributeError: ‘NoneType’ object has no attribute ‘outputs’
…ignoring
I discovered that Ansible internally uses a display() helper function to establish a lock before printing, as opposed to the code sample I used that just naively prints. OK, makes sense - we’re doing an 8+ hour build that contains several 1+ hour tasks producing a LOT of output, so it makes sense that we’d see this race condition more often than everyone else using it.
However, even establishing a lock with the current runner (obtained via stack inspection, since the callback plugin API doesn’t pass it) doesn’t seem to prevent this from happening. I haven’t had time to debug this further, but AFAICT from reading the code base, there’s no support for more advanced control of printing than that.
Any thoughts on how to print sanely in this situation, preferably without having to patch Ansible or log each task to a different file?