When running win_shell module in our playbooks with async or become, we are sometimes left with ansible exec_wrapper and a child conhost process that won’t return and keeps eating resources on the node. Since we are using ansible through ZUUL to run Continuous Integration, we quickly fill up workers with hanging wrappers, leaving them with 100% Processor utilization.
Anyone seen this and have any hints on what to do?
Ansible 2.9
The decoded commandline of the hanging process:
I think the more important question here is what are your win_shell commands actually running. If it’s spawning another process then that could be taking control of the conhost used by WinRM keeping the Ansible wrapper process still running in the backround. Otherwise if you are just running win_shell commands with long running tasks as async and fire and forget then they will eventually build up and take over all resources.
We have seen no pattern to the content of the win_shell tasks, and when we ran async they were all with a timeout. As we suspected problem with async, we have worked around the need for async-timeout tasks, but this still happens for tasks with become. Exploring the process tree shows just two powershell processes, both with commandline line containing the wrapper (one with the a powershell call before it) and the conhost, as descendants of each other. All other processes that the win_shell started have finished - the scripts having deployed logs etc as they were supposed to, and no cmd, powershell or python processes remaining (which is what the win_shell starts)
It’s really hard to tell unfortunately, this is the first time I’ve seen this problem before, or at least someone has reported it. I would recommend trying to narrow it down to the simplest reproducer and remove as many moving parts. This usually means
Try to use a simple win_shell example like; ‘- win_shell: echo “hi”’ and see if the problem persists
Use win_command to fully control what is being output, you can try powershell and cmd and see if it’s still there
Determine whether it happens all the time or not
When using become, try different user accounts, is this happening in all cases or for just certain ones
Determine any common info about the Windows host, OS version, PS version, etc
You could try using the psrp connection plugin which doesn’t actually spawn the conhost with the connection but only when needed. There will be one spawned when you use win_shell but that should be tied to that not the underlying exec wrapper.