Has there been any dicsussion around session mgmt in Ansible’s WinRM implementation? I don’t know much about the protocol-level workings of Ansible, but as far as I can see:
Each task in a play is treated fairly independent (open session, do winrm things, close session).
This leads to multuple processes getting spun up and down with each task
Each process spinup is essentially a new ps runspace created, which I assume is a significant factor in Ansible’s slowness against windows-based nodes. It’s also contributing to cpu usage, as setting up a ps runspace is a cpu-intensive task
As far as I can see, using ansible loops such as “with_items” will reuse the same “Shell ID”, which leads me to believe that there is at least some session management in place.
I’m researching wether it’s possible to have an “always-running” PS Remoting configuration server-side that ansible would connect to, but this would take a bunch of configuration so imho it would be better to make Ansible “smarter” in terms of sessions/session reuse.
Has there been any discussions aroud how to improve this?
Reusing the same shell_id across tasks/plays is one way to eliminate the need for the open_shell and close_shell requests per task. There would still be a new HTTP connection with authentication for each task, but not the overhead of a new process. There would be the issues of where to save the shell_id between tasks, how to invalidate the shell_id if the user/host/port/authentication changes, and (most importantly) how to cleanup the shell at the end of the play/playbook.
The persistent connection framework used for networking (with ansible-connection) may be another option to explore, since it enables something akin to SSH’s ControlPersist for other connection types.
I guess it’s also worth exploring where the slowness comes from. I’m pretty sure the process spinup/spindown is a factor, but Ansible is also pushing files across the wire (xml-based, so that’s also probably slow). Not sure if Matt has done any tracing to try and determine the “call stack” but I guess that could be done by measuring the various pywinrm calls.
Looking at persisted connections/shells is definitely something we want to look at and because 2.6 is going to be a stabilization release from the core team 2.7 is where we will start looking into it. WinRM has 2 components that make it really slow compared to SSH;
The network latency and number of packets that are required
The time it takes to startup PowerShell.exe on each task
Having persisted connections/shells will definitely help with the first one but it requires some internal work inside Ansible to really achieve properly. The latter part is a lot harder but swapping over to PSRP over WinRM with persisted connections will definitely improve that story. For file transfers there is not much else we can really do. Matt’s pretty much reached the limit of what you can do with that protocol and to get good file transfer speeds we either have to use SMB or the SSH/SFTP implementation from Microsoft.
You should be able to use the persistent connection capabilities we built for networking here. It will allow the connection plugin to persistent from task to task and only shutdown the session once the play has completed. When you get ready to look at implementing this, we can go over the implementation for connection network_cli and use that as a starting point for how you might achieve the same with WinRM. I know in the networking use case, our performance increase was pretty significant but it does come at a cost and that is an increase in both running processes and managed TCP connections.