Windows put_file performance (please test!)

Chris Church and I have been kicking around a major Windows performance fix for awhile now- I finally got around to merging our efforts together and working some of the bugs out. The result is available here: https://github.com/nitzmahone/ansible/tree/winrm_put_file . This should increase Windows module push performance by an order of magnitude, and win_copy performance by several orders of magnitude (as well as removing the restriction on the size of files that can be pushed).

Some quick n’ dirty benchmarks on this code (win_copy):

Mac->Local VM (vagrant/virtualbox), 2k/100k/1M/420M file sizes (times in seconds):
1.9.2: 4.3/15.6/74.5/(fail)
2.0.0 (this branch): 1.5/1.9/2.8/53.3

Mac->AWS-hosted (t2.micro), 2k/100k/1M/420M file sizes (times in seconds):
1.9.2: 11.8/62/553/(fail)
2.0.0 (this branch): 8.3/8.5/13.7/1244

The numbers are even better if you use the pywinrm 0.1.0 RC at https://github.com/diyan/pywinrm/tree/v0.1.x - I made a fix there that significantly improves Basic auth perf. These numbers will also improve even more once we’re using requests in pywinrm, since it supports connection keepalive. I think a big chunk of our slow WAN performance is due to renegotiating the connection on each WinRM operation.

This code still isn’t perfect (error handling and reporting, logging could be improved), but I think it’s a lot stronger than what we’re shipping now. Could I trouble some of the Windows community to try it out and give me some feedback before I shoot for a PR?

Thanks,

-Matt

I tried it out (As I’ve suffered from the windows performance issues), and it bombed out on me. Here’s the stack trace:

PLAY: ***************************************************************************

TASK [setup] ********************************************************************

An exception occurred during task execution. The full traceback is:

Traceback (most recent call last):

File “/opt/ansible-improved-winrm/lib/ansible/executor/process/worker.py”, line 118, in run

executor_result = TaskExecutor(host, task, job_vars, new_connection_info, self._new_stdin, self._loader, shared_loader_obj).run()

File “/opt/ansible-improved-winrm/lib/ansible/executor/task_executor.py”, line 111, in run

res = self._execute()

File “/opt/ansible-improved-winrm/lib/ansible/executor/task_executor.py”, line 218, in _execute

self._handler = self._get_action_handler(connection=self._connection, templar=templar)

File “/opt/ansible-improved-winrm/lib/ansible/executor/task_executor.py”, line 425, in _get_action_handler

shared_loader_obj=self._shared_loader_obj,

File “/opt/ansible-improved-winrm/lib/ansible/plugins/init.py”, line 284, in get

obj = getattr(self._module_cache[path], self.class_name)(*args, **kwargs)

File “/opt/ansible-improved-winrm/lib/ansible/plugins/action/init.py”, line 64, in init

self._shell = shell_loader.get(shell_type)

File “/opt/ansible-improved-winrm/lib/ansible/plugins/init.py”, line 274, in get

path = self.find_plugin(name)

File “/opt/ansible-improved-winrm/lib/ansible/plugins/init.py”, line 246, in find_plugin

if not name.startswith(‘_’):

AttributeError: ‘ShellModule’ object has no attribute ‘startswith’

fatal: [b-svd1-3.25.svd1.maxpointinteractive.com]: FAILED! => {“failed”: true, “stdout”: “”}

NO MORE HOSTS LEFT **************************************************************

PLAY RECAP **********************************************************************

b-svd1-3.25.svd1.maxpointinteractive.com : ok=0 changed=0 unreachable=0 failed=1

I also am having some odd errors when trying to run your fork – but I’m not convinced that isn’t something weird with my environment. Notably my group_vars don’t appear to get loaded.

Super excited for this!

Hmm, that’s pretty far away from where this code change is- looks like you just have a broken 2.0 setup (the connection plugin hasn’t even loaded yet).

I also have been suffering from poor windows performance. Could you provide a little more detail on what is needed to test this? Is it as simple as checking out your fork, running source hacking/env-setup and going to town?

Thanks

Michael

As long as you already have a working 2.0 environment, yes (ie, with the new Python dependencies for 2.0). If not, you’ll need to ensure that you have the latest of all the 2.0 requirements first (eg, sudo pip --upgrade install paramiko PyYAML Jinja2 httplib2 six), then do the usual “install from source” from my branch/repo.

Okay I checked out your fork then ran the playbook

2.0 is a little tighter- the tasks have to be a list, not a dict (ie, put a dash before win_file:)

2.0 is a little tighter- the tasks have to be a list, not a dict (ie, put a dash before win_file:)

Sorry that was a bad copy and paste job. I’ve tried formatting the playbook a few different ways. With the dash results in “ERROR! no action detected in task.” Out of exasperation I copied the example playbook from http://docs.ansible.com/ansible/playbooks_intro.html and tried to run it. It also returns “ERROR! no action detected in task.” Are we sure the parser isn’t broken in this fork? I ran a git pull and there wasn’t anything pull down.

Michael

I hate the answer, “works on my machine”, but, well, it does… :wink: If you’re able to run 2.0 devel without issues but this fork is busted, there’s something to talk about, but otherwise, I’m guessing you’re missing 2.0 dependencies somewhere, you’ve got some old cruft on your path, or you’ve got some other environmental issue…

Here’s my development env pip list output:

(ansiblev2)mdavis-mac:ansiblev2 mdavis$ pip list
alabaster (0.7.6)
ansible (2.0.0)
Babel (2.0)
boto (2.38.0)
cffi (1.1.2)
cryptography (0.9.1)
docutils (0.12)
ecdsa (0.13)
enum34 (1.0.4)
funcsigs (0.4)
httplib2 (0.9.1)
idna (2.0)
ipaddress (1.0.7)
isodate (0.5.1)
Jinja2 (2.8)
kerberos (1.2.2)
magicmock (0.3)
MarkupSafe (0.23)
mock (1.3.0)
MySQL-python (1.2.5)
ndg-httpsclient (0.4.0)
paramiko (1.15.2)
patch (1.14.2)
pbr (1.6.0)
pip (7.1.2)
pyasn1 (0.1.8)
pycparser (2.14)
pycrypto (2.6.1)
Pygments (2.0.2)
pyOpenSSL (0.15.1)
pytz (2015.4)
PyYAML (3.11)
requests (2.7.0)
setuptools (17.0)
six (1.9.0)
snowballstemmer (1.2.0)
Sphinx (1.3.1)
sphinx-rtd-theme (0.1.8)
ssl-opt-out (0.9.1)
units (0.6)
wheel (0.24.0)
xmltodict (0.9.2)

Fair enough, the issue was local to my environment. The modules weren’t available. A few git submodule update… and I’m up and running. The error message wasn’t particularly useful but alas I can run playbooks now! I’m running RHEL 6.5 with python 2.6.6

TASK [Create directory] *********************************************************
An exception occurred during task execution. The full traceback is:
Traceback (most recent call last):
File “/home/mperzel/ansible-test/ansible/lib/ansible/executor/process/worker.py”, line 118, in run
executor_result = TaskExecutor(host, task, job_vars, new_connection_info, self._new_stdin, self._loader, shared_loader_obj).run()
File “/home/mperzel/ansible-test/ansible/lib/ansible/executor/task_executor.py”, line 111, in run
res = self._execute()
File “/home/mperzel/ansible-test/ansible/lib/ansible/executor/task_executor.py”, line 218, in _execute
self._handler = self._get_action_handler(connection=self._connection, templar=templar)
File “/home/mperzel/ansible-test/ansible/lib/ansible/executor/task_executor.py”, line 425, in _get_action_handler
shared_loader_obj=self._shared_loader_obj,
File “/home/mperzel/ansible-test/ansible/lib/ansible/plugins/init.py”, line 284, in get
obj = getattr(self._module_cache[path], self.class_name)(*args, **kwargs)
File “/home/mperzel/ansible-test/ansible/lib/ansible/plugins/action/init.py”, line 64, in init
self.shell = shell_loader.get(shell_type)
File “/home/mperzel/ansible-test/ansible/lib/ansible/plugins/init.py”, line 274, in get
path = self.find_plugin(name)
File “/home/mperzel/ansible-test/ansible/lib/ansible/plugins/init.py”, line 246, in find_plugin
if not name.startswith('
'):
AttributeError: ‘ShellModule’ object has no attribute ‘startswith’

fatal: [mdm-wsrv98.surescripts-dev.qa]: FAILED! => {“failed”: true, “stdout”: “”}

When I get a little more time I’ll try to dig in and see what the issue is.

Trying to test this too, but hitting the same "startswith" error at the moment.

I have cloned, run git submodule --init and updated pip dependencies.

Matt, what Python version are you running?

My test machine is running lubuntu 14.04 with not much in the way of customization.

For info I am just testing by running the make test-winrm integration tests.

Jon

Ok, so I worked out what was going on…

In my hurry to get testing I cloned the correct repo, but forgot to switch to the branch.

For anyone else trying to test this out, here’s what I think I wound up with

mkdir putfile
git clone https://github.com/nitzmahone/ansible.git
cd ansible
git submodule --init --recursive
git checkout winrm_put_file
git submodule update
ansible --version
cd test/integration
export TEST_FLAGS=‘-i /etc/ansible/TEST_INVENTORY -t test_win_copy’
make test_winrm

I experienced a hang when the tests reached the following task:

test_win_copy : copy an empty file

For what its worth, the traceback when I hit ctrl-C was as follows

Process WorkerProcess-5:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
Process WorkerProcess-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
Process WorkerProcess-4:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/home/jon/putfile/ansible/lib/ansible/executor/process/worker.py”, line 100, in run
self.run()
File “/home/jon/putfile/ansible/lib/ansible/executor/process/worker.py”, line 100, in run
(host, task, basedir, job_vars, play_context, shared_loader_obj) = self._main_q.get()
File “/usr/lib/python2.7/multiprocessing/queues.py”, line 117, in get
(host, task, basedir, job_vars, play_context, shared_loader_obj) = self._main_q.get()

Hope this edge case can be fixed as I re-ran the tests I did when the issue about win_copy size limitation first appear and its really, really quick…

1 mb real 0m3.640s
2 mb real 0m5.864s
3 mb real 0m7.809s
4 mb real 0m6.791s
8 mb real 0m9.341s
16 mb real 0m12.819s

This is really encouraging as, thinking about it, I made the test harder this time as its hitting 2 vms, not 1 as previously. Not retested with the old code but 16 mb wouldn’t even run before.

Happy to test further, just let me know…

Jon

For info spotted another error in the integration tests

for me, it fails during win_file tests at this point

  • name: win copy directory structure over
    win_copy: src=foobar dest={{win_output_dir}}

ran with -vvvvv but it hung. When hit Ctrl-C I got this traceback:

Process WorkerProcess-2:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/home/jon/putfile/ansible/lib/ansible/executor/process/worker.py”, line 100, in run
(host, task, basedir, job_vars, play_context, shared_loader_obj) = self._main_q.get()
File “/usr/lib/python2.7/multiprocessing/queues.py”, line 117, in get
Process WorkerProcess-4:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/home/jon/putfile/ansible/lib/ansible/executor/process/worker.py”, line 100, in run
(host, task, basedir, job_vars, play_context, shared_loader_obj) = self._main_q.get()
File “/usr/lib/python2.7/multiprocessing/queues.py”, line 117, in get
Process WorkerProcess-3:
Traceback (most recent call last):
File “/usr/lib/python2.7/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/home/jon/putfile/ansible/lib/ansible/executor/process/worker.py”, line 100, in run
(host, task, basedir, job_vars, play_context, shared_loader_obj) = self._main_q.get()
File “/usr/lib/python2.7/multiprocessing/queues.py”, line 117, in get

Also retried with 1 host but got the same result.

Hope this helps.

Jon

I also had missed the step to switch branches. For completeness I also ran source hacking/env-setup after step “git submodule update” in Jon’s directions.

The use case I was most concerned with of creating one directory on one windows hosts has gone from 45 seconds to 22 seconds!

/opt/ansible/ansible-test/win_file> time ansible-playbook main.yml

TASK [Create directory] ********************************************************
changed: [host]

real 0m22.714s

I also ran the winrm tests and task [test_win_copy : copy an empty file] hangs for me as well.

/me hangs head in shame for not running integration tests before uploading…

I had a TODO in the source to deal with the 0-length file case, but hadn’t gotten to it yet. I’ve committed an update that does that- makes all the winrm integration tests pass.

Man, 22s is still really slow for a win_file dir create- are you running this against a Cuisinart over a 300baud modem or something? :wink:

Make sure you’re running with the new pywinrm 0.1.1 that just released last week- that has my perf fix that halves the number of winrm round-trips when you’re using Basic auth.

Just for reference, this branch with pywinrm 0.1.1 does

time ansible win2008r2-vagrant -i hosts -m win_file -a “path=c:/makeme state=directory”

in ~1.7s against a local VM, and ~6s against a t2.micro 2012R2 in AWS.

-Matt

Thanks Matt

I’ve picked up your latest changes and ran through all the winrm integration tests successfully.

I also installed pywinrm 0.1.1

sudo pip install pywinrm --upgrade

Its clearly even faster with new pywinrm.

Total time to run the tests was just over 19 minutes on my test environment - I’ve seen it take over 30 mins on the same kit.

I re-ran my copy tests again and the 16Mb file transferred in 7.668s (to two hosts).

Please create a PR for this when you can.

In fact, if I can only have 1 PR for Christmas, this would be it. :slight_smile:

Many thanks for all your (and Chris Church’s) work on this.

Jon

Lol, my test rig is 2 local RHEL servers. It boggles my mind it takes so long but 22 seconds is a lot better than 45. raw mkdir c:\temp runs in a couple seconds. I am using an old version of pywinrm. I’ll try out your patched version.

Thanks,

Once you’ve done the pywinrm upgrade, if it’s still taking more than about 2s between local servers, can I get you to post output from a run where ANSIBLE_DEBUG=1 is set in the runner’s environment? I’m just curious what the heck it’s doing that’s taking so long (or if we need to add some more instrumentation to figure it out).