writing a an ansible module - seems that Popen is spending a long time polling

Hello all,

Perhaps this is a general Python question, if so, I apologize in advance for having had the temerity to ask it here.

One of our team was complaining that the Riak stuff was slow. Now this is always a subjective thing, and I think Ansible is fast, so I hacked the Riak module and added in some cProfile stuff so I could profile the main() block.

( All the links for running this are at https://gist.github.com/binarytemple/5ccbe8f3325b4113c432 )

I then ran it against a 3 node Vagrant cluster running on my laptop (vagrant provision), I wrote the change in such a way that every time the riak module was run, it dumped a file with profiling information to the /tmp directory, then I pulled them all down, concatenated them and took a look a the results.

To cut a long story short, there wasn’t a lot of exciting stuff in there, everything seemed to be fairly quick, apart from what seemed a long time spent polling.

1 module params = {‘target_node’: ‘riak@127.0.0.1’, ‘config_dir’: ‘/etc/riak’, ‘wait_for_service’: ‘kv’, ‘http_conn’: ‘127.0.0.1:8098’, ‘wait_for_ring’: False, ‘wait_for_han doffs’: False, ‘command’: None, ‘validate_certs’: True}
2 4232 function calls (4095 primitive calls) in 10.723 seconds
3
4 Ordered by: cumulative time
5
6 ncalls tottime percall cumtime percall filename:lineno(function)
7 1 0.000 0.000 10.721 10.721 /home/vagrant/.ansible/tmp/ansible-tmp-1413495802.29-66831348941533/riak:115(maind)
8 3 0.000 0.000 10.626 3.542 /home/vagrant/.ansible/tmp/ansible-tmp-1413495802.29-66831348941533/riak:1404(run_command)
9 3 0.000 0.000 10.613 3.538 /usr/lib/python2.7/subprocess.py:723(communicate)
10 3 0.000 0.000 10.613 3.538 /usr/lib/python2.7/subprocess.py:1303(_communicate)
11 3 0.000 0.000 10.612 3.537 /usr/lib/python2.7/subprocess.py:1336(_communicate_with_poll)
12 9 10.612 1.179 10.612 1.179 {built-in method poll}
13 1 0.000 0.000 0.397 0.397 /home/vagrant/.ansible/tmp/ansible-tmp-1413495802.29-66831348941533/riak:107(ring_check)
14 1 0.000 0.000 0.094 0.094 /home/vagrant/.ansible/tmp/ansible-tmp-1413495802.29-66831348941533/riak:1836(fetch_url)

The ring_check command, which runs under a second on the command line, or in a sample python script which uses Popen to execute, seems to be under this ‘built-in method poll’ which takes 10 seconds.

Has Python spending 10 seconds polling a command which takes less than a second to run?

This box is old, Ubuntu 12.04, Python 2.7.

Bryan Hunt

I don’t know the answer to this one.

Possibly there’s a process that was slow in closing a file descriptor (or something similar) in the riak stuff?