ssh_alt - new version

Hello,

the new version of ssh_alt branch can be found here :

diff: https://github.com/jeromew/ansible/commit/23da88f45318fcdb8350bd7671c3d3f88bcab7c2

branch: https://github.com/jeromew/ansible/tree/ansible_ssh_alt

test with “ansible-playbook -c ssh_alt”

this version :

  • has the pipelining feature on ssh_alt
  • works with ansible-playbook -k and -K options
  • has a mechanism in the runner to play nice with action_plugins and cases where the remote tmp is necessary

in case there could be OS level differences, my current setup is fedora19 (control machine) working with centos 6.4 remotes.

I hope your tests will run ok and that you will get speed boosts like me :slight_smile:

on my dev setup (fedora19 vagrant against centos6.4 vmware), 100 x command: echo “ping”
ssh: 26 sec
ssh_alt: 9 sec !!

Jerome
ps: I squashed the branch & merged with the HEAD of ansible/devel. I hope it is clearer for reviewers

Just did a quick test running ansible -m setup on 28 hosts:

Regular ssh:

real 0m18.164s
user 0m36.436s
sys 0m1.836s

With ssh_alt:

real 0m8.274s
user 0m7.784s
sys 0m1.864s

SO, yeah!

One error I encountered, was with hosts that are currently down:
Normally, I get the typical error “FAILED => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue”
With ssh_alt, this tracebacks:

FAILED => Traceback (most recent call last):
File “/home/serge/src/ansible/lib/ansible/runner/init.py”, line 413, in _executor
exec_rc = self._executor_internal(host, new_stdin)
File “/home/serge/src/ansible/lib/ansible/runner/init.py”, line 504, in _executor_internal
return self._executor_internal_inner(host, self.module_name, self.module_args, inject, port, complex_args=complex_args)
File “/home/serge/src/ansible/lib/ansible/runner/init.py”, line 704, in _executor_internal_inner
result = handler.run(conn, tmp, module_name, module_args, inject, complex_args)
File “/home/serge/src/ansible/lib/ansible/runner/action_plugins/normal.py”, line 54, in run
return self.runner._execute_module(conn, tmp, module_name, module_args, inject=inject, complex_args=complex_args)
File “/home/serge/src/ansible/lib/ansible/runner/init.py”, line 379, in _execute_module
res = self._low_level_exec_command(conn, cmd, tmp, sudoable=sudoable, in_data=in_data)
File “/home/serge/src/ansible/lib/ansible/runner/init.py”, line 815, in _low_level_exec_command
rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user, sudoable=sudoable, executable=executable, in_data=in_data)
File “/home/serge/src/ansible/lib/ansible/runner/connection_plugins/ssh_alt.py”, line 233, in exec_command
stdin.write(in_data)
IOError: [Errno 32] Broken pipe

HTH,

Serge

Awesome!!!

– Michael

Hello,

Glad it makes such a difference on 28 hosts !

The error “SSH encountered an unknown error…” is only thrown in _make_tmp_path. In the current implementation _make_tmp_path serves as an implicit guard to test connectivity since it always is the first ssh connection made.

can you give me the exact command you are launching ? with a server down I get :

ansible all -m command -a “/bin/echo hello” -c ssh

FAILED => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

ansible all -m command -a “/bin/echo hello” -c ssh_alt

ssh_alt:

FAILED >> {
“failed”: true,
“msg”: “”,
“parsed”: false
}

Thanks

$ ansible dba-vvo-pr-1 -m setup -c ssh

dba-vvo-pr-1 | FAILED => SSH encountered an unknown error during the
connection. We recommend you re-run the command using -vvvv, which will
enable SSH debugging output to help diagnose the issue

$ ansible dba-vvo-pr-1 -m setup -c ssh_alt

dba-vvo-pr-1 | FAILED => Traceback (most recent call last):
  File "/home/serge/src/ansible/lib/ansible/runner/__init__.py", line 413,
in _executor
    exec_rc = self._executor_internal(host, new_stdin)
  File "/home/serge/src/ansible/lib/ansible/runner/__init__.py", line 504,
in _executor_internal
    return self._executor_internal_inner(host, self.module_name,
self.module_args, inject, port, complex_args=complex_args)
  File "/home/serge/src/ansible/lib/ansible/runner/__init__.py", line 704,
in _executor_internal_inner
    result = handler.run(conn, tmp, module_name, module_args, inject,
complex_args)
  File
"/home/serge/src/ansible/lib/ansible/runner/action_plugins/normal.py", line
54, in run
    return self.runner._execute_module(conn, tmp, module_name, module_args,
inject=inject, complex_args=complex_args)
  File "/home/serge/src/ansible/lib/ansible/runner/__init__.py", line 379,
in _execute_module
    res = self._low_level_exec_command(conn, cmd, tmp, sudoable=sudoable,
in_data=in_data)
  File "/home/serge/src/ansible/lib/ansible/runner/__init__.py", line 815,
in _low_level_exec_command
    rc, stdin, stdout, stderr = conn.exec_command(cmd, tmp, sudo_user,
sudoable=sudoable, executable=executable, in_data=in_data)
  File
"/home/serge/src/ansible/lib/ansible/runner/connection_plugins/ssh_alt.py",
line 233, in exec_command
    stdin.write(in_data)
IOError: [Errno 32] Broken pipe

Serge,

the bug is fixed you can pull if you want to re-clone the branch.
https://github.com/jeromew/ansible/tree/ansible_ssh_alt

can you confirm it now gives you a meaningful message ?

do not hesitate if you hit another issue.

Thanks
Jerome

the bug is fixed you can pull if you want to re-clone the branch.
https://github.com/jeromew/ansible/tree/ansible_ssh_alt

can you confirm it now gives you a meaningful message ?

Yes!

​​$ ansible dba-vvo-pr-1 -m setup -c ssh

dba-vvo-pr-1 | FAILED => SSH encountered an unknown error during the
connection. We recommend you re-run the command using -vvvv, which will
enable SSH debugging output to help diagnose the issue

$ ansible dba-vvo-pr-1 -m setup -c ssh_alt

dba-vvo-pr-1 | FAILED => SSH Error: data could not be sent to the remote
host. Make sure this host can be reached over ssh

Merci,

Serge

FYI the only reason host count applies here is you need to set --forks here as the default is crazy low – just 5, which I should increase for people who are not aware.

You should test with 1 host and a playbook that runs ping 20 times because that is how you gauge trip time.

– Michael

Ok noted. --fork=1 could maybe be used also ?
Otherwise yes my initial tests where doing 100 times a module on 1 host.

Did you see any speed up at all in your tests ? (if you had Time to test because i believe you are hard at work on galaxy)

Yep, haven’t :slight_smile:

Have you got it working for copy/template type operations (that need to move files) and to know when to not optimize for old style modules?

If so, I’d really like a pull request so we don’t lose track of it. I believe github lets you keep updating the pull request by updating the branch.

If not, if you want to tackle those things, that will greatly improve the likelihood of me testing and it and getting it into core’s development tree where exponentially more people can test it (and we can also loop in ansible-project) :slight_smile:

Thanks!!

–Michael

Just for sharing some other quick test results.

really awesome results !! thanks for that !

control machine is a F20 VM (with ansible-1.5-0.git201312062002)

target host is a rhel6.5 VM

One playbook doing numerous config tuning to the remote hosts (and with sudo: yes so -K needed at the CL)
All run leading to the exact same sequence being executed.
Assuming the time to enter the sudo password is similar on the 4 runs.

Test sequence 1: ansible control node behind a slow VPN connection and target VM in a remote datacenter

run 1.1: with -c ssh_alt gives:

real 8m34.037s

user 0m10.172s

sys 0m10.365s

run 1.2 with nothing (smart mode) gives :

real 26m40.383s

user 0m10.151s

sys 0m10.687s

So a 3X improvement !!

Test sequence 2 : ansible controle node on same internal network (but on different subnet) then target VM - the same as in sequence 1

run 2.1 with -C ssh_alt gives:

real 2m35.765s

user 0m5.695s

sys 0m3.166s

run 2.2 with nothing (smart mode) gives :

real 6m38.139s

user 0m10.738s

sys 0m10.970s

so a 2.5X improvement !!

That is a very slow connection you have there!

Yes, definitely it is intended your control node lives inside your datacenter.

How many tasks are in that play?

That’s always significant in understanding how much it is spending on each item.

Thanks!

Hello Michael,

yes this version handles specific operations that need to move files: ‘assemble’, ‘copy’, ‘script’, ‘template’, ‘unarchive’
it also handles the case of new versus old style plugins.

I use “ssh_alt” everyday and haven’t run into a problem yet so hopefully it should be robust. We’ll see what happens on a wider scale !

can I ask you to commit a copy of ssh.py as ssh_alt.py in devel ?
It seems to me that the pull request will be more valuable this way because ssh_alt will be compared to something. otherwise you will have just one bug green file for ssh_alt.py.

But then maybe I am wrong. tell me what you prefer and I’ll send the PR.

Jerome

From grepping the ansible-playbook output :
85 TASKS (some being loop) and in more details:
177 ok
3 skipping
11 changed

Yes running the control node in the data center is what we do.

In that case, I was curious to give a try to ssh_alt and share the results.
the tasks are mostly about setting some values in some config files, and restart some daemons

I should have mentioned an important point:
the improvement observed is mostly linked to things being done out of the real work done by tasks
as the playbook has been run once before the measurements.

All of the four runs got the same final output: ok=87 changed=7 unreachable=0 failed=0

Thanks for the tests.
on > 100 tasks, there is I think a visible an effect as soon as the ssh round trip time is superior de 20 ms.

ssh ~ 100 * 20ms* 5
ssh_alt ~ 100 * 20ms * 1 or 2 (depending on how many calls to copy / template there are)

Maybe there would be some room for a shared “speed test playbook” measuring different aspects of ansible.

Jerome

Yes, definitely want to get ssh_alt in there for manual testing by a dozen or two people before we decide to replace the main one.

Please send a PR and I’ll put it at the top of my queue to review and get merged in.

Thanks!

–Michael

Thanks, with that being 177 tasks I don’t think the numbers were that bad, I would have been worried if there were 1/10th of that though :slight_smile:

The new improvements will definitely help things!

PR sent under
https://github.com/ansible/ansible/pull/5226

Ok I had merged this but had to revert as async wasn’t working, can you take a look, and we can merge once async is operational?

Thanks!!!