I’ve just merged the latest feature for 1.3: Accelerated Mode.
Accelerated mode is essentially fireball mode, with a few improvements:
- No bootstrapping required.
- Support for running commands via sudo.
- Fewer requirements! 0mq is no longer required, and only python-keyczar is required.
To use accelerated mode, all you have to do is add “accelerate: true” to your play:
Shouldn’t be a warning, should be an error.
Yes, local mode should ignore the accelerate keyword, in my opinion. Can you open a github issue for this? Thanks!
No local mode involved, I just mean that both ansible-playbook (running
locally) and the remote node running the accelerate module both need
python-keyczar, but a warning is only got from ansible-playbook locally,
whilst it doesn't report that from the remote moduie
Ahh gotcha. I saw the issue was opened, thanks!
Very interesting. Raises three questions for me:
-
What is the scope of such a temporary daemon? Is it limited to a play within a playbook? Or is it limited to a playbook across all included plays? Or is it limited to a session of ansible-playbook? Or is it limited to the workstation and lives there for those 30minutes that are mentioned in the documentation even across several instances of ansible-playbook?
-
What about any security implications of that? Anything we should be careful with? E.g. the temporary daemon does have to store the keys somewhere and I wonder if this is potentially opening some potential holes.
I assume the authentication for the initial daemon on each host is still going through the original ssh authentication process using either username/password or certificates, right?
- Is there a fallback included so that if the requirements for the accelerated mode are not met that they always fall back to ssh connection type?
Just pushed up a fix for this, if you’d like to give it a try.
Thanks a lot for the detailed answers, this is extremely helpful.
I’ve just tried my common role on one remote host with gathering_fact on 3 hosts and 12 tasks (all without changes) and compare the “connection: ssh” versus the “acceleration: true” with a time saving of 5 seconds only, down from 33 to 28s. This is less than expected. But still 15%. Is there anything I could do to debug this any further?
The time savings generally grow as the number of tasks and the number of hosts grow. Your example is a single task on 3 hosts, so the fact that you’re seeing a 15% increase is actually pretty good My usual test playbook (run on a couple of different public clouds) was a single task with 10 with_items, followed by 40 individual tasks (all were essentially command: echo “something”). All together, that means 50 tasks were run on each target, and I was typically testing with 5-10 target nodes with an equal number of forks.
Another user ran a more simple playbook (create 5 directories, sleep 10 seconds, delete the 5 directories) and was seeing a 3x increase in speed (18 seconds vs 46 when using SSH). That in particular was a very nice result, as it was 10 operations in 8 seconds (after factoring out the sleep time).
These are all compared to SSH using ControlPersist, which is itself 2-3x faster than paramiko or SSH without CP, thus the note that it can be up to 6x faster than straight SSH management connections.
OK, I’ll do another benchmark when I’ve finished rewriting my playbook and roles.
However, one more question:
Can you see a way to pass a parameter to a playbook whether acceleration should be used or not? In my playbook I’m having this:
- name: “Gather Facts”
hosts: all
connection: ssh
accelerate: “{{accelerate}}”
gather_facts: true
sudo: yes
and I’m calling this with “ansible-playbook -i HOSTS PLAYBOOK --extra-vars “accelerate=false” --ask-sudo-pass”
It still tries to use acceleration mode. Any idea?
That I did not try. I just pushed a commit up to fix this (the code was simply checking for the presence of the accelerate: keyword in the YAML and not checking the actual value assigned to the key).
If you re-pull via git, this should be working as you expect.
Found another problem with the accelerated mode: timeouts.
The task where this happens:
- name: “MySQL | Install required packages”
apt: pkg={{ item }} state=installed
with_items:
- mysql-server
- mysql-client
When I run this in accelerated mode and my host is really (!!!) slow, meaning that this takes significantly longer than 10 seconds, then the playbook aborts with the following error message:
fatal: [vm1] => timed out while waiting to receive data
Using -vvv to get more detailed debug data, I get this output:
TASK: [MySQL | Install required packages] *************************************
EXEC COMMAND /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=kkyyizrfwfsovplsimpsfxvpmqojahdj] password: " -u root /bin/sh -c '”’“‘mkdir -p $HOME/.ansible/tmp/ansible-1378283391.67-257013052215365 && chmod a+rx $HOME/.ansible/tmp/ansible-1378283391.67-257013052215365 && echo $HOME/.ansible/tmp/ansible-1378283391.67-257013052215365’”‘"’’
REMOTE_MODULE apt pkg=mysql-server,mysql-client state=installed
PUT /tmp/tmpHWCen0 TO /root/.ansible/tmp/ansible-1378283391.67-257013052215365/apt
PUT file is 48995 bytes
EXEC COMMAND /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=hicqkdxwgfyrntmdqhvjrgzhiopvfgwe] password: " -u root /bin/sh -c '”’“‘/usr/bin/python /root/.ansible/tmp/ansible-1378283391.67-257013052215365/apt; rm -rf /root/.ansible/tmp/ansible-1378283391.67-257013052215365/ >/dev/null 2>&1’”‘"’’
fatal: [vm1] => timed out while waiting to receive data
When I run the same playbook without acceleration then it takes really long but it doesn’t fail.
This is a known issue I’m working on correcting. Right now, I think we’re just going to up the socket receive timeout to 5 minutes rather than the 30 seconds it is now. I’ll be working on adding a keepalive mechanism that will still timeout connection failures quickly while allowing for more long-running connections, though we may not be able to get that in before Friday’s release so it would be in 1.4.
A timeout of 5 minutes for an unreachable host is definitely not going to be acceptable as it will pin up other tasks on other hosts, let’s see what we can get in.
Ok, I misread this, this is about long running tasks… this still may be something of a problem, but seems ok. I’m not sure 5 minutes is the magic number, but use of async for long running tasks is usually a good idea anyway.
We should make extra sure we’re not going to any other kinds of scenarios though.
When I run ansible-playbook in accelerate mode and for some reason ansible task fail the next run will not connect with following error
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
fatal: [xxx.xxx.xxx.xxx] => Failed to connect to xxx.xxx.xxx.xxx:5099
TASK: [role | task] ***********************************************
FATAL: no hosts matched or all hosts have already failed – aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @//task.retry
xxx.xxx.xxx.xxx : ok=0 changed=0 unreachable=1 failed=0
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
fatal: [xxx.xxx.xxx.xxx] => Failed to connect to xxx.xxx.xxx.xxx:5099
TASK: [role | task] *********************************************
FATAL: no hosts matched or all hosts have already failed – aborting
and so on…
Then in target host there is a ansible procces running. I run
kill -9 $(ps -aux | grep accelerate | awk ‘{print $2}’) for kill this proccess and next run success