Hi.
I investigated the locking in v2, and thought I would write down my
findings here for later discussion.
First, the motivation: if you run ansible against multiple hosts, and
some of them need host key verification (i.e. ssh issues a prompt and
waits for you to enter "yes"), then:
1. The prompts for affected hosts should be displayed one by one
2. Output from hosts that don't need prompting shouldn't be shown
while any prompt is active.
#1 is a matter of inter-process locking in ssh.py, and is the subject of
this mail.
#2 is a matter of locking in display.py. @bcoca said on IRC that this is
complicated, because the locking in display doesn't interact well with
serialisation. So it's a necessary piece, but I haven't looked into it
yet. (Note that display.py currently only locks debug output.)
There's a commented-out lock_host_keys() method in ssh.py with the
following comment:
# lock around the initial SSH connectivity so the user prompt about
# whether to add the host to known hosts is not intermingled with
# multiprocess output.
The way it is supposed to work is as follows (note that I'm describing
the intention here; the actual implementation has various problems[1]):
1. We acquire the lock before exec()ing ssh, iff host key checking
is enabled and the host key is not already known.
2. We release the lock after the connection is done.
Here's what I propose instead, after a brief discussion on IRC with
@jimi-c:
1. If host key checking is enabled, acquire the lock. (Without
checking if the key is known.)
2. Add an "echo BECOME-SUCCESS"-style command (CONNECT-SUCCESS?).
3. As soon as we detect this string in the output, we can unlock the
connection, because we know that we're past host key verification
already.
This means that we would (a) hold the lock for a shorter period, not the
entire connection, (b) not have to scan/rescan known_hosts at all.
(I would have liked to be able to detect that we've connected without
having to add another magic string, but since we don't run ssh with -v
all the time, I couldn't think of a way to do it. Suggestions welcome.)
https://github.com/amenonsen/ansible/tree/connection-locking has a
prototype patch to reintroduce the connection lock. It's a temporary
file opened pre-fork in the TaskQueueManager, whose fd is passed in to
the PlayContext, serialised, and made available to workers. The basic
bits of the patch are simple, the only real question is about where to
call self.lock_connection()/self.unlock_connection() in ssh.py[2]. I've
not implemented the magic key stuff yet.
-- Abhijit
1. The biggest problem in lock_host_keys() as written is that it calls
self.not_in_host_file() (a) twice, (b) in such a way that if ssh adds
the host key to the known hosts file during the connection, as it's
expected to, the lockfile won't be unlocked. But let's ignore that.
2. ssh.py also locks the creation of the ControlPath directory. This is
disabled in devel for want of a prepare_writable_dir function. If the
function were added, the connection-locking code would just work.
Also, paramiko_ssh.py is simpler, because host key verification is a
callback. The connection-locking tree already works there.