On 0.5 roadmap, play books, md5sums, performance, tempdirs, large file copying, strings, sealing wax, and other fancy stuff

Hi all.

Roadmap update for you, since I’ve been getting some questions.

First off – Playbooks in 0.4 (“Unchained”) will not be changed much, so don’t look for much to happen. My desire is to close all bugs in the 0.4 milestone and be done with 0.4 when that happens. The goal for this, for people who are new to the list, is a mid may release.

Depending on the size of the incoming features, I may accommodate a few small module tweaks before then. Depending on how fast bugs are squashed, the release may happen earlier, but I’d encourage people interested in doing anything major to wait. This is encouragement to go squash bugs, as I will not be squashing all of them (hint, hint).

In 0.5 (“Amsterdam”), one of the first major blocking things I will be doing is making play books more object oriented, much like I recently did with inventory. This will mean there is going to be a class/file named ‘play’, another called ‘task’, one called ‘handler’,
and so on. This will also allow asking those objects meaningful questions – like what modules they use, what files they would use (given template input), etc. It will be able to ask a playbook for all of it’s plays, and be able to intelligently do things with them.

At this point in time, it should be possible to request a tempdir from runner and NOT destroy it immediately, but leave it open. It should be possible to take up all modules that need to be transferred, tar them (maybe), transfer them, and untar them in one
step. It should be possible to delete files only at the very end of the play. All things people have been asking for. This will not require any additional configuration.

At this point in time, YAML becomes the language of the playbook parser, but it would also be theoretically possible to run the playbooks from anything – though I doubt we’ll be doing that. playbooks.py will remain the place for the parser, but it will
grow lots shorter, and runner.py may acquire some methods only used by playbooks. The code will then be much easier to work on.

I say this because I’ve received a lot of questions for “wouldn’t it be nice if”, with regards to playbooks, optimizing operations, to some extent variable consistency (though some of that’s in 0.4) and so forth. Yes, those things would be nice… but doing those things now in playbooks/runner means those hacks will be harder to reimplement later.

Thus, at this point in 0.4, I wish to stabilize the current release, knock out bugs, and get it out the door. So, the next two weeks or so are about squashing bugs, and that’s pretty much it.

0.5 will focus extensively on streaming the playbook code – WITHOUT changing the playbook language – and opening the door up for better performance and possibly some improved reporting (TBD). I think you’ll like it. This means pretty sweeping code changes will happen early on in that release, and then we’ll let people riff on them and see what we can do with what they can enable.

I would request folks withhold requests for “moar faster stuff!!!” and language features until this happens. You’ll like it better if this happens in 0.5. Won’t be long.

–Michael

The only suggestion I would have is in the post I made a few minutes ago. I would like (in 0.6 or whenever) for Ansible to offer a quick/dirty https server on some arbitrary port on the overlord instance, and then use a randomized secret to download files to the target from the overlord, say from the ~packages directory. While the current ‘copy’ action is nice and all, https is faster than sftp and besides there are many command line utilities that want/expect/can handle an http(s) url. Because we can pass secrets via ssh between the overlord and target instances, security would not be significantly affected. We could use Tornado to serve a default directory we could call, say, $ANSIBLE_HOME/packages or (more generally) $ANSIBLE_HOME/downloads.

I can kind of see something like a “wget” module, maybe. I don’t see it likely that Ansible would support any kind of file server. It being daemonless (and stripped down) was one of the original design goals.

I agree http://, Samba, or NFS is faster for large file transfer – though I think we’ll probably leave that up to the reader – perhaps there’s a local copy equivalent, I don’t know. Would have to discuss more what that might entail.

–Michael

By ‘daemon’ you mean ‘a process that forks off and runs in the background’, right? Because that’s not what I had in mind – just a small one-line non-blocking HTTP server that begins listening when ansible begins executing a playbook, and stops when ansible stops, and only accepts encrypted connections from ansible clients. That seems very useful to me, and it’s not a solution that you could arrive at just by mixing and matching preexisting transfer tools (which, as you pointed out, typically are daemons and thus create an additional attack surface). You wouldn’t even need to fork at all, green threads are good enough, by GIL’s beard!

That being said: you’re right, it duplicates functionality available elsewhere.

Yeah, I had considered you may have meant that – start up the fileserver for a particular file on a unique URL (http[s?]://overlord/<random_uuid>) until all nodes are serviced, and perhaps the copy module is as brutally implemented (read: optimized) as using the ‘raw’ model to invoke wget. Then it immediately goes down and stops serving after the play.

This would be reasonably clever and somewhat fun to implement. Probably not huge.

I wouldn’t see a problem with it forking. I prefer that, really. As long as all modules were not executed asynchronously for transfer (which we don’t support anyway), it wouldn’t be that bad (just use a high --forks count). I suppose it could be executed async,
but we’d need a file reference count mechanism and a timeout of when to kill the server, and so forth.

The problem is that it’s yet another port to be accessible, so not everyone could utilize it, would require extra work with tunnels and such, etc, so it probably shouldn’t be the default either.

In that event, I question whether you’d even offer it encrypted at all. In a raise between https:// and scp are they REALLY that different? Probably not. The issue is probably the pure Python implementation we’re using now. So, for people comfortable
with their network, it could be a speedup, and those who are not, should still use scp.

This probably reads like an option to the copy module.

I’ll put it down for consideration in 0.5, towards the end of the roadmap for that release, depending on how things go with playbook upgrades. If it looks like it complicates code too much or would be too hard to maintain, we won’t do it.

–Michael

I’ve been looking at this wrong.

One benefit to doing this everywhere, BTW, would be that by not switching sudo paramiko into file transfer mode (which requires closing the command connection, but only in sudo mode), connections could be left open for the life of the play, and not have to be restarted.

This could actually be pretty huge because it’s usuable for module transfer.

It seems that rather than being something specific to the copy module it is more likely something that is a setting, whether specified in playbooks/etc/somewhere else.

It would have to be optional though – but I am liking this more now.

–Michael

I could be wrong about where connections are being closed, but I'm
pretty sure I took out the connection closing stuff from the sudo
code, and it still works. :wink: It looks like a new SSH connection is
created for each task, but it seems this is more because each task
makes a new Runner which makes a new Connection, rather than anybody
closing the old one. Perhaps we could re-use Connections.

Another optimization opportunity would be to have Connection just open
an SFTP channel once and then re-use that for multiple files.

Would a file server thread really buy us much? If you have static
files to copy to many nodes, you would get better performance setting
up a dedicated web server. If you need to template files, I'd guess
you're CPU limited by the overlord anyway.

-John

I could be wrong about where connections are being closed, but I’m
pretty sure I took out the connection closing stuff from the sudo
code, and it still works. :wink: It looks like a new SSH connection is
created for each task, but it seems this is more because each task
makes a new Runner which makes a new Connection, rather than anybody
closing the old one. Perhaps we could re-use Connections.

Interesting. As I was saying in IRC, some kind of LRU on the connects (keep the last N
open) may be reasonable, though I’m hard pressed to find a good value for N.

Another optimization opportunity would be to have Connection just open
an SFTP channel once and then re-use that for multiple files.

Perhaps. I’d like to wait until 0.5 to play with that, where it will be easier to ask
questions about what files playbooks are needing, and it may be possible
to copy them all to the staging area first and then let the file module act on them.

Would a file server thread really buy us much? If you have static
files to copy to many nodes, you would get better performance setting
up a dedicated web server. If you need to template files, I’d guess
you’re CPU limited by the overlord anyway.

Better with a dedicated server, maybe, but probably faster than paramiko’s native python
SFTP. But probably only if unsecured.

That really seems like a good thing for a proof of concept and some benchmarks.
Still much easier done after playbook implementation reorg.

I love this idea.

Also, recursive copying could be setup with this too (I'm pretty sure
wget supports up to 5 levels of recursion).

Would be slick.