Ziploader merged to devel. New features in progress

I merged the first ziploader PR (
https://github.com/ansible/ansible/pull/15246 ) into devel this
morning. So far, everything seems to be working. So that's one
step.forward.

There's a few other features that I wanted to get working for
ziploader before our feature freeze on April 18th.. Notably, other
valid ways of specifying python imports and "recursive imports" --
allowing module_utils code to import other module_utils code. Today I
pushed a new ziploader branch in my repository that adds both those
features. However, there's some caveats and performance issues that I
have to iron out before they'd be ready to merge:

Here's the code, ready for any testing and suggestions that you'd like
to make : https://github.com/ansible/ansible/compare/devel...abadger:ziploader
So far its passed all my testing for correctness.

Here's the problems I know about:

* This branch can be slow.
   - I did a highly artificial test that called the ping module on a
thousand hosts (all aliases of localhost). This was about 40% slower
than the pre-ziploader baseline I took a few days ago.
   - I did a second highly artificial test that called the ping module
on a thousand hosts (all aliases of a local vm I have here). This was
just a tad slower than the pre-ziploader code... close enough that it
could disappear if I do further testing.
   - I ran a subset of the integration tests that I'd previously
baselined. These tests all use the local connection plugin (like the
first ping test). It was 115% slower than the baseline I took a few
days ago.

I may be able to use controller-side caching to address this. For any
given ansible run, only process the module once to assemble all of the
dependencies into the zip file. ansible will still need to add the
module's parameters to the wrapper every time but that's just string
formatting so it shouldn't be as time consuming as scanning the module
and its module_utils deps. The cache would be cleared between every
ansible invocation.

Another option could be to not rely so heavily on ast.parse to figure
out what's an import and what isn't. I need to do a bit of testing to
determine whether it's the use of ast.parse or the fact that we're
scanning multiple files that's leading to most of the slowdown before
embarking on this. ast.parse isn't cheap but it makes it trivial to
know whether we have ansible.module_utils imports to deal with. Doing
our own parsing will be much more complex.

Caveats:

* Currently doesn't handle relative imports ("from .urls import
fetch_url"). ansible.module_utils has to appear in the import
statement. This is something I don't think should be too hard to add
but I'm not deeply bothered about not being able to do it. Relative
imports in python2.4 are bad because the syntax there is ambiguous.
We can always use from ansible.module_utils[...], there's no need to
use the relative import. I want to address performance before I
clutter up the code with handling for this.

* Currently this code only handles python modules, not python
packages. What that means is that it handles *.py files directly
inside of module_utils/. It does not handle a directory in
module_utils that contains an __init__.py and other *.py files. This
deficiency bothers me much more than lack of relative imports. but it
will be even more costly to perform than what we currently have. So
I'm not anxious to add directories that we have to scan until after
I've had a chance to optimize the code to see if recursive python
module files can be made fast enough.

-Toshio

I realized that my integration performance test failed to account for the fact that one of the tests is installing an OS package. When the OS package manager’s metadata is out of date the test takes much longer to run than when the cache is fresh. Reran my performance tests with the following results:

(pipelining=True and module_compression=ZIP_DEFLATED for these tests)

Branch Ping (local) Ping (net) Integration
====== ============ ========== ===========
pre-ziploader devel real 29.95 real 53.65 real 129.82
ziploader devel (current HEAD) real 30.83 real 54.16 real 131.99
ziploader, nonrecursive ast.parse real 30.74 real 53.35 real 135.70
ziploader, recursive ast.parse real 49.29 real 58.21 real 155.48

So the integration tests are only about 15% slower. Ping across ssh to a local vm (1000 host aliases) is under 10% slower. Ping to localhost is the big loser at ~65% slower.

-Toshio

I added controller-side caching of modules and this branch is now as quick
as baseline so I've made a Pr for review and merge:
https://github.com/ansible/ansible/pull/15344

Feature Freeze for 2.1.0 (for us, feature freeze coincides with rc1... so
there will likely be rc2, rc3, etc with bugfixes) is on April 18th so this
may be the last round of feature additions to ziploader in time for 2.1.0.
The caveats about not understanding python packages (subdirectories of
module_utils) and relative imports still holds true but imports between the
modules in module_utils is a significant step forward even without those.
We'll likely add some more of those for 2.2 development at the same time as
we start using ziploader's functionality to make things like python3
porting easier.

-Toshio