Performance improvements

Hi all, I’ve dedicated this week to begin looking into some of the performance issues people have reported since 2.0 was released, and I have a working branch I’d like some testing against:

https://github.com/ansible/ansible/compare/performance_improvements

Notably, there are improvements related to problems reported in #12239 and #16749. The later is very interesting, as it highlighted a exponential decay in performance when nesting includes (it became very noticeable once you got over 7 layers deep) using dynamic includes for recursion. As noted in the issue, at 30 levels deep it took about 1.5 hours to finish - this branch does the same in under 5 seconds.

There are a few known issues with this branch:

  • The algorithm for queueing tasks can starve and deadlocks if the number of workers is < the number of inventory hosts. Still looking into that (I’m somewhat surprised that this didn’t happen with the old code, as the changes were relatively minor there).
  • Per-item callbacks are broken.

Thanks, and let me know if you run into bugs while testing this!