Slow ansible-playbook start up with vaults (and possible solution)

Hello,

We recently started using a vault and discovered that using it slowed down ansible start up significantly. A bit of digging made it obvious why: perhaps because of how we were using it, the vault opened and decrypted once for every one of the ~110 YAML files ansible was reading through on launch, adding about 0.5s each time it did so for a total start up time of > 1 minute. As a kind of band aid/sledgehammer solution, we modified ansible utils slightly to cache parsed YAML data and return a deepcopy of the cached data every time ansible wanted to access the same file again (see code) below, which reduced the start up time to under 10 seconds and seems to work for us. Just thought I’d share in case someone else has run into the same problem.

Regards
J Kling

Are all of the files you’re referencing vault-encrypted, or does the slowness come from the initialization of the VaultLib? It might be better to cache that object based on the encryption method and/or hashed password rather than the unencrypted contents of the YAML files themselves.

This is definitely something better discussed on ansible-devel, so copying that list on this.

Are all of the files you’re referencing vault-encrypted, or does the slowness come from the initialization of the VaultLib?

Only one file is vault encrypted, but (almost) every role references it as a vars file. I don’t know which part in particular is slow; I noticed the delay after an open() on the vault file when stracing ansible and then just confirmed it decrypted the file each time without digging any further.

It might be better to cache that object based on the encryption method and/or hashed password rather than the unencrypted contents of the YAML files themselves.

The choice to cache the parsed YAML rather than something lower down the chain was made because that avoided the most work for ansible and, since I know YAML files don’t changes during execution, seemed safe. An initial attempt only cached the decryption result, which already improved speeds considerably but was still about three times as slow as caching the lot.

I just tried caching only the VaultAES256 object instead of the above approach, but that made no difference to the start up time.

–J

I have a couple of patches in queue that might help on this (though not the core issue if the extra time is due to vault):

https://github.com/ansible/ansible/pulls/sergevanginderachter

especially this one:

https://github.com/ansible/ansible/pull/6734

Can you test this patch set? You could also test with my ‘integration’ branch that has all those pathches merged in with 1.7 devel (currently last updated some weeks ago).

https://github.com/sergevanginderachter/ansible/tree/INTEGRATION

Given you load the encrypted file as a var file, it might not address your use case, but it might help loading 110 yaml files (assuming those are primarily group/host_vars files.

Serge

I’m open to the idea of parse_yaml_from_file caching small files in memory if vault decoded.

+1 would love to see this patch get prioritized as this severely limits the utility of vault files. As an example, a playbook running against one group with ~500 hosts which in turn references a single vault-encrypted file via vars_files, takes 6 1/2 minutes to run vs ~30 seconds when the file is decrypted.

The patches I referred to earlier, have been merged in the mean time. Do
you still see this behaviour using the latest devel branch?

(This patch should make sure that encrypted file is only parsed once, where
before it would get parsed again for every of those 500 hosts)

Serge

I can confirm the problem is still present in devel. I’ll add details to the open issue but I definitely don’t see any improvement in my use case.

$ ansible-playbook --version
ansible-playbook 1.7 (devel d51e10a3f4) last updated 2014/07/26 13:13:39 (GMT -700)

Further testing on the dev build reveals that the issue does not surface when using a vault-encrypted group_vars but still affects the usage of vars_files.

vars_files paths that depend on a inventory scoped variable name are loaded differently than those that do not, though most are loaded at global scope and that would happen only once. The inventory ones would happen once per host and that could be a lot of math for large host counts, but even so should only occur once per host.

In any case, can you construct a minimal playbook that reproduces this that you’d feel comfortable sharing?

Seems I crossed streams a bit with Serge’s patch, apologies.

I’ve re-submitted my testing which reproduces the problem as a new issue:

https://github.com/ansible/ansible/issues/8340

replied to the ticket, thanks