We recently started using a vault and discovered that using it slowed down ansible start up significantly. A bit of digging made it obvious why: perhaps because of how we were using it, the vault opened and decrypted once for every one of the ~110 YAML files ansible was reading through on launch, adding about 0.5s each time it did so for a total start up time of > 1 minute. As a kind of band aid/sledgehammer solution, we modified ansible utils slightly to cache parsed YAML data and return a deepcopy of the cached data every time ansible wanted to access the same file again (see code) below, which reduced the start up time to under 10 seconds and seems to work for us. Just thought I’d share in case someone else has run into the same problem.
Are all of the files you’re referencing vault-encrypted, or does the slowness come from the initialization of the VaultLib? It might be better to cache that object based on the encryption method and/or hashed password rather than the unencrypted contents of the YAML files themselves.
This is definitely something better discussed on ansible-devel, so copying that list on this.
Are all of the files you’re referencing vault-encrypted, or does the slowness come from the initialization of the VaultLib?
Only one file is vault encrypted, but (almost) every role references it as a vars file. I don’t know which part in particular is slow; I noticed the delay after an open() on the vault file when stracing ansible and then just confirmed it decrypted the file each time without digging any further.
It might be better to cache that object based on the encryption method and/or hashed password rather than the unencrypted contents of the YAML files themselves.
The choice to cache the parsed YAML rather than something lower down the chain was made because that avoided the most work for ansible and, since I know YAML files don’t changes during execution, seemed safe. An initial attempt only cached the decryption result, which already improved speeds considerably but was still about three times as slow as caching the lot.
I just tried caching only the VaultAES256 object instead of the above approach, but that made no difference to the start up time.
Can you test this patch set? You could also test with my ‘integration’ branch that has all those pathches merged in with 1.7 devel (currently last updated some weeks ago).
Given you load the encrypted file as a var file, it might not address your use case, but it might help loading 110 yaml files (assuming those are primarily group/host_vars files.
+1 would love to see this patch get prioritized as this severely limits the utility of vault files. As an example, a playbook running against one group with ~500 hosts which in turn references a single vault-encrypted file via vars_files, takes 6 1/2 minutes to run vs ~30 seconds when the file is decrypted.
Further testing on the dev build reveals that the issue does not surface when using a vault-encrypted group_vars but still affects the usage of vars_files.
vars_files paths that depend on a inventory scoped variable name are loaded differently than those that do not, though most are loaded at global scope and that would happen only once. The inventory ones would happen once per host and that could be a lot of math for large host counts, but even so should only occur once per host.
In any case, can you construct a minimal playbook that reproduces this that you’d feel comfortable sharing?