Hi,
I have been diving into the Inventory the last couple of days, and for part of that I wrote a script to dump my - mostly dir/ini based - inventory into a json set that follows the api for dynamic inventory. This made me notice that dumping the inventory, based on dir/ini parsing, then adding some own parsing for the group/host_vars files and then offering that json result set back to ansible through a script that just cat’s the json as part of an inventory script, was way more efficient.
Running a simple ansible -m debug on an inventory subset containing around 600 hosts and 600 groups, standard ansible took around 50s to parse the inventory, whilst my script took around 5s.
The issue at hand is how the parsing in vars_plugins works. (Vars_plugins basically means parsing group_vars and host_vars - as I’m not aware of any other implementations.)
Whilst the groups_vars.py plugin has probably the most work to parse group_vars files, the way it is now implemented, the plugin gets called for every host in the inventory and will parse all files again and again (which makes me think of why the “_meta: hostvars” key was implemented in the script plugin api.)
Previously, I tried implementing caching of the group parsed data in that plugin, but as this gets called as inventory.get_variables(host) from runner(), which forks into several processes, this caching doesn’t do any real good.
Actually, what seems a bit weird now I look at it, is that the group_vars plugin makes it’s own logic on variable precedence and yield vars per host only.
Whilst one would expect that e.g. the data from group_vars/all.yml would end up in the vars parameter of Group(‘all’), it now gets directly worked into the vars section of every host, whilst other parts of the inventory (vars section in ini files and vars section in the json from scripts) remain attached to their respective groups, only to have its precedence calculated later in Host.get_variables()
So, there is definitely lots of improvements to make to this code, but not without some changes into core.
From the top of my head, I think a patch would need to:
- at least modify the vars_plugins to return the main group- and host vars, as they are set in the respective files, and let the precedence in the other code parts do it’s job as they already do for other inventory sources
probably something like moving part of that logic from Inventory._get_variables(host) to Inventory._get_group_variables(group) - But also move that logic totally out of runner code, out of those specific methods, and put it 100% into Inventory code, avoiding late parsing in Runner code.
- Eventually - if I dare to ask - I wonder to what degree host/group_vars parsing should remain in a plugin infrastructure, given it’s importance, it’s specifics, and also being the place where vault is implemented.
Does anyone know of any other vars_plugin implementations that justify keeping vars_plugins? Not that I would recommend to abolish that of course, but i would seriously consider to implement host/group vars into core, and not as a plugin.
Thoughts?
Serge