Recommendation for external inventory script w/large number of hosts?

I use an external inventory script that I’ve built to gather the hosts list and also use it to provide hostvars - various items collected from my centralized provisioning DB collected up into a per-host dict.

However, for this segment of my hosts there are about 600 systems. Since the external inventory script is called N+1 times where N is the number of hosts matching a pattern (–list and a --host HOST for each match), starting up a playbook takes a fair amount of time.

I thought that some of my slowness was that each call of the script with --host HOST was opening up a DB cursor, processing it and closing it. However, after writing up code to have --list cache things locally, I’m still running into a lot of time. Invoking python itself (I suspect) is making each --host call take between 250-750ms, give or take.

I’m wondering whether anyone has developed any approaches to handling this more elegantly.

Michael, I would love to have my --list call just statically populate YAML host_vars files, but as far as I can tell, the external inventory script is a totally separate code pathway that does not do any hostvars importing from files.

If that’s incorrect, let me know and I will go back to beating on it.

If that IS correct (that external inventory scripts bypass hostvars files), would you consider allowing both external inventory script and hostvars files as a possible solution?

Thanks and best regards,
Eugene Archibald

Oh. I see that host_vars files will do just fine if I don’t collide the variable namespaces. Very well then. Color me embarrassed and carry on.

For now, I can dump to files in the host_vars directory, and can build a lightweight “echo ‘{ }’” launcher for /etc/ansible/hosts to avoid running python and tearing it down that many times.

All of this is still suboptimal, however - even at this scale, that’s a nontrivial number of files in the host_vars directory. (I am contradicting what I said in the first post after thinking about the problem some more.)

I am still thinking about a good approach to the problem, and am very open to suggestions as to how to do this most efficiently.

Regards,
Eugene Archibald

This got brought up very recently by someone else, and I agree the idea that hosts HAVE to be returned via --hosts is suboptimal for external inventory scripts.

What I want to do is update the external inventory system such that if we have something like “_hostvars” in the original return, it won’t call the program again.

Let’s make sure we file a github ticket for this. (If nobody else wants to tackle this, this will be one of the first things I work on for 1.3)

We would then want to update the github inventory script examples we have (ec2, etc) to make use of this .

I am thinking like a return like this at top level now:

“_hostvars” : {
asdf.example.com” : {
“x” : 12,
“y” : “fjord”,
} ,
}

And we just treat the parser to handle _hostvars seperately.

Michael-

That seems most-right to me as well. Perfect.

I’ll file a ticket sometime today with a story.

Best regards,
Eugene

Michael-

I’m having some issues with this. When I run it against the old (pre _meta[“hostvars”]) script, everything works fine. However, when I run it against the new script that’s implemented _meta[“hostvars”] it claims that the relevant hostvars are undefined.

excerpt from playbook

  • name: retrieving storage pool information and storing into nsfstool DB
    command: /home/nsfstool/automation/bin/ansible-get_and_insert_storage_pools_into_database.py -r {{ datacenter }} -i {{ inventory_hostname }} -u {{ user }} -p {{ pass }}
    delegate_to: 172.28.32.53

excerpt from playbook ends

excerpt from output of db-hosts-meta.py --list. Note that I can get you the whole thing but it’s about 5 MB

“_meta”: {
“hostvars”: {
“131charlie-bos01-cdp01”: {
“datacenter”: “da1”,

excerpt from output of db-hosts-meta.py --list

The error thrown is:

fatal: [131charlie-bos01-cdp01] => One or more undefined variables: ‘datacenter’ is undefined

When I run it with the old script (the one without _meta that gets called with --hosts repeatedly) it works without complaint.

Any ideas what I might be doing wrong, or whether there is something going wrong with the underlying mechanism? Let me know what additional information might be useful to you.

Best regards,
Eugene

P.S. for things not referencing hostvars (and including them once this is fixed) the speedup is everything I hoped it would be. Startup of seconds rather than n-minutes-and-growing. Thanks!

Hmm, that’s strange.

Can you share a inventory script (gist perhaps) that reproduces it? Just a basic example that returns it via “print” would probably be enough.

It is hard to tell as some things are snipped if it’s exactly right, but seems to be at first glance.

Here you go - I’m assuming you mean “the results of your inventory script run with --list” rather than the script itself :slight_smile:

https://gist.github.com/earchibald/e0ee1905923ac9cc9dbc

Update: I wrapped it in bash, so you can run it directly.