Using Spacewalk or Red Hat Satellite - new inventory script

I just merged a pull request for an Ansible inventory source.

Testing (and patches) quite welcome.

Thanks to Jon Miller for the inventory script!

As a reminder, inventory scripts are ways to pull ansible inventory (host names, group memberships, and variables) from external software systems as opposed to the default INI files. Just point to what inventory source you are using with -i. Other examples include Cobbler, AWS or Eucalyptus, and OpenStack Nova. If a file is marked executable, it returns JSON and accepts certain parameters, it is treated as a script not a static inventory INI file. If you have a directory full of inventory files and scripts, just specify “-i” and the directory name to use multiple data sources at once. (This works best in 1.2 otherwise ansible doesn’t like seeing config files for the inventory scripts in the same directory). You can read more about them here:

http://ansible.cc/docs/api.html#external-inventory-scripts

Hi,

This sounds great and would be really useful in conjunction with the
rhn_channel module to fill in the sysname variable.

I had a look at this, but from what i see, this can only be used on a
Satellite server?

Regards,
Vincent

I have an inventory script that connects through the RHN API and it is dead slow. Just querying each system to get even the most basic information is slow and if you have 2000 servers...

And while you can cache the information, the moment the cache needs to be renewed you're waiting for the inventory to finish.

(The fact that I add the IP address from RHN into the inventory doesn't help in speed, but it's the only way to know how to get to them)

Looking at the implemented solution for spacewalk.py, it will fail to work on any current RHN Satellite since spacewalk-report does not support the "system-groups-systems" report.

The "inventory" report on our RHN Satellite with about 2000 active systems takes more than 10 minutes to complete (which probably matches with what we do directly using the RHN API).

Not sure if there's anything we can do to speed up the RHN Satellite.

Dag, do you have a link to your script?
I’m working on something myself…

Nope, it's unfinished and too slow to be useful.

I wanted to add the various resources found in RHN to the inventory. We have groups for patch-management and additional info per server that I thought could be useful, but since it's even unbearable to get a list of hosts and (only!) query to get the IP address for each host to use as ansible_ssh_host, I didn't even bother going the whole nine yards...

I'll probably turn to the internal Linux CMDB (Mysql based) instead, if I find the time :slight_smile:

As someone who used to work 3 cube rows over from the Satellite team, I’m definitely interested in seeing it work with Satellite proper as well.

I suspect they will bring the groups support into a future release.

Meanwhile, if the module can detect what version of software it is talking to and not use the groups API unless it’s recent, that makes sense.

The EC2 module caches on it’s end, so I think a similar approach of caching is the logical way to deal with the API speed here.

Are you saying talking to RHN is slow or Satellite? I’d think you would mean RHN, and yeah, I would assume that to be slow just because it’s SaaS, but would also hope it could take advantage of similar caching.

–Michael

Here's my current script.

The things I wanted to add was support for custom info, adding automatic groups (e.g. based on base channel), adding more host-specific information.

Since it's based on a previous CMDB script, it does include support for DNS-based inventory as well as a network inventory (based on network ranges).

Have fun with it :slight_smile:

(attachments)

inventory-rhn.py (11.6 KB)

Hi,

I just had a look at retrieving hostnames and ip's from RHN. This is
indeed dead slow. Getting a list with dics from all hosts on the other
hand is quite fast.

Retrieving details from hosts comes with a penalty is seems...

I'm going to have a look at the caching options in Python. Never used
that before.. If someone has pointers (maybe someone from the
satellite/Sapcewalk dev team? )

@Dag thanks for the script, might come in handy

Vincent

Great!

The EC2 module just caches to disk, but I’d probably just use the python shelve module:

http://docs.python.org/2/library/shelve.html

Having it take the invalidation option like the ec2 module seems smart.

My script does caching already, caching is not a problem. The problem is that when the cache expires (and we don't want to cache for too long) that Ansible is delayed. (Check out '-r' for renewing the cache)

Imagine your cache expires every day, it impacts your workflow every day (more than once, depending on the set of systems that need to be renewed).

In my case I _only_ get group-information and IP addresses, but my intention was to get much more from RHN, which unfortunately would make it quite painful. 10 minutes for 2000 systems with only IP/group and DNS query.

A solution could be to renew the cache in the background (in an atomic way) so that it should not impact any concurrent Ansible. However currently '-r' only renews a --list or --host lookup (mostly for testing purposes).

Yeah there’s a standard way the ec2 module replaces the cache for all hosts.

It does seem like running it on cron is reasonable, though it might need a way to lock.

Send me a pull request if you would like to submit any upgrades or mergers of your two script ideas.

I think caching will only be helpfull if you need something from you
inventory. Imho building/updating your inventory will always take
time.

What I plan to use for now is an inventory with just the groupnames
and the hosts in it. SInce we register our machines with their
hostname in RHN anyway we can use them to connect to.

One thing I would like to add as a variable is each host's id in rhn
so I can use it with the rhn_channel mod.

I'm no expert in developing or Python so it can take some time.

Regards,
Vincent

The system id is already part of that script I send you. Since it's needed for any communication with the RHN API I figured it's best to include it as a host fact.

BTW If you have multiple entries for a given hostname (duplicate RHN entries) the inventory script will use the one that was last connected. (so best to avoid this situation altogether)

I think caching will only be helpfull if you need something from you
inventory. Imho building/updating your inventory will always take
time.

What I plan to use for now is an inventory with just the groupnames
and the hosts in it. SInce we register our machines with their
hostname in RHN anyway we can use them to connect to.

One thing I would like to add as a variable is each host's id in rhn
so I can use it with the rhn_channel mod.

I'm no expert in developing or Python so it can take some time.

The system id is already part of that script I send you. Since it's needed
for any communication with the RHN API I figured it's best to include it as
a host fact.

I added a function to the rhn_channel mod to retrieve the sysid from
the host itself, but it's not used since i think that the one true
source is rhn itself ( i've seen cases where the local sysid is
different than the one in RHN.

BTW If you have multiple entries for a given hostname (duplicate RHN
entries) the inventory script will use the one that was last connected. (so
best to avoid this situation altogether)

Nice feature :slight_smile:

I'm currently working on a small script that suits our needs. I'll
mail it when it's more or less finished...

Vincent

I guess it can't hurt to put my script in the ansible-provisioning tree, so we can send pull-requests.

Yeah there's a standard way the ec2 module replaces the cache for all hosts.

Well, I use the same option in my script, however it will only replace the cache for the query that is executed. And that is the problem. Even the ec2 module will not refresh the host caches in this case.

If you want to do the same for each individual host query, you basically have to script this yourself. It would be useful to standarize something like:

   -h, --host HOST
   -l, --list
   -r, --refresh-cache
   -R, --refresh-all-caches

where --refresh-all-caches would iterate over each host to update its cache as well.

Another possibility is to make --refresh-cache act differently depending on whether it is executed together with --host, --list or no query. In the latter case it could refresh everything, rather than just the query.

It does seem like running it on cron is reasonable, though it might need a
way to lock.

Just an atomic move of the cache should be sufficient. I don't think we necessarily want to add locking-complexities to all inventory-scripts.

Something else that needs standardizing is the way to influence inventory-scripts so that one does not need to modify the script itself to add server-name, credentials or cache ttl.

I usually read out a specific file (e.g. hpam.ini, rhn.ini) however if we have multiple sources we may want to read everything inventory-related from a single file from different sections ? And standardize things like cache ttl, etc...

Someone was nice enough to send a contributing to the project, so pull requests should be send to ansible’s main project.

There is no reason to have a seperate repo for this when we already have lots of great inventory modules in core.

You should submit upgrades to this one:

https://github.com/ansible/ansible/blob/devel/plugins/inventory/spacewalk.py

The difference is that the original inventory script does not require any configuration (as it uses spacewalk-report) whereas if I would add my stuff to it, it requires configuration directives to work.

So it would break existing users.