Refining hash variables

Hi all,

It seems that it is not possible to refine (change, add) variables in a
hash without redefining everything.

For instance, let's say we're managing sshd config file, and we have
this in 'group_vars/all' :

  ssh:
    allow_users: "user1"
    x11_forwarding: "no"

Now, on a specific machine, we want to allow another user in. So we add
this in 'group_vars/somegroup' :

  ssh:
    allow_users: "user1 user2"

In the end, the resulting hash will be :

  ssh:
    allow_users: "user1 user2"

Since the 'x11_forwarding' key is lost, it seems hash keys are not
cumulative and the hash is completely redifined, ending in "the latter
who speaks wins" scenario.

While this is what's expected reading the documentation
(http://ansible.cc/docs/playbooks2.html#understanding-variable-precedence),
it could be very interesting to allow key refining, instead of
redefining everything.

There are many use case for this. The above is an example, you can think
of many more. In fact, the way I use ansible (I use hashs to store
configuration specific variables so I don't end up with variables
scaterred everywhere), every service could benefit from this.

What do you think about this ? How do you guys handle configuration
refinements ?

Thanks,

M

Ansible set out explicitly to not be a programming language, and I'm
holding it to that.

As I've suggested before, if you want this kind of behavior, this is a
great reason to write an external inventory source, as you can merge
variables
any way you like.

This kind of hardline attitude is really offputting to me Michael. I'm sympathetic to trying
to the hazard of becoming a programming language, but there are already variables, and the
limitation that we have only very few tools to -define- register and munge them is a really
harsh experience for many people: there's something in your system, and it suggests itself
well, and it's crippled, hideously crippled, and people keep wanting to use it for more, and
some of them make it here to post about it.

Certainly diving lower and making inventory providers is a viable option for many use cases,
but it's an awful lot to ask of a newcomer to the environment if they want anything more
than the most trivial variables.

I really hope Ansible can find a less hardline approach that is more welcoming to use and
modification of variables, both w/r/t this specific issue, & more generally so.

-rektide de la faye

This kind of hardline attitude is really offputting to me Michael. I'm sympathetic to trying
to the hazard of becoming a programming language, but there are already variables, and the
limitation that we have only very few tools to -define- register and munge them is a really
harsh experience for many people: there's something in your system, and it suggests itself
well, and it's crippled, hideously crippled, and people keep wanting to use it for more, and
some of them make it here to post about it.

Ansible meets my asthethic, it always has.

It's a direct reaction to people making configuration management
content too hard to read, edit, and work on.

I see this every day.

Certainly diving lower and making inventory providers is a viable option for many use cases,
but it's an awful lot to ask of a newcomer to the environment if they want anything more
than the most trivial variables.

Newcomers should not be doing things that complex. What is the
actual real world use case, not the actual way you think you would
implement it?
There's probably plenty of ways to go ahead and solve it now.

Templates are clearly one of them.

When I look at what I want to deploy in a Linux setup, the thought "I
clearly need hashtable merging!" is not one that immediately comes up.

It's a direct reaction to people making configuration management
content too hard to read, edit, and work on.
I see this every day.

Michael,

I really don't understand why this is such a big deal.
It's just a mater of being able to reorganize stuff like :

php_var1: value
php_var2: othervalue
...dozen of other variables

into this :

php:
  var1: value
  var2: othervalue
  ...dozen of other keys

I don't see how this is harder to "read, edit, work on" than the first
form (promoted in "selected playbooks" on Ansible website).

It seems to me that a hash is much cleaner than the first form, more
readable, and much easier to work with. You know your php playbook
variables are in the php hash. Period. No need to chase many variables
scattered all over the place.
You can even loop over your hash if you need to (for debugging for
instance).

I suppose that since Ansible gathered facts are organized in a hash,
there are good reasons. I don't see why those good reasons would not
apply to the vars section.

Certainly diving lower and making inventory providers is a viable option for many use cases,
but it's an awful lot to ask of a newcomer to the environment if they want anything more
than the most trivial variables.

Agreed. There are quite a lot to do already without diving into some
kind of inventory generation stuff that will involve quite some
programming, and will involve cross tool (inventory gnerator, Ansible,
...) bug chasing.

Newcomers should not be doing things that complex.

Again I don't see why this is more complex than the flattened version.

What is the
actual real world use case, not the actual way you think you would
implement it?

I see many. But I might be using bad solutions. I don't know. It just
makes sense to me to do it like that.

Just a quick example. Let's say I want to deploy iptables rules on
machines. The baseline setup just filters out everything. I usually
group stuff in specific chains in the filter table (TCP_IN, UDP_OUT,
ICMP_IN, etc..).
I want to have the mainstream setup on all hosts, but need to handle few
exceptions for some hosts: rules files location, default policies on
chains, etc...

Yes, I could have several templates for each situation, but it would
repeat a lot of stuff, and changing the baseline configuration would
involve changing many files. So to stay in line with the "DRY"
principle, I'd rather do it differently.

What I would do if the hash refining would work would be this :

group_vars/all:

It's a direct reaction to people making configuration management
content too hard to read, edit, and work on.
I see this every day.

Michael,

I really don't understand why this is such a big deal.
It's just a mater of being able to reorganize stuff like :

php_var1: value
php_var2: othervalue
...dozen of other variables

Well, for one, right here, you've just invented a syntax where "_" is
meaningful beyond what most people would expect.

It goes against the principle of 'least surprise'.

into this :

php:
  var1: value
  var2: othervalue
  ...dozen of other keys

I don't see how this is harder to "read, edit, work on" than the first
form (promoted in "selected playbooks" on Ansible website).

It seems to me that a hash is much cleaner than the first form, more
readable, and much easier to work with. You know your php playbook
variables are in the php hash. Period. No need to chase many variables
scattered all over the place.
You can even loop over your hash if you need to (for debugging for
instance).

You can already loop over $hostvars[$ansible_hostname] to get all of that.

You can also already put variables in hashes.

What you can't do is take two hashes (variables of different
precedences) and arbitrarily merge the keys together.

Most folks would expect those variables would override each other
completely, and for those that wouldn't, it can't do both :slight_smile:

Again I don't see why this is more complex than the flattened version.

The need to merge two hashes is not something that comes up for
newcomers at all.

Being able to store things in hashes? They can already do that.

Just a quick example. Let's say I want to deploy iptables rules on
machines. The baseline setup just filters out everything. I usually
group stuff in specific chains in the filter table (TCP_IN, UDP_OUT,
ICMP_IN, etc..).
I want to have the mainstream setup on all hosts, but need to handle few
exceptions for some hosts: rules files location, default policies on
chains, etc...

Yes, I could have several templates for each situation, but it would
repeat a lot of stuff, and changing the baseline configuration would
involve changing many files. So to stay in line with the "DRY"
principle, I'd rather do it differently.

It's quite simple really, you can have a couple of booleans in your
template about which services to enable.

This raises an interesting point, I think, in the difference in how Michael
views the project and how other people do. Is ansible primarily for
newcomers or for experienced users? The fact that newcomers don't need to
merge hashes is to me irrelevant as hypothetical newcomers don't remain
newcomers. The entire field of configuration management is, realistically,
for people with prior programming experience or at least the ability to
gain that experience relatively quickly.

Michael, is it your intent that ansible primarily targets newcomers over
experienced people?

I only ask because I have a specific use case in which I define a bash hash
and then merge in additional (sometimes deep) layers into the hash as my
environments and other metadata get more and more specific. (I did this in
Puppet and each module can inject additional bits to the hash and then a
template renders it to json at the end for django to consume). This was a
real world use case where I wouldn't want to have to include the entire
hash when I want to change a single key 4 levels deep (which I have to do).

> Again I don't see why this is more complex than the flattened version.

The need to merge two hashes is not something that comes up for
newcomers at all.

Being able to store things in hashes? They can already do that.

This raises an interesting point, I think, in the difference in how Michael
views the project and how other people do. Is ansible primarily for
newcomers or for experienced users? The fact that newcomers don't need to
merge hashes is to me irrelevant as hypothetical newcomers don't remain
newcomers. The entire field of configuration management is, realistically,
for people with prior programming experience or at least the ability to gain
that experience relatively quickly.

Let's not over analyze the argument -- most advanced users don't need
to merge hashes either.

Michael, is it your intent that ansible primarily targets newcomers over
experienced people?

No, absolutely not. I believe most crazy complex CM setups in other
tools are the result of those that either have become enamored with
using *all* of a tool, and a tool that has allowed itself to grow
too large by accepting all user requests without paying attention to
the learning curves it is creating. Thus they grow without paying
attention to the asthethic of the language and ultimately feel
non-cohesive in the end.

So, in that regard, yes, it's a balancing act -- but there is nothing
you can't really manage in Ansible, but how you do it will be affected
by how Ansible wants you to do it.

As users fully ansible-ize their entire configurations though, their
ansible content should remain simple and accessible to everyone, not
trending towards becoming it's own
monster that requires tremendous hours of input to learn, debug, and maintain.

More importantly, it must be readable and nearly obvious to everyone,
so that when people less familiar with Ansible read playbook content,
they can undertand what it is going to do.

There will never be a spaceship operator or triple equals or anything like that.

In other words, there is no simple or advanced user -- as Ansible
doesn't want to be 'hard' at any level.

I only ask because I have a specific use case in which I define a bash hash
and then merge in additional (sometimes deep) layers into the hash as my
environments and other metadata get more and more specific. (I did this in
Puppet and each module can inject additional bits to the hash and then a
template renders it to json at the end for django to consume). This was a
real world use case where I wouldn't want to have to include the entire hash
when I want to change a single key 4 levels deep (which I have to do).

Sure.

I object to changing current behavior or introducing syntax that is
not immediately intuitive.

I also object to absolute arguments that lack of this feature is the
end of the world :slight_smile:

So far I've heard "change the way it works now so hashes don't
override", and "let's make underscores mean something different than
they mean now", both of which are non starters.

Propose a syntax where this doesn't look horrible or change existing
behavior and we can possibly consider things, assuming you're also
willing to write the patch and tests for it, and there is no
regression on existing behavior.

Though really looking in at the use case in different ways, and the
end result you want to achieve on the system, what you want to do can
probably be modelled in different ways to not need this either.

This is caused by the way we reassign vars and do not merge them. Merging complex data structures is not something python does (many have written their own).

I force rewriting the whole hash on overrides, but it doesn’t work well for large ones. We could add the deep merge to vars processing but it is a non trivial change.

It's somewhat trivial. Cobbler did it. The issue is I like the
current behavior, and we don't break backwards compat.

Making it a global option switchable only in the config file *could*
be done, and then I wouldn't be too worried about compatibility or
syntax.

Just to throw something out there YAML has a built-in syntax for merging hashes: http://yaml.org/type/merge.html

(I’m new to Ansible and I don’t know how playbooks get loaded yet; it might not be possible to apply this usefully.)

Brendan,

I actually use that feature (mostly for repetitive configs like DB
connection settings, etc) but it only works within the same document.

Ansible is doing the merge across several documents and in python.

I'm a big fan of YAML 1.0, and knew 1.1 contained some evil, but I had
no idea this horror existed.

I recommend people not use that, and this is really about merging
variables at different depths anyway.

Micheal,

It currently works with ansible (within a single doc), I used it in my
own config system and was happy to see I didn't need to change
anything for ansible.

I'm not clear what you mean here. It seems to me that the example above is
merely trying to illustrate how one has to name Ansible's variables with the
*existing* syntax and semantics if one wants variables from different sources to
compose without clobbering each other: specifically, in a flat hierarchy, and
any grouping then has to be done with a common prefix.

I don't claim to know what most folks expect, but I know I agree with Michael B:
I intuitively expected nested variables in yaml configs to get merged, and was
surprised disappointed to discover that in practise they don't. For much the
same reasons. So for me, it feels more like a limitation I need to work around
than a usability and transparency win.

N

ps

A feature of variables I worry about more than allowing merging is unintentional
and silent variable clobbering when variable definitions are combined,
potentially resulting in important things getting the wrong name (usually along
the lines of "${some.no-longer.defined.variable}"). Disallowing nesting doesn't
help prevent that.

Brian,

You mean it works in a single YAML file ?
Do you have an example ?

Thanks !

M

This is an example from my app servers group vars file, for settting
default db configs that need to be explicit in the app config
templates.

db: &db_defaults
    blocking: True
    init_command: SET NAMES utf8
    autocommit: True
    locale: en_US
    user: appuser
    passwd: apppassword
    maxcached: 25
    maxconnections: 100
    maxrequests: 1000
    timeout: 120

app1db:
   - name: db
     <<: *db_defaults
     db: app1schema
     host: readwrite.tld
     maxrequests: 200
   - name: sessions
     <<: *db_defaults
     db: sessions
     host: sessions.tld
   - name: reports
     <<: *db_defaults
     db: app1schema
     host: reports.tld
     user: readonly
     passwd: readonlypassword
app2db:
   - name: db
     <<: *db_defaults
     db: app2
     user: app2_user
     password: app2_passwd
     host: readonly.tld
   - name: analitics
     <<: *db_defaults
     db: ana_schema
     host: reports.tld

Thanks Brian. I didn't know you could do that with yaml. It can be quite
interesting in some use cases.

For my needs, I think I just flatten everything and use plain vars. Not
very clean or handy, but I see no other way.

M