Does anyone use hash_behaviour=merge?

I’d like to say hello to all of you Ansible guys.

I’ve seen a lot of discussion about hash_behaviour=merge being deprecated, than about holding back with deprecation…

I’d like to know, anybody out there is still using hash_behaviour=merge ?

Is it really bad?

Is it considered as bad praxis, that should be avoided? Is it gonna be deprecated in the future maybe? (because that would mean rewriting of all Ansible code significantly).

My goal is to have var (dictionary) ideally of the same common name defined partially in role defaults and partially overridden from group_vars and for the tasks in the role to function properly I need to merge these dictionaries together from defaults and group_vars. I believe that hash_behaviour=merge would do and will probably be the easiest possible way to achieve this, but… am I wrong?

My approach is wrong? Is it something that should never be done? Should there be only vars of distinguished names for defaults and group_vars and they should be combined into “merged” dictionary inside the role, only?

Thank you in advance for your ideas, opinion, I’m really looking for the best praxis recommendation, something that will stay, will be simple and maintainable and will not require change in near future.

Have great day everyone…

If I understand your question correctly, you basically want to have something like this:

group_vars/all.yml:

---
global_stuff:
  - 'my_first_list'
  - 'bananas'

group_vars/database_servers.yml:

group_stuff:
  - 'mangos'
  - 'bbq'

group_vars/webservers.yml:

other_group_stuff:
  - 'cow'
  - 'chicken'

And then turn it into a single list for each host, like so:

db1:

stuff:
  - 'my_first_list'
  - 'bananas'
  - 'mangos'
  - 'bbq'

web1:

stuff:
  - 'my_first_list'
  - 'bananas'
  - 'cow'
  - 'chicken'

Right?

If that’s the case, you’ll want to take a look at community.general.merge_variables. It works like this:

  1. Make the lists in your configuration, as you did above
  2. Add the following to group_vars/all.yml
stuff: "{{ lookup('community.general.merge_variables', '_stuff', pattern_type='suffix', initial_value=[]) }}"

This will create the stuff list for each host based on the suffix _stuff on all variables for that specific host/group.

The docs for this module are here: community.general.merge_variables lookup – merge variables with a certain suffix — Ansible Community Documentation

1 Like

I personally have never relied on hash_behaviour=merge. It was not really portable because not much people is aware of it nor do they make adjustments to ansible.cfg to make use of it.

Lookup plugin merge_variables also did not exist at the time when I needed such functionality. I did implement my own version of merge_variables at the time but opted out of using it to make my code more portable.

Instead, I went along the path of using a combination of varnames and vars lookup plugins which are builtin. This gave me a lot of power and flexibility. It looks a little bit ugly in the code but I can do all kind of stuff, not just simple merging, like variable priority, partial sorting of lists, merging both lists and dicts… Also, by explicitly merging vars, you are less likely to accidentally merge vars you did not intend to merge.

1 Like

Yes, people use it.

No, you shouldn’t use it. It makes the behaviour of variables even harder to understand, and you can accomplish everything it does (and more) with explicit merging of variables.

5 Likes

We often need the default hash behaviour, so changing the default would cause serious breakage for us.

Like apparently a lot of other folks, we (I) implemented a lookup - ours is called mergevars - that allows us to merge dicts or lists by name and/or regex. This was before community.general.merge_variables was a thing. In fact, it was before collections were a thing. It’s got some, er, “interesting differences” from c.g.merge_variables, but with so much code using it, and as simple as it is, there’s very little incentive to replace it.

In contrast to c.g.merge_variables, mergevars can take a list of variable names and/or in-line data, and it will merge those first. It also accepts a regex, and any existing variable names that match which haven’t already been merged will be merged in lexicographical order. It takes a flag indicating whether to de-duplicate resulting lists, and whether to recursively merge dicts. Anyone is welcome to use it, etc. If you break it, you get to keep the pieces. :slight_smile:

2 Likes

I just noticed a curious difference between how we use mergevars and the way community.general.merge_variables is documented. The latter seems to be oriented toward a common suffix for selecting variables to merge. In practice, we use a common prefix.

For example, in our Linux systems project ./roles/sssd/tasks/main.yml file, we have this variable definition:

    sssd_allow_groups: "{{ lookup('mergevars',
                  'sssd_allow_groups_default',
                  regex='^sssd_allow_groups_') | sort | unique }}"

In ./roles/sssd/defaults/main.yml we define the variable sssd_allow_groups_default which gets merged first. In various ./group_vars/... files we define sssd_allow_groups_hostgroupfoo, sssd_allow_groups_hostgroupbar, sssd_allow_groups_hostgroupbaz, etc., and similarly in some ./host_vars/... files.

This is just convention of course; one could match on either suffix or prefix with either mergevars or c.g.merge_variables. It just struck me as odd that we had chosen to use prefix way back in the day. I’m almost certain we never even considered matching on a common suffix. :thinking: Especially since variables specifically relevant to local roles all start with the roles’ names.

Thanks all of you for taking your time and reply to my post, I really appreciate you gave me an opinions and also alternative solutions.

What I take out of it, well, people use it, but it should not be used, that’s it.

Also meanwhile I came across this comment in ansible.cfg (when it’s generated using ansible-config init --disabled --type all), that is quite clear, and is completely in accordance with what you have written here.

# (string) This setting controls how duplicate definitions of dictionary variables (aka hash, map, associative array) are handled in Ansible.
# This does not affect variables whose values are scalars (integers, strings) or arrays.
# **WARNING**, changing this setting is not recommended as this is fragile and makes your content (plays, roles, collections) nonportable, leading to continual confusion and misuse. Don't change this setting unless you think you have an absolute need for it.
# We recommend avoiding reusing variable names and relying on the ``combine`` filter and ``vars`` and ``varnames`` lookups to create merged versions of the individual variables. In our experience, this is rarely needed and is a sign that too much complexity has been introduced into the data structures and plays.
# For some uses you can also look into custom vars_plugins to merge on input, even substituting the default ``host_group_vars`` that is in charge of parsing the ``host_vars/`` and ``group_vars/`` directories. Most users of this setting are only interested in inventory scope, but the setting itself affects all sources and makes debugging even harder.
# All playbooks and roles in the official examples repos assume the default for this setting.
# Changing the setting to ``merge`` applies across variable sources, but many sources will internally still overwrite the variables. For example ``include_vars`` will dedupe variables internally before updating Ansible, with 'last defined' overwriting previous definitions in same file.
# The Ansible project recommends you **avoid ``merge`` for new projects.**
# It is the intention of the Ansible developers to eventually deprecate and remove this setting, but it is being kept as some users do heavily rely on it. New projects should **avoid 'merge'**.

So I’m reworking my approach and I’ll try NOT to reuse vars names at various places and NOT rely on merging mechanisms as it probably brings unnecessary complexity.

Have a nice day everyone.

Hi and thanks for reply,

just to clarify, I do care mainly about merging hash “dictionaries” (not interested so much about “lists”)

The main interest was to have common hash dictionaries with some key: value pairs defined in role defaults and some key: value pairs defined in group_vars (or host_vars), so they can either combine or override what role gets from it’s defaults for complete functionality.

But problem comes when some of necessary key: value pairs are not present in group_vars or host_vars. The whole hash dictionary is replaced with one that is of higher priority and at the end some of the key: value pairs (which were already present in role defaults) are unfortunately missing.

That’s exactly the scenario fitting most of our usage: combining hashes for that reason. Sometimes we have a default list of “things” that applies to all hosts, but a few specific hosts or hosts in a host group need a few more, so we use the same technique for lists.

I did some grepping to generate the following list of regexes we use in our Linux system project, just to give a sense of how much we use this technique. Again, this is just one project:

^cups_printerlist_, ^id_group_defs_, ^id_groups_, ^id_local_overrides_, ^id_users_, ^iptable_chains_, ^iptable_rules_, ^limits_conf_, ^limits_files_, ^limits_remove_files_, ^logrotate_scripts_, ^mw_linux_yum_yumconf_, ^nfsmount_mount_defs, ^nfsmount_mounts_, ^nfsmount_server_defs, ^postfix_conf_, ^rsyslog_files_, ^selinux_allow_, ^selinux_boolean_, ^selinux_fcontext_, ^selinux_login_, ^selinux_permissive_, ^selinux_port_, ^selinux_restorecond_, ^selinux_restore_dir_, ^ssh_authorized_keys_ids_, ^sssd_allow_groups_, ^sssd_allow_users_, ^sudoers_configs_, ^sudoers_full_groups_, ^sudoers_full_users_, ^sysctl_settings_, ^uncfile_files_, ^uncfirewalld_, ^yumrepo_setrepo_, ^yumrepo_skipunavilable_

Looks very familiar to what we do :slight_smile: but as we used to use GitHub - leapfrogonline/ansible-merge-vars: An Ansible action plugin to explicitly merge inventory variables we’re ‘stuck’ with the __to_merge suffix, but OTOH, it’s very clear those variables are shared and cobbled together in some way.

And on the plus side, the lookup plugin made it easier for us to apply it (even for roles that don’t have support for this ‘layered’ approach to setting/overriding/adding variables).

EDIT: Oh, and before I forget, mergevars (the link) also does supports merging parts of dicts into one big dict. I haven’t needed it so far and I don’t know if c.g.merge_variables supports it. But looking at the replies in the topic, I think there’s enough options around :slight_smile:

1 Like

Thanks the problem what I see with your approach is that it’s probably mainly useful for “lists”, but what I’m looking for is merge of dictionaries like for example this is the output I get from community.general.merge_variables:

TASK [ansible : Debug merge] *********************************************************************************************
ok: [ansible-testing-1] => {
    "ansible_common": {
        "home": "/var/lib/ansible",
        "ssh_keys": [
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJA6YzPY5j2lrHD/dbCnnu4mmD5p8lr79omYSBtbWsBh igielskv@icloud.com",
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICPUx75bIDUQquaYStRxMwid4itArEKG8ANWe4P7ATY9 vladislav.igielski@logicworks.cz",
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJA6YzPY5j2lrHD/dbCnnu4mmD5p8lr79omYSBtbWsBh igielskv@icloud.com"
        ],
        "ssh_keys_removed": [],
        "ssh_max_startups": "10:30:100"
    }
}

The problem here is when some value is defined more than one time I see it merged multiple times from different places. Obviously, that’s why U use | sort | unique but whenever I try this, seems like plugin converts merge into the list and strips off the key value pairs and outcome like this is not what I’m looking for:

TASK [ansible : Debug merge] *********************************************************************************************
ok: [ansible-testing-1] => {
    "ansible_common": [
        "home",
        "ssh_keys",
        "ssh_keys_removed",
        "ssh_max_startups"
    ]
}

To be more specific what I’m trying to achieve?

For example, the initial idea was to have:

group_vars/all/main.yml

---
ansible_common:
  ssh_keys:
    - "{{ lookup('file', 'users/personal_ed25519.pub') }}"
    - "{{ lookup('file', 'users/lw_ed25519.pub') }}"

roles/ansible/defaults/main.yml

---
# defaults file for roles/ansible

ansible_common:
  home: /var/lib/ansible
  ssh_keys_removed: []
  ssh_max_startups: 10:30:100

roles/ansible/tasks/main.yml

---
# tasks file for roles/ansible

- name: Debug merge
  ansible.builtin.debug:
    var: ansible_common

with following output:

TASK [ansible : Debug merge] *********************************************************************************************
ok: [ansible-testing-1] => {
    "ansible_common": {
        "home": "/var/lib/ansible",
        "ssh_keys": [
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJA6YzPY5j2lrHD/dbCnnu4mmD5p8lr79omYSBtbWsBh igielskv@icloud.com",
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICPUx75bIDUQquaYStRxMwid4itArEKG8ANWe4P7ATY9 vladislav.igielski@logicworks.cz"
        ],
        "ssh_keys_removed": [],
        "ssh_max_startups": "10:30:100"
    }
}

Super easy with hash_behaviour=merge but very difficult with hash_behaviour=replace (by default).

Just use the combine filter like this:

group_vars/all/main.yml

---
ansible_common_specific:
  ssh_keys:
    - "{{ lookup('file', 'users/personal_ed25519.pub') }}"
    - "{{ lookup('file', 'users/lw_ed25519.pub') }}"

roles/ansible/defaults/main.yml

---
# defaults file for roles/ansible

ansible_common_common:
  home: /var/lib/ansible
  ssh_keys_removed: []
  ssh_max_startups: 10:30:100

roles/ansible/tasks/main.yml

---
# tasks file for roles/ansible

- name: Debug merge
  ansible.builtin.debug:
    var: ansible_common
  vars:
    ansible_common: "{{ query('ansible.builtin.vars', ['ansible_common_common', 'ansible_common_specific']) | ansible.builtin.combine(recursive=true) }}"
2 Likes

I see. But if you don’t set hash_behaviour=merge (and, yeah, don’t) then you can’t have multiple identically named variables from different sources contributing values to the final result. One of them - and only one - will “win”, and the others will disappear.

Instead, you’d need to name all the contributing variables uniquely. The convention apparently everyone adopts is either a common suffix or prefix, plus some “uniquifying” bit to prevent them clobbering each other. (Ah, like in @bvitnik’s example just above this post.)

I picked a poor example; that one happened to use a list while we were talking about dicts. But both c.g.merge_variables and mergevars will happily do either. The latter will also de-duplicate lists.

I am curious, though, about how you ended up with the results you showed just above your “To be more specific…” heading. Did you get that from my mergevars plugin, or the community.general.merge_variables plugin, or something else? (You can direct message me if we’re straying too far off-topic, but that does look like a bug that I’d like to fix.)

1 Like

Yes I’ve got that weird output from community.general.merge_variables when combined with | sort | unique like this:

"{{ lookup('community.general.merge_variables', 'ansible_common_', pattern_type='prefix') | sort | unique }}"

But that plugin is maybe not quite ready for that and what you were talking about before was your mergevars plugin which might have slightly different behavior.

Anyway when I remove those | sort | unique the output is more or les what I want (I know it might not be unique and might have dupes if same stuff is defined on more than one place, which should not happen).

But what are you trying to sort here? The order of ansible_common_* variables before they are merged?

sort is maybe not so important but unique would be useful… to make merge a bit more “fail proof”.

Can you elaborate? Variables are always unique because they are overridden. Multiple variables with same name are going to be collapsed to a single variable based on variable Ansible precedence rules. Dicts keys are also unique.

But I’m also wondering in your example:

sssd_allow_groups: "{{ lookup('mergevars', 'sssd_allow_groups_default', regex='^sssd_allow_groups_') | sort | unique }}"

Why you initialize with sssd_allow_groups_default and this you are merging with regex='^sssd_allow_groups_'? Would not be sssd_allow_groups_default also matched with regex? But maybe it’s not problem in your workflow or you get rid of it using that unique thing.

Yes… I have among key: value pairs also key: dict_value pairs, those ssh_keys and if I accidentally define same key in group_vars, and than host_vars, I end up with an output like this:

TASK [ansible : Debug merge] *********************************************************************************************
ok: [ansible-testing-1] => {
    "ansible_common": {
        "home": "/var/lib/ansible",
        "ssh_keys": [
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJA6YzPY5j2lrHD/dbCnnu4mmD5p8lr79omYSBtbWsBh igielskv@icloud.com",
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICPUx75bIDUQquaYStRxMwid4itArEKG8ANWe4P7ATY9 vladislav.igielski@logicworks.cz",
            "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJA6YzPY5j2lrHD/dbCnnu4mmD5p8lr79omYSBtbWsBh igielskv@icloud.com"
        ],
        "ssh_keys_removed": [],
        "ssh_max_startups": "10:30:100"
    }
}

U see there I have same ssh key twice from two different places, it’s not really desired outcome.

I can see now. Then this means you have a third variable you have not mentioned? You have one in role defaults, one in group_vars, the extra third ssh key must be coming from a third variable.