How to extract a value of a field within a list of deeply nested dictionaries when some keys match other values

Let’s assume the following fictional dictionary with nested lists, dictionaries, and strings, of arbitrary depth.:
domain_definition.yml

---

domain_definition:
- name: server01
cluster: cluster0
port: '8070'
- name: server11
cluster: cluster1
port: '8080'
workstations:
- name: debian1
revision: '2016-06-21'
url: http://debian1.example.com:9911/2016-06-21
- name: ubuntu1
revision: '2017-08-16'
url: http://ubuntu1.example.com:9912/2017-08-16
- name: server12
cluster: cluster2
port: '8090'
workstations:
- name: debian2
revision: '2018-04-15'
url: http://debian2.example.com:9921/2018-04-15
- name: ubuntu2
revision: '2020-01-12'
url: http://ubuntu2.example.com:9922/2020-01-12

The goal is to extract as efficiently as possible (the original file can be quite large) a workstation matching:

  • a workstation os with its : for instance ubuntu1 ↔ 2017-08-16
  • only a workstation os : for instance debian2
    As you can see from example above, the key is not always set.

A major constraint is to avoid “json_query” as it cannot deal properly with keys containing a colon (as it may happen in my real context), last time I checked

I unsuccessfully tried different tasks, including the following:


- name: Extracting a value of a field deep within a list of complex dictionaries when some keys match other values
hosts:
- localhost
strategy: debug
tasks:
- name: Including domain_definition
include_vars: "domain_definition.yml"

- name: Extracting the OS URL when <os_revision> is known
vars:
- os_name: ubuntu1
- os_revision: 2017-08-16
debug:
msg:
"os_url: {{ domain_definition |
selectattr('workstations') |
list |
selectattr('name', 'eq', os_name) |
list |
selectattr('revision', 'eq', os_revision) |
map(attribute= 'url') |
first }}"
ignore_errors: yes

- name: Extracting the OS URL when <os_revision> is known
vars:
- os_name: ubuntu1
- os_revision: 2017-08-16
debug:
msg:
"os_url: {{ domain_definition |
selectattr('workstations.name', 'eq', os_name) |
list |
selectattr('workstations.revision', 'eq', os_revision) |
map(attribute= 'workstations.url') |
first }}"
ignore_errors: yes

- name: Extracting the OS URL when <os_revision> is unknown
vars:
- os_name: debian2
debug:
msg:
"os_url: {{ domain_definition |
selectattr('workstations') |
list |
selectattr('name', 'eq', os_name) |
map(attribute= 'url') |
first }}"
ignore_errors: yes

- name: Extracting the OS URL when <os_revision> is unknown
vars:
- os_name: debian2
debug:
msg:
"os_url: {{ domain_definition |
selectattr('workstations.name', 'eq', os_name) |
map(attribute= 'workstations.url') |
first }}"
ignore_errors: yes

I always get the error msg: ‘dict object’ has no attribute ‘workstations’.

Any suggestion?

Add an empty list if the attribute is missing. For example

    - include_vars:
        file: domain_definition.yml
        name: domain_definition_raw
    - set_fact:
        domain_definition: "{{ domain_definition|default() +
                               [my_item] }}"
      loop: "{{ domain_definition_raw.domain_definition }}"
      vars:
        empty_workstations:
          workstations:
        my_item: "{{ (item.workstations is defined)|
                     ternary(item, item|
                     combine(empty_workstations)) }}"
    - debug:
        var: domain_definition

give

  domain_definition:
  - cluster: cluster0
    name: server01
    port: '8070'
    workstations:
  - cluster: cluster1
    name: server11
    port: '8080'
    workstations:
    - name: debian1
      revision: '2016-06-21'
      url: http://debian1.example.com:9911/2016-06-21
    - name: ubuntu1
      revision: '2017-08-16'
      url: http://ubuntu1.example.com:9912/2017-08-16
  - cluster: cluster2
    name: server12
    port: '8090'
    workstations:
    - name: debian2
      revision: '2018-04-15'
      url: http://debian2.example.com:9921/2018-04-15
    - name: ubuntu2
      revision: '2020-01-12'
      url: http://ubuntu2.example.com:9922/2020-01-12

This should solve the problems.

Thanks for your answer Vladimir.
Now, there is a ‘workstations’ attribute all the time.
However, I still cannot get ‘os_url’.
The following includes your few lines:


- name: Extracting a value of a field deep within a list of complex dictionaries when some keys match other values
hosts:
- localhost
strategy: debug
tasks:
- include_vars:
file: domain_definition.yml
name: domain_definition_raw

- set_fact:
domain_definition: "{{ domain_definition|default([]) + [my_item] }}"
loop: "{{ domain_definition_raw.domain_definition }}"
vars:
empty_workstations:
workstations: []
my_item: "{{ (item.workstations is defined)|
ternary(item, item|combine(empty_workstations)) }}"

- name: (1) Extracting the OS URL when <os_revision> is known

vars:
- os_name: ubuntu1
- os_revision: 2017-08-16
debug:
msg:
"os_url: {{ domain_definition |
selectattr('workstations') |
list |
selectattr('name', 'eq', os_name) |
list |
selectattr('revision', 'eq', os_revision) |
map(attribute= 'url') |
first }}"
ignore_errors: yes

leads to:
“The task includes an option with an undefined variable. The error was: No first item, sequence was empty…”

If I try to list only the workstations dictionaries with:


- name: (0) Listing all workstations
debug:
msg:
"workstations: {{ domain_definition |
selectattr('workstations') |
list }}"
ignore_errors: yes

I get:

ok: [localhost] => {
“msg”: “workstations: [{‘name’: ‘server11’, ‘cluster’: ‘cluster1’, ‘port’: ‘8080’, ‘workstations’: [{‘name’: ‘debian1’, ‘revision’: ‘2016-06-21’, ‘url’: ‘http://debian1.example.com:9911/2016-06-21’}, {‘name’: ‘ubuntu1’, ‘revision’: ‘2017-08-16’, ‘url’: ‘http://ubuntu1.example.com:9912/2017-08-16’}]}, {‘name’: ‘server12’, ‘cluster’: ‘cluster2’, ‘port’: ‘8090’, ‘workstations’: [{‘name’: ‘debian2’, ‘revision’: ‘2018-04-15’, ‘url’: ‘http://debian2.example.com:9921/2018-04-15’}, {‘name’: ‘ubuntu2’, ‘revision’: ‘2020-01-12’, ‘url’: ‘http://ubuntu2.example.com:9922/2020-01-12’}]}]”
}

On top of that, even if I were able to extract only the ‘workstations’ dictionaries instead of the whole structure, there would also probably be an issue with the empty one(s) since I would not be able to access their non-existent nested attributes (name, version and url).

Regarding my last remark, I probably should rather define the empty workstations as:


empty_workstations:
workstations:
- name:
revision:
url:

What are your thoughts?

It's up to you. If you need default values put them into the
empty_workstation, or better in this case, into the
default_workstations. For example

                 default_workstations:
                   workstations:
                     - name: name_default
                       revision: revision_default
                       url: url_default

Otherwise, you might want to simply skip the empty lists. For example

    - debug:
        msg: "{{ item.name }}:
              {{ item.workstations|
                 selectattr('name', 'eq', os_name)|
                 selectattr('revision', 'eq', os_revision)|
                 map(attribute= 'url')|
                 list }}"
      loop: "{{ domain_definition }}"
      when: item.workstations|length > 0
      vars:
        os_name: ubuntu1
        os_revision: '2017-08-16'

There are 2 issues:.

  1. The loop can become costly as the number of entries rise: I have files with hundreds (potentially thousands) of servers/workstations.
  2. it also results in as many answers as the number of entries, although only one is non empty. I don’t know how to set a variable with the single right answer.

- debug:
msg: "{{ item.workstations|
selectattr('name', 'eq', os_name)|
selectattr('revision', 'eq', os_revision)|
map(attribute= 'url')|
list }}"
loop: "{{ domain_definition }}"
when: item.workstations|length > 0
vars:
os_name: ubuntu1
os_revision: '2017-08-16'

leads to:


ok: [localhost] => (item={'name': 'server01', 'cluster': 'cluster0', 'port': '8070', 'workstations': [{'name': 'name_default', 'revision': 'revision_default', 'url': 'url_default'}]}) => {
"msg": []
}
ok: [localhost] => (item={'name': 'server11', 'cluster': 'cluster1', 'port': '8080', 'workstations': [{'name': 'debian1', 'revision': '2016-06-21', 'url': 'http://debian1.example.com:9911/2016-06-21'}, {'name': 'ubuntu1', 'revision': '2017-08-16', 'url': 'http://ubuntu1.example.com:9912/2017-08-16'}]}) => {
"msg": [
"http://ubuntu1.example.com:9912/2017-08-16"
]
}
ok: [localhost] => (item={'name': 'server12', 'cluster': 'cluster2', 'port': '8090', 'workstations': [{'name': 'debian2', 'revision': '2018-04-15', 'url': 'http://debian2.example.com:9921/2018-04-15'}, {'name': 'ubuntu2', 'revision': '2020-01-12', 'url': 'http://ubuntu2.example.com:9922/2020-01-12'}]}) => {
"msg": []
}

Isn’t there a loopless solution?

        - debug:
                msg: "{{ item.workstations|
                           selectattr('name', 'eq', os_name)|
                           selectattr('revision', 'eq', os_revision)|
                           map(attribute= 'url')|
                           list }}"
          loop: "{{ domain_definition }}"
          when: item.workstations|length > 0
          vars:
                os_name: ubuntu1
                os_revision: '2017-08-16'

Isn't there a loopless solution?

Next option is a custom filter. For example,

cat filter_plugins/custom_select_url.py

def custom_select_url(l, os_name, os_revision):
    l1 =
    for i in l:
         if 'workstations' in i.keys():
             for j in i['workstations']:
                 if (j['name'] == os_name) and \
                    (j['revision'] == os_revision):
                     l1.append(j['url'])
    return l1

class FilterModule(object):

        def filters(self):
            return {
                'custom_select_url': custom_select_url
                }

and the playbook

cat pb.yml

- hosts: localhost
  tasks:
    - include_vars: domain_definition.yml
    - debug:
        msg: "{{ domain_definition|
                 custom_select_url('ubuntu1', '2017-08-16') }}"

give

  msg:
  - http://ubuntu1.example.com:9912/2017-08-16