Filtering a complex list of dictionaries into another list of dictionaries in an efficient way

ansible 2.9.4

Let’s assume the following list of dictionaries:

`
list:

  • key1:
  • ‘abc’
  • ‘def’
    key2: ‘ghi’
    key3: ‘jkl’
  • key1:
  • ‘mno’
  • ‘pqr’
    key3: ‘stu’
    key4: ‘dfg’
  • key1:
  • ‘vwx’
  • ‘yza’
    key3: ‘okl’
    key4: ‘azel’

`

The goal is threefold:

  • to extract only the records which match a regex criteria over one of the keys, for instance if “key3” contains a ‘k’.
  • to keep only a selection of keys, for instance only “key1”, “key3” and “key4”, which may not always be present
  • to be as efficient as possible with thousands of records

In this case, we expect the resulting list:

`
result_list:

  • key1:
  • ‘abc’
  • ‘def’
    key3: ‘jkl’
  • key1:
  • ‘vwx’
  • ‘yza’
    key3: ‘okl’
    key4: ‘azel’

`

The difficulty I’m experiencing concerns “key1” as a list.

My solution works without considering “key1”:

`

  • name: Filtering a complex list of dictionaries into another list of dictionaries
    vars:
    “list”: [
    {
    “key1”: [
    “abc”,
    “def”
    ],
    “key2”: “ghi”,
    “key3”: “jkl”
    },
    {
    “key1”: [
    “mno”,
    “pqr”
    ],
    “key3”: “stu”,
    “key4”: “dfg”
    },
    {
    “key1”: [
    “vwx”,
    “yza”
    ],
    “key3”: “okl”,
    “key4”: “azel”
    }
    ]
    set_fact:
    result_list: “{{ result_list|default() + [ {‘key3’: item.key3, ‘key4’: item.key4|default(‘’)} ] }}”
    loop: “{{ list }}”
    when: item.key3 | regex_search(‘(^.k.$)’)

`

leads to:

`
TASK [yang : Filtering a complex list of dictionaries into another list of dictionaries] *********************************************************************
task path: test.yml:133
<172.16.136.116> attempting to start connection
<172.16.136.116> using connection plugin network_cli
<172.16.136.116> found existing local domain socket, using it!
<172.16.136.116> updating play_context for connection
<172.16.136.116>
<172.16.136.116> local domain socket path is .ansible/pc/521e859c25
ok: [TEST] => (item={‘key1’: [‘abc’, ‘def’], ‘key2’: ‘ghi’, ‘key3’: ‘jkl’}) => {
“ansible_facts”: {
“result_list”: [
{
“key3”: “jkl”,
“key4”: “”
}
]
},
“ansible_loop_var”: “item”,
“changed”: false,
“item”: {
“key1”: [
“abc”,
“def”
],
“key2”: “ghi”,
“key3”: “jkl”
}
}
skipping: [TEST] => (item={‘key1’: [‘mno’, ‘pqr’], ‘key3’: ‘stu’, ‘key4’: ‘dfg’}) => {
“ansible_loop_var”: “item”,
“changed”: false,
“item”: {
“key1”: [
“mno”,
“pqr”
],
“key3”: “stu”,
“key4”: “dfg”
},
“skip_reason”: “Conditional result was False”
}
ok: [TEST] => (item={‘key1’: [‘vwx’, ‘yza’], ‘key3’: ‘okl’, ‘key4’: ‘azel’}) => {
“ansible_facts”: {
“result_list”: [
{
“key3”: “jkl”,
“key4”: “”
},
{
“key3”: “okl”,
“key4”: “azel”
}
]
},
“ansible_loop_var”: “item”,
“changed”: false,
“item”: {
“key1”: [
“vwx”,
“yza”
],
“key3”: “okl”,
“key4”: “azel”
}
}

`

How can we insert “key1” in the picture?

Also, when the list contains thousands of records, it may be less compute intensive to use **json_query**, but I don’t know how to use it in this context.

The task below does the job

    - set_fact:
        result_list: "{{ result_list|
                         default() + [
                         dict(keys|
                              zip(keys|
                                  map('extract', item)|
                                  list))] }}"
      vars:
        keys: "{{ ['key1', 'key3', 'key4']|
                  intersect(item.keys()|list) }}"
      loop: "{{ list|
                selectattr('key3', 'regex', '^.*k.*$')|
                list }}"

json_query is not much of use here, I think. If you have any large sets'
benchmarks I'll be interested to learn. Thank you.

HTH,

  -vlado

Custom filter shall improve the efficiency. For example a filter to select a
list of keys from a dictionary

  $ cat filter_plugins/dict_utils.py
  def dict_select_list(d, l):
      d2 = {}
      for k in l:
          d2[k] = d[k]
      return d2

  class FilterModule(object):

      def filters(self):
          return {
              'dict_select_list' : dict_select_list
              }

The task below gives the same result

    - set_fact:
        result_list: "{{ result_list|
                         default() + [
                         item>dict_select_list(keys)] }}"
      vars:
        keys: "{{ ['key1', 'key3', 'key4']|
                  intersect(item.keys()|list) }}"
      loop: "{{ list|
                selectattr('key3', 'regex', '^.*k.*$')|
                list }}"

HTH,

  -vlado

fantastic! :slight_smile:

My real use case is little more complex:

  1. there are other attributes like “key1” or similar to “list” (within top-level “list”) which must be taken into account
  2. the regex filter on “key3” is a little more complex (a list of logical OR)

I tried to to implement the 2 points by expanding your solution, and it works beautifully, despite the fact that some new keys are themselves list of dictionaries, instead of simple lists or strings.
I used something like:

`

vars:
keys: “{{ [‘key1’, ‘key3’, ‘key4’, ‘key5’, ‘key6’]|
intersect(item.keys()|list) }}”

loop: “{{ list|
selectattr(‘key3’, ‘regex’, ‘regex1|regex2|regex3’)|
list }}”

`

Finally, is there an online documentation that you would recommend to learn all the necessary tools to be able to perform such great filters (the ansible documentation is very sparse on that subject)?

For instance, if I need to add another constraint like another attribute must match another regex alongside “key3” (as an logical AND), I have no clue.

It's possible to extend the pipe. For example

       loop: "{{ list|
                 selectattr('key3', 'regex', 'regex1|regex2|regex3')|
                 selectattr('key4', 'defined')|
                 list }}"

HTH,

  -vlado