first_found lookup not searching the expected paths

The roles in this long-existing repository often use a concept like…

- name: COPY | copy 20auto-upgrades
  ansible.builtin.copy:
    src: "{{ _item }}" 
    dest: /etc/apt/apt.conf.d/20auto-upgrades
    backup: no
    owner: root
    group: root
    mode: "0644"
  vars:
    _files:
    - "{{ autoupdate_20auto_upgrades | default([]) }}"
    - "{{ topdir }}/files/autoupdate/{{ inventory_hostname_short }}/20auto-upgrades"
    - "{{ topdir }}/files/autoupdate/{{ ansible_hostname }}/20auto-upgrades"
    - "{{ topdir }}/files/autoupdate/{{ host_group | default(False) }}/20auto-upgrades"
    - "20auto-upgrades.{{ ansible_distribution_release }}"
    _item: "{{ lookup('ansible.builtin.first_found', _files) }}"
  when:
    - ansible_distribution == 'Debian'
    - ansible_distribution_major_version is version_compare('9', '>=')
    - _item is ansible.builtin.truthy
  tags: autoupdate

to express the idea of having a configuration file default to roles/ROLENAME/files/FILENAME but overridable by host group, host, or configuration variable.

(They actually use lookup('ansible.builtin.first_found', _files, skip=True, errors='ignore'), but I removed the kwargs for debugging this.)

I was just adding a .trixie file and wanted to test this, and colour my surprise at the task just being skipped. Adding -vvvvv I saw that:

TASK [autoupdate : COPY | copy 20auto-upgrades] *****************************************************************
task path: /BASEDIR/roles/autoupdate/tasks/main.yml:154
looking for "20auto-upgrades.trixie" at "/BASEDIR/roles/autoupdate/None/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/roles/autoupdate/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/roles/autoupdate/tasks/None/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/roles/autoupdate/tasks/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/playbooks/hosts/None/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/playbooks/hosts/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/playbooks/hosts/None/20auto-upgrades.trixie"
looking for "20auto-upgrades.trixie" at "/BASEDIR/playbooks/hosts/20auto-upgrades.trixie"
[ERROR]: Task failed: The lookup plugin 'ansible.builtin.first_found' failed: No file was found when using first_found.

However, to my reading, this code is equivalent enough to the examples on ansible.builtin.first_found lookup – return first file found from list — Ansible Community Documentation which also refers to Search paths in Ansible — Ansible Community Documentation which indicates that it should indeed search roles/NAME/files/NAME.

But, apparently, something more than that is wrong. Not only failed it to find…

-rw-r--r-- 1 1015 1016 118 Nov 25 23:33 roles/autoupdate/files/20auto-upgrades.trixie

… but the other names and paths aren’t searched either.

What am I missing from the documentation that explains why this does not look up as intended?

While the first link to the examples of ansible.builtin.first_found seems to indicate that the plugin should search in files/, the second link (Search paths in Ansible — Ansible Community Documentation) does not say that find_first will look in files/. It generally explains the file search order, and mentions files/ only next to “in its appropriate subdirectory”. (This applies to the action plugin ansible.builtin.copy for example, which looks in files/.)

Which version of ansible-core are you using? Maybe the behavior for find_first changed over time (and the find_first docs were not updated), or maybe the find_first docs were always wrong…

Hm, this is… a bit annoying. Other things do find their files under roles/ROLE/files/NAME, now I’m wondering which don’t.

This is Ansibe from Debian trixie (2.19.4), used to be bookworm (2.14.18).

I’m not sure that this code was always written like that. I’m also seeing uses of this pattern:

- name: CONFIGURE | ntp config
  template: src="{{ item }}" dest=/etc/ntp.conf backup=yes
  with_first_found:
    - "{{ common_ntp_conf }}"
    - "{{ topdir }}/files/common/{{ inventory_hostname_short }}/ntp.conf.j2"
    - "{{ topdir }}/files/common/{{ ansible_hostname }}/ntp.conf.j2"
    - "{{ topdir }}/files/common/{{ host_group | default(False) }}/ntp.conf.j2"
    - "ntp.conf.j2.{{ ansible_os_family_release }}"
    - ntp.conf.j2
  notify:
    - restart ntp
  when:
    - ansible_distribution == 'Debian'
    - ansible_distribution_major_version is version_compare('10', '<=')
  tags:
  - ntp
  - common

I’m not sure which is older (I can probably dig it out of git if needed) or which is better. The one from the original post has the added benefit of having the selected filename in a variable, so that extra actions can be done based on that, including skipping if no file was found (whether that is desired or not is a per-task decision). Perhaps the two patterns did emerge because they needed this for some tasks.

Can you recommend way(s) to fix this? Such as, adding the pathe to the lookup somehow (can I put them into a more global variable, ideally one that even selects the current role automatically, but one in roles/ROLE/defaults/main.yml would also work), or refactoring it to use a better pattern for this kind of search?

This should ideally still work with Ansible 2.14 for a while, until the last target nodes that have older Python3 versions are gone.

Hm, yes. Grepping shows all uses of this pattern have skip=True, errors='ignore' so perhaps this was used exactly where it should skip the task if no match.

Some more debugging. I compared the use of the above with the use of with_first_found, and found that the paths are roughly equivalent, except the lookup plugin has a literal None instead of files in four of the eight places.

The difference is that plugins/lookup/first_found.py:run() calls self.find_file_in_search_path with subdir=None vs. subdir="files", but the find_file_in_search_path method seems to just stringify it anyway.

Ha! And there we do have a change @felixfontein you were right in that it must have once worked.

$ git show 35750ed3218e7bce68b21f473cecb0a3b9d60321 -- lib/ansible/plugins/lookup/first_found.py

Notice how the new code introduced:

+        if _jinja_plugins._LookupContext.current().invoked_as_with:
+            for subdir in ['template', 'var', 'file']:
+                if subdir in te_action:
+                    break
+
+            subdir += "s"  # convert to the matching directory name
+        else:
+            subdir = None

And a few lines below, it deleted the old code:

-        subdir = getattr(self, '_subdir', 'files')

AIUI, the old code would have defaulted subdir to 'files', so perhaps the else branch should be changed:

-            subdir = None
+            subdir = 'files'

And this seems to be a 2.19 regression against earlier:

$ git tag --contains 35750ed3218e7bce68b21f473cecb0a3b9d60321
1 Like

I’m proposing the fix as Fix regression in `ansible.builtin.first_found` lookup by mirabilos · Pull Request #86270 · ansible/ansible · GitHub (it already has been tested locally by me).

The change was almost certainly not deliberate, and using None there is very much not useful.

1 Like

It looks like this line was attempting to fix that:

But I’m not sure how subdir in te_action is being populated.

Thanks for digging into this before I got there. :slight_smile:

Meanwhile, if the extra _item variable is not needed elsewise, as a workaround until the fix is rolled out everywhere, the original code can (almost certainly⁺) be rewritten as:

- name: COPY | copy 20auto-upgrades
  ansible.builtin.copy:
    src: "{{ item }}" 
    dest: /etc/apt/apt.conf.d/20auto-upgrades
    backup: no
    owner: root
    group: root
    mode: "0644"
  with_first_found:
  - files:
    - "{{ autoupdate_20auto_upgrades | default([]) }}"
    - "{{ topdir }}/files/autoupdate/{{ inventory_hostname_short }}/20auto-upgrades"
    - "{{ topdir }}/files/autoupdate/{{ ansible_hostname }}/20auto-upgrades"
    - "{{ topdir }}/files/autoupdate/{{ host_group | default(False) }}/20auto-upgrades"
    - "20auto-upgrades.{{ ansible_distribution_release }}"
    skip: true
  when:
    - ansible_distribution == 'Debian'
    - ansible_distribution_major_version is version_compare('9', '>=')
  tags: autoupdate

⁺) that is, it worked with the (slightly more simple) task I used to test this, but I believe this transformation to be complete

No, that is in the then branch, not the else branch.

Basically, the old code did a getattr with "files" as the fallback value, and the new code instead…

  • if-branches depending on whether it’s a with block or not
    • in a with block, uses the most likely between "template", "var", "file" and then appends the "s" so we get one of "templates", "vars", "files"
    • outside of a with block, always sets it to None

I think the latter is just wrong.

You’re welcome! I put up a PR with the fix, and a workaround to restructure tasks in the meanwhile.

(This work was sponsored by B1 Systems GmbH.)

Ah, that helps.

You know, they’re going to want tests for the test suite… (Tests which would have caught this case before merging, had they existed. But I can’t throw that stone. My own tests range from pathetically inadequate to non-existent.)

If they want tests, I can maybe write some…

Look under .../test/integration/targets/lookup_first_found/.

1 Like

Huh, there’s apparently a PR (stale since August) that also tries to fix this, perhaps slightly better (moving the var/template/file logic out of the if) and with a little test already. Could be merged instead or atop, however the devs want, but it needs attention first. (But it’s that PR’s author who pointed it out, so maybe he’ll have another go at it now.)