Locating source of an ansible-lint warning

I’m getting a warning from ansible-lint and having trouble tracking down the cause.

WARNING /opt/venv-python39-ansible-core-214/lib64/python3.9/site-packages/ansible/parsing/yaml/constructor.py:76 AnsibleWarning While constructing a mapping from <unicode string>, line 106, column 3, found a duplicate dict key (name). Using last defined value only.

It’s not just that one, though. I’m getting hundreds, and they aren’t all about “name”; a few are about other keys. ansible-lint is providing a line number, but not the file! I’ve tried looking in various places around the mentioned line numbers, but I don’t see any smoking guns. Or any smoke at all.

Does anyone have suggestions? I’ve tried lots of combinations of flags mentioned on the --help page, and running with -vvvv but without increased clue. --version shows

ansible-lint 6.21.1 using ansible-core:2.14.7 ansible-compat:4.1.10 ruamel-yaml:None ruamel-yaml-clib:None

Does yamllint find any duplicate dict key?

1 Like

Nope.

$ find . -name \*.y\*ml -print0 | xargs -0 yamllint | grep -C12 duplic

The above command got three hits: one on the file name ./collections/ansible_collections/community/vmware/tests/integration/targets/vmware_tag/tasks/tag_manager_duplicate_tag_cat.yml, and one on each of the two .yml and .yaml files I created with duplicate keys to make sure my find … pattern was working.

$ find . -name \*.y\*ml -print0 | xargs -0 yamllint | wc -l
11128
$ find . -name \*.y\*ml -print0 | xargs -0 yamllint | grep ^./ | wc -l
1264

11,128 issues found in 1,264 files. The only (key-duplicates) reported were the two I put in as a test.

Edit: No duplicate keys found in my COLLECTIONS_PATHS trees either.

Write a Bash script to find all .yml / .yaml files and then run ansible-lint on each individual file to see if that uncovers a problem? :man_shrugging:

Do you have plugins/modules? IIRC ansible-lint allows to lint examples in these, and the warnings might stem from these.

If it comes from those, and they are in collections, you should also be able to use the ansible-test sanity tests to get a similar warning (but with a path) when running the sanity tests on the collections in questions (ansible-test sanity --docker -v --test yamllint).

I yamllinted all 44k YAML files I have read rights to on the system. The only duplicate keys I found were the ones I put there as a test. That’s a dead end.

The way I/we do projects’ collections is supposed to mimic the way AWX does it. In fact that part of our update-me script was lifted from AWX source. Among other things, it does this:

  if [ -f roles/requirements.yml ] ; then
    ansible-galaxy install -r roles/requirements.yml -p ./roles/ --force
  fi

  if [ -f collections/requirements.yml ] ; then
    export ANSIBLE_COLLECTIONS_PATH=./collections
    ansible-galaxy collection install -r collections/requirements.yml -p ./collections/  --force-with-deps
  fi

So I removed everything under collections/ansible-collections and ran ansible-lint. It was clean.

Then I ran ansible-galaxy collection install -r collections/requirements.yml -p ./collections/ --force-with-deps , ran ansible-lint again, and the “found a duplicate dict key” messages — all 114 of them — are back. I believe this tells me the problem is somewhere in these collections.

When I remove collections/ansible_collections/community/vmware (4.0.0) , I’m down to 28 “found a duplicate dict key” messages.

After putting it back and removing collections/ansible_collections/community/general (8.0.0), then ansible-lint reports “found a duplicate dict key” 86 times. With the 28 from community.vmware, that accounts for all 114 messages.

My original post shows “…constructing a mapping from , line…”, but actually the messages say “…constructing a mapping from <unicode string>, line…”. (I failed to backslash-escape the opening left-angle-bracket. Oops.) This indicates that at least at the level the message is generated, ansible-lint is unaware of the provenance of the “<unicode string>” it’s reporting on.

I thought ansible-lint was clever enough to ignore stuff that git ignores. And yet:

INFO     Loading ignores from .gitignore
INFO     Excluded: .git
INFO     Loading ignores from collections/ansible_collections/ansible/utils/.gitignore

That makes no sense to me, because according to git, my .gitignore ignores collections/ansible_collections/ansible. So why would ansible-lint consider loading ignores from collections/ansible_collections/ansible/utils/.gitignore?

That’s the crux of the issue: ansible-lint, at least

ansible-lint 6.21.1 using ansible-core:2.14.7 ansible-compat:4.1.10 ruamel-yaml:None ruamel-yaml-clib:None

doesn’t interpret .gitignore the same way git does.

Furthermore — and I’m not sure if this is a feature or a bug — it appears to use the .gitignore in HEAD rather than the version saved in your working tree.

Yeah, not quite. It uses Python’s pathspec.GitIgnoreSpec to read the .gitignore file in the working tree’s root. However, it then proceeds to read and process the .gitignores (and all other files!) from the ansible-galaxy-installed collections that should be ignored according to the top project-level .gitignore.

Here’s the relevant part of a project’s top-level .gitignore that triggers this errant behavior from ansible-lint.

# Installed Collections
collections/ansible_collections/**

# But don't ignore the contents of a project-specific collection named "mw.tablinx".
#   These overrides let you keep a "mw.tablinx" collection in git while
#   the rest of the collections brought in through requirements.yml are
#   ignored by git.
!collections/ansible_collections/mw/
!collections/ansible_collections/mw/tablinx/
!collections/ansible_collections/mw/tablinx/**

Using git check-ignore to determine that git does ignore these collections’ .gitignore files:

$ find collections/ -name .gitignore
collections/ansible_collections/mw/cc/.gitignore
collections/ansible_collections/ansible/utils/.gitignore
collections/ansible_collections/community/vmware/tests/.gitignore
collections/ansible_collections/community/vmware/.gitignore
collections/ansible_collections/community/general/changelogs/.gitignore
collections/ansible_collections/community/general/tests/integration/targets/terraform/.gitignore
collections/ansible_collections/community/general/tests/.gitignore
collections/ansible_collections/community/general/.gitignore

$ git check-ignore -v $(find collections/ -name .gitignore ) 
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/mw/cc/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/ansible/utils/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/vmware/tests/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/vmware/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/general/changelogs/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/general/tests/integration/targets/terraform/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/general/tests/.gitignore
.gitignore:14:collections/ansible_collections/**	collections/ansible_collections/community/general/.gitignore

So git really is ignoring them. But ansible-lint is processing them anyway, along with everything else in those collections.

$ ansible-lint -vv 2>&1 | grep '\.gitignore'
INFO     Loading ignores from .gitignore
INFO     Loading ignores from collections/ansible_collections/ansible/utils/.gitignore
INFO     Excluded: collections/ansible_collections/ansible/utils/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/changelogs/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/tests/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/tests/integration/targets/terraform/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/vmware/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/vmware/tests/.gitignore
INFO     Loading ignores from collections/ansible_collections/mw/cc/.gitignore
INFO     Loading ignores from .gitignore
INFO     Loading ignores from collections/ansible_collections/ansible/utils/.gitignore
INFO     Excluded: collections/ansible_collections/ansible/utils/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/changelogs/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/tests/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/general/tests/integration/targets/terraform/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/vmware/.gitignore
INFO     Loading ignores from collections/ansible_collections/community/vmware/tests/.gitignore
INFO     Loading ignores from collections/ansible_collections/mw/cc/.gitignore
DEBUG    data set to None for collections/ansible_collections/community/general/tests/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/community/general/tests/integration/targets/terraform/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/mw/cc/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/community/vmware/tests/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/community/vmware/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/community/general/changelogs/.gitignore due to being '' (unknown) kind.
DEBUG    data set to None for .gitignore due to being '' (unknown) kind.
DEBUG    data set to None for collections/ansible_collections/community/general/.gitignore due to being '' (unknown) kind.
1 Like

I spent a rather depressing yesterday doing a deep dive on ansible-lint’s use of Python’s pathspec.GitIgnoreSpec to deal with .gitignore files. The good news — the only good news — is that the most common, straightforward .gitignore patterns appear to work, mostly. For anything more “interesting”, pathspec.GitIgnoreSpec’s approach is flawed. Thus, so is anything that relies on it, like ansible-lint. I’m not sure that library can be fixed without a near-total rewrite.

Other .gitignore implementations exist, but I haven’t found any without known deviations from native git behavior. Most are not in Python.

Several paths forward:

  • An option to use git if it’s available. That requires the project to be the working tree of a git repo, so, plenty of circumstances where that doesn’t help. (It solves my problem, but so what!?) Probably not worth the effort if it doesn’t provide a real fix.

  • An option to ignore .gitignore files! That seems a strange choice, but think of it this way: currently the problem is generally including files/directories which shouldn’t be linted because of misinterpretation of .gitgnore files. If we don’t start digging that hole, then we don’t have to worry about how to back-fill select parts. Such an option, when combined with the next bullet, should enable a reasonable work-around until such time as pathspec.GitIgnoreSpec gets fixed or a better implementation comes along that we can switch to.

  • If you craft enough --exclude expressions, I’m fairly sure you can get the correct behavior out of ansible-lint. Since the failures I’ve seen mostly involve linting files/directories that should have been excluded, this rather ugly solution may also be our most reasonable work-around. A clever shell function using some git magic (git ls-files -i -o --directory --no-empty-directory --exclude-standard for example) could handle the heavy lifting. This would work best when coupled with an option to ignore .gitignore files that are going to be misinterpreted anyway, but that may not strictly be necessary.

  • Finally, implement a transliteration of git’s current .gitignore handling. Whether that ends up as a part of ansible-lint, or a fix for or a replacement of pathspec.GitIgnoreSpec not withstanding, this is undoubtedly the correct long-term solution in my opinion. A brief look at git’s source yesterday left me more comfortable with this task than I expected to be.

This is not the hole where I’d expected to be digging this week. Here’s hoping I’m digging out, not deeper.

2 Likes

IMO the best idea would be to use .gitignore only if it is part of a git repository, and then use the git binary to list all files.