Serious collection requirement violation of cisco.mso: undeclared dependency on cisco.nd (which isn't part of Ansible community package)

While working on an ansible-test extension (validate-modules: reject option/alias names equal up to casing belonging to different options by felixfontein · Pull Request #83530 · ansible/ansible · GitHub) I noticed that all modules in cisco.mso depend on the collection cisco.nd, which is neither declared as a dependency of cisco.mso (ansible-mso/galaxy.yml at master · CiscoDevNet/ansible-mso · GitHub) or is part of the Ansible community package (but cisco.mso is part of it).

This basically makes it impossible to use any module of this collection when installing the Ansible community collection without manually also installing cisco.nd.

This is a clear and serious violation of Ansible community package collections requirements — Ansible Community Documentation.

I’ve created an issue in the collection’s repo: Collection does not declare its dependency cisco.nd, which also is not part of the Ansible community distribution · Issue #479 · CiscoDevNet/ansible-mso · GitHub

@SteeringCommittee how should we proceed here?

2 Likes

There has been a first reaction in the GH issue:

Thank you for raising the awareness, we will discuss this with the team.

1 Like

Thanks for raising this.

I’m glad that Akini Ross from Cisco has responded already.

Does this highlight a gap in our CI testing?

Right now we have no CI testing, except some very, very basic things that are done when trying to build a new release (for the current version and the next version). We did want to run more tests, see for example Testing collections within the ansible package, but we never got far enough to actually run that in CI.

Akini has created a PR to resolve this issue. It is currently being reviewed by our team.

Once merged, we will release a v2.8.0 which will include this fix and other improvements. I hope the Ansible package will be able to pick that change up in the next build.

I have also created a request for cisco.nd to be added to the Ansible package.

Do you know why the “import” checks in Sanity on both Galaxy and Automation Hub did not find this issue?

1 Like

Do you mean the sanity tests run in CI? My guess is that the cisco.nd collection was installed before running the sanity tests. The sanity tests kind of assume that only the collections that the collection depends on (and transitive dependencies) are installed, so if you install other collections that are not dependencies, the import sanity test will not flag this.

I’m not sure whether Galaxy/AH actually run the import sanity tests: the galaxy importer which could run them does not install any collection dependencies (Galaxy importer: collection dependency handling), so the results of these tests would be mostly unusable (except for collections that have zero collection dependencies).

Based on my experience with private automation hub/galaxy_ng, the importer may take locally distributed collections into account. I.e. galaxy.ansible.com has cisco.nd imported already, so the dependency in cisco.mso’s plugins/modules would be met whether it performed any sanity checks or not.

Running ansible-test sanity --docker default locally on my machine without cisco.nd, against cisco.mso==2.6.0 leads to 9 failed tests.

FATAL: The 9 sanity test(s) listed below (out of 46) failed. See error output above for details.
import --python 2.7
import --python 3.6
import --python 3.7
import --python 3.8
import --python 3.9
import --python 3.10
import --python 3.11
import --python 3.12
validate-modules
FATAL: Command "podman exec ansible-test-controller-0N1IB4zI /usr/bin/env ANSIBLE_TEST_CONTENT_ROOT=/root/ansible_collections/cisco/mso LC_ALL=en_US.UTF-8 /usr/bin/python3.12 /root/ansible/bin/ansible-test sanity --containers '{}' --truncate 206 --color yes --host-path tests/output/.tmp/host-cjrdxb1v --metadata tests/output/.tmp/metadata-no77cshb.json" returned exit status 1.

Lots of plugins/module_utils/mso.py:33:0: traceback: ModuleNotFoundError: No module named 'ansible_collections.cisco.nd'\nplugins/modules/mso_backup.py lines and derivatives about missing the cisco.nd stuff.

1 Like

Yep. ansible-mso/.github/workflows/ansible-test.yml at 984dd49146f73b4111b4e015b1347cd3ef91d762 · CiscoDevNet/ansible-mso · GitHub

      - name: Install the collection tarball
        run: ansible-galaxy collection install .cache/collection-tarballs/*.tar.gz

      - name: Install the ND collection (NDO dependency)
        run: ansible-galaxy collection install cisco.nd

So CI caught the issue before the PR adding this feature was merged, but it was silenced by installing the collection manually.

2 Likes

The question was about Galaxy and Automation Hub import check when a new version of a collection is uploaded as this is the real gate and not our own CI but I think @Denney-tech mentions that they may take any locally distributed account into account or do not run the import sanity at all (as @felixfontein mentioned) which could be improved.

1 Like

@allhart - sorry for the direct ping, but thought you might be able to help understand if there’s something galaxy-importer could do to help detect undeclared dependencies when a collection is updated in Galaxy/AH…?

1 Like

@samccann This may be something the Galaxy team can tackle while discussing the issue of how galaxy-importer handles dependencies in general. They’ll be able to best advise on the status of that, since some other items recently took priority. (CC @galaxy)

3 Likes

We have merged the PR and released v2.8.0 of the cisco.mso collection which include the change to remove the dependency on cisco.nd.

1 Like

@lhercot Thank you for fixing this.

Is there anything we can do to help you improve CI to prevent this (or similar) from happening again?

@gundalow I think this was a combination of a series of things that will probably not happen again.

  • Addition to our CI Sanity/Galaxy Importer step of ND as a dependency (when it should only have been added to the Integration test step.
  • Addition of the dependency in the code
  • Galaxy and Automation Hub sanities not catching this issue
  • Ansible package CI not catching this issue.

From our side, as for recent version (v3.2+) of the Cisco MSO product we need the HTTPAPI plugin provided by the cisco.nd collection for authentication and API access, we need to start the process to declare it as a proper dependency.

Because cisco.mso is part of the Ansible package, the first step is to get cisco.nd into the Ansible package. Then we will be able to officially declare it as a dependency in galaxy.yml and remove the manual install of it in the pipeline.

We have started the first step with our request for inclusion.

I think it would be useful to make sure that Galaxy and Automation Hub import test can catch these issues.

1 Like