Feedback Wanted: New Workflow to Validate Collections for Automation Hub

Problem statement

In the Ansible community and partner engineering team, we would like to ask the community for advice to help us solve the following issue.

Currently, Red Hat partners who joined the program to get their collections certified and available on Automation Hub often face rejections of collection tarballs they upload based on errors from the Galaxy-importer log. This causes a lot of friction in the process we’d like to minimize by giving partners, and the community in general, a solution that will help them catch and fix those errors on their end before uploading collections to Automation Hub or Galaxy.

On Automation Hub, Galaxy-importer performs:

  1. Collection building and basic checks like its metadata validation.
  2. Running Ansible Lint with the production profile.
  3. Running Ansible sanity tests.

Our initial vision on how to solve this

We’ve created the ansible-collections/partner-certification-checker repository for collection certification onboarding which, among a few other things such as a README template, will contain a GitHub workflow we want to encourage partners to use in their repositories.

This solution has the following properties:

  1. The repository is public.
  2. It is a separate repository.
  3. It is kept minimalist: it contains only necessary items (such as jobs in the workflow) for the purpose of content certification.

Let’s now discuss the properties

  1. The repo is public under the ansible-collections org because the larger community can also benefit from it to improve quality of their content before uploading it to Galaxy. The community contributions are welcome there. Many partners are also active community members and have their collections on Galaxy and included in the Ansible community package.

  2. It is a separate repository. However, we have a feeling that there might be some overlap with what we have in the collection_template repo used as a template for initializing new collections repos on GitHub. We also refer to it from our collection inclusion requirements as a source of templates for testing (contains the ansible-test workflow), execution-environment.yml, README, LICENSE, etc. We have considered merging them, but I personally think that keeping the content from ansible-collections/certification/ in its dedicated repo will be less confusing.

  3. The content is intentionally kept as minimal as possible. There are a lot of good and useful GitHub workflows and actions we could recommend (e.g., for releasing, for running integration and unit tests), but we intentionally decided to have only one certification.yml workflow that contains only the checks from Automation Hub. However, from there, we could refer to other community resources such as the collection_template repo if maintainers want to get a workflow for unit and integration tests or consult the community package inclusion requirements to learn community best practices.

  4. On the implementation, see the certification workflow:

  • We decided to minimize potential points of failure and not to refer to any other reusable workflows/actions except the ansible-community/ansible-test-gh-action@release/v1 one. The Galaxy-importer and Lint checks are pretty straightforward, so we don’t want to depend, say, on any of ansible/ansible-content-actions workflows some of which, in turn, use other tooling such as tox-ansible under the hood. This approach could be reconsidered though if responsible teams make a strong commitment to ensure their stability.
  • We decided not to include unit, integration or any other unnecessary checks to keep things simple.
  • There’s also a test module we run the workflow against on a scheduled basis to make sure everything works.

What do you think about this effort and the implementation?
We’d love to hear from you in the comments!

2 Likes

Thanks for this discussion, and the thoughts so far.

I think this need to state why it’s different to collection_template early on. I’m still unsure what specific things could (or MUST) be different. I know you are running tests in a different way, though wouldn’t calling ansible-lint, etc be useful for all Collections?

I think it could be useful to include links to how this could be performed in the GHA.

Link Repo to this discussion

Would it be worth updating updated GitHub - ansible-collections/certification: Content related to collection certification README.md to point to this Forum Post?

Template

If ansible-collections/certification is meant to be used as a template, then maybe we need to have different files for Collection’s readme, and how to use this repo documentation.

1 Like

@gundalow thanks for the feedback!

First, the goal of the workflow is to make it as easy as possible for partners to apply it, i.e. just copy with no modifications.
Second, Lint runs with a very AH specific “production” profile - was developed for AH certification purposes. In general, how we run it here is very AH specific. It wouldn’t hurt other collections to run it that way, but it’s quite restricted compared to default.

This is the plan, but out of the scope of this forum post:)

Sounds good to me

That was exactly the plan.

@gundalow @oranod @samccann and others, I’ve done with GitHub - ansible-collections/certification: Content related to collection certification, please take a look
When implementing this resource, I followed the principles:

  • Keep it minimalist:
    • I put only what’s truly needed in the repo
    • Described only most problematic things (most common reasons for rejection) in the onboarding precess (though there’s a reference to the partner-facing docs)
    • Added “Optional” section to list references to some generally-great-for-collection-development, but unnecessary-for-certification resources
  • The workflow contains only checks that run on AH: they run separately (not by galaxy-importer because of its limitations). Should be enough to catch most of the problems on partners’ sides
    • It’s ready to use without any modification: just copy-paste to a repo
    • There’s a test collection that the checks run against in GitHub Actions scheduled and in every PR against the repo
  • Not to rely on external stuff maintained by other teams as much as possible:
    • We use only one external action for sanity checks: the rest is very simple to maintain it on our own, i.e. not to depend on others and reduce potential points of failure.

It’s ready for review, please take a look

certification/README.md at main · ansible-collections/certification · GitHub is incorrect. The EOL dates for core have to come from the product, not the upstream docs. Certified collections have to continue supporting and testing against the default core release included in the relevant AAP version. This table is probably the better one to link to.

It’s probably worth even mentioning in the README that the certified collection owner should not track the EOL dates in the core matrix on docs.ansible.com as that is a common point of confusion.

I’d also suggest removing all the checks listed in the readme in this section.. We would end up having to update that information in multiple places. It should all be in the certification workflow guide as the single source of truth.

1 Like

Sorry to come in with dribs and drabs on the comments but I think we should remove the README_TEMPLATE.md as well because once again, we would have to update it in multiple places and will invariably forget. We want a single source of truth.

I’d also suggest adding to the readme that the repo contains a test collection etc so that people aren’t cloning it to create their first collection.

1 Like

More dribs and drabs:

@samccann thanks for the feedback! Your suggestions from the first 2 comments were implemented by @oranod and I’ve just merged them, thanks

I’m not sure galaxy-imporer uses -x sanity on Automation Hub.

We run sanity tests as a separate job, so it’d be extra to run them by galaxy-importer too.

needs to include core 2.15 and 2.16.

I think galaxy-importer on automation hub is using core 2.16. Dunno if that changes things here as that means the ansible-test is from core 2.16.

As we’ve discussed earlier, the goal of the workflow is not to run things exactly the same as on AH because of some known limitations and complications it’d bring. How they run in the proposed workflows is good to catch most of the errors on partners side.

1 Like

Also FYI, the repos has been renamed to GitHub - ansible-collections/partner-certification-checker: Content related to collection certification

At the moment it is completely not clear what versions of tools are used.
This is just an example - if new ansible-lint released - do partners have to immediately follow all the recommendations. Because workflow run will install new version and report all failures.

Please for each and every tool specify exact version used in AAH check workflow.

1 Like

@kks hello, thanks for your feedback!
We could do it, but the problem is that the versions can be updated on AH too to the latest:

  • if we hardcode them to the workflow, the workflows copied by partners will get stale overtime.
  • if we introduce another workflow for partners that will call that current one and update the versions in the latter ourselves, the new checks will arrive in partners’ GHAs later anyway, probably in much bigger numbers.

In community collections, I see the maintainers test against just latest or often even against devel versions of the tools to ensure their content adheres to the latest standards. I think it makes sense. The failures will arrive sooner or later.

We could create some kind of matrix in README and add comments to the workflow saying that partners can specify those versions run on AH at the moment if they want, but with the caveat i explained above. I’ll stay unspecified, i.e. latest, by default.
Once we update the tools versions on AH, we’ll send emails to partners about it asking to update the workflows in their repos.
What do you all think about it?

Then recommend reusing the workflow, not copying it.

I am not sure that I got that. There is an option to have workflow versioned, so it is partner decision to use latest version of workflow and rely on updates made by partner org. Or use specific version and be in danger of not fulfilling all requirements.

I am not using and will not be using any GH workflow - they are impossible to execute locally. To be honest what I would like to see is something like the following:
"
Here is what is required to publish ansible collection on AAP.

  1. pass ansible-test sanity --docker default for all ansible-core versions that collection supports, see min list of ansible-core versions that should be supported.
  2. pass ansible-lint for all published content, ansible lint version is ansible-lint==25.9.2
  3. pass ansible galaxy-imported, version galaxy-importer==0.4.30
  4. pass collection integration tests and unit tests for all supported collection python versions and collection ansible supported version. Here is min support matrix python-ansible-core:
    py310-core2.15
    py311-core2.15
    py312-core2.15
    py313-core2.15
    py310-core2.16
    py311-core2.16
    py312-core2.16
    py313-core2.16
    py310-core2.17
    py311-core2.17
    py312-core2.17
    py313-core2.17
    py311-core2.18
    py312-core2.18
    py313-core2.18
    `
    "

Above is just an example and something I run for collection I maintain. It does not include - ansible-doc lint-collection-docs, reuse checks (as far as I know they are utilized by some community collections and they are great), antsibull-changelog lint and I might have missed something.

Yeah I also think copying will lead to drift and fragmentation.

@kks Thanks for the all the feedback and driving the conversation. To my mind, it sounds to me like you’re describing part of the end goal.

The workflow is intended as a minimal entry point to the certification process. It offers a convenient way to check against common issues that fail certification and is not a substitute for a robust test strategy. We’ve updated the project README recently to make things clearer.

I think where we’d like to go with this is helping collection maintainers with test coverage. Haven’t talked with @felixfontein directly about this but antsibull-nox provides a common interface for tooling that really abstracts away lots of that complexity. It’s much easier to integrate into an existing project and solves a few of the same problems that I think the partner engineering team also want to fix. Curious if you’ve had a look at antsibull-nox?

1 Like

I have not looked at ansible-nox. I remember I tried to use ansible-tox unsuccessfully - at the end of the day I just recreated all environment combinations I was interested in tox manually. ansible-tox generated too many environments and was a bit obscure. I suppose ansible-nox is similar to ansible-tox but using nox.