Standardizing Ansible Code

If this should be posted elsewhere, let me know.

I’m looking for real-world stories from organizations that have standardized Ansible automation beyond “a bunch of playbooks in a repo.”

In my current environment, we’re pushing hard toward reusable content (roles, collections, and shared libraries) as the default instead of one-off playbooks. We’re trying to balance a few goals at the same time: keeping the developer experience lightweight, avoiding a “central automation team bottleneck,” and still having enough governance that people trust what’s in the catalog.

I’m particularly interested in how you approached:
•Getting teams to think in reusable components instead of quick scripts (for example: code review practices, templates, linting/policy, training, or “golden role” examples).
•Deciding where content lives and how it’s owned (single monorepo, domain-based repos, or collection-per-team, and how you made ownership and SLAs clear).
•Handling versioning, deprecation, and breaking changes for shared content so that consumers aren’t afraid to depend on the common roles/collections.
•Integrating with Ansible Automation Platform or other tooling to expose an “automation catalog” that people actually use, not just a theoretical library.

If you’ve gone through this journey, what worked, what didn’t, and what you’d absolutely do differently next time?

3 Likes

Here is one experience if relevant :slight_smile:

  • Getting teams to think in reusable components instead of quick scripts
    A bit of a cheat here. Small company here, only two or three teams involved. One team is dedicated for implementing reusable Ansible components, other teams use these components as they are. Teams that use components generaly don’t have enough expertise to develop Ansible code - they just have enough expertise to use it according to the strict rules and given examples. If anyone capable of developing Ansible code appears in the team, he is “promoted” to the development team :slight_smile: . By developing “well thought out” reusable Ansible components, we have much lowered required expertise for Ansible usage. A couple of two-hour workshops proved to be enough for people to start using our Ansible, even for those that previosly had no experience with it. The biggest chalenge became teaching people how to interpret and write proper YAML, including data types it supports. This is especialy the case for people with no developer background whatsoever (strict system engineers for example).

  • Deciding where content lives and how it’s owned
    All the code is centrally hosted on a single Git server and majority of the code is owned by already mentioned single development team. Each component, and in our case these are either Ansible collection, importable profile of group_vars, Ansible project template/sample or helper tool, has it’s own repo and maintainer. Code ownerships are essentialy managed on the Git server (think of like GitHub). Everything that is provisioned, configured or managed by Ansible starts as a dedicated Ansible project based on one of predefined Ansible project templates/samples. Projects are divided per client, environment, purpose etc. One “user team” also has team specific components which they independently manage by themselves. Still, they have to be in line with the shared code.

  • Handling versioning, deprecation, and breaking changes for shared content so that consumers aren’t afraid to depend on the common roles/collections
    We have strict rules here. Everything is semantically versioned. This means that minor versions bring only bug fixes and are fully compatible and without regressions. That being said, update to any new minor version is safe. We test, verify and assure that. Major versions bring new features and possibly break compatibility which is well documented in changelog and in porting guides. Upgrading to a new major version thus involves some porting. For deprecations, similar tactics are used like in Ansible itself and community collections. Deprecation warning will be issued at some point and feature will removed two major versions later. We also have a concept of a global version of the whole “solution”. This means that we put a version tag on a certain combination of versions of all components and say that as long as each component is within the same major version, every component will be compatible with each other. When some component gets a major release it usually requires some update to other components too. They also get a major release and version of the solution is also increased. Finally, each component has stable branches and we backport bugfixes to two major versions bellow. Careful eye will easily spot that these are the same tactics used in community Ansible ecosystem in general.

  • Integrating with Ansible Automation Platform or other tooling to expose an “automation catalog” that people actually use, not just a theoretical library
    Well… a bit of a cheat again. We don’t use AAP. We have some internal tooling, CLI based. Our onboarding process enforce the users to use our internal tooling. That way we have a lot of control over the process of using Ansible.

What worked and what didn’t? Trivially implemented Ansible usage and maintenance workflows failed. More freedom brought more chaos. We had to implement strict control of everything, including development and versioning. Extensive testing is also a must. Simple, trivially implemented and half implemented components going into production (tight schedule) also made a lot of mess. That’s why we had to make a certain level of implementation quality and completeness for all components a hard requirement.

Hope it helps :slightly_smiling_face:

4 Likes

You could get so many different not-wrong answers. The complexity of an organization’s Ansible implementation can vary a lot, and how it’s organized will probably reflect the organization’s management structure as much as the nature of the work.

I’m in a higher-ed centralized IT organization, and my group manages about [frantically greps through inventory] 21 service lines (Wow! I was going to guess a dozen), where “service line” means one or more hosts that do roughly the same thing and are managed by dedicated Ansible projects each with its own git repo.

Lots of those service lines have components in common. For example, quite a few run Apache httpd services, and a subset of those have our single sign-on solution directly on them, while others have SSO offloaded upstream on proxy servers. So we have a common role for httpd, another for SSO, likewise for log aggregation, monitoring, etc. Those common roles each have their own git repos. Most of that was implemented before collections were A Thing™, but they still work just fine. We have moved a little bit into project-specific collections (using the local namespace), and a couple of shared collections.

Our testing methodology is… embarrassingly simple. Whoever makes a change knows what was changed, so that person runs ad hoc tests to see where the surprises are; iterate. Create a Merge Request (we’re on GitLab on-prem, so we have Merge Requests rather than Pull Requests), get somebody else to approve it - presumably after looking it over - and merge it. Never push to production after lunch on Friday unless you’re on-call.

This is not robust enough for some orgs; it’s overkill for others. For our 8-person group with very little daily config churn, it’s about right, and we have very little to coordinate with other groups…

…except for AWX, which we also run for ourselves, the DBAs, the systems group, security, etc. As has been pointed out repeatedly in other threads, however, AWX has been “frustratingly stable” for several years now.

We didn’t get here on purpose. There was no architecture spec or design mandate from above. These are all minor scars from scratching itches over nearly a decade. We’re in a comfortably productive place with enough work to keep us busy, so we have insufficient motivation to make sweeping changes. Not all weeks along this journey could be described that way, and that’s okay, too.

I hope this helps.

2 Likes