Advices on roles and projects best practices

Hi
As Ansible/AWX become one of our most important tool since few years, it’s time now to think about how to better organize our projects and our coding way.

At the moment we host all of our ansible/awx project into a sub-group hosted on a GitLab instance. I figured out that many roles we developped are common from project to project as well as some vars file (i.e Hashicorp Vault access , URLs …)

Here’s open questions :

  • what should be the best practices in order to streamline common stuff like roles, vars and common Ansible resources ?
  • Also, how do you guys automate role/task testing from tools like GitLab everytime you get some updates (ansible update, module update, execution environement update …) ?
  • In another way, how do you successfully set up projects to ease collaboration between workmates on your team ?

Thanks a lot for your help & reading !
Gael

We’re an IT organization in an academic environment. I’m not sure how much that changes these answers; I mention it to give you some context.

We developed a suite of common roles before collections were mature. If we had it to do over again, we’d probably use collections. For things other than task files - think plugins of various sorts - collections are the way to go.

Since Ansible’s implementation of collections matured, we’ve kind of dumped new work in our catch-all “common collection”. This would make more sense as multiple collections if we had more roles or plugins, but it’s only a small handful at this point. There has been no incentive to move our old common roles into collections because they work just fine as they are.

Our pushes into our gitlab instance and merge requests trigger Jenkins jobs that run ansible-lint. When appropriate, Jenkins also scans our AWX instance looking for projects that need an SCM update but aren’t configured to force one before a job. Again, this was all set up before gitlab runners were fashionable, and we’ve done little with runners because, again, the old Jenkins jobs work just fine.

We don’t have CI/CD test suites. Instead we test manually whatever we’ve been working on before merging. That works for us because we have few changes, and each of our service lines is so niche. I really don’t see how to implement rigorous testing in our existing projects. It sounds like a good idea, though!

We migrated our entire puppet config to Ansible many years ago, and there was a lot of learning and making mistakes along the way. And Ansible itself was undergoing significant change. All of that has settled down considerably. We have a dozen or so common roles - each of which is its own project - plus basically one project per “service line”. We rarely have multiple feature branches in play in any given project, and we have a small, collaborative team (7 people). And we’ve been totally work-from-home since the Spring of 2020, so we practically live in Slack. (Those Jenkins jobs I mentioned before post notices in our Slack channels when appropriate.)

These days, on the rare occasion we need a new project, we run a Jenkins job that creates a gitlab project, clones our “skeleton” project in our gitlab instance, sets up permissions, web hooks, initial tokens, etc. We run it every couple of months just to make sure it all still works, then delete the resulting test project.

If we were starting from scratch and hadn’t already been using Jenkins, then we’d be using gitlab runners more, and I’d like to think we’d build test suites. But a lot of what we do can’t really be tested without standing up instances of our service lines, so I’m not sure what that would look like.

So maybe we aren’t the best example of best practices, but I hope this gives you some idea of how somebody runs Ansible. Feel free to ask any follow-up questions you might have.

Good luck!

Sorry for my late answer @utoddl
Thanks a lot for your feedback ! :slight_smile:

  • The ansible-lint stuff is smart, i’ll have to check that ! So this is your way to check code format right ? and then, you manually test your changes ?

  • The Jenkins part is also very interesting, i’ll have a check on some similar settings on GitLab.

  • About the collections, why do you say you still stay on role instead of collection ? it means you’ll create 1 big collection with all your custom modules inside right ? so it means for every of your projects, you load the entire collection ?

That’s right. If it doesn’t cleanly pass ansible-lint with our agreed-upon lint config, then it won’t merge into one of our protected branches by this process. We can always override that process and merge manually for hair-on-fire situations, but that rarely happens.

Our testing situation may not compare well to others, in that our hosts are very much firewalled off into their own VLANs, and the parts we’re configuring are highly bespoke: web servers, tomcats, syslog aggregators, etc. We’re a “middleware” group, where “middleware” means something different to everyone who hears it. In our case, it means we get each of our Linux boxes from our systems group, and we add on the software layers that make it specific to a particular role. We also manage service VIPs on load balancers and reverse proxies. Looking down the stack, we don’t touch networking, firewalls, or other “system” level stuff; up the stack is user data and that’s also hands-off for us. Our “customers” are other internal groups who offer public- and campus-facing services above our layers. So it’s hardly ever practical for us to spin up a box to test, say, changes to our roles that manage downtimes because there’s so much other plumbing involved with VIPs, VLANs, firewalls, reverse proxies… It’s almost always easier to pull a non-prod node out of a VIP pool and test our changes on that. I’m not advocating this approach; it’s just how our organization and infrastructure evolved into the creature it is. So what if that creature has three thumbs on each of its wings if it can still swim?

Generally, our development and testing happens on an ssh bastion. We can run ansible with --become on all our hosts from there (except the bastion itself), or through an AWX instance (which we also administer). So we do piecemeal testing through the CLI and schedule daily jobs in AWX, pushing good bits into our gitlab instance from the bastion. Again, not advocating anything; that’s just How We’ve Always Done It™.

Again, we’re only using Jenkins because we were already using Jenkins. We should be pushing that work into GitLab runners. In fact, we have GitLab runners installed on the same hosts our Jenkins instances live on. It’s a good spot for it in our networking LANdscape. But there’s very little urgency to reimplement something that’s already working fine.

Rather than talk around it, let me give a few relevant concrete examples. We have lots of “service-line” specific projects. I’ll list three of them: mw-ansible-jenkins, mw-ansible-webster, and mw-ansible-lms. You can guess the first one. webster is a little utility host just for our group’s stuff: an Apache httpd, some cron jobs that generate reports, a little certificate management, etc. lms configures a Learning Management System that sits on a cluster of nodes each of which runs a pair of tomcats.

Then we have some “common roles”: mw-ansible-common_logrotate, mw-ansible-common_httpd, and mw-ansible-common_tomcat to name three. Each of the “service-line” projects in the previous paragraph uses at least two of these three common roles. (I currently count 11 common roles in our portfolio.) Moving any of these common roles into one or more collections would be (1) very easy, and (2) completely pointless; they hardly ever change, and they’re all working fine as is.

Now, we did create a single local common collection a few years ago. It contains four filters, three lookups, and two modules. It also contains four small roles, one of which is no longer in use. New work will continue to land there unless a reason arises to justify the work to split it into multiple collections. It could happen, but I can’t see anything on our 6-month horizon to suggest it.

Having said that, a couple of years ago one of our projects did sprout an internal collection. Our tableau project has a local.tableau collection (collections/ansible_collections/local/tableau) for the sole reason that two related plugins needed to share some python code, and there’s no good way for non-collection plugins to share code. @felixfontein was a huge help to me when getting that working, and I remain forever grateful for his insightful assistance.

1 Like