ansible on 'brownfield' sites

I've had great success using Ansible to build up multiple environments
(devs, staging, prod) from kickstart to production, and I imagine
that's how most folks here would use it.

My current mission is to add some automation onto a lot of 'handbuilt'
environments that have grown up from scripts. As you'd expect, each
new environment has learned from the lessons of the previous one and
the scripts have evolved.

Unfortunately, that means each environment is different, sometimes
radically - not just in scale of a given service but often in the
presence or absence of it. There's a lot of commercial software
deployed too, which results in the inevitable "we don't have enough
licenses for $TECHNOLOGY to run it in all the dev. environments" and
more drift.

Basically "a maze of twisty environments, non alike" ..

I'm making some headway in getting inventories written up to at least
start to use adhoc tasks for some of the routine tasks. Next phase is
"roads and sewers" - fairly straightforward services that are simple
to setup but essential (e.g. NTP, SNMP, yum repos).

I'm hoping to tackle some of the inconsistencies with liberal use of
$inventory/group_vars folders to enable/disable roles based on where
I'm running.

I had a look around but haven't seen much discussion about
retrofitting Ansible to try to tame this kind of sprawl. Would be
interested in how others have tackled the challenge - "nuke the site
from orbit" and "run away screaming" excluded.

Mostly the way you seem to be doing it as well I guess… I have various customers where I’m implementing ansible on existing infra.
I start with the basics, getting information on various systems and using group_vars and host_vars to enable/disable bits of configuration from ansible. Lots of running ansible with --check and --diff on small sets of hosts and trying to minimize (functional) diffs.
Then slowly getting the systems all in line with eachother.

Greenfields would be nice… but most customers aren’t up for it… but it’s fine either way.

git(or vcs of choice) is your friend! now that thats out of the way,

we keep different inventory files for some stuff, and sometimes a vagrant file with its own inventory as “dev/test” for smaller or personal projects. one folder will have roles downloaded from galaxy. the others next to it have be for their own sections instead of one big ansible tree for everything. for example, all the routers and firewalls are in their own tree, so if you want to look at the changelog for those, its easy. so your tree could look like this. if you want to keep them all in one big repository, you’ll probably want git submodules.

ansible

if by nuke it from orbit you mean build a new, streamlined environment and wipe out the old one, thats my favorite approach.

I am currently concentrating on building the new systems all the same. If nothing else happens as the old systems are life cycled the new systems will take over.

If I get to the point where I have spare time I will start slowly fixing the old systems. One (or at least one batch) at a time. I don't intend to modify everything but if I can take over the management of the basic infrastructure then at least that will be uniform across all systems. DNS, NTP, monitoring, backups, SA accounts,...

I distrust ‘manual systems’, the first thing I normally do is reverse engineer each type of server and rebuild in an automated fashion (I did this way before Ansible, through bash/sh/ksh/perl/python scripting and other CM tools).

The easy part is looking at installed packages and things configured in /etc, a bit harder is weeding out stuff that is not used anymore or obsolete.

Then the ‘fun’ begins, find the /home/whatever/app.rb that is being executed from a detached session from when a dev logged in 2yrs ago and see how that app SHOULD run, its deps and requirements. Sometimes you need to change code as stuff is hardcoded and expectations are implied in subtle ways. Normally you can justify these in the process of adding high availability, scalability and/or security to an app.

In the end I have a way to spin up any type of machine (app, web, db, etc) by running a script with a few options (last 3 jobs it was playbooks!). It is hard work, takes a long time and sometimes you have to overcome resistance by your coworkers rather than technical hurdles, but the end result is worth it.

Resistance from co-workers is a big problem…

Legacy systems is another one but can be dealt with…

In some cases their are other issues though…

But equally it is not always possible to replace manual installations. I am not an Oracle DBA and in the organisation I work for the DBAs are a separate team. While it might be possible for me to automate Oracle installation and configuration, I don’t know what to do, and I don’t think that they would let me. (Heck, they don’t listen when I tell them that asmlib is not needed, is a hack, and we should use the OS equivalent which is better supported)

Some of our installs could be automated, but when you only install the app once on two machines and the vendor supplied instructions are several hundred pages, I am not going there. If the vendor wants to automate their install process then great, but in that case I am not. (The application, database, message bus, DNS entries, file systems, network settings, etc are all detailed. The application alone consists of several hundred daemons and has its own registry…)

Finally I also have to deal with some systems which have been approved by a regulatory agency. Those ones cannot be changed without a lot more approval work. In those cases we are mostly at the mercy of the application vendor.

Not all environments are the same.

I’ve been a DBA at times, other times I had reasonable DBAs, I’ve been QA at times, other times I had reasonable QA … etc … etc.

When it comes to govt/regulation restrictions, then you really are SOL and just have to ‘live with it’.

What I described above applies mostly to medium/small shops or (rare case) a big company that for some reason has been able to allow for a full revision of the systems, not just the machines, but standards and workflow.

Indeed. I am talking about a large organisation (several hundred Unix/Linux servers alone, across five states, six major data centres) which has a lot of cruft, some very large, very complex applications, some federal regulation, and so on.

Even in this situation though we can introduce standardised server builds for all new servers, roll the infrastructure pieces under configuration management on (most of) the existing servers, and so on. Even if the applications look different on every machine (and some of those will be rolled out with the same configuration management tool, increasing as the apps get lifecycled) the operating systems themselves will become more and more uniform.

While it would be nice to be able to just rebuild everything it isn’t always possible. But starting with one or two changes and rolling those out, then moving on to the next one might seem like small victories but over time they all add up. When you know that you can log on to a server and have standard tools and configurations it feels much better than not knowing who hand crafted the artisanal server you now have to deal with and what their preferences are.

Of course management likes the fact that I can roll out a new Linux server in 15 minutes compared to the old “two weeks” and they were all different.

A slight detour from the original question but, I’m an Oracle DBA and I use nothing but Ansible to manage my installations/configurations and I couldn’t be happier. It doesn’t matter if it’s a cluster (RAC) or stand alone systems, the toolkit I wrote handles it. So, it can definitely be done.

I recently changed jobs and I’m in the process of retro-fitting our (fairly large) Oracle environments to be managed by Ansible. Fortunately I haven’t had much resistance yet.

Good luck!
/M

Thanks all, glad i'm not on totally the wrong track :slight_smile:

There's plenty of buy in from the new team, it's just deciding which
sequence of bites we use to eat the elephant. Agree with earlier
points that the key to this is knowing when to stop (Oracle team have
actually got their act together, but they're happy using
Oracle-specific management and I have no intention of changing that
for changes sake).

Another interesting point is there are at least 3 other CM systems in
play here, all of which are doing a bit of the management roles (and
yes that's as interesting as it sounds). I'm still finding servers I
'miss' because they aren't in the official inventory (I have a nasty
feeling there are several official inventories >_< ).

I'm also expecting some talks with architects when it comes time to
start decommissioning e.g. the puppet layer; the on call teams are
always the ones who see the benefit but when you start to revisit the
Agreed Solution you need to be careful :slight_smile:
I'm wary of stepping on toes and coming across as a zealot, so I think
it's time to meet up with a few teams and see who's keen to migrate
first. With a bit of luck that'll get us some momentum.

It's a redhat/satellite shop so the upcoming Ansible integration with
Satellite is hopefully going to make this an easier sell (plus the
change windows I've been putting in are waay smaller and management
have started to notice that).

Wish me luck, should be interesting :slight_smile: