Ansible philosophy for those new to the list == keep it simple

Michael_DeHaan · June 20, 2013, 9:05pm

This is not meant to be anything as formal as the “Zen of Python”, but I figured I’d type it up since we’ve multiplied userland by 100x since we last talked about the design goals of the system.

I tend to be detecting a lot of questions about “how do I do super-complicated-thing” lately, and wanted to avoid the impression that I’m trying to squash that discussion any.

However, it’s true… Ansible as a project dislikes complexity! This I do
try to squash, and I am going to fight to make sure we maintain that current aesthethic, and simple way that we can explain everything to everyone.

Ansible playbooks are meant to be minimialistic and simple, super easy to read, very little to guess about. A small number of concepts, not a lot.

Playbooks are about plain English, and I like to talk in plain English too. That means everything we do need to be simple to explain, and that means there should be (as Python says) one logical way to do something (rather than like 5).

In that vein, if you find yourself trying to be clever, to over abstract, etc, in writing content, I’d say step back a bit and trust the system for a while, and see if you REALLY need those levels of academic abstraction.

Ansible wants you to follow an easier path. I won’t say things can’t evolve, but we do and will actively resist expanding the syntax until there’s a vast visible need for new syntax. (Roles came in for 1.2 because of one of those needs and I’m pretty happy with them fitting in with the aesthethic), but I’m also cautious about seeing things evolve
too quickly.

If you come from a development background, you’ll quickly note Ansible is not meant to be a programming language. Over the last year we’ve adopted some ‘pluggable’ things that can make playbooks complicated, I’d recommend not racing into needing to use them all at
once. Lookup plugins? You probably won’t need them initially. Etc.
Think of it more as a modelling system (like Legos?) than a programming language.

It’s not meant to be a tool you should have to obsess over, and it believes “perfect is the enemy of good” in many cases. Ansible
wants you to get something done and move on and go back to do
something else that is not Ansible

The idea with Ansible is you should be able to write some content, get it working, and not touch it again… You shouldn’t have to edit Ansible content every single day, it wants to help you and be unobstrusive.

We care a lot about backwards compatibility, and try to keep things working. For this reason, we also move slow on language changes and make sure there’s a very well validated need for them among a large percentage of user land.

However, we move fast on modules… we’ll include modules for all kinds of popular services.

We try to minimize syntax and new things, and things with lots of buzzwords that make it hard to explain, or make the technology less suitable for newcomers. This means both computer-sciencey terminology as well as meme-related things. We could have choosen to associate everything with names composed with some components of the sheep shearing industry (arbitrarily), but did not, on purpose. Thus, also in conversations, we value directness and simple real world examples, rather than theory.

Finally, what we are building here is not just a configuration management tool – but a tool that can be used for configuration, app deployment, and general IT orchestration purposes. As a result, some things are going to be more flexible and open ended. Also as a result, attempting to apply strictly configuration management based ideas to things may also not quite always fit 100%. We’re not just automating the one-node-standing alone, etc.

We try to be different by not porting every single idea from every other tool out there – but modelling what works, and in ways that make Ansible efficient to help you work on content.

So anyway, I hope that helps make it a bit clear where I am coming from, especially from the influx of new folks coming from other toolchains.

Don’t try to port all of your previous systems over directly, but see how they fit naturally, and I think you’ll find it more enjoyable!

Anyway, I hope that explains – just a bit – my views on complexity, language evolution, and why I’m not super anxious to throw in a lot of new concepts.

We have a SUPER large community that could easily take us in 5000 different directions at once, and I think if Ansible is to stay true to itself, it ponders all of them really carefully, least it fall down the traps of previous projects – where it becomes really hard to explain, full of inconsistencies and secret hard-to-acquire knowledge and so on.

There you have it

kahlil.hodgson · June 20, 2013, 11:57pm

Ansible playbooks are meant to be minimialistic and simple, super easy to
read, very little to guess about. A small number of concepts, not a lot.

This was one of the biggest attractions for me when I first came across
ansible, so I hope we can keep it this way.

In that vein, if you find yourself trying to be clever, to over abstract,
etc, in writing content, I'd say step back a bit and trust the system for a
while, and see if you REALLY need those levels of academic abstraction.

The challenge for me, coming from a programming background, is "seeing"
obvious "programmatic" approaches to modelling our systems and having to
accept that this will introduce complexity that I'll regret later, and that
there is probably a better "data-centric" model that I just can't "see"
yet.

In the process of migrating across from bcfg2, we have had to revise our
model of our systems a number of times to get ansible to work for us. Each
time this has exposed aspects of our systems that we did not understand as
well as we thought, including data and relationships that had been obscured
by earlier "programmatic" approaches, and unnecessary complexity that we
are now glad to be rid of.

The idea with Ansible is you should be able to write some content, get it

working, and not touch it again... You shouldn't have to edit Ansible
content every single day, it wants to help you and be unobstrusive.

This is great for users like myself who don't have sys-admin or dev-opts as
their primary role. I don't use ansible every day. I don't change or even
read our playbooks very often, but if I do, its great to know I can get
myself up to speed quickly, make whatever changes are required easily, and
get back to doing other things.

Finally, what we are building here is not just a configuration management
tool -- but a tool that can be used for configuration, app deployment, and
general IT orchestration purposes. As a result, some things are going to
be more flexible and open ended. Also as a result, attempting to apply
strictly configuration management based ideas to things may also not quite
always fit 100%. We're not just automating the one-node-standing alone,
etc.

Its good to see this reiterated. I've always thought of ansible as an
orchestration tool first, with configuration and deployments as a special
cases of orchestration tasks.

Thanks for all your great work Michael, and thanks to everyone else who has
contributed to the project. Deployments over the last year would have been
almost impossible without such an great tool.

Cheers,

Kal

Kahlil (Kal) Hodgson GPG: C9A02289
Head of Technology (m) +61 (0) 4 2573 0382
DealMax Pty Ltd (w) +61 (0) 3 9008 5281

Suite 1415
401 Docklands Drive
Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that
the parts you are reassembling were disassembled by you. Therefore,
if you can't get them together again, there must be a reason. By all
means, do not use a hammer." -- IBM maintenance manual, 1925

Bit_Divine · December 9, 2016, 8:02pm

On the matter of philosophy, as someone who has taken care of infrastructure for a while but who is new to Ansible, here is my general take. What do you think? How does it compare with your vision?

A deployment script is primarily a declaration of intended state. (E.g. logical volume X should exist and should have size at least 10GB.)
If you tell a physicist the above, he or she is likely to counter that abstract truths don’t exist in the real world. The only truths are empirical. When we say that a drive has size 10G what we really mean is that we can write 10G of data to it and read it back. Actually doing that every time we want to check a partition size is a bit slow and tedious so we may use lvdispaly instead (knowing full well that lvdisplay can give a different answer) but this is still an actual verifiable test, not an abstraction.
Corollary: A deploy script’s declaration of state is a sequence of tests of the form “is the world in state X?”.
If a test of the form “is the world in state X?” returns negative, a deploy script should have an action of the form “make it so”.
Lesson from life, just because an installation script returns true doesn’t mean that it succeeded. The world is full of broken code and it is pointless trying to rail against it. Any deploy script that does: check else install is vulnerable to broken installers. Site reliability engineers cannot afford to write installers like that. Their installers must always be of the form check or ( install and check again ).
Corollary: Every aspect of an install script needs two functions: check and install, and they should be run as check or ( install and check again ).
feature X:
check: function that checks whether X is so
install: make X so

feature Y:
check: function that checks whether Y is so
install: make Y so

…

One nice thing about this philosophy is that it is not infrequent to be faced with a situation where you think: Great, I have installed a program called foowidgeebar. Now, how do I know whether it works? If the installer is well written it will have a definition of ‘it works’ that is empirical. If foowidgeebar is a web server, the ultimate test is curl http://localhost:80. If the program is a postgres database and the definition of installation is that user moog can run sql queries then a test is sudo -u moog psql -c "select 99;". It becomes really easy to check whether a machine, system, server or cluster is in a good state. If an installer doesn’t have a read-only “check that the installation is good” mode than that is a big red flag.

Now, why use a structured format such as YAML to encode the desired state? A: Because that lets you reason abut your desired state. It makes it easy to say: My desired state has 8 hosted servers so my projected monthly bill is 16 gold ingots. or My desired state has 100 users but actually there are 110. Who are these additional users with access to my systems?

How closely does this match Ansible’s vision?

Best wishes, Max

Jon_Forrest · December 10, 2016, 7:36pm

[...]

How closely does this match Ansible's vision?

My gut feeling as someone who's not an Ansible expert
is that you're putting way too much into this. I think
things are simpler than what you describe.

An Ansible deployment script is not "a declaration
of intended state". I might describe a Puppet manifest that
way, but an Ansible script is like a program in
Python (or many other languages) where you start at the
top and move down.

I have no idea what the rest of your post is saying.

Jon Forrest

Dick_Davies · December 10, 2016, 7:47pm

You're right to point out the map is not the territory.

The disk space argument sounds like just semantics, generally I think most
engineers hear '10Gb disk' as 'storage with 10Gb capacity' with various caveats.

I think idempotent playbooks cover the 'check installation is good'
rather well in practice.
We frequently run our playbooks and verify they're all green, with
some thought it's straightforward
for that playbook to not affect state if no work needed to be performed.

Ansible works best (for me) when it delegates to the underlying
operations primitives - a service
task _can_ hack around a badly written initscript but fixing the
underlying initscript pays
off in more situations.

'who are the additional users' is not a question ansible should be
asking, in my opinion.
I think of a playbook as a job description - if you can do the job I
don't care about your other
attributes. That way lies the madness of CMDBs.

Spike_Robinson · December 10, 2016, 9:45pm

I'm kind of with BitDivine. I've been using Ansible "in anger" (at times literally!) on a real world environment for a few months now. I also try to make my playbooks a statement of desired state. Most of the time that's easy to achieve. It gets frustrating when it's not so easy to achieve, but each of those challenges has pushed me up the learning curve of the language, so it's all good. As Michael says in the OP, it's easy (and good) to write scripts in plain English, and once written they stay written and useful, you can forget about them and still rely on them.

I also try to write idempotent scripts. Again, this is easy often enough that in the cases when it isn't so easy, it gets frustrating. You can see from the varying functioning and usage of the various modules that idempotence is probably one of the "computer-sciencey" "memes" that the OP disdains. It's a general principle for modules, maybe, but it's far from rigidly enforced or universal or uniform. Or in some cases, it's so taken for granted that you can't see from the documentation how idempotence is going to work, but then magically it "just does". I'm learning that with Ansible sometimes you just have to close your eyes and *believe*.

"Install then (check or install)" is a good pattern. Running the script until it goes green is a much easier meta-pattern (a behavior pattern rather than a programming pattern). It's a great habit to get into and on our site we are doing it all the time. (Along with, while testing the playbook, actually checking the system *hasn't* changed when the playbook responds green!) Again, the fact that this "just works" most of the time makes it doubly frustrating when, for example, tests that perform no actual system change report Changed by default, or, worse, tests like (the touch module) actually do change the system when executing what should be a read only test.

So I can see the merits of all the approaches mentioned here.

Bit_Divine · December 12, 2016, 9:11am

Do you feel safe running the playbook automatically every five minutes and having it alert you when things were not as expected? Personally that’s the point I’d feel queasy about, and given that a playbook (in my philosophy) logically must have a check only mode, not as a feature but as something you get for free, I’d be happier using that. Automatic correction would be cool but I’d only want it at night if it’s run during office hours only for long enough for me to trust it. Or at least long enough for me to make a call on whether to take that risk. Where I can take the risk I’d be very happy for the infrastructure to be self healing! That is not just “make it so” but “make it stay so”.

I agree completely (from my limited understanding) that Ansible it is not primarily a monitoring tool. It is primarily a “make it so” tool. Whether one gets an “is it so” out for free is a litmus test for whether it’s a high reliability “make it so”. What to do about e.g. extra users of a system goes beyond even the free extra. Funnily enough I was flipping through a Google Site Reliability Engineering book on the train last night and they brought up exactly the case of incorrect user lists. In their manual they recommend having the system notice and alert but not try to autocorrect. I wasn’t even going to go as far as alert. Finding all users in Ansible takes one jsonpath selector. Getting all users in AWS also takes one line of code. Writing a check that does the diff automatically is super-straightforward! It uses Ansible as a data store but it’s not Ansible that does the check.

I am interested that you describe Ansible as a job description rather than the description of desired state that I had been imagining. That makes it a means to an end, an imperative language in declarative guise. How do you represent the desired end state? I am looking for something that is good at describing that end state and if Ansible is on a divergent course it’s good for me to know early!

Regards, Max

Bit_Divine · December 13, 2016, 10:13am

I too love things to be idenpotent. It is no accident that check or (install and check again) is automatically idempotent.

Bit_Divine · December 13, 2016, 10:15am

iden/idem - I am one of those mathematicians who can’t figure out why ‘identically’ ‘potent/powerful’ should turn into ideMpotent. Apologies for my bad spelling!

Topic		Replies	Views
ansible week and thoughts Ansible Project	10	3	May 5, 2012
Sharing playbooks, etc Ansible Project	23	19	December 17, 2012
Infrastructure Development Workflow? Ansible Project aws	5	7	September 26, 2014
Ansible best practices: idempotance Ansible Project	17	28	June 19, 2021
New "Best Practices" documentation, please help me review Ansible Project	20	9	February 25, 2013

Ansible philosophy for those new to the list == keep it simple

Related topics