Ansible Survey 2024 - Analysis & results

It’s been 2 months since we closed the Ansible survey, so I’ve had a little time to sit down and start going through it. We’ll dive into the raw data, and a few more detailed plots and thoughts.

This is going to be long post, so strap in, and let’s go! :arrow_forward:

A huge thank you

Let’s start with thanks - we had 1,117 responses to our survey, which is amazing. Our previous biggest survey was the docs survey in 2020/1 which got ~900 replies, so I’m very happy to have exceeded that. This survey was also much larger and time-consuming, so my heart-felt gratitude goes out to everyone who took the time to give us their answers. THANK YOU! :tada: :heart:

What this survey isn’t

There are a few things we can’t do with this data, and it’s worth touching those:

Not linked to the wider community

It’s usual in surveys to ask some demographic questions that you already know the answer to, because doing so allows you to weight your replies such that you can start to draw conclusions about the wider population, including those who didn’t fill out the survey.

Unfortunately we don’t have that demographic data (no, we don’t even know the distribution of Ansible versions in the “wild” because we don’t do telemetry!) so we can only draw conclusions about the people who answered the survey - so bear that in mind. Anecdotally though, we know these people tend to be more engaged and more more likely to run later versions, so you would expect the views of those on slower-moving distributions or more glacial enterprises to be under-represented.

Not trend data

This is the first really big survey we’ve run for Ansible in, well, forever as far as I know. As such, we only have one data point for many of these questions. That’s valuable in itself, as we’ll see, but it means we can’t answer questions about whether something is getting better/worse/bigger/smaller.

Not highly modelled

This one we can fix, eventually, but right now I’ve only had time to do the raw data (that’s still ~60 plots!) and a small amount of modelling. So don’t expect to see “these 5 factors are the biggest influence on DevTools usage” or similar.

These things are possible though - the problem is that there are millions of ways to combine these questions, and I clearly can’t do them all. So rather than try to guess, I’ve decided to publish the raw data, and ask you what more detailed analysis you want to see. So, if you want to know how Ansible version varies with experience, or if EDA, network automation, and AWX are correlated, or anything else, let me know!

Survey Goals

So what was the point? Well, this was a wide-ranging survey, with input from many groups on things they wished to know about. We can’t build that trend data without a first data point, so we have to start here. We also wished to get some baseline demographic data on the project, for dealing with that weighting as well.

Ultimately, there’s just a lot we don’t know, and this is a first step to addressing that - but I expect this will throw up more questions than answers. What’s important is that we keep digging into it.

Results

Let’s get into it. The format is as follows - each question gets its own plot, with raw counts and percentages. For free-text replies, I’ve “tokenised” the replies (split into individual words) and then made a table of the most frequent words in the response.

Many of the plots are self-explanatory, so try to keep the length manageable, I’ve only commented on plots I think are interesting.

At the end, I’ve put two examples of more detailed analysis, looking at ratings compared to free text, and stacking the ratings together. This is a mere taster of some of the more detailed work we can do if you come to me with ideas :slight_smile:

A note on sample sizes

We had 1,117 responses, spread across various sources as follows:

I find it really interesting that the docs banner was so much more impactful than even the forum - a reminder that despite the forum’s success, it doesn’t have the reach to the average user that the docs does. Also noteworthy is how little things like social media contributed, though I expect that is because the people who saw it there had already seen it on the docs or forum.

The plots

These are (mostly) in the order they came in the survey. In most cases I’ve stripped out “NA” empty responses because that way the proportions respect the portion of the community who replied to that. However, for straight Yes/No I’ve replaced NA with No so we get a proper sense of how many people use that feature.

Very slight uptick in newer users, but probably with the margin of error. Bear in mind this is unweighted, so we’re likely skewed towards the more experienced user anyway.

“Other” here needs some work, I suspect that if we went through and hand-classified it, the other bars would increase a bit. Job titles can be very special, after all :slight_smile:

Now this is interesting - because words like “homelab” and “development” are as common as “services”, “government”, “finance” and so on. Just serves to remind us of the breadth of usecases that Ansible serves.

I’ll be honest, this caught me offguard, because the tech world likes to
hype on the new use case. So, it shouldn’t surprise me that Linux and
application management come top, but it did. It’s good to be grounded
once in a while :wink:

I knew we had a strong European base for our commuity, but I had
expected it to be closed to North America. Maybe evidence that
AnsibleFest needs to return to Europe? (Don’t hold me to that!).

I did hear from Sandra that this matches the geo-ip data we have from
the analytics on docs.ansible.com, so that’s reassuring for the validity
of other survey results.

This is one area where I’ve done a fair bit of modelling, as “score” is
a nice outcome variable to work with, and see what influences it.
Notably, people tend to give higher ratings as experience (time)
increases, but even the newest users are more likely to give 6+ than
anything lower, so I think we’re doing OK.

For an interesting use of this variable, see the “chatterplots” at the
end :slight_smile:

This one really made me think! The fact that (marginally) more people prefer to use the OS-provided package over pip is a real surprise. There’s probably
several consequences to this, if we really take it to heart - it
certianly helps to explain the long tail of older versions we see in the
previous plot.

We also probably need to make sure we’re holding good communication
channels with our various downstream packagers, as a good chunk of our
community is going to base their experience (and thus that score above)
on the package created for the OS. We can’t just wash our hands of it.

Nothing too surprising, but I’ll note (a) that cloud deployments seem to
trend smaller than physical ones, and (b) that (predictably) use of AWX
etc trends towards larger deployments. Good to see some things we
intuit come out right!

OK, this needs some more massaging :slight_smile: - but the takeaway is pretty
clear. Linux still dominates, with RHEL, Ubuntu, and Debian taking the
largest share. Combined with the pip vs packaging result above, I think
this helps us to know what packages to facilitate.

It would perhaps be interesting to do this table once-per-install type,
and see which OSes are common to those not using pip…. let me know if
you want to see that!

Genuinely surprised to see VSCode so far in front here. Maybe I’m just
old :wink:

Processing note - this was a “top-5” question, so I totalled it by
giving 5 points for 1st place, 4 points for 2nd, and so on. Then I
summed the scores and plotted the above result.

We know community.general is a bit of a grab-bag, but this shows just
how many key things are held there. We might want to think about what
that could imply for future work, whether it means more effort on community.general
itself, or splitting more things into new collections … I don’t know,
but it seems a disproportionate score in some way.

Still a lot of role-activity!



Somewhat less that the main score, but only a little. That makes sense,
writing is harder than just consuming other collections, but it’s good
to see what’s coming up as issues.




Examples coming up a lot here - not a big surprise, as keeping that kind
of content up-to-date as the collections and the language evolve would
seem to be a big task, to me.

I’ll be honest, I don’t have a lot of insight into Cloud & networking,
so the next few plots are straight-up, no-commentary. See you further
down the page :slight_smile:














OK, I’m back, phew! :grinning:

32% of ~1,100 is a decent chunk of the community, and higher than I would
have guessed (my take was ~20%).





Coming out at very close to the overall project score, for a complex
part of the ecosystem, is a pretty big win, I think.

EEs are required for certain ways of using Ansible, but the uptake here
is higher than that, so people are clearly finding value in EEs more
generally. It would be good to hear back from some of our EE users about
how we can make that better, or educate other users on your usecases.



EDA use remains really low, but I wonder if that’s a usecase problem or
a deployment problem. Ideas for what to subset this with to see if we
can identify patterns in EDA use are very welcome!




I don’t find this especially surprising, we’ve known that people like
using Ansible with Terraform for years. Python / scripts makes sense
too. Nice to see ARA, Semaphore, etc in the top list too.

It’s a busy world, I know. We won’t judge you, but a big thanks to those
who do find the time to keep up to date. I’d also point out that this
relates to the install-type question at the top - if you use OS packages
you also have a lower burden of things to update manually….

I can’t fix the lack of time (although I’m first on the waiting list if
someone else does, please). But I would love to challenge the idea that
people don’t have enough experience to contribute! Speaking as someone
who got started in FOSS 20 years ago by hanging out in chat rooms and
answering questions (frequently badly!) I know that your perspective and
usage bring something, even if you think you’re doing nothing special.
Hang out in the forums, and sooner or later there’l be a question you
can answer, and then you’re off :slight_smile:


The Steering Committee asked for these two questions, as the maintenance
of the current package as raised some concerns. However, it seems most
users feel the package is roughly on-point!


The amount of positivity in those top words for final comments is always
going to bring a smile to my face :slight_smile:

Extras

Chatterplots

That’s all the questions, but I promised you a couple of extras! Let’s
start by looking at open-ended replies, and the ratings questions - and
then see what words are associated with higher or lower scores.

Take a moment with this - we’re saying that words higher up (bigger Y
value) are more frequent, and words to the right (bigger X value) are
associated with higher ratings (the 1-10 score question). The black
vertical line is the average score, 7.99.

On the right we see a lot of love for the docs, the community, and even
the survey itself! But I think it’s interesting that words like simple,
language, programming, and so on are more negatively rated - perhaps we
are no longer as simple & easy to use as we once were?

What other “text vs score” plots would you like to see? Let me know!

Ratings stacked

Another thing we can do is take all the 1-10 rating questions, and stack
them together:

This gives us the chance to see all those score historgrams stacked
together, along with the number of responses to each score question.

I don’t think this is too surprising - but it’s good to see that ratings
are all largely positive.

Lenses / faceting

It’s often useful to ask how one question varies with another. As an
example, let’s look at how the overall score of the project varies
across

So we can see that while all the scores are pretty happy, there’s a
noticable trend towards higher values once users get to 3+ years.

This is actually a very simple model - we’re using “Experience” as a
predictor for Score (the outcome). This can work for most pairs of
variables in plot form (e.g perhaps Geography vs DevTools?) but we can
actually go further and make bigger models. In that case we’d say
something “we thing X, Y, and Z are relevant to the outcome, but lets
see which is most important…”. These models take more time, so I’ve not
prepared any to show you here, but if you want to suggest one, we can
discuss it!

Next steps

Again, this is mostly just raw data and some interpretation. In many
cases, we’ll have to wait for a second survey to see how they move from
one year to the next, but in the meantime we can try to look for
patterns in the relationships between the questions. I’ve said it
before, but do come talk to me if you have thoughts on what you’d like
to see for such facet / subsets / models - I’d love to do a follow up
blog in a month or two with a collection of these ideas.

In the meantime, another huge thank you to the people who filled out
the survey, and also to you for reading to the end! Until next time!

9 Likes