I really like Ansible and have built a large infrastructure around it, but I'm finding it untrustworthy to the point of being unusable.
In the last 9 months, I've reported 4 variable precedence bugs:
https://github.com/ansible/ansible/issues?utf8=✓&q=is%3Aissue+author%3Adstillman+
The first two were marked as P1 and fixed, and the third was confirmed as P2 in December but remains open. The last one, which I reported today, occurs in 1.9.3 but is fixed on devel for 2.0 — and yet devel appears to break one of the P1 bugs (#9498) again, despite my including a test case with the original report (as I've done for all of them). The other P1 bug also disappeared and reappeared a couple times during 1.8 development as other variable bugs were fixed, which seems to be the general pattern for these bugs.
If it's not clear, these are incredibly dangerous bugs in production environments, because they can cause services to silently be rolled out in the wrong location or with the wrong configuration. (I noticed this because a service had been deployed to a directory with the name of another service, resulting in two copies of the service trying to run — though fortunately this was on a dev machine.) The safest solution I've found is to configure different roles on the systems separately using tags, but that somewhat defeats the purpose of a central configuration management tool (and actually doesn't even avoid the P1 bug that's broken again on devel, so I guess I should say the safest solution is not to use variables at all).
It's possible I'm using variables somewhat differently than most people using Ansible — the bugs I've reported all depend on include_vars within a role, which I use extensively — but there seem to be quite a few reports of variable bugs, and none of the issues I've reported have been marked as invalid.
I don't want to abandon Ansible, but I can't keep using it if I can't trust it to deploy services correctly. I also shouldn't have to keep my own set of tests that I run whenever I try a new version just to make sure dangerous bugs that I've reported previously — with those same tests — haven't regressed.
If the current variable precedence system is salvageable (and I'm not convinced it is or should be), it seems like many more integration test cases are needed, all run in separate processes and — needless to say — with new ones added whenever variable bugs are found.
(I think a contributing factor here may actually be the layout of the integration test suite. Most of the test cases I've submitted require multiple roles, but adding those to the current suite would get messy quickly, since there's just a single root directory and single roles directory for all integration tests. I think it'd be much cleaner to use a subdirectory for each integration test, with a top-level playbook in each, to keep all test files grouped together and avoid accidental interactions with other files. That would also make it much simpler to add people's test contributions.)
Anyway, I hope something can be done. As it stands now, I'm nervous every time Ansible runs.