hi Ansible Developers!
My team is getting its feet wet with ansible and we’re having contentious discussions about use of “hosts: all” in playbooks.
I’m opposed to using an “extra-variable” in the ansible-playbook command to choose the set of hosts to apply the playbook to, since I believe it could instill the wrong habits. I believe the idiomatic ansible way for handling this is to use the --limit option. So if people got accustomed to having some extra-variable for choosing hosts in some playbooks, then it would be dangerous if they encountered other playbooks that don’t have such a variable. Hence I believe we should expect users to just learn “the ansible way” of leveraging --limit. However, others on my team are uncomfortable and want some way to force people to either specify the hosts on the cmd line or to force use of the --limit option.
Can anyone tell me the best practices around these issues? I’ve read the Ansible Best Practices and haven’t found anything about this, other than the info about keeping separate inventory files for staging and production.
Thanks!
something like this?
- fail:
when: (groups['all']|difference(play_hosts))|len == 0
Thanks for the response Brian.
Note that with ansible-1.9.2 I had to change “len” to “length” in that conditional.
Interesting proposal, but there are a few issues with it as compared to what we want:
(1) We have “localhost” in our inventory files to perform some actions
on the orchestration/ansible-source host – but localhost often isn’t
SSH-able (e.g., on a laptop) so it doesn’t show up in the play_hosts
variable. We can workaround it by adjusting the check to allow for
localhost failing:
- fail: msg=“must choose a subset of hosts using the --limit option”
when: (groups[‘all’]|difference(play_hosts))|length == 0 or groups[‘all’]|difference(play_hosts) == [‘localhost’]
(2) The proposed “fail-when” check would need to be embedded into all of our roles and playbooks. It doesn’t guarantee future
playbooks/roles that are ported into our system will have this check in it. So people could come to rely on having the check, and then accidentally leave off the --limit option.
(3) Raw “ansible” invocations won’t be subject to the check, so there’s no way to force use of --limit.
This really would be great as a config option in a .dotfile. i.e.,
something like:
% cat .ansible
require_limit_option: true
But it doesn’t seem like there is a .ansible file yet.
Does that sound like something the Ansible community would be amenable to?
The scope you have was not that apparent from the first post, my
proposal works well for a single play, but not for everything possibly
run on your system.
adhoc ansible does not need limit as you need to specify the target
already, making limit really redundant.
even if you had that setting in ansible.cfg it would not stop a user
from creating his own ~/.ansible.cfg or ./ansible.cfg that would
override it.
If you are that concerned about what hosts get targeted, split the
inventory dev/qa/prod into diff files, none in the default inventory
paths, so now users need to specify the inventory with -i
/path/to/dev, already forcing a set of limtis. You can even use group
membership to limit access to the inventories (qa can read qa
inventory, only ops can read prod).
Thanks for the response Brian.
A few more points:
(1) We’re not concerned with protection against willful circumvention. We’re concerned with accidents. The proposed check allows for a localized (within select playbooks) enforcement of --limit, but it doesn’t allow us to just do the enforcement at the level of our “ansible-stuff” git repo. If we went with the scheme I proposed and someone intentionally modifies the .ansible.cfg then they are more likely to know what they’re doing. We’re interested in protecting ourselves from less knowledgeable people that are just following a “playbook” (no pun intended) and running commands manually. Where they might tweak a param without fully understanding the ramifications.
(2) We do already have per-environment inventory files. But within each environment we have multiple sub-services that form the aggregate service (e.g., DBs, ZooKeepers, etc.). I don’t wanna split everything up into separate inventory files since then I lose the ability to do some commands over everything in an environment.
(3) The “playbook-fail-when” check you proposed has another downside in its current form: it prevents use of “all” as the --limit option. i.e., intentional selection of all. Which is very different than accidental selection of all.
so i think the best way to force a selection is to make hosts a
variable, then they always need to specify the targets
- hosts: "{{target}}"
^ would require that you always add -e 'target=hostsiwantotarget' to
the command line
hey, yep, we already came up with that idea too, but I think it’s worse than just using --limit because of this risk: people will start to assume that all playbooks will have this variable, and then on some fine day there is a new playbook that is missing that option, and then their “-e ‘target=foo’” parameter does nothing and they accidentally do some maintenance to every host. i.e., To me --limit is the same as the -e target idea, but more “idiomatic ansible”.
Any solution that requires a change to individual playbooks is a no-go for me, I’ll just tell people they need to use --inventory and --limit and be careful.
In summary, you are concerned about playbook users accidentally running the plays on the wrong hosts. Forcing you users to always use ‘–limit’ will at least ensure that they “think” about the target hosts first.
Another approach is to actually include the potential accidents in the way you model your infrastructure.
In my infrastructure I have a set of “production” hosts and a set of “testing” hosts. I deploy code to those hosts in exactly the same way. However, I don’t want to accidentally deploy untested code to “production”. So I model safe deployments with one role and two small playbooks:
file: deploy_to_testing.yml
- hosts: testing
roles:
- deploy
file: deploy_to_production.yml
- hosts: production
roles:
- deploy
Plays are your primary mechanism for mapping hosts to a sequence of tasks, and for modelling that association. The --limit option is more about making adhoc refinements for situations that are unusual and not worth modelling explicitly.
Hope this helps,
Kal
Thanks Kal. More food for thought. We have a very rich set of hosts and environments and I think using this kind of technique might lead to a combinatorial explosion of playbooks. But it’s very helpful to have the alternative perspective and way of thinking about “the ansible way”.
Thanks!