Can only add one host to a dynamic group

Hello!

I’m deploying a cluster of app-servers behind a load balancer. When an appserver has been installed - and has passed some basic tests - I would like to add it to a dynamic group. Later, in the nginx (the load balancer) config file template, I’m iterating over this dynamic group in order to add only functioning app-servers to the nginx load balancing configuration.

So, in the playbook, which sets up the app-server, I have this here at the end:

  • name: add new instance to functioning applayer group
    local_action:
    add_host name={{ inventory_hostname }} groupname=functioning-applayer-hosts

However, only one of my app servers is added to this group. In fact, when I watch Ansible process my playbooks, it just looks like this:

TASK: [appserver | running unit tests] ****************************************
skipping: [54.206.225.114]
skipping: [54.206.165.147]

TASK: [appserver | add new instance to functioning applayer group] ************
ok: [54.206.165.147] <====== ONLY DONE FOR A SINGLE HOST, FOR SOME REASON…

TASK: [appserver | stopping django server] ************************************
changed: [54.206.165.147]
changed: [54.206.225.114]

TASK: [appserver | starting django server] ************************************
<job 82131229693> finished on 54.206.165.147
<job 82131229693> finished on 54.206.225.114

You can see that my unit tests are run on both app-servers, the Django process is restarted on both app-servers, but right in the middle only one of them is added to the “functioning-applayer-hosts” group.

Any idea why this is the case and how I could fix this?

Thank you very much…

Juergen

The “add host” module is currently coded as something called a “bypass host group” module, which means it only runs once.

This means it’s meant to be used in plays that look like this:

  • hosts: localhost

tasks:

  • ec2: …
  • add_host: …

It has to be outside the host group as each fork/host has a different copy of the shared memory.

While this may seem to be somewhat non-intuitive, it also means you don’t have to do “local_action:” all throughout the provisioning play.

However, I’ll agree that this is non-obvious, but somewhat of an artifact of Ansible not intending to be used as a programming language.

A better way to talk to all servers that function is thus:

  • hosts: group1
    tasks:

Because what happens here is as soon as a host fails, it will be pulled OUT of the group.

Thus “add_host” really only exists to enable the ec2 provisioning need where you must dynamically (temporarily) add a host to inventory because it won’t show up until the inventory script is run again, and it’s just been created.

Long story short – don’t worry about this too much – the key concept is that Ansible removes failed hosts from the rotation, so if you simply talk to the original group in a future play, or even further down the task list, Ansible won’t talk to failed hosts again.

Most signs of confusion in using Ansible stem from trying to use it too programmatically, where it’s intended to be a lot more direct.

So, basically take out the “add_host” magic (which is there to only support the provisioning cases) and it will be a lot more straightforward.

Hello!

The "add host" module is currently coded as something called a "bypass
host group" module, which means it only runs once.

Thank you for your reply. You're right: This is a little confusing when
you come across this for the first time. I would have 'debugged' this
endlessly without coming to a solution.

A better way to talk to all servers that function is thus:
- hosts: group1
  tasks:
      - ....
Because what happens here is as soon as a host fails, it will be
pulled *OUT* of the group.

Unfortunately, that doesn't always seem to apply. I just now observed a
playbook being processed in which one host failed due to some
intermittent SSH error, nothing to worry about. And indeed, I saw that
as the playbook processing continued, this host was skipped in whatever
tasks came up next. However, finally a template was being processed (to
create the load balancer config file) in which I iterated over the
members of that group. Suddenly, even the failed host was back in.

Not sure if this is specific to templates, or maybe it has to do with
the fact that the failure and the template processing took place in
different playbooks?

In site.yml:

    - include: appservers.yml
    - include: frontend.yml

In appservers.yml:

    - hosts: applayer-hosts
      tasks:
         ...
         ... <==== some failure here for one of the hosts
         ...

In frontend.yml:

    - hosts: frontend-hosts
      tasks:
          ...
          ... <==== template processing
          ...

In the template:

    {% for host in groups['applayer-hosts'] %}
        ....
    {% endfor %}

Does this explain why the failed host was included in the template
processing? How could I avoid that?

Thank you very much...

Juergen

The “add host” module is currently coded as something called a “bypass host group” module, which means it only runs once.

This means it’s meant to be used in plays that look like this:

  • hosts: localhost

tasks:

  • ec2: …
  • add_host: …

It has to be outside the host group as each fork/host has a different copy of the shared memory.

While this may seem to be somewhat non-intuitive, it also means you don’t have to do “local_action:” all throughout the provisioning play.

However, I’ll agree that this is non-obvious, but somewhat of an artifact of Ansible not intending to be used as a programming language.

Michael, I believe I understand what you’re saying. However, if I may, I should like to make an observation and then give you a scenario where I am finding the current add_hosts behaviour to be exceedingly unhelpful. Perhaps you’ll have a solution for the latter!

First, the observation. People keep being bitten by this add_hosts behaviour – both on this group (as in this thread), but also in the ansible issue tracker on github (e.g. issues #5145, #2963, #6912, etc.). There have been repeated requests to note this unintuitive behaviour in the documentation for the add_host command, and even issues raised specifically to address this (e.g. #104, #532 in ansible-modules-core). Nevertheless, the documentation still doesn’t mention this, and thus additional people are continuing to waste hours over this, as I did today.

As someone coming fresh to ansible in the last week or so, this sort of unexpected gotcha, especially combined with inadequate documentation, has been a source of considerable frustration. Another example of this class of problem would be the apparent inability to specify binary (non-text) parameter values – see my open question on that in relation to ec2 user_data, here: https://groups.google.com/d/msg/ansible-project/HYa3ipze_aY/ebkguL57hkAJ

Now, all that said, I’m very grateful that Ansible exists and is supplied under a generous licence – thank you! I really want to like it, and it seems (nearly) to do so many things right, but these unexpected and counter-intuitive stumbling blocks are obviously causing a lot of friction for a number of people.

Secondly, my scenario. I want to provision multiple ec2 regions simultaneously, because that will greatly speed up the provisioning process. e.g. if I have three regions, with a similar configuration, provisioning all three at the same will be approximately 3x faster than doing them serially. When we’re talking about several minutes of elapsed time to provision each region, the savings soon add up, especially during the development and testing phase. Now, to achieve simultaneous provisioning, and to do so in a clear, flexible, and self-documenting way, I defined a static inventory file along these lines:

[aws_webserver_regions]

eu-west-1 ansible_ssh_host=localhost ansible_connection=local freebsd_ami=ami-3013a747 # Ireland

us-west-2 ansible_ssh_host=localhost ansible_connection=local freebsd_ami=ami-53fcb763 # Oregon

In my playbooks, I can then do something like this:

  • hosts: aws_webserver_regions
    gather_facts: false
    vars_files:
  • vars/araxis_users.yml

roles:

  • role: ec2-webserver-provisioning
    aws_region: “{{ inventory_hostname }}”
    aws_ami: “{{ freebsd_ami }}”
    ec2_user: “{{ araxis_users[‘ec2-user’] }}”
  • hosts: ec2_provisioned_instances

connection: ssh

remote_user: ec2-user

su: yes

su_user: root

roles:

  • role: freebsd-common

And, you know what, this works great. All, that is, except for the add_host invocation in my ec2-webserver-provisioning role (which is intended to add the newly provisioned instances to the group ec2_provisioned_instances). That doesn’t work as I intended at all, because the add_host invocation only works for one of my regions.

Having wasted a good many hours on this today, I eventually discovered that add_hosts isn’t expected to work in this situation. Ok, that’s annoying, but there’s an easy work around, I think – I’ll simply make use of the ec2 dynamic host inventory (since my provisioning role waits for ssh to be up-and-running on all the instances) in my playbook and go from there. That way, I won’t need add_host at all. Except that I’ve just read about yet another gotcha, which is that dynamic inventory is evaluated only once, at the beginning of the playbook run. So now I’m stuck, and beginning to think that, for me at least, ansible is proving to be anything but the radically simple solution that it is pitched as being.

Now, I apologise for the moaning – though it has been a frustrating couple of days! I’m sure that simultaneous provisioning across multiple ec2 regions is not an atypical thing to want to do, so how ought I to go about that?

(I am aware that I can put serial: 1 in the top part of my playbook. And, if I do, everything works perfectly. Except, of course, that I am then unable to provision against multiple regions simultaneously.)

Many thanks in advance for your help! And again, I really, really am trying to like ansible :slight_smile: