distributing files via several repos to balance load

Dear All,

I was going through Will Thames Ansible examples to Brisbane devops day and I notice the configuration file copy, or file transfer pattern below:


- name: download java
  action: get_url url={{repo_url}}/{{java_archive}} dest={{tmpdir}}/{{java_archive}}

where repo_url is defined as a group variable

*repo_url*: '[http://repo.dev.example:8000](http://repo.dev.example:8000)'

My Question : how can the repo_url point to several URLS instead of one so that if there are several nodes executing it, and Ansible load balance the requests and know which URL to send back to which client ?

kind regards

Walid

You could set the URL in a host or group variable, so that it would be more evenly distributed. Beyond that, you could setup a reverse proxy like squid for your systems to fetch through so as soon as one system fetched the file you’d have a locally cached copy.

Hi James,

the URL just as one example what if it was through a copy/fetch module not an http server but a file server, also could you elaborate on how one can extend the use of group variables to load balance source in automated way.

TIA

Walid

Yes, it would require splitting up your tasks and having your hosts in a second child group. For example, given an inventory file like this:

host1
host2
host3

[target1]
host1

[target2]
host2

[target3]
host3

And group_vars/target[1…3]:

target_url: http://… # different url for each group var

You could have a playbook that does the download like this:

  • get_url: url={{target_url}}/{{java_archive}} dest={{tmpdir}}/{{java_archive}}

Thanks James, Will take this into consideration when scaling Ansible.

or you can just point to a load balancer and have it take care of it based on load

Dear Brian,

I was looking more of a less intrusive internal Ansible solution without any considerations to the underlying physical infrastructure, however i do hear you and James, load balancer , caching, and probably DNS round robin are possible solutions. if it is not available internally by Ansible in a more automated way or data directive may be it will make it as a feature request later. Another configuration management solution does have it to answer problems of scale. the way it is done via 2-4 directives, something like select_attribute directive that selects from a list and can apply automatically the division and allocation of source to destination uses an automatically created random weights based on a probability distribution if that makes any sense.

kind regards

Walid

One of the things I like about ansible is that it doesn't try to do
everything itself, relies on existing well known and used solutions (ssh,
sudo, cron, etc).

That is why some of us will push outside solutions (tcp load blancer, dns,
etc) vs seeing it built into ansible. But it is perfectly understandable
that other people have different preferences and would like to see more
stuff 'built in'. It is very hard, if not impossible, to please everyone.

Very much agree with Brian here, this is a great case for using a load balancer.

BTW, if not shared already, “with_random_choice” can be useful if you want something basic. I temporarily forgot about this :slight_smile:

  • debug: msg={{ item }}
    with_random_choice:
  • boston
  • paris
  • tokyo

Thanks Michael, and Brian, eventually we need to keep an open mind and adapt to tools and devop processes accordingly