Concurrent playbooks and synchronization

cevich · May 20, 2017, 1:35pm

Hi,

I have a use case where playbooks are sometimes (unpredictably) run concurrently by a simple job-scheduling system. Most of them touch a common set of cloud systems, on their way to affecting other cloud systems. This means I need a way to synchronize actions WRT the shared systems, such that they apply roles in an orderly and predictable fashion. Use of Ansible-tower would be out-of-scope in this case, and using system-level file locking against each ansible-playbook... command would be too course.

My thinking was to implement a strategy plugin (linear or free) that could accept a lock filepath parameter (and maybe a timeout). Then use the standard python fcntl.flock() calls to synchronize selected plays within the playbooks.

I’ve searched through open PRs and on this forum, but haven’t seen anything similar. Before I even prototype this, I thought I’d ask for feedback on any alternatives and/or my idea above.

Thanks in advance.

cevich · May 30, 2017, 7:23pm

In other words:

Job-scheduling system can use a playbook like this:

j.r.hawkesworth · May 31, 2017, 11:38am

Not seen anything like this, but wondering if you can make do by using wait_for on a file.

http://docs.ansible.com/ansible/wait_for_module.html

cevich · May 31, 2017, 1:30pm

Thanks for the suggestion, and taking the time to understand the problem.

File-existence/absence locks have a giant problem: The state is only very loosely tied to the creating or waiting process. i.e. the creator can die, but leave the file there. That blocks everyone else indefinitely because they can’t check the creator process state file existence atomically and with certainty.

With a flock, not only can you have read and write locks (a.k.a. shared and exclusive), but having the file open is a requirement for holding the lock. So if the lock-owner ever dies, or somehow never unlocks, the lock is always released (guaranteed) when that process exits.

That’s why I was thinking of doing this at the strategy-level. The play author need not ever worry about releasing the lock properly: When the specific play ends, the lock is released. When ansible exits (for any reason) the lock is released. All sections of the play are protected (fact-gathering, variable importing, pre-tasks, roles, tasks, post-tasks, and handlers.

The other choice would be to use the “block” construct, however this is much more complicated as they all get serialized together in the code, then handed off to (surprise) the strategy-handlers. At least that’s my reading of the code.

Anyway, I do appreciate your reply. At the same time, I think the (otherwise) lack of replies means:

Many don’t understand the problem
Many don’t have better solutions
Not many have encountered the problem
There’s another (existing) / better soltuon
I’m doing something terribly wrong (design wise)
I’m being way to perfectionistic about the solution (above)

Brian_Coca · May 31, 2017, 11:31pm

Ansible itself has no restriction on running concurrently, there are
several ways to solve this issue, most depend on your context:

- use ansible-pull with a cron/scheduler on each machine using
setlock/lockf/etc to guarantee a single execution on itself

- If using a single 'management machine'/jumphost have a wrapper call
locking over a file using the target hostname (or connection info
combo as needed) as unique identifier (possibly host/play)

- wrap your ssh sessions for 'ansible users' to create common lock on
target host that expires with session (several ways to do this ... i
would avoid pam unless you have no other option)

- use job scheduler app to ensure that each target only gets one job
applied at a time, 'the job' being an Ansible invocation

- cowboy hat (I've actually witnessed this) .. only person wearing hat
can be root!

- setup queuing system that handles the actual execution, people can
run jobs into the queue but queue will only execute one job at a time
(I've done this with incron and tcpserver with directory based
queues).

Not all locking methods apply to all workflows, some people will want
an absolute lock (only 1 Ansible running in the network), others only
care for specific tasks on the same file on the same host ( or
multiple hosts sharing NFS mount) ... so it is hard to come up with a
scheme that fits all needs.

One of Ansible's strength is that it is a command line tool, so it is
easy to mix/match with other tools to get the result you need, I would
not build this into Ansible iteslf as there are soo many tools out
there that already do this and it is extremely context dependent.

cevich · June 1, 2017, 4:19pm

I agree it depends on the context. My specific situation is one where I WANT playbooks/jobs to execute concurrently, except for one little critical-section that absolutely must not ever run concurrently.

It seems like the easiest option from your list, would be to split the playbook execution in three, to isolate the critical section. Then use something like /usr/bin/flock around the ansible-playbook command (for that section). This doesn’t add any dependencies on particular job-scheduling tooling, which is desirable in my case.

Thanks for the help.

Brian_Coca · June 1, 2017, 4:34pm

Good luck,

I'm partial to the cowboy hat .. or the 'sombrero' variant (also seen this).

Topic		Replies	Views
Is there any option that can synchronously execute multiple palybooks on the same host? Ansible Project ubuntu	0	6	November 11, 2014
Ansible parallell playbooks Ansible Project	4	10	November 1, 2017
Concurrent running of ansible Ansible Developer	1	5	July 14, 2015
Thoughts on lock files for Ansible runs Ansible Project	1	33	June 10, 2015
Threadsafety with Ansible Python API Ansible Project	2	19	April 21, 2014

Concurrent playbooks and synchronization

Related topics