Hi,
I have a use case where playbooks are sometimes (unpredictably) run concurrently by a simple job-scheduling system. Most of them touch a common set of cloud systems, on their way to affecting other cloud systems. This means I need a way to synchronize actions WRT the shared systems, such that they apply roles in an orderly and predictable fashion. Use of Ansible-tower would be out-of-scope in this case, and using system-level file locking against each ansible-playbook...
command would be too course.
My thinking was to implement a strategy plugin (linear or free) that could accept a lock filepath parameter (and maybe a timeout). Then use the standard python fcntl.flock() calls to synchronize selected plays within the playbooks.
I’ve searched through open PRs and on this forum, but haven’t seen anything similar. Before I even prototype this, I thought I’d ask for feedback on any alternatives and/or my idea above.
Thanks in advance.
In other words:
Job-scheduling system can use a playbook like this:
Not seen anything like this, but wondering if you can make do by using wait_for on a file.
http://docs.ansible.com/ansible/wait_for_module.html
Thanks for the suggestion, and taking the time to understand the problem.
File-existence/absence locks have a giant problem: The state is only very loosely tied to the creating or waiting process. i.e. the creator can die, but leave the file there. That blocks everyone else indefinitely because they can’t check the creator process state file existence atomically and with certainty.
With a flock, not only can you have read and write locks (a.k.a. shared and exclusive), but having the file open is a requirement for holding the lock. So if the lock-owner ever dies, or somehow never unlocks, the lock is always released (guaranteed) when that process exits.
That’s why I was thinking of doing this at the strategy-level. The play author need not ever worry about releasing the lock properly: When the specific play ends, the lock is released. When ansible exits (for any reason) the lock is released. All sections of the play are protected (fact-gathering, variable importing, pre-tasks, roles, tasks, post-tasks, and handlers.
The other choice would be to use the “block” construct, however this is much more complicated as they all get serialized together in the code, then handed off to (surprise) the strategy-handlers. At least that’s my reading of the code.
Anyway, I do appreciate your reply. At the same time, I think the (otherwise) lack of replies means:
- Many don’t understand the problem
- Many don’t have better solutions
- Not many have encountered the problem
- There’s another (existing) / better soltuon
- I’m doing something terribly wrong (design wise)
- I’m being way to perfectionistic about the solution (above)
Ansible itself has no restriction on running concurrently, there are
several ways to solve this issue, most depend on your context:
- use ansible-pull with a cron/scheduler on each machine using
setlock/lockf/etc to guarantee a single execution on itself
- If using a single 'management machine'/jumphost have a wrapper call
locking over a file using the target hostname (or connection info
combo as needed) as unique identifier (possibly host/play)
- wrap your ssh sessions for 'ansible users' to create common lock on
target host that expires with session (several ways to do this ... i
would avoid pam unless you have no other option)
- use job scheduler app to ensure that each target only gets one job
applied at a time, 'the job' being an Ansible invocation
- cowboy hat (I've actually witnessed this) .. only person wearing hat
can be root!
- setup queuing system that handles the actual execution, people can
run jobs into the queue but queue will only execute one job at a time
(I've done this with incron and tcpserver with directory based
queues).
Not all locking methods apply to all workflows, some people will want
an absolute lock (only 1 Ansible running in the network), others only
care for specific tasks on the same file on the same host ( or
multiple hosts sharing NFS mount) ... so it is hard to come up with a
scheme that fits all needs.
One of Ansible's strength is that it is a command line tool, so it is
easy to mix/match with other tools to get the result you need, I would
not build this into Ansible iteslf as there are soo many tools out
there that already do this and it is extremely context dependent.
I agree it depends on the context. My specific situation is one where I WANT playbooks/jobs to execute concurrently, except for one little critical-section that absolutely must not ever run concurrently.
It seems like the easiest option from your list, would be to split the playbook execution in three, to isolate the critical section. Then use something like /usr/bin/flock around the ansible-playbook command (for that section). This doesn’t add any dependencies on particular job-scheduling tooling, which is desirable in my case.
Thanks for the help.
Good luck,
I'm partial to the cowboy hat .. or the 'sombrero' variant (also seen this).