I have a strange, intermittent issue, that I can connect to one of
these hosts via ansible and run a playbook. Running it on more than
one host fails out, sometimes "unreachable", sometimes module errors.
Re-running the playbook on the failing host only works.
Any hints, how to solve this? Or how to look for the error? I thought
about checking ssh multiplexing, pipelining and similiar stuff, but
without an idea where to look I'm kind of in the dark here...
One thing which once hit me: on MacOSX the file ulimit was only 256. I have about 140 hosts and when our company decided to use a jump host I suddenly ran into problems because the pipelined connections now hit this limit.
As the hosts are lxc containers running on the jump host (and only
being available via the jump host), I guess it might be due to memory
usage when the commands are being run on all hosts simultaneously.
I'm in the middle of trying out some things, but no, a real solution
did not present itself.
Try (and maybe disable) ssh multiplexing, pipelining and starting the
playbooks with forks and/or serial...