Hi,
We are using Ansible to deploy a lot of differents services on a lot of servers.
We developped a backend which is starting ansible playbook when the user send a rest request.
In our project, we deploy entire platforms and we have a web GUI to monitor the deployment.
As we need a “per service” or a “per host” granularity to get some informations during the deployment of each service ( success, failure, etc ), we decided to run one ansible-playbook process per inventory host, to be able to get the return code from each process in our manager.
The problem is that when we deploy more than 20 servers, there is 20 ansible-playbook parent processes and they are VERY resources consuming (load = 50) and then some processes are killed because of oom issues.
So we decided to use the “strategy free” deployment to run only one playbook for all hosts, but then we lost the “per host” return code granularity and we really need this.
We could add more CPU/RAM, but it doesn’t seems to be a scalable solution.
Our goal is to deploy 100+ hosts simultaneously in the fastest way.
We don’t want to wait the end of the playbook to detect errors on some hosts, we prefer to be able to detect errors as soon as possible to re-run only the failed hosts
Is ansible tower solving this issue ? Else how could we solve this please ?
Thanks.