We have about 500 hosts and we were testing the ssh connections that run in parallel at a given point before running the actual playbooks as we were hitting bottlenecks.
Ansible version being used: 2.10.5
Its a simple playbook which runs “ls” command on all the hosts.
200 forks will likely not be achievable. Although we attempt to spawn that many forks based on your setting, you will create too much CPU contention on the single core that manages all of the forks, that you are likely to never see that many.
Somewhere between 50-75 is often the max that most people will ever see.
The problem isn’t really with forking, or with how many cores you have. The problem is more related to the fact that a single process that is bound to a single core is responsible for spawning forks, monitoring the processes, and handling responses back from all of the forks.
As a result, that single process can reach CPU contention easily with higher fork counts.
As for why the callback reports inconsistent results is likely due to the fact it was never tested with the free strategy, and not written in a way that would allow it to properly track the timing of the tasks.
Those numbers are completely unrealistic. Memory is hardly the limiting factor. A single fork can often consume 1G of memory itself. 100 forks per 4GB of memory seems to only be the allocation necessary to actually load Python (40MB) itself without doing anything.
The limit is generally the computing power of a single core of your CPU, and your 30 effective forks is what I would consider typical performance.
I could talk about this topic for days. But you are getting roughly what I’d expect.
Maybe watch this presentation from AnsibleFest 2019: