I ran in to an interesting issue today with the service module, which led me to a chunk of code I don’t quite understand. I am posting here before filling a github issue, simply to see if this is actually behavior someone desires.
You can see the relevant code here:
https://github.com/ansible/ansible/blob/devel/library/system/service#L724
For one, if the ‘stop’ operation fails, the module continues with the ‘start’ routine. This strikes me a problematic, as if a service failed to stop, typically running ‘start’ again will end with a successful return code–after all, the service is already running. However, you will have failed to actually restart the service. Worse, the failure is hidden by the following code:
# merge return information
if rc1 != 0 and rc2 == 0:
rc_state = rc2
stdout = stdout2
stderr = stderr2
else:
rc_state = rc1 + rc2
stdout = stdout1 + stdout2
stderr = stderr1 + stderr2
Here rc1 is the return code for the ‘stop’ command, and rc2 is the return code for the ‘start’ command. If stop fails, it is simply ignored, and the error is never surfaced.
As an example of why this is potentially dangerous, in my case our service failed to stop due to a bug. The subsequent ‘start’ command returned 0, because the service was still running. Despite the fact that no restart had happened, the Ansible run reported that the restart was a success. Had I run this against a production cluster, I would have left the machines in a bad state and never known better.
Am I missing a reason this is the way you would want this module to behave? Thanks for the help!