Service Module: Service state recognition

running = False
if status_stdout.find(“stopped”) != -1 or rc == 3:
running = False
elif status_stdout.find(“running”) != -1 or rc == 0:
running = True
elif name == ‘iptables’ and status_stdout.find(“ACCEPT”) != -1:

iptables status command output is lame

TODO: lookup if we can use a return code for this instead?

running = True

I’ve tested on Ubuntu 10.04 because we mainly use this release by now.

Let me give you a few examples of the output of the service status command.

MySQL:

  • Output in stopped state: mysql stop/waiting

Ok so it would return “running = False” in this case, which is correct.

  • Return Code in stopped state: 0

  • Output in started state: mysql start/running, process 20846

  • Return Code in started state: 0

So the math above would find running first in the above logic and return running = True, which is ALSO correct.

Apache2:

  • Output in stopped state: Apache is NOT running.

  • Return Code in stopped state: 1

  • Output in started state: Apache is running (pid 21454).

  • Return Code in started state: 0

The Problem here is, that the current service module always sees my Apache in the state running, even if it’s not.
Since Ubuntu is using two ways for their init scripts. The old init way and the new upstart.

This means it the code could also make a check to make sure the string does not contain “not”. Patches accepted, sounds like a trivial fix.

Maybe it would be a better solution to check first if the service has an upstart script. (e.g. with initctl list)
With upstart the service status output is standardized and you can check for the keywords.

Not sure this is necessary per the above.

The output of the old init scripts is just “free human text” and therefor not reliable for a keyword based check.

Yeah though I think the “not” is the only case we have had of this so far.

P.S.: sorry for the long post, but this topic is really important to me and i think it’s a core function of ansible that’s broken

FWIW, you’re the first one out of hundreds to mention it.

Patch is pretty simple though. Send me a pull request.

running = False
if status_stdout.find(“stopped”) != -1 or rc == 3:
running = False
elif status_stdout.find(“running”) != -1 or rc == 0:
running = True
elif name == ‘iptables’ and status_stdout.find(“ACCEPT”) != -1:

iptables status command output is lame

TODO: lookup if we can use a return code for this instead?

running = True

I’ve tested on Ubuntu 10.04 because we mainly use this release by now.

Let me give you a few examples of the output of the service status command.

MySQL:

  • Output in stopped state: mysql stop/waiting

Ok so it would return “running = False” in this case, which is correct.

No it would not return “running = False”. It would return “running = True” because of the output “stop” (not stopped, like the find is looking for) and because of the return code of the service binary. It could tell us correctly that the service is not running so it’s return code is 0 which leads to a “running = True” in this if clause.

  • Return Code in stopped state: 0

  • Output in started state: mysql start/running, process 20846

  • Return Code in started state: 0

So the math above would find running first in the above logic and return running = True, which is ALSO correct.

Apache2:

  • Output in stopped state: Apache is NOT running.

  • Return Code in stopped state: 1

  • Output in started state: Apache is running (pid 21454).

  • Return Code in started state: 0

The Problem here is, that the current service module always sees my Apache in the state running, even if it’s not.
Since Ubuntu is using two ways for their init scripts. The old init way and the new upstart.

This means it the code could also make a check to make sure the string does not contain “not”. Patches accepted, sounds like a trivial fix.

This would fix the problem for apache.
But this would be only for this one case. (another example at the bottom of this text)

Maybe it would be a better solution to check first if the service has an upstart script. (e.g. with initctl list)
With upstart the service status output is standardized and you can check for the keywords.

Not sure this is necessary per the above.

The output of the old init scripts is just “free human text” and therefor not reliable for a keyword based check.

Yeah though I think the “not” is the only case we have had of this so far.

P.S.: sorry for the long post, but this topic is really important to me and i think it’s a core function of ansible that’s broken

FWIW, you’re the first one out of hundreds to mention it.

Patch is pretty simple though. Send me a pull request.

Try atop on ubuntu 10.04 for example.
The atop init script is not yet an upstart job (on 10.04) and has no status keyword.
So there are no words like “running” or “stopped” and the return code is 1.
We could not determine the running state of this process, but we still state that it is not running and we would run the start script, which will fail when it’s already running.

I know, it’s not the fault of ansible that there are crappy init scripts around, but they’re out there and I’m trying to figure out a better way of handling them.

Maybe I should just try it in a branch of my fork on github and you could review it when I’m done.
Would that be ok for you?

Maybe I should just try it in a branch of my fork on github and you could review it when I’m done.
Would that be ok for you?

That sounds good

I am ok with some upstart specific code, though we should also add the “not” code above and also change the search to look for “stop” and not stopped – since not everyone is using upstart.

–Michael

Hi,

I’ve created a patch now.

Basicly what I’ve done is:

I created a method for the state recognition because it got bigger.
There’s now an initial state “None”, which prevents from falsely running scripts.

Then I’ve ordered the state recognition methods by their safety of the outputs.
Upstart is fist (when it’s there), because the output is always consistent.
If not found by upstart, then it’ll be checked by init script response code.

If not found by that, then it’ll be checked by the output of the init script.
Additionally the init script output gets cleaned from the service name und transformed to lower case.
This should prevent false positives in case a daemon is called “notify-daemon” or something like that.
Otherwise the search for the word “not” would lead to a false positive…

I did leave the special section for iptables in there, but I think these special cases might be better covered by writing init scripts that do have a status method.
Atop on Ubuntu 10.04 is just the same. It has no status method and therefore always sends the response code 1.
Since atop creates a pid file when running it should be quite simple to fix that init script instead of tweaking this in ansible.

So here’s the commit. Work’s very good for me.
https://github.com/gottwald/ansible/commit/d17dbc801b77fc96a5df3edcf9285bb69c32b366

Could you please test it on your systems?

Tell me if you want to change something or if I should send you the pull request.

Best Regards

Ingo

Could you please test it on your systems?

Tell me if you want to change something or if I should send you the pull request.

Yeah, send the pull request. It’s easier for folks to test when it’s already in the main tree.

Thanks!

done.

Hi,

Hmm. I have also nginx on Ubuntu 12.04 which results to the following
on it's status if it's not running:

(ansible)ubuntu@ubuntu:~/ansible$ sudo /etc/init.d/nginx status
* could not access PID file for nginx
(ansible)ubuntu@ubuntu:~/ansible$ echo $?
4

there are no words like "running" or "stopped" and the return code is 1

nginx here returns 4 instead of 1.

My simple solution is to fix the init script (hmm. but there might be
more init scripts out there that might be doing the same?)

Hi,

you did the right thing.
This is a bug in the init file and it should get fixed in the init file.

According to the Linux Standard Base Core Specification, the exit code “4” stands for “program or service status is unknown”
So when the init file itself says that the status is unknown, we just have no other choice.

Seems like ubuntu has a few crappy init files.
Especially the ones not ported to upstart.

Regards,

Ingo

Hi,

I just ran into this with nginx. Has anyone added bug reports to Ubuntu/Debian, perhaps with their fixed init scripts?

Hmm.. I'm not sure if there is a bug report to this.
It's my idea is create a init script of nginx and include that in
playbook to setup for now.

Got another problem with service status for uwsgi
As Ingo pointed out, there are more crappy init scripts out there. :slight_smile:

I’m not sure what the issue is with the nginx startup script. I reported an issue to Ubuntu (since they package it), was marked invalid: https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1023389

(They mentioned checking the return value, not sure what Ansible does with this right now).

Today I ran into an issue where I tried to restart the (stopped) nginx service from ansible, and it didn’t start. Haven’t tried to reproduce it, though.

Take care,

Lorin

(They mentioned checking the return value, not sure what Ansible does with this right now).

On ansible, when /etc/init.d/nginx status is run, it returns 4 (which
means unknows), when nginx is not running.
I'm not sure if service can be coded as not running for non-zero exit
status?

Nope, that would be a very bad idea!
An unknown state is an unknown state, not a stopped state:

Imagine you have a running process and some guy accidently removes your pid file.
If we treat this as a stopped state you would soon be having 2 running processes which in some case can cause big trouble…

I registered at launchpad and left a comment on this bug.
Please read it, there’s the information included how you can fix this properly.
It’s just 1 line you have to change, since one LSB function is used a little bit incorrectly.

Regards

Ingo

Ingo, nice comments. I agree with it.:slight_smile: