Hello everyone,
Two days ago, I created a new issue (#84607) on the ansible/ansible repository regarding the ansible.builtin.service_facts
module.
I also posted about it on r/ansible.
While working with hosts that use systemd
, I noticed that the module can only return two possible state values:
In the moduleβs code, within the SystemctlScanService
class, the state_val
value is set to stopped
by default and only changes to running
if the SUB
parameter from systemctl
is explicitly running
.
at 276 line :
state_val = "stopped"
at 292:293 lines:
if fields[3] == "running":
state_val = "running"
fields[3]
is the real state of SUB
from this systemctl
command:
systemctl list-units --no-pager --type service --all --plain
This approach means the module will never reflect the actual state of the service. Instead, any service that isnβt running
is simply reported as stopped
.
Currently, the SUB
parameter from systemctl
can represent various states, such as:
dead
exited
reloaded
failed
This makes it impossible to use the module for performing precise checks on the state of services managed by systemd
and, consequently, to implement solutions based on their actual state.
Am I the only one who thinks this approach is overly simplistic?
1 Like
It really depends on what you want to do with the result. If you are just interested in whether the service is running or not, the current values are perfect. (Especially the value reloaded
is often not helpful, you want running
instead.) If you want more details on why a service is not running, they are not.
Itβs probably best to add a new return value that gives the exact state. Changing the existing state potentially breaks a lot of playbooks/roles that assume there are only running
or stopped
.
(There are more states currently returned BTW: on my system I also see active
, inactive
, and unknown
.)
2 Likes
What I expect from a module like this is that the state value reflects the exact state, not a βplaceholderβ value.
To give an example, the main issue is that a service in a failed
or exited
state is not the same as being stopped
.
A service is considered stopped
only if the shutdown process completed without errors.
Failed
or exited
are not equivalent to stopped
and indicate an issue with the service.
In the issue I opened, I shared my thoughts and proposed a change to replace the state_val
value with the SUB
state.
As you mentioned, changing the existing state
field might cause problems for all currently implemented playbooks/roles.
Perhaps the best solution would be to add a new value, sub_state
, that reflects the exact state from systemctl
, but only when the source is systemd
.
For example, like this:
class SystemctlScanService(BaseService):
BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
def systemd_enabled(self):
return is_systemd_managed(self.module)
def _list_from_units(self, systemctl_path, services):
# list units as systemd sees them
rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not list units from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
state_val = "stopped"
status_val = "unknown"
fields = line.split()
# systemd sometimes gives misleading status
# check all fields for bad states
for bad in self.BAD_STATES:
# except description
if bad in fields[:-1]:
status_val = bad
break
else:
# active/inactive
status_val = fields[2]
service_name = fields[0]
sub_state_val = fields[3]
if sub_state_val == "running":
state_val = "running"
services[service_name] = {"name": service_name, "state": state_val, "sub_state": sub_state_val, "status": status_val, "source": "systemd"}
2 Likes
I created #84618 PR , adding the return of the new sub_state
value for hosts using systemd
.
- Added the
sub_state
value to the module.
- Added documentation in module.
- Added the
changelogs/fragments
file.
- Included comments on the test playbook and result in
ansible-core 2.18.2
.
All checks have passed
2 Likes
I added a few review comments. You should also include some basic integration tests for this (since itβs a new feature), i.e. make sure that the field is present for systemd services and make sure that the values are as expected at least in a few cases where it can be reliably guessed.
Note that the core team might ignore the PR (or at least not merge it) until data tagging gets merged, since they want to avoid creating additional conflicts to the data tagging PR (which touches almost all files).
1 Like
@felixfontein thank you very much for the review comments (my mistake )
Iβll definitely commit all the suggested changes.
Regarding the integration test
, should I modify the current test by adding a check for the sub_state
value, or should I completely replace the existing test with one focused only on the sub_state
value?
I have added changes to run integration tests for the sub_state
value:
- Modified the original file
test/integration/targets/service_facts/tasks/main.yml
- Added a new block to avoid interfering with the existing
disabled test
- Added a new file
test/integration/targets/service_facts/tasks/subtests.yml
In this new test, the sub_state
value will be extracted in several key scenarios to observe how the sub_state
value reflects on the state or takes on different values.
The test uses the already present ansible_test
service in integration.
Scenarios:
state |
enabled |
started |
yes |
stopped |
yes |
stopped |
no |
started |
no |
-
In the penultimate test, the main process is intentionally killed
to force the service into a failed
state.
-
In the final test, the .service
file is modified, then reload and restart to force the service into an exited
state immediately after starting.
In this last test, you can see how the module reports the following values:
"ansible_test.service": {
"name": "ansible_test.service",
"source": "systemd",
"state": "stopped",
"status": "enabled",
"sub_state": "exited"
}
1 Like
FYI
I have rewritten the integration test
to make it more efficient
and easier to maintain
.
- Separated the standard operation tests into
standard_operation.yml
using include_tasks
in a loop
.
- Isolated the kill test into
kill_operation.yml
.
- Moved the test for the
exited
service into exited_operation.yml
.
- This test replaces the
ansible_test.systemd
file with exited_ansible.systemd
.
new tree:
test/integration/targets/service_facts/
βββ aliases
βββ files
β βββ ansible.systemd
β βββ ansible_test_service.py
β βββ exited_ansible.systemd
βββ tasks
βββ exited_operation.yml
βββ kill_operation.yml
βββ main.yml
βββ standard_operation.yml
βββ subtests.yml
βββ systemd_cleanup.yml
βββ systemd_setup.yml
βββ tests.yml
All checks have passed
1 Like