Hello everyone,
Two days ago, I created a new issue (#84607) on the ansible/ansible repository regarding the ansible.builtin.service_facts module.
I also posted about it on r/ansible.
While working with hosts that use systemd , I noticed that the module can only return two possible state values:
In the moduleβs code, within the SystemctlScanService class, the state_val value is set to stopped by default and only changes to running if the SUB parameter from systemctl is explicitly running.
at 276 line :
state_val = "stopped"
at 292:293 lines:
if fields[3] == "running":
state_val = "running"
fields[3] is the real state of SUB from this systemctl command:
systemctl list-units --no-pager --type service --all --plain
This approach means the module will never reflect the actual state of the service. Instead, any service that isnβt running is simply reported as stopped.
Currently, the SUB parameter from systemctl can represent various states, such as:
dead
exited
reloaded
failed
This makes it impossible to use the module for performing precise checks on the state of services managed by systemd and, consequently, to implement solutions based on their actual state.
Am I the only one who thinks this approach is overly simplistic?
1 Like
It really depends on what you want to do with the result. If you are just interested in whether the service is running or not, the current values are perfect. (Especially the value reloaded is often not helpful, you want running instead.) If you want more details on why a service is not running, they are not.
Itβs probably best to add a new return value that gives the exact state. Changing the existing state potentially breaks a lot of playbooks/roles that assume there are only running or stopped.
(There are more states currently returned BTW: on my system I also see active, inactive, and unknown.)
2 Likes
What I expect from a module like this is that the state value reflects the exact state, not a βplaceholderβ value.
To give an example, the main issue is that a service in a failed or exited state is not the same as being stopped.
A service is considered stopped only if the shutdown process completed without errors.
Failed or exited are not equivalent to stopped and indicate an issue with the service.
In the issue I opened, I shared my thoughts and proposed a change to replace the state_val value with the SUB state.
As you mentioned, changing the existing state field might cause problems for all currently implemented playbooks/roles.
Perhaps the best solution would be to add a new value, sub_state, that reflects the exact state from systemctl, but only when the source is systemd.
For example, like this:
class SystemctlScanService(BaseService):
BAD_STATES = frozenset(['not-found', 'masked', 'failed'])
def systemd_enabled(self):
return is_systemd_managed(self.module)
def _list_from_units(self, systemctl_path, services):
# list units as systemd sees them
rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
if rc != 0:
self.module.warn("Could not list units from systemd: %s" % stderr)
else:
for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:
state_val = "stopped"
status_val = "unknown"
fields = line.split()
# systemd sometimes gives misleading status
# check all fields for bad states
for bad in self.BAD_STATES:
# except description
if bad in fields[:-1]:
status_val = bad
break
else:
# active/inactive
status_val = fields[2]
service_name = fields[0]
sub_state_val = fields[3]
if sub_state_val == "running":
state_val = "running"
services[service_name] = {"name": service_name, "state": state_val, "sub_state": sub_state_val, "status": status_val, "source": "systemd"}
2 Likes
I created #84618 PR , adding the return of the new sub_state value for hosts using systemd.
- Added the
sub_state value to the module.
- Added documentation in module.
- Added the
changelogs/fragments file.
- Included comments on the test playbook and result in
ansible-core 2.18.2.
All checks have passed 
2 Likes
I added a few review comments. You should also include some basic integration tests for this (since itβs a new feature), i.e. make sure that the field is present for systemd services and make sure that the values are as expected at least in a few cases where it can be reliably guessed.
Note that the core team might ignore the PR (or at least not merge it) until data tagging gets merged, since they want to avoid creating additional conflicts to the data tagging PR (which touches almost all files).
1 Like
@felixfontein thank you very much for the review comments (my mistake
)
Iβll definitely commit all the suggested changes.
Regarding the integration test, should I modify the current test by adding a check for the sub_state value, or should I completely replace the existing test with one focused only on the sub_state value?
I have added changes to run integration tests for the sub_state value:
- Modified the original file
test/integration/targets/service_facts/tasks/main.yml
- Added a new block to avoid interfering with the existing
disabled test
- Added a new file
test/integration/targets/service_facts/tasks/subtests.yml
In this new test, the sub_state value will be extracted in several key scenarios to observe how the sub_state value reflects on the state or takes on different values.
The test uses the already present ansible_test service in integration.
Scenarios:
| state |
enabled |
| started |
yes |
| stopped |
yes |
| stopped |
no |
| started |
no |
-
In the penultimate test, the main process is intentionally killed to force the service into a failed state.
-
In the final test, the .service file is modified, then reload and restart to force the service into an exited state immediately after starting.
In this last test, you can see how the module reports the following values:
"ansible_test.service": {
"name": "ansible_test.service",
"source": "systemd",
"state": "stopped",
"status": "enabled",
"sub_state": "exited"
}
1 Like
FYI
I have rewritten the integration test to make it more efficient and easier to maintain.
- Separated the standard operation tests into
standard_operation.yml using include_tasks in a loop.
- Isolated the kill test into
kill_operation.yml.
- Moved the test for the
exited service into exited_operation.yml.
- This test replaces the
ansible_test.systemd file with exited_ansible.systemd.
new tree:
test/integration/targets/service_facts/
βββ aliases
βββ files
β βββ ansible.systemd
β βββ ansible_test_service.py
β βββ exited_ansible.systemd
βββ tasks
βββ exited_operation.yml
βββ kill_operation.yml
βββ main.yml
βββ standard_operation.yml
βββ subtests.yml
βββ systemd_cleanup.yml
βββ systemd_setup.yml
βββ tests.yml
All checks have passed 
1 Like