Ansible.builtin.service_facts incorrect and simplified state

Hello everyone,

Two days ago, I created a new issue (#84607) on the ansible/ansible repository regarding the ansible.builtin.service_facts module.

I also posted about it on r/ansible.

While working with hosts that use systemd , I noticed that the module can only return two possible state values:

  • stopped
  • running

In the module’s code, within the SystemctlScanService class, the state_val value is set to stopped by default and only changes to running if the SUB parameter from systemctl is explicitly running.

at 276 line :

state_val = "stopped"

at 292:293 lines:

if fields[3] == "running":
    state_val = "running"

fields[3] is the real state of SUB from this systemctl command:

systemctl list-units --no-pager --type service --all --plain

This approach means the module will never reflect the actual state of the service. Instead, any service that isn’t running is simply reported as stopped.

Currently, the SUB parameter from systemctl can represent various states, such as:

  • dead
  • exited
  • reloaded
  • failed

This makes it impossible to use the module for performing precise checks on the state of services managed by systemd and, consequently, to implement solutions based on their actual state.

Am I the only one who thinks this approach is overly simplistic?

1 Like

It really depends on what you want to do with the result. If you are just interested in whether the service is running or not, the current values are perfect. (Especially the value reloaded is often not helpful, you want running instead.) If you want more details on why a service is not running, they are not.

It’s probably best to add a new return value that gives the exact state. Changing the existing state potentially breaks a lot of playbooks/roles that assume there are only running or stopped.

(There are more states currently returned BTW: on my system I also see active, inactive, and unknown.)

2 Likes

What I expect from a module like this is that the state value reflects the exact state, not a β€œplaceholder” value.

To give an example, the main issue is that a service in a failed or exited state is not the same as being stopped.

A service is considered stopped only if the shutdown process completed without errors.
Failed or exited are not equivalent to stopped and indicate an issue with the service.

In the issue I opened, I shared my thoughts and proposed a change to replace the state_val value with the SUB state.

As you mentioned, changing the existing state field might cause problems for all currently implemented playbooks/roles.

Perhaps the best solution would be to add a new value, sub_state, that reflects the exact state from systemctl, but only when the source is systemd.

For example, like this:

class SystemctlScanService(BaseService):

    BAD_STATES = frozenset(['not-found', 'masked', 'failed'])

    def systemd_enabled(self):
        return is_systemd_managed(self.module)

    def _list_from_units(self, systemctl_path, services):

        # list units as systemd sees them
        rc, stdout, stderr = self.module.run_command("%s list-units --no-pager --type service --all --plain" % systemctl_path, use_unsafe_shell=True)
        if rc != 0:
            self.module.warn("Could not list units from systemd: %s" % stderr)
        else:
            for line in [svc_line for svc_line in stdout.split('\n') if '.service' in svc_line]:

                state_val = "stopped"
                status_val = "unknown"
                fields = line.split()

                # systemd sometimes gives misleading status
                # check all fields for bad states
                for bad in self.BAD_STATES:
                    # except description
                    if bad in fields[:-1]:
                        status_val = bad
                        break
                else:
                    # active/inactive
                    status_val = fields[2]

                service_name = fields[0]
                sub_state_val = fields[3]
                if sub_state_val == "running":
                    state_val = "running"

                services[service_name] = {"name": service_name, "state": state_val, "sub_state": sub_state_val, "status": status_val, "source": "systemd"}
2 Likes

I created #84618 PR , adding the return of the new sub_state value for hosts using systemd.

  • Added the sub_state value to the module.
  • Added documentation in module.
  • Added the changelogs/fragments file.
  • Included comments on the test playbook and result in ansible-core 2.18.2.

All checks have passed :white_check_mark:

2 Likes

I added a few review comments. You should also include some basic integration tests for this (since it’s a new feature), i.e. make sure that the field is present for systemd services and make sure that the values are as expected at least in a few cases where it can be reliably guessed.

Note that the core team might ignore the PR (or at least not merge it) until data tagging gets merged, since they want to avoid creating additional conflicts to the data tagging PR (which touches almost all files).

1 Like

@felixfontein thank you very much for the review comments (my mistake :sweat_smile: )
I’ll definitely commit all the suggested changes.

Regarding the integration test, should I modify the current test by adding a check for the sub_state value, or should I completely replace the existing test with one focused only on the sub_state value?

I have added changes to run integration tests for the sub_state value:

  • Modified the original file test/integration/targets/service_facts/tasks/main.yml
    • Added a new block to avoid interfering with the existing disabled test
  • Added a new file test/integration/targets/service_facts/tasks/subtests.yml

In this new test, the sub_state value will be extracted in several key scenarios to observe how the sub_state value reflects on the state or takes on different values.

The test uses the already present ansible_test service in integration.

Scenarios:

state enabled
started yes
stopped yes
stopped no
started no
  • In the penultimate test, the main process is intentionally killed to force the service into a failed state.

  • In the final test, the .service file is modified, then reload and restart to force the service into an exited state immediately after starting.

In this last test, you can see how the module reports the following values:

"ansible_test.service": {
    "name": "ansible_test.service", 
    "source": "systemd", 
    "state": "stopped", 
    "status": "enabled", 
    "sub_state": "exited"
}
1 Like

FYI

I have rewritten the integration test to make it more efficient and easier to maintain.

  • Separated the standard operation tests into standard_operation.yml using include_tasks in a loop.
  • Isolated the kill test into kill_operation.yml.
  • Moved the test for the exited service into exited_operation.yml.
    • This test replaces the ansible_test.systemd file with exited_ansible.systemd.

new tree:

test/integration/targets/service_facts/
β”œβ”€β”€ aliases
β”œβ”€β”€ files
β”‚   β”œβ”€β”€ ansible.systemd
β”‚   β”œβ”€β”€ ansible_test_service.py
β”‚   └── exited_ansible.systemd
└── tasks
    β”œβ”€β”€ exited_operation.yml
    β”œβ”€β”€ kill_operation.yml
    β”œβ”€β”€ main.yml
    β”œβ”€β”€ standard_operation.yml
    β”œβ”€β”€ subtests.yml
    β”œβ”€β”€ systemd_cleanup.yml
    β”œβ”€β”€ systemd_setup.yml
    └── tests.yml

All checks have passed :white_check_mark:

1 Like