Hello everyone,
Some friends have pointed out to me that using Ansible to manage systemd services can sometimes lead to unexpected results.
More specifically, services with StartLimitBurst may fail to start because the unit has exceeded the failed limit.
What does this mean?
The module does its job well, but unfortunately, systemctl commands fail when a service exceeds the limits set by StartLimitBurst and StartLimitIntervalSec.
As a result, the failure of these commands is reflected in the module’s failure.
For testing, I used a modified version of the ansible_test service.
[Unit]
Description=Ansible Test Service
StartLimitBurst=1
StartLimitIntervalSec=60
[Service]
ExecStart=/usr/sbin/ansible_test_service "Test\nthat newlines in scripts\nwork"
ExecReload=/bin/true
Restart=on-failure
Type=forking
PIDFile=/var/run/ansible_test_service.pid
[Install]
WantedBy=multi-user.target
# systemctl status ansible_test.service
Ă— ansible_test.service - Ansible Test Service
Loaded: loaded (/etc/systemd/system/ansible_test.service; enabled; preset: disab
Active: failed (Result: signal) since Fri 2025-02-21 18:53:10 CET; 17min ago
Duration: 4.504s
Process: 1887 ExecStart=/usr/sbin/ansible_test_service Test
that newlines in scripts
work (code=exited, status=0/SUCCESS)
Main PID: 1889 (code=killed, signal=KILL)
CPU: 892ms
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Scheduled restart job, re
Feb 21 18:53:10 ansiblecn systemd[1]: Stopped Ansible Test Service.
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Start request repeated to
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Failed with result 'signa
Feb 21 18:53:10 ansiblecn systemd[1]: Failed to start Ansible Test Service.
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Start request repeated to
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Failed with result 'signa
Feb 21 18:53:10 ansiblecn systemd[1]: Failed to start Ansible Test Service.
TASK [try start after fail] *************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unable to start service ansible_test:
Job for ansible_test.satus ansible_test.service\" and \"journalctl -xeu ansible_test.service\"
for details.\n"}
Would it be useful to handle reset-failed directly within the systemd_service module?
For testing, I created a local copy of the module and introduced a new boolean option, reset_failed, along with a new code block that performs a reset-failed on the service before executing state operations.
TASK [try start after fail] **************************************************************
changed: [localhost] => {"changed": true, "name": "ansible_test", "state": "started",...}}
The rest of the module remains unchanged.