Hello everyone,
Some friends have pointed out to me that using Ansible to manage systemd services
can sometimes lead to unexpected results.
More specifically, services with StartLimitBurst
may fail to start because the unit has exceeded the failed limit.
What does this mean?
The module does its job well, but unfortunately, systemctl
commands fail when a service exceeds the limits set by StartLimitBurst
and StartLimitIntervalSec
.
As a result, the failure of these commands is reflected in the module’s failure.
For testing, I used a modified version of the ansible_test
service.
[Unit]
Description=Ansible Test Service
StartLimitBurst=1
StartLimitIntervalSec=60
[Service]
ExecStart=/usr/sbin/ansible_test_service "Test\nthat newlines in scripts\nwork"
ExecReload=/bin/true
Restart=on-failure
Type=forking
PIDFile=/var/run/ansible_test_service.pid
[Install]
WantedBy=multi-user.target
# systemctl status ansible_test.service
× ansible_test.service - Ansible Test Service
Loaded: loaded (/etc/systemd/system/ansible_test.service; enabled; preset: disab
Active: failed (Result: signal) since Fri 2025-02-21 18:53:10 CET; 17min ago
Duration: 4.504s
Process: 1887 ExecStart=/usr/sbin/ansible_test_service Test
that newlines in scripts
work (code=exited, status=0/SUCCESS)
Main PID: 1889 (code=killed, signal=KILL)
CPU: 892ms
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Scheduled restart job, re
Feb 21 18:53:10 ansiblecn systemd[1]: Stopped Ansible Test Service.
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Start request repeated to
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Failed with result 'signa
Feb 21 18:53:10 ansiblecn systemd[1]: Failed to start Ansible Test Service.
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Start request repeated to
Feb 21 18:53:10 ansiblecn systemd[1]: ansible_test.service: Failed with result 'signa
Feb 21 18:53:10 ansiblecn systemd[1]: Failed to start Ansible Test Service.
TASK [try start after fail] *************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unable to start service ansible_test:
Job for ansible_test.satus ansible_test.service\" and \"journalctl -xeu ansible_test.service\"
for details.\n"}
Would it be useful to handle reset-failed
directly within the systemd_service
module?
For testing, I created a local copy of the module and introduced a new boolean
option, reset_failed
, along with a new code block that performs a reset-failed
on the service before executing state
operations.
TASK [try start after fail] **************************************************************
changed: [localhost] => {"changed": true, "name": "ansible_test", "state": "started",...}}
The rest of the module remains unchanged.