Getting the following error when using community.general.shutdown in a playbook on a Ubuntu host:
fatal: [obico01.makerland.xyz]: FAILED! => {"changed": false, "msg": "Shutdown command failed. Error was Failed to set wall message, ignoring: Interactive authentication required.\r\nFailed to call ScheduleShutdown in logind, no action will be taken: Interactive authentication required., Shared connection to obico01.makerland.xyz closed.", "shutdown": false}
Obviously I’m going to have the same issue if I were to use the shell module and sudo shutdown. What should I do?
There’s your issue. Are you using become on your play / task by any chance ? If so (and you’d need to if login to your remote node with a user not authorized to reboot your node through logind), ensure you’re also asking to be prompted for become password with -K.
If not, please post your playbook, command you’re using and full output for us to have a look.
Here is the playbook.
I’m not using become right now. I want Ansible to do it’s own thing entirely automatically and I don’t want to put the password into the script file. I’d rather configure the server to allow this user to shutdown the computer without password authentication. (Seems safer)
- name: Shutdown Windows Servers
hosts: windows
tasks:
- name: Shutdown
ansible.windows.win_shell: shutdown -s -t 0
args:
executable: cmd
- name: Shutdown Linux Servers
hosts: linux
tasks:
- name: Shutdown
community.general.shutdown:
Here is the command output
charlespick@ansible:~$ ansible-playbook playbook.yml -i inventory
PLAY [Shutdown Windows Servers] ****************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [federation02.makerad.makerland.xyz]
TASK [Shutdown] ********************************************************************************************************
changed: [federation02.makerad.makerland.xyz]
PLAY [Shutdown Linux Servers] ******************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [obico01.makerland.xyz]
TASK [Shutdown] ********************************************************************************************************
fatal: [obico01.makerland.xyz]: FAILED! => {"changed": false, "msg": "Shutdown command failed. Error was Failed to set wall message, ignoring: Interactive authentication required.\r\nFailed to call ScheduleShutdown in logind, no action will be taken: Interactive authentication required., Shared connection to obico01.makerland.xyz closed.", "shutdown": false}
PLAY RECAP *************************************************************************************************************
federation02.makerad.makerland.xyz : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
obico01.makerland.xyz : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Yeah, that’s the key. It’s not Ansible-specific. The way you do that is to create a file in /etc/sudoers.d, say /etc/sudoers.d/15-thisuser that looks like this:
That will allow thisuser to run sudo /usr/bin/systemctl poweroff without a password.
More generally, a site designates an id (say, ansible-unchained) for running Ansible on the managed nodes, sets up public/private key pairs for that id, and enables that id with full password-less sudo capability via /etc/sudoers.d/<somenumber>-ansible-unchained that contains:
ansible-unchained ALL=(ALL) NOPASSWD:ALL
If you do that, understand and track that private key to the max: it is literally the key to your kingdom.
I don’t want to put the password into the script file
That’s fair, and you shouldn’t. There are ways to pass secrets encrypted to your playbook, but that’s not the point.
to shutdown the computer without password authentication. (Seems safer)
Moot point, though I agree not having to send a password (even encrypted) over your ssh connection would be somewhat less safe than letting your user run a single and well defined command passwordless. Though, there is now one more potential config (sudoers) you have to track. (To be clear, I am not debating )
I was just remembering commands ran by Ansible on managed nodes are locally executed modules, not the actual commands module is running under the hood. Which means allowing your user to run said commands from sudoers won’t work; you’ll have to find the actual command Ansible is running and add this one to sudoers, or the whole modules path.
I assume I can do the same with shutdown? I want to schedule the shutdown for 1 second to a few minutes into the future to avoid the situation@utoddl mentions and so that infrastructure dependencies are shutdown in order (VMs, then hosts, then storage, then network)
I have been invoked!
Two things leap to mind, the relevance of which may be slight, but here goes.
Be wary when you schedule, for example, at jobs for the near future with relative times like so:
echo "systemctl poweroff" | at now + 1min
That job may start running the very next second. For example, if the current time is 2:05:59, then “now +1min” means “any time at or after 2:06”, not “2:06:59 or later”. And it’s not just “at”; many timing/scheduling mechanisms have a similar granularity. Anyway, don’t assume you’ve got 60 seconds to do a vital thing. (I learned this the hard way so you don’t have to.)
The corollary [I’m still on the first thing, btw] is this: just because a job has become eligible to begin doesn’t mean it will begin. Any number of factors may delay a scheduled event in lots of systems: load, resource contention of various flavors, limited availability of “slots” — to keep this intentionally vague and ambiguous. It’ll vary between scheduling mechanisms, but unless you’re running on a RTOS (Real Time Operating System) such things are not guaranteed.
The second thing follows from the first: If you are trying to do a sequence of steps in a particular order – like you mentioned in the parent post, do not depend on pulled-from-the-air timing guesses that seem like they “ought to be long enough” for the various steps to complete. If you do, you’ll end up starting step C before step B (or even A) completes, or you’ll waste time between steps.
Alternatives to “probably long enough” scheduling:
Have a higher level process manage the steps. A well-designed systemd .target for example.
Have each step set some testable artifact that subsequent steps can check. This should work well for Ansible managed task sets.
Have subsequent steps be initiated by the prior step.
??
The caveat is that it all depends on the recovery process for when steps fail. If the “fail” branch step is to simply move on to the next step anyway, then your failure detection and recovery can be much simpler. In any case, consider what constitutes recovery from failure at each stage before investing too much effort. Too often such pipelines are built for success with the intent to add recovery later, and later never comes.
Where would this run exactly? I will have to shutdown everything from Debien Linux to linux appliances that barely allow you to ssh in, to synology nas, to esxi. And I want this to intentionally be lightweight on the side of the system being managed since I am often adding and removing services and hardware from my inventory.
I appreciate the note about timing. In practice how delayed can a scheduled event get? For now I plan to have 3 tiers as I loosely described earlier, VMs, then hosts with a force to shutdown any remaining VMs that I may not have added for some reason or other reason, then the storage array. I planned to schedule the VM shutdown 1 second into the future just so the command has time to return to Ansible. From there I have about 15 minutes of UPS runtime total so I planned to use 3 ish to cover quick power outages, then 10 for OS shutdown. I’ve tested most of my services and the longest to shutdown is Veeam on Windows Server which takes about 5 minutes sometimes. Which yes leaves 5 minutes or more “wasted” in a way but I think it’s worth it for the reliability? After the guests are gone esxi shuts down in less than 15 seconds and synology in about 1 minute.
What I’m trying to say is I’m less concerned with having everything chained perfectly back to back as I am about having the best possible likelihood of it working in the end. For example I plan to test the system once a month or two but if one OS gets hung and won’t shutdown for some reason, I don’t want that to hold up the hosts and more importantly the storage array and risk data loss / hardware damage for more than just that one vm. I would rather one misbehaving VM get abruptly shutdown when it’s time for the hosts to disconnect from the storage array and shutdown than the entire storage array lose power and possibly everything gets messed up (unlikely but always possible).
To sum up, my approach has been to get the array shutdown properly, and put in best effort to get the OSes shutdown before that happens. I’m curious what you think…
It wouldn’t apply in your case. I should have been more clear about that being only a typifying example. That whole post was more generally about approach and mindset than anything specifically applicable in your situation. I did say right at the top, “…the relevance of which may be slight”.
Overall it sounds like you’re doing it right. You know your power budget, and you’ve set reasonable time limits after which you absolutely must move on because UPS. I hope you’ll find a way to get that Veeam shutdown noticed so your process can move on rather than heating the data center with diminishing UPS power until an arbitrary timer expires. But you’ll be improving this process for years. If a solution to that problem shows up, you’ll probably implement it as an addition to rather than a replacement of your 5 minute watchdog. For a graceful shutdown-on-UPS-power that’s probably as it should be.