Problems with Async commands

Hi,

I am trying to get my remote machine to reboot using the following code

`

  • name: restart machine
    shell: sleep 2 && shutdown -r now “Ansible updates triggered”
    async: 1
    poll: 0
    sudo: true
    ignore_errors: true

  • name: waiting for server to come back
    local_action: wait_for host={{ static_ip }} state=started delay=30 timeout=300
    sudo: false

`

No matter what I try I keep getting the following error:

`

fatal: [192.168.0.11]: FAILED! => {
“changed”: false,
“failed”: true,
“module_stderr”: “Shared connection to 192.168.0.11 closed.\r\n”,
“module_stdout”: “\r\n/bin/sh: 1: /home/pi/.ansible/tmp/ansible-tmp-1481969769.79-144484795431651/async_wrapper.py: not found\r\n”,
“msg”: “MODULE FAILURE”
}

`

I cannot find any information about this and I have no idea where to start.
I have tried:

  • changing the shell command
  • increasing the async value

If you set async to 0 then it doesn’t generate the error, but you get a different error because you can’t reboot on a synchronous command.

Ideas?

The async wrapper is notoriously slow about daemonizing the module process and capturing its output, so you’ll need to increase the sleep delay to probably at least 5s to reliably get this working. You’ll also definitely want async: 0 (which makes it “fire and forget” instead of polling for a result, which will never succeed). Then you’ll need to use wait_for on the next task and watch for the ssh port to come back up before continuing your playbook. Even this is not 100% reliable- there are several different shutdown/startup races involved that can make it flaky, depending on what your target OS is.

I wrote the windows reboot action (win_reboot), and I just finished (re)writing a *nix-friendly version that will likely ship in Ansible 2.3- I’ve tested against several popular distros with success… Unfortunately to work properly, it needs a change to the base connection layer, so you won’t be able to just drop the action plugin into Ansible 2.2.x and have it work.

-Matt

Sorry, that should’ve been: you’ll need to set async to a high value (at least 30s) to prevent the module process from being killed prematurely by the watchdog, then set poll to 0 for fire-and-forget.

Thanks for the suggestions, but try as I might I just cannot get it to work. I have tried values all the way up to 250 for async and sleep - it just keep giving me the same error all the time.

What is frustrating is that this used to work. Then my Raspberry PI’s upgraded to Jessie and ansible to 2.2 and none of my reboots will work any more.

You don’t want to match the sleep/async values- the ansible watchdog wrapper will always kill it during the sleep if you do that. The async value is the maximum allowed exec time in seconds for the task, and is enforced on both the control side and the managed side. The sleep beforehand can be very short (and oftentimes isn’t needed at all), but you want an async value that’s at least several seconds longer than the max time you think the command will take to return (doesn’t really matter how high you set it, as the watchdog will get nuked on the reboot anyway in the “happy path”).

I haven’t run into a distro where I couldn’t get this working fairly reliably, but the only guaranteed way is via a control-side action where you can handle/ignore the race where the shutdown occurs before the command output has returned to the controller (this is exactly how both win_reboot and the forthcoming reboot actions work, though the new one works at a little higher level).

-Matt