Intermittent _delete_remote_files failure

I'm running ansible-playbook across five hosts on EC2, and on just about every run, one or more host fails with this error:

fatal: [REDACTED.ec2.internal] => Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 427, in _executor
    exec_rc = self._executor_internal(host)
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 478, in _executor_internal
    return self._executor_internal_inner(host, inject, port)
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 595, in _executor_internal_inner
    self._delete_remote_files(conn, tmp)
  File "/usr/lib/pymodules/python2.7/ansible/runner/__init__.py", line 152, in _delete_remote_files
    raise Exception("not going to happen")
Exception: not going to happen

The play, action and host where the failure occurs on seem to be completely random.

$ lsb_release -d
Description: Ubuntu 12.04.1 LTS
$ uname -a
Linux 3.2.0-30-virtual #48-Ubuntu SMP Fri Aug 24 17:12:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ ansible --version
ansible 0.7

Any suggestions would be appreciated.

Thanks,

Jonathan

The 'not going to happen' safeguard is to keep 'delete remote files'
from deleting any path that does not have the substring "/tmp/" in it.
  It's a safeguard that should never have to be triggered.

Did you perhaps set ANSIBLE_REMOTE_TEMP (an internals way of
overriding ansible's choice of where to throw files, and not really
needed to be set in nearly any cases anymore) to some directory that
does not contain the path "/tmp/" in it? Or fiddle with the
associated setting in the config file? That's the only cause I can
immediately think of.

No, ANSIBLE_REMOTE_TEMP is not set, and I haven't configured anything related to file storage locations.

I've isolated the problem to the 'ssh' transport, which I was mistakenly using. It appears that the default 'paramiko' transport doesn't have this problem.

Jonathan

So you've isolated it... But why is it happening? Let's find out why
and fix it.

You are the first I've ever heard of this and am curious as to what's up.

Sure, I'm happy to help find the exact issue. I don't really know much about the codebase so I'm not sure what to try next, let me know if you have suggestions.

For starters, inside of the delete_remote_files function
(lib/ansible/runner/__init__.py) I would temporarily modify the code
to print out the filename it is trying to delete prior to deleting the
exception, or to include the path it is trying to delete inside the
exception message.

I can't see anything offhand that would make it connection type specific.

The file it's trying to delete is '/', while the rest look like '/home/ubuntu/.ansible/tmp/ansible-1347652098.09-222583577850753/'.

Ok, find out why it thinks it wants to delete / :slight_smile:

Thankfully it is smart enough to not do it :slight_smile:

Hi there,

sorry for not dropping in earlier.

I had the same / similar issue ages ago: https://github.com/ansible/ansible/issues/848

Although not sure if the current issue is related to this, but may it helps to track it down?

Dietmar.