File exists: '/home/travis/.fireball.keys'

Hi,

I’m trying to get Ansible’s accelerated mode to work on Travis CI, but I’m getting

OSError: [Errno 17] File exists: ‘/home/travis/.fireball.keys’

See here: https://travis-ci.org/analytically/hadoop-ansible/builds/15598716

From the log we can see that the bootstrap of the servers on DO using Ansible works fine, bootstrapping using Ansible works fine, both source and target servers have python-keyczar installed. Any idea?

Thanks,

Mathias

Looking at the last build, it seems like this issue went away: https://travis-ci.org/analytically/hadoop-ansible/builds/15680563

Did you find a workaround?

I’ve just disabled accelerated mode, the issue is still there.

This is the code where the exception occurs … key_path = os.path.expanduser(“~/.fireball.keys”) if not os.path.exists(key_path): os.makedirs(key_path) key_path = os.path.expanduser(“~/.fireball.keys/%s” % hostname) It does make a check before creating the directory, but perhaps you are running into some weird race condition? Do any of your inventory hosts share the same IP or ansible_ssh_host ?

Strange. I’m creating new droplets on DigitalOcean, so they should have seperate IP’s.

We’ve recently run into this at work, and thought I’d point out the problem in case someone else stumbles across the issue.

There is a race, as the initial task (setup to gather facts) goes to be executed on multiple hosts, if the directory ~/.fireball.keys does not exist, there will be multiple attempts to create it, depending on where the GIL is interrupted to hand over control around https://github.com/ansible/ansible/blob/62979efa140ce9659beac6442b51bd8efe35d4ba/lib/ansible/utils/encrypt.py#L98

When running multiple forks, each one can test that the directory does not exist, and attempt to create it. The first one that creates it will succeed without an issue, and the remainder will get an OSError that the file already exists.

Since for the Travis CI runs you get a pristine environment each time, you’ll see this occurring on a frequent basis. A simple fix is to ensure a single gather facts run is executed on only one of the target hosts using accelerate first, then follow this with the normal plays.

A fix in the ansible code base would be to wrap the mkdir either with a threading lock or with something like the following:

@contextlib.contextmanager
def ignore_oserror_exists():
try:
yield
except OSError as ose:
if ose.errno != errno.EEXIST:
raise ose

with ignore_oserror_exists():
os.makedirs(key_path, mode=0o700)

Will file an issue with ansible shortly.