Can you check if you have TMPDIR, TEMP or TMP env variables set?
Code assumes it successfully creates file in tmp dir (but this can fail),
this uses python's tempfile.gettempdir.
Can you check if you have TMPDIR, TEMP or TMP env variables set?
Code assumes it successfully creates file in tmp dir (but this can fail),
this uses python's tempfile.gettempdir.
Hi Brian,
setting any of these variables did not help.
I’m a newbie to python so I’m not familiar digging into the python descriptions.
However, as far as docs.python.org says about tempfile.gettempdir those variables are 1.-3. in a search hierarchy for a temp space which on “other systems” (which Solaris is probably) is follwed by searching for /tmp, /var/tmp, and /usr/tmp.
I checked if a normal user can create a file in /tmp: YES. So I expect there is no problem with tempfile.gettempdir.
Having found docs.python.org I also looked into fcntl.flock which says: “See the Unix manual flock(2) for details. (On some systems, this function is emulated using fcntl()
.)”
Solaris man page for flock(3ucb) (There is no flock(2).) can be found at http://docs.oracle.com/cd/E19683-01/816-0213/6m6ne37v5/index.html
fcntl(2) is described here: http://docs.oracle.com/cd/E19683-01/816-0212/6m6nd4n9c/index.html
Both man pages describe EBADF as a possible error code.
Further I searched google with “solaris flock python” and found a couple of locking issue descriptions. But as far as I saw none describing the error “Bad file number”.
This all does not give me a clue whether ansible is using fcntl.flock in a Solaris incompatible way of if already the python implementation for fcntl.flock is incompatible with Solaris.
Any idea?
Regards,
Karl
So the initial error made me think the file wasn't accessible or being
created, but looking closer at the code, you would have had a traceback
earlier with 'IOError' or similar.
Just to make sure this isn't a python flock incompatibility error, can you
run (adjust shebang and file string as needed):
this should run w/o output on success.
--- locktest.py ---
#!/usr/bin/python
import fcntl
lfile = "/var/tmp/mylock"
lfile_c = open(lfile,"w")
lfile_c.close()
LOCKFILE = open(lfile,"r")
fcntl.flock(LOCKFILE, fcntl.LOCK_EX)
It sounds like it’s just unable to get a lock on your tempdir and we need to make it tolerate not being able to get that lock, which is simple enough.
Please make sure there isa github ticket filed and I’ll take care of it.
The locking here is only to make sure that output does not get interlaced, which can happen occassionally.
Hi Brian,
running locktest.py yields following:
~ 64) python locktest.py
Traceback (most recent call last):
File “locktest.py”, line 10, in
fcntl.flock(LOCKFILE, fcntl.LOCK_EX)
IOError: [Errno 9] Bad file number
~ 65) ls -al /var/tmp/mylock
-rw-r–r-- 1 kcb e2dv 0 15. Mai 21:20 /var/tmp/mylock
After reading again the error descriptions in the Solaris man pages I found it likely, that the descriptions of fcntl apply to our problem. (If the flock.fcntl function for Solaris actually is emulated using fcntl.)
For fcntl the EBADF error may be returned in a number of combinations of arguments and does not simply mean it’s not a valid file descriptor.
Especially the man file says: EADF is set when … “the type of lock, l_type
, is a shared lock (F_RDLCK
), and is not a valid file descriptor open for reading; or the type of lock l_type
is an exclusive lock (F_WRLCK
) and is not a valid file descriptor open for writing.”
In your example you are opening the file for write, closing the file, opening the file for read and requesting an exclusive lock on the file.
In a second test I changed the type of lock to shared (LOCK_SH). Now locktest.py runs through without any message.
~ 71) rm /var/tmp/mylock
~ 72) ll /var/tmp/mylock
/usr/gnu/bin/ls: cannot access /var/tmp/mylock: No such file or directory
~ 73) python locktest.py
~ 74) ll /var/tmp/mylock
-rw-r–r-- 1 kcb e2dv 0 15. Mai 21:51 /var/tmp/mylock
So changing the requested locktype to shared for reading a file could be a solution.
Double checking the other way round (requesting a shared lock for a file open for writing) yields the error again:
~ 76) python locktest.py
Traceback (most recent call last):
File “locktest.py”, line 10, in
fcntl.flock(LOCKFILE, fcntl.LOCK_SH)
IOError: [Errno 9] Bad file number
And checking the fourth combination (requesting an exclusive lock for a file open for writing) yields no error again.
Regards,
Karl
A shared lock seems to defeat the purpose in this case.
I think we just want exception handling so it doesn’t break on Solaris.
Hi Michael,
I just filed an issue in ansible github: https://github.com/ansible/ansible/issues/2925
It seems well possible to get a lock under Solaris, but the type of lock seems to have to conform to the r/w mode the file has been opened with.
It’s described above and in the issue.
Hope this solves the problem. I’ll standby to pull any new code for trial.
Regards,
Karl
Will take a look at this shortly.
FYI, this lock is only acquired on the management machine, so if you were managing from say, Scientific Linux or Fedora or Ubuntu, etc, you won’t run into this, even if managing Solaris guests.
Should be an easy fix.
At the moment we are a more or less pure Solaris shop. And this will not
change soon.
For this to work on all platforms, we could just open the lockfile in write
mode.
Yep, I believe that will do!
For this to work on all platforms, we could just open the lockfile in write mode.
Yes, opening the LOG_LOCK file in write mode makes it work on Solaris.
Thanks,
Karl
Yep, this change is now merged in to the 1.2 branch.
Should be good to go!