This evening, I re-used a very simple playbook that was working a while back (0.8-devel). it didn’t work this time (errors shown below) unless I comment out the sudo keyword.
First, the playbook (simplified somewhat, but I ran it as shown here during my test):
TASK: [Install client ntp on a host running Red Hat alike Linux]
*********************
<fedora17-ci> REMOTE_MODULE yum pkg=ntp state=installed
failed: [fedora17-ci] => {"failed": true, "parsed": false}
invalid output was: sudo: unknown user: None
I'll have to check on this to see if I can reproduce it. I suspect
in replacing $SHELL with an explict /bin/sh last night some of the
quoting that was removed was actually needed.
Exception in thread Thread-4 (most likely raised during interpreter
shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/pool.py", line 298, in
_handle_workers
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
I can get around the last part by specifying say -f 10 but it's also new.
Welcome to the development branch. We're tweaking the
multiprocessing usage some.
What was your other --forks value? Doesn't seem to explain anything
unless you had configured
it to be 1.
I’ll have to check on this to see if I can reproduce it. I suspect
in replacing $SHELL with an explict /bin/sh last night some of the
quoting that was removed was actually needed.
That would be great. Just as a double-check. Below I ran the same playbook on a Ubuntu 11.10 host, which still has the old ansible 0.8 devel. The playbook as it is worked just fine:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric
$ ansible-playbook --version
ansible-playbook 0.8
* zperry <zack.perry at sbcglobal.net> [2012/10/26 09:48]:
> [...]
>
> I'll have to check on this to see if I can reproduce it. I suspect
> in replacing $SHELL with an explict /bin/sh last night some of the
> quoting that was removed was actually needed.
>
That would be great. Just as a double-check. Below I ran the same
playbook on a Ubuntu 11.10 host, which still has the old ansible 0.8 devel.
Just a word of caution here -- On most linuxen, /bin/sh is really
bash, and even in POSIX mode it accepts some bashisms; but on other
(non-linux) systems, /bin/sh is a real Bourne shell, on which
bashisms are unsupported. If ansible is going to explicitly call
/bin/sh, there should be no unintentional bashims in there. We've
been bitten by unintentioanl bashims many, many times and they're
very difficult to track down, since they only to happen on "obscure"
(read: non-linux) systems.
Even with systems that taut a ‘POSIX bourne shell’ you have differences, POSIX leaves many things vague and they get implemented differently. I’ve found annoying differences across the BSDs and they share a big part of the codebase.
We do almost nothing with bash. Search for "/bin/sh" in the code to
see what we call.
Shouldn't be any problems.
I'm going to work on fixing the multiprocessing stuff back to *closer*
(but better) than the way it originally was tonight. This will fix
0.9 Control-C handling and fix what I'm assuming is a race condition
in the items below.
Fixes to multiprocessing pushed. Sudo seems to work for me, let me
know if you see problems.
I decided to post my test results here. More people may chip in to ensure it’s not an issue only in our environment.
This sudo issue still exists. The following are used for my testing:
Between SL 6.3 to SL 6.3. Python 2.6.6
Between SL 6.3 to Fedora 17. Python 2.6.6; 2.7.3
Between Ubuntu 11.04 to Fedora 17. Python 2.7.1+; 2.7.3
All running the latest 0.9 devel. The multiprocessing error no longer appears, even with default -f. But when using the simple test playbook shown below, regardless which one of the 3 pairs listed above, identical errors:
I am not so sure why the ‘root’ user suddenly became None in this case. BTW, are there any hints for debugging ansible somewhere? The official doc “Module Development” is very light in this regard… I would love to learn more and do more…
I hope the above trace is of some help in nailing down the real culprit.
So, AFAIK, it's *never* been possible to pass in the sudo boolean via
--extra-args. (Not that it shouldn't be possible).
sudo: True/False explicitly requires a boolean.
It seems you need to set sudo_user to something, though if sudo_user
is not set, it should be root.
You've given me enough to look into what's going on (will look today),
but your usage of passing in the sudo bit via --extra-vars in that way
is undoubtedly why no one else is seeing it -- it's unlikely anyone
else is doing that. Not saying it's wrong, it's probably just
unique.
So, AFAIK, it’s never been possible to pass in the sudo boolean via
–extra-args. (Not that it shouldn’t be possible).
On our end, we have been doing so without any problems since 0.5, until the late stage of 0.8.
sudo: True/False explicitly requires a boolean.
It seems you need to set sudo_user to something, though if sudo_user
is not set, it should be root.
You’ve given me enough to look into what’s going on (will look today),
but your usage of passing in the sudo bit via --extra-vars in that way
is undoubtedly why no one else is seeing it – it’s unlikely anyone
else is doing that. Not saying it’s wrong, it’s probably just
unique.
On our various test systems (CentOS, Fedora, SL, Ubuntu), we use sudo a lot and consistently. Developers also get to do things that require ‘root’ privilege, but only two of us sysadmins have the actual root password to each system. Now you can see why we write many of our small playbooks the way it is.
* Michael DeHaan <michael.dehaan at gmail.com> [2012/10/26 17:42]:
We do almost nothing with bash. Search for "/bin/sh" in the code
to see what we call.
I didn't mean to imply there *was* a problem, just that it's a thing
to keep in mind. Apologies if I gave the impression that there were
existing problems.