Bob_Tanner
(Bob Tanner)
January 13, 2015, 7:13pm
1
git pulled the latest code
$ git rev-parse HEAD
f1fdddb640e1dfcd8fdc930031db267947d8cb70
and now the following play hangs
name: package upgrade (apt)
apt: upgrade=dist
sudo: yes
All computers have minimal load. Logs on the remote computer show:
ansible-apt: Invoked with dpkg_options=force-confdef,force-confold upgrade=None force=False package=None purge=False state=present update_cache=True default_release=None cache_valid_time=3600 deb=None install_recommends=True
The computer running ansible shows:
<www.X.com > ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO www.X.com
<www.X.com > REMOTE_MODULE apt upgrade=dist
<www.X.com > EXEC /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && echo $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644’
<www.X.com > PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpgwRxMi TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt
<www.Xl.com > EXEC /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=XXXXXX] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-XXXXXXX; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt; rm -rf /home/rte/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/ >/dev/null 2>&1’”‘"’’
What is odd is the remote computer does not show python running or ssh connections so I think the command had completed on the remote host and it’s ansible computer that is hung or slow waiting to process things.
Here is what I see on the ansible host for a process list
501 82141 31032 0 12:05PM ttys004 0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82151 82141 0 12:06PM ttys004 0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82154 82141 0 12:06PM ttys004 0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82155 82141 0 12:06PM ttys004 0:00.00 (python)
501 82156 82141 0 12:06PM ttys004 0:00.00 (python)
PID 82154 continues to tick CPU time so I do not think it’s hung.
I’ve rebooted the www.X.com host and still the ansible control computer hangs on this play.
I’m looking for tips on how to debug this problem.
Thanks.
Bob_Tanner
(Bob Tanner)
January 13, 2015, 7:19pm
2
Here is what I see on the ansible host for a process list
501 82141 31032 0 12:05PM ttys004 0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82151 82141 0 12:06PM ttys004 0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82154 82141 0 12:06PM ttys004 0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82155 82141 0 12:06PM ttys004 0:00.00 (python)
501 82156 82141 0 12:06PM ttys004 0:00.00 (python)
PID 82154 continues to tick CPU time so I do not think it’s hung.
If I kill 82154 the ansible control computer moves past the “hung” www.X.com host and continues with the rest of the plays in the playbook.
Bob_Tanner
(Bob Tanner)
January 14, 2015, 1:22am
3
Have this play hang now too.
name: package update (apt)
apt: update_cache=yes cache_valid_time=3600
sudo: yes
always_run: yes
Verbose run shows this on multiple hosts:
<host.domain.com > ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO host.domain.com
<host.domain.com > REMOTE_MODULE apt update_cache=yes cache_valid_time=3600
<host.domain.com > EXEC /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && echo $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320’
<host.domain.com > PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpGH6ekd TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt
<host.domain.com > EXEC /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=X] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-X; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt; rm -rf /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/ >/dev/null 2>&1’”‘"’’
Then is just hangs.
Brian_Coca
(Brian Coca)
January 14, 2015, 12:36pm
4
what happens when you manually run 'apt-get update' on the same machine?
Bob_Tanner
(Bob Tanner)
January 14, 2015, 5:41pm
5
I should have posted that info in the first place. Thanks for asking. Things work fine when apt-get update from the command line.
Brian_Coca
(Brian Coca)
January 14, 2015, 6:03pm
6
ok, so what happens when you run "apt-get dist-upgrade" , both these
commands are what ansible is running for you with those arguments to
the module.
Bob_Tanner
(Bob Tanner)
January 14, 2015, 11:07pm
7
Works as expected.
Oddly if I --limit “www.X.com ” it also works.
When I’m -not- using --limit I do see this in the logs of the host that hangs then the “REMOTE_MODULE apt update_cache=yes cache_valid_time=3600” is run:
Jan 14 17:02:05 www sudo: pam_unix(sudo:auth): authentication failure; logname=ansible-user uid=1001 euid=0 tty=/dev/pts/3 ruser=ansible-user rhost= user=ansible-user
But earlier in the logs I see
Jan 14 16:57:43 www ansible-setup: Invoked with filter=* fact_path=/etc/ansible/facts.d
So I know the sudo stuff is working for the “REMOTE_MODULE setup” stuff.
I’m really at a loss on why this is happening!
Bob_Tanner
(Bob Tanner)
January 15, 2015, 2:27am
8
I think I found the problem and I think it’s a bug.
I have an inventory file that looks like this:
[web-servers]
www.domain1.com
www.domain2.com
www.one-off.com
www.domain3.com
www.domain4.com
Once ansible hits the www.one-off.com every www.domainX.com after that has sudo authentication failure. If I comment out www.one-off.com everything works as expected (no hangs).
What’s different with www.one-off.com ? It has a different (one-off) sudo password.
hosts_vars/www.one-off.com.yml
Brian_Coca
(Brian Coca)
January 15, 2015, 1:28pm
9
Bob_Tanner
(Bob Tanner)
January 15, 2015, 2:35pm
10
Yes, that’s the issue.
The follow ups are not clear to me.
Is ansible behaving as expected? Meaning, if you set the ‘ansible_sudo_pass’ in a host_vars it’s set from that point forward?