apt: upgrade=dist play hangs

git pulled the latest code

$ git rev-parse HEAD
f1fdddb640e1dfcd8fdc930031db267947d8cb70

and now the following play hangs

  • name: package upgrade (apt)
    apt: upgrade=dist
    sudo: yes

All computers have minimal load. Logs on the remote computer show:

ansible-apt: Invoked with dpkg_options=force-confdef,force-confold upgrade=None force=False package=None purge=False state=present update_cache=True default_release=None cache_valid_time=3600 deb=None install_recommends=True

The computer running ansible shows:

<www.X.com> ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO www.X.com
<www.X.com> REMOTE_MODULE apt upgrade=dist
<www.X.com> EXEC /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644 && echo $HOME/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644’
<www.X.com> PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpgwRxMi TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt
<www.Xl.com> EXEC /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=XXXXXX] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-XXXXXXX; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/apt; rm -rf /home/rte/.ansible/tmp/ansible-tmp-1421171817.66-121308835316644/ >/dev/null 2>&1’”‘"’’

What is odd is the remote computer does not show python running or ssh connections so I think the command had completed on the remote host and it’s ansible computer that is hung or slow waiting to process things.

Here is what I see on the ansible host for a process list

501 82141 31032 0 12:05PM ttys004 0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82151 82141 0 12:06PM ttys004 0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82154 82141 0 12:06PM ttys004 0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82155 82141 0 12:06PM ttys004 0:00.00 (python)
501 82156 82141 0 12:06PM ttys004 0:00.00 (python)

PID 82154 continues to tick CPU time so I do not think it’s hung.

I’ve rebooted the www.X.com host and still the ansible control computer hangs on this play.

I’m looking for tips on how to debug this problem.

Thanks.

Here is what I see on the ansible host for a process list

501 82141 31032 0 12:05PM ttys004 0:00.80 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82151 82141 0 12:06PM ttys004 0:00.04 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82154 82141 0 12:06PM ttys004 0:09.39 python ansible-playbook --ask-pass --ask-sudo -i customers/XXXX.com linux-servers.yml --tags apt
501 82155 82141 0 12:06PM ttys004 0:00.00 (python)
501 82156 82141 0 12:06PM ttys004 0:00.00 (python)

PID 82154 continues to tick CPU time so I do not think it’s hung.

If I kill 82154 the ansible control computer moves past the “hung” www.X.com host and continues with the rest of the plays in the playbook.

Have this play hang now too.

  • name: package update (apt)
    apt: update_cache=yes cache_valid_time=3600
    sudo: yes
    always_run: yes

Verbose run shows this on multiple hosts:

<host.domain.com> ESTABLISH CONNECTION FOR USER: ansible-user on PORT 22 TO host.domain.com

<host.domain.com> REMOTE_MODULE apt update_cache=yes cache_valid_time=3600

<host.domain.com> EXEC /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320 && echo $HOME/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320’

<host.domain.com> PUT /var/folders/d_/bm7rvz154jb_2djkqybb503h0000gn/T/tmpGH6ekd TO /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt

<host.domain.com> EXEC /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=X] password: " -u root /bin/sh -c '”’“‘echo SUDO-SUCCESS-X; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/apt; rm -rf /home/ansible-user/.ansible/tmp/ansible-tmp-1421180023.91-7480599726320/ >/dev/null 2>&1’”‘"’’

Then is just hangs.

what happens when you manually run 'apt-get update' on the same machine?

I should have posted that info in the first place. Thanks for asking. Things work fine when apt-get update from the command line.

ok, so what happens when you run "apt-get dist-upgrade" , both these
commands are what ansible is running for you with those arguments to
the module.

Works as expected.

Oddly if I --limit “www.X.com” it also works.

When I’m -not- using --limit I do see this in the logs of the host that hangs then the “REMOTE_MODULE apt update_cache=yes cache_valid_time=3600” is run:

Jan 14 17:02:05 www sudo: pam_unix(sudo:auth): authentication failure; logname=ansible-user uid=1001 euid=0 tty=/dev/pts/3 ruser=ansible-user rhost= user=ansible-user

But earlier in the logs I see

Jan 14 16:57:43 www ansible-setup: Invoked with filter=* fact_path=/etc/ansible/facts.d

So I know the sudo stuff is working for the “REMOTE_MODULE setup” stuff.

I’m really at a loss on why this is happening!

I think I found the problem and I think it’s a bug.

I have an inventory file that looks like this:

[web-servers]
www.domain1.com
www.domain2.com
www.one-off.com
www.domain3.com
www.domain4.com

Once ansible hits the www.one-off.com every www.domainX.com after that has sudo authentication failure. If I comment out www.one-off.com everything works as expected (no hangs).

What’s different with www.one-off.com? It has a different (one-off) sudo password.

hosts_vars/www.one-off.com.yml

seens like this issue
https://github.com/ansible/ansible/issues/9823

Yes, that’s the issue.

The follow ups are not clear to me.

Is ansible behaving as expected? Meaning, if you set the ‘ansible_sudo_pass’ in a host_vars it’s set from that point forward?

no, its a bug