Hi folks,
I am trying to use ansible to deploy openshift on 2 hosts (okd01a and okd01b, CentOS 7.6). Problem: Apparently some remote task gets stuck in a password query:
root 7406 1 0 Mar26 ? 00:00:00 /usr/sbin/sshd -D
root 7945 7406 0 Mar26 ? 00:00:00 _ sshd: root@pts/0
root 7948 7945 0 Mar26 pts/0 00:00:00 | _ -bash
root 896 7948 0 11:32 pts/0 00:00:00 | _ ps -ef --forest
root 897 7948 0 11:32 pts/0 00:00:00 | _ cat
root 48897 7406 0 09:59 ? 00:00:00 _ sshd: root@pts/1
root 49097 48897 0 09:59 pts/1 00:00:00 _ /bin/sh -c /usr/bin/python /root/.ansible/tmp/ansible-tmp-1553677155.03-134576205842945/AnsiballZ_systemd.py && sleep 0
root 49109 49097 0 09:59 pts/1 00:00:00 _ /usr/bin/python /root/.ansible/tmp/ansible-tmp-1553677155.03-134576205842945/AnsiballZ_systemd.py
root 49117 49109 0 09:59 pts/1 00:00:00 _ /usr/bin/systemctl restart docker
root 49118 49117 0 09:59 pts/1 00:00:00 _ /usr/bin/systemd-tty-ask-password-agent --watch
root 49119 49117 0 09:59 pts/1 00:00:00 _ /usr/bin/pkttyagent --notify-fd 5 --fallback
Ain’t ansible supposed to run the remote tasks without controlling terminal to avoid this kind of problem?
Regards
Harri
Can you provide more information? its hard to tell what exactly is going on without looking at the code and logs.
Apparently systemd got confused. After running “systemctl daemon-reload” in another terminal ansible continued its playbook. Success.
I just wonder, shouldn’t ansible run “systemctl daemon-reload” before starting, stopping or restarting services on the remote host?
it works different. You need to add a handler that will do a systemctl daemon-reload. That’s the proper way to do it. If this tasks doesn’t have it then I suggest you do a PR.
I am not sure if thats reasonable. Ain’t the service module supposed to hide the underlying implementation (systemd, sysv init, upstart, whatever)? It should not get stuck, breaking the whole ansible-playbook session, no matter what.
You mean, you hardwire "systemd-only" into your playbooks? Or
do you explicitly list the systemd hosts in your inventory file,
making sure the general service module isn't used by accident?
Ansible is very new to me, but I had the impression that it is
supposed to hide these internal details.
Regards
Harri
You can make some tasks depending on os version. So for example use service module with RHEL <= 6 and use systemd module with RHEL >=7
When statement and ansible_os_family or ansible_distribution and ansible_distribution_major_version could be interesting fact to use
Regards
I don't like to distinguish between these hosts. The RedHat
host might have been upgraded from 6.x to 7.y, for example. Its
unreasonable to assume that everybody updates his inventory
files in such a case.
But thats not the point. I am highly concerned how one bad
host can force the whole ansible-playbook task to halt for
>70 minutes. Surely this is not best practice, but I wonder
if this is seen as an acceptable hiccup for using Ansible?
Regards
Harri
it depends how you are writing then automation. You can have ansible do the work in a inventory and go through that list in a order or shoot and do the work in every host and doesn’t matter if one is hold up. https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
The “ansible_” variables are facts gathered from the remote system and not set in your inventory. They allow you to customize your plays based on the target system.
Regarding your other question how one bad host can force the playbook to halt; it happens and you have to find ways to work around it. We’ve run into scenarios where fact gathering halts because of hung NFS mounts, the yum module hangs because yum is having problems, etc. There are ways to work around these. I frequently use “async” to set a timeout on the tasks that could potentially hang so that they can at least fail without ruining the playbook for the rest of the hosts.
–Steve