Ansible lost tmp dir when run script module

Andy3 · May 30, 2016, 8:51am

Hi experts,
In our test environment, some test cases will failed due to ansible run script moudle, this is hard to reproduce, so I write a script to reproduce this issue, below is the detail a bout this bug:

ANSIBLE VERSION

[root@172-20-12-5 ~]# ansible --version
ansible 1.9.6
  configured module search path = None
[root@172-20-12-5 ~]#

CONFIGURATION

[root@172-20-12-5 ~]# cat /etc/ansible/ansible.cfg
# config file for ansible -- http://ansible.com/
# ==============================================

# nearly all parameters can be overridden in ansible-playbook
# or with command line flags. ansible will read ANSIBLE_CONFIG,
# ansible.cfg in the current working directory, .ansible.cfg in
# the home directory or /etc/ansible/ansible.cfg, whichever it
# finds first

[defaults]

# some basic default values...

inventory      = /etc/ansible/hosts
#library        = /usr/share/my_modules/
remote_tmp     = $HOME/.ansible/tmp
pattern        = *
forks          = 5
poll_interval  = 15
sudo_user      = root
#ask_sudo_pass = True
#ask_pass      = True
transport      = smart
#remote_port    = 22
module_lang    = C

# plays will gather facts by default, which contain information about
# the remote system.
#
# smart - gather by default, but don't regather if already gathered
# implicit - gather by default, turn off with gather_facts: False
# explicit - do not gather by default, must say gather_facts: True
gathering = implicit

# additional paths to search for roles in, colon separated
#roles_path    = /etc/ansible/roles

# uncomment this to disable SSH key host checking
#host_key_checking = False

# change this for alternative sudo implementations
sudo_exe = sudo

# what flags to pass to sudo
#sudo_flags = -H

# SSH timeout
timeout = 10

# default user to use for playbooks if user is not specified
# (/usr/bin/ansible will use current user as default)
#remote_user = root

# logging is off by default unless this path is defined
# if so defined, consider logrotate
#log_path = /var/log/ansible.log

# default module name for /usr/bin/ansible
#module_name = command

# use this shell for commands executed under sudo
# you may need to change this to bin/bash in rare instances
# if sudo is constrained
#executable = /bin/sh

# if inventory variables overlap, does the higher precedence one win
# or are hash values merged together?  The default is 'replace' but
# this can also be set to 'merge'.
#hash_behaviour = replace

# list any Jinja2 extensions to enable here:
#jinja2_extensions = jinja2.ext.do,jinja2.ext.i18n

# if set, always use this private key file for authentication, same as
# if passing --private-key to ansible or ansible-playbook
#private_key_file = /path/to/file

# format of string {{ ansible_managed }} available within Jinja2
# templates indicates to users editing templates files will be replaced.
# replacing {file}, {host} and {uid} and strftime codes with proper values.
ansible_managed = Ansible managed: {file} modified on %Y-%m-%d %H:%M:%S by {uid} on {host}

# by default, ansible-playbook will display "Skipping [host]" if it determines a task
# should not be run on a host.  Set this to "False" if you don't want to see these "Skipping"
# messages. NOTE: the task header will still be shown regardless of whether or not the
# task is skipped.
#display_skipped_hosts = True

# by default (as of 1.3), Ansible will raise errors when attempting to dereference
# Jinja2 variables that are not set in templates or action lines. Uncomment this line
# to revert the behavior to pre-1.3.
#error_on_undefined_vars = False

# by default (as of 1.6), Ansible may display warnings based on the configuration of the
# system running ansible itself. This may include warnings about 3rd party packages or
# other conditions that should be resolved if possible.
# to disable these warnings, set the following value to False:
#system_warnings = True

# by default (as of 1.4), Ansible may display deprecation warnings for language
# features that should no longer be used and will be removed in future versions.
# to disable these warnings, set the following value to False:
#deprecation_warnings = True

# (as of 1.8), Ansible can optionally warn when usage of the shell and
# command module appear to be simplified by using a default Ansible module
# instead.  These warnings can be silenced by adjusting the following
# setting or adding warn=yes or warn=no to the end of the command line
# parameter string.  This will for example suggest using the git module
# instead of shelling out to the git command.
# command_warnings = False

# set plugin path directories here, separate with colons
action_plugins     = /usr/share/ansible_plugins/action_plugins
callback_plugins   = /usr/share/ansible_plugins/callback_plugins
connection_plugins = /usr/share/ansible_plugins/connection_plugins
lookup_plugins     = /usr/share/ansible_plugins/lookup_plugins
vars_plugins       = /usr/share/ansible_plugins/vars_plugins
filter_plugins     = /usr/share/ansible_plugins/filter_plugins

# by default callbacks are not loaded for /bin/ansible, enable this if you
# want, for example, a notification or logging callback to also apply to
# /bin/ansible runs
#bin_ansible_callbacks = False

# don't like cows?  that's unfortunate.
# set to 1 if you don't want cowsay support or export ANSIBLE_NOCOWS=1
#nocows = 1

# don't like colors either?
# set to 1 if you don't want colors, or export ANSIBLE_NOCOLOR=1
#nocolor = 1

# the CA certificate path used for validating SSL certs. This path
# should exist on the controlling node, not the target nodes
# common locations:
# RHEL/CentOS: /etc/pki/tls/certs/ca-bundle.crt
# Fedora     : /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
# Ubuntu     : /usr/share/ca-certificates/cacert.org/cacert.org.crt
#ca_file_path =

# the http user-agent string to use when fetching urls. Some web server
# operators block the default urllib user agent as it is frequently used
# by malicious attacks/scripts, so we set it to something unique to
# avoid issues.
#http_user_agent = ansible-agent

# if set to a persistent type (not 'memory', for example 'redis') fact values
# from previous runs in Ansible will be stored.  This may be useful when
# wanting to use, for example, IP information from one group of servers
# without having to talk to them in the same playbook run to get their
# current IP information.
fact_caching = memory

# retry files
#retry_files_enabled = False
#retry_files_save_path = ~/.ansible-retry

[privilege_escalation]
#become=True
#become_method=sudo
#become_user=root
#become_ask_pass=False

[paramiko_connection]

# uncomment this line to cause the paramiko connection plugin to not record new host
# keys encountered.  Increases performance on new host additions.  Setting works independently of the
# host key checking setting above.
#record_host_keys=False

# by default, Ansible requests a pseudo-terminal for commands executed under sudo. Uncomment this
# line to disable this behaviour.
#pty=False

[ssh_connection]

# ssh arguments to use
# Leaving off ControlPersist will result in poor performance, so use
# paramiko on older platforms rather than removing it
#ssh_args = -o ControlMaster=auto -o ControlPersist=60s

# The path to use for the ControlPath sockets. This defaults to
# "%(directory)s/ansible-ssh-%%h-%%p-%%r", however on some systems with
# very long hostnames or very long path names (caused by long user names or
# deeply nested home directories) this can exceed the character limit on
# file socket names (108 characters for most platforms). In that case, you
# may wish to shorten the string below.
#
# Example:
# control_path = %(directory)s/%%h-%%r
#control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r

# Enabling pipelining reduces the number of SSH operations required to
# execute a module on the remote server. This can result in a significant
# performance improvement when enabled, however when using "sudo:" you must
# first disable 'requiretty' in /etc/sudoers
#
# By default, this option is disabled to preserve compatibility with
# sudoers configurations that have requiretty (the default on many distros).
#
#pipelining = False

# if True, make ansible use scp if the connection type is ssh
# (default is sftp)
#scp_if_ssh = True

[accelerate]
accelerate_port = 5099
accelerate_timeout = 30
accelerate_connect_timeout = 5.0

# The daemon timeout is measured in minutes. This time is measured
# from the last activity to the accelerate daemon.
accelerate_daemon_timeout = 30

# If set to yes, accelerate_multi_key will allow multiple
# private keys to be uploaded to it, though each user must
# have access to the system via SSH to add a new key. The default
# is "no".
#accelerate_multi_key = yes

[selinux]
# file systems that require special treatment when dealing with security context
# the default behaviour that copies the existing context or uses the user default
# needs to be changed to use the file system dependant context.
#special_context_filesystems=nfs,vboxsf,fuse

OS / ENVIRONMENT

[root@172-20-12-5 ~]# cat /etc/system-release
CentOS Linux release 7.2.1511 (Core)

SUMMARY

I have a test script which will re-run a task on the host(localhost), but after a lot of times(sometimes only a few minutes) re-run the task, the task failed by report:

TASK: [pre-install script] ****************************************************
<172.20.12.5> ESTABLISH CONNECTION FOR USER: root
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1464595244.34-181895261530692 && echo $HOME/.ansible/tmp/ansible-tmp-1464595244.34-181895261530692'
<172.20.12.5> PUT /tmp/tmp_l2Pf4 TO tmp_l2Pf4
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c 'chmod +rx tmp_l2Pf4'
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c 'LANG=C LC_CTYPE=C tmp_l2Pf4 '
failed: [172.20.12.5] => {"changed": true, "rc": 127}
stderr: OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 57: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 30030
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 127
Shared connection to 172.20.12.5 closed.

stdout: /bin/sh: tmp_l2Pf4: command not found

FATAL: all hosts have already failed -- aborting

STEPS TO REPRODUCE

Only need to replace your real host ip and run the test_script.py will re-produce the bug.
BTW, don’t forget to add your host ip to /etc/ansible/hosts

Andy3 · May 30, 2016, 9:17am

This issue seems not only a script module issue, after I re-run the test script, setup module also meet the same problem:

2016-05-30_17-00-57 executing shell command[ yaml_file=mktemp`
cat <> $yaml_file

Andy3 · May 30, 2016, 12:21pm

After debug this issue, I find it fails in function _make_tmp_path of file /usr/lib/python2.7/site-packages/ansible/runner/init.py
below is the debug output:

`
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=“/root/.ansible/cp/ansible-ssh-%h-%p-%r” -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1464610559.99-5504025515450 && echo $HOME/.ansible/tmp/ansible-tmp-1464610559.99-5504025515450’

{‘stdout’: ‘’, ‘stderr’: ‘OpenSSH_6.6.1, OpenSSL 1.0.1e-fips 11 Feb 2013\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 57: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 6427\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to 172.20.12.5 closed.\r\n’, ‘rc’: 0}
<172.20.12.5> PUT /tmp/tmpjiXC2N TO setup
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=“/root/.ansible/cp/ansible-ssh-%h-%p-%r” -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c ‘LANG=C LC_CTYPE=C /usr/bin/python setup’
`

Does this means the ssh connection broken? This is strange due to I run this task on localhost, so network issue should not happen.

在 2016年5月30日星期一 UTC+8下午4:51:36，Andy写道：

Johannes_Kastl · May 30, 2016, 12:46pm

Maybe I missed it in your first mail (which was very detailed, hats
off for the effort), but where did you specify 'connection: local'? If
you did not, then you would connect to your localhost (or its IP) via ssh.

Johannes

Andy3 · May 31, 2016, 1:56am

The host ip I specify in yaml file is my local host ip address.

在 2016年5月30日星期一 UTC+8下午8:47:03，Johannes Kastl写道：

Johannes_Kastl · May 31, 2016, 5:51am

Then you are connecting via ssh...

Try adding "connection: local" to your playbook.

Johannes

Andy3 · May 31, 2016, 8:09am

yes, you are right, after add ‘connection: local’ to my playbook, this issue seems gone away, but I still want to know how to avoid this issue when I didn’t run the task for local host.

Johannes_Kastl · May 31, 2016, 9:27am

With the information you have now (works with connection: local, fails
via ssh) I would say open a bug report/issue on github.

Johannes

Johannes_Kastl · May 31, 2016, 9:28am

Oh, and try setting this in ansible.cfg:

[ssh_connection]
control_path = %(directory)s/%%C

I had lots of issues with the control_path being to long before
setting this...

Johannes

Andy3 · May 31, 2016, 1:51pm

Thanks, Johannes , I have reported this issue on github, BTW, ansible didn’t recognize the config: %C

`
<172.20.12.5> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=“/root/.ansible/cp/%C” -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 172.20.12.5 /bin/sh -c ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1464702659.99-146678604562601 && echo $HOME/.ansible/tmp/ansible-tmp-1464702659.99-146678604562601’
fatal: [172.20.12.5] => SSH Error: percent_expand: unknown key %C
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

TASK: [pre-install script] ****************************************************
FATAL: no hosts matched or all hosts have already failed – aborting

PLAY RECAP ********************************************************************
to retry, use: --limit @/root/tmp.tjKk2GEYR5.retry

172.20.12.5 : ok=0 changed=0 unreachable=1 failed=0

`

在 2016年5月31日星期二 UTC+8下午5:28:50，Johannes Kastl写道：

Johannes_Kastl · May 31, 2016, 2:03pm

There should be two %% before the C...

Johannes

Topic		Replies	Views
Weird /tmp file issue Ansible Project	12	41	November 7, 2014
Failed to create temporary directory Ansible Project	41	323	June 3, 2022
Couldn't read packet: Connection reset by peer Ansible Project	11	2	July 29, 2013
UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory. Ansible Project	1	168	May 3, 2022
/tmp issue Ansible Project	2	69	February 27, 2024

Ansible lost tmp dir when run script module

ANSIBLE VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

Related topics