Weird /tmp file issue

Hello,
I’m running ansible-playbook 1.7.2 on Centos 6 and after running ok for a couple of weeks, now I’m getting this:

Note that the /tmp full path is missing. I used to get this:

Same playbook, same command. The playbook has not changed, and the two examples you see above were run on the same day. The weird thing is that around half the time it now fails with the above symptom. The other half it continues to work just fine for the entire playbook. It is as if between the EXEC line and the PUT line that the path-to-be-appended is being lost.

I have tried reverting to 1.7.1 without success.
Suggestions?

Thank you,
Paul

Can you please paste this as text so things aren’t in screenshots?

Thanks!

Sorry, here goes again:

Hello,
I’m running ansible-playbook 1.7.2 on Centos 6 and after running ok for a couple of weeks, now I’m getting this:

PLAY [all] ********************************************************************

TASK: [Get stats] *************************************************************
<10.250.1.55> ESTABLISH CONNECTION FOR USER: root
<10.250.1.55> REMOTE_MODULE command cat /usr/src/ap/gateway.ini #USE_SHELL
<10.250.1.55> EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/root/.ansible/cp/%h-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘10.250.1.55’, “/bin/sh -c ‘mkdir -p /tmp/ansible-tmp-1414787762.25-20167395212794 && echo /tmp/ansible-tmp-1414787762.25-20167395212794’”]
<10.250.1.55> PUT /tmp/tmpRAIc9O TO command
<10.250.1.55> EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/root/.ansible/cp/%h-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘10.250.1.55’, u"/bin/sh -c ‘LANG=C LC_CTYPE=C /usr/bin/python command’"]
failed: [10.250.1.55] => {“failed”: true, “parsed”: false}
invalid output was: /usr/bin/python: can’t open file ‘command’: [Errno 2] No such file or directory
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 1749
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 2
Shared connection to 10.250.1.55 closed.

FATAL: all hosts have already failed – aborting

PLAY RECAP ********************************************************************
to retry, use: --limit @/root/j_gw_stats.retry

10.250.1.55 : ok=0 changed=0 unreachable=0 failed=1

Note that the /tmp full path is missing. I used to get this:

PLAY [all] ********************************************************************

TASK: [Get stats] *************************************************************
<10.250.1.55> ESTABLISH CONNECTION FOR USER: root
<10.250.1.55> REMOTE_MODULE command cat /usr/src/ap/gateway.ini #USE_SHELL
<10.250.1.55> EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/root/.ansible/cp/%h-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘10.250.1.55’, “/bin/sh -c ‘mkdir -p /tmp/ansible-tmp-1414784342.55-92617073727981 && echo /tmp/ansible-tmp-1414784342.55-92617073727981’”]
<10.250.1.55> PUT /tmp/tmpF_msTP TO /tmp/ansible-tmp-1414784342.55-92617073727981/command
<10.250.1.55> EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/root/.ansible/cp/%h-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘10.250.1.55’, u"/bin/sh -c ‘LANG=C LC_CTYPE=C /usr/bin/python /tmp/ansible-tmp-1414784342.55-92617073727981/command; rm -rf /tmp/ansible-tmp-1414784342.55-92617073727981/ >/dev/null 2>&1’"]
changed: [10.250.1.55] => {“changed”: true, “cmd”: “cat /usr/src/ap/gateway.ini”, “delta”: “0:00:00.072717”, “end”: “2014-10-31 19:39:07.888074”, “rc”: 0, “start”: “2014-10-31 19:39:07.815357”, “stderr”: “”, “stdout”: “[id]\r\ngatewayid = 1;\r\nhopseed = 1;\r\n\r\n[fw]\r\nfwname = "P01-0208.bin";\r\nfwlength = 79872;\r\nrecords = 2560;\r\n\r\n[noc]\r\nport = 23005;\r\n\r\n[debug]\r\ndebug=0;”}

…snip…

Any thoughts at all why the /tmp folder on the PUT action is not always being defined?

Thank you,
Paul

Shot in the dark: How are you setting remote_tmp? the default is
$HOME/.ansible/tmp but your output is showing /tmp/ (in the places
where it's being properly inserted).

-Toshio

remote_tmp = /tmp

And you're setting that by modifying /etc/ansible/ansible.cfg?

Yes, set by /etc/ansible/ansible.cfg

Looking through the code nothing's jumping out at me. The symptom you
describe where this sometimes works and sometimes doesn't screams race
condition but I'm not seeing where that's happening. I think we'll
need a reproducer in order to diagnose from our side.

Failing that, if you're comfortable with python code you might look at
instrumenting _execute_module in ansible/runner/__init__.py to see if
you can pin down where the value of tmp is being lost.

-Toshio

Thank you, we’ll look at it over here.
Paul

Ok, here’s a followup. I added a statement to print the contents of the result variable in _make_tmp_path (in /runner/init.py) and then ran my playbook. The output is kind of long, but here it is:

<10.250.1.55> EXEC [‘ssh’, ‘-C’, ‘-tt’, ‘-vvv’, ‘-o’, ‘ControlMaster=auto’, ‘-o’, ‘ControlPersist=60s’, ‘-o’, ‘ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r’, ‘-o’, ‘Port=22’, ‘-o’, ‘KbdInteractiveAuthentication=no’, ‘-o’, ‘PreferredAuthentications=publickey’, ‘-o’, ‘PasswordAuthentication=no’, ‘-o’, ‘ConnectTimeout=10’, ‘10.250.1.55’, “/bin/sh -c ‘mkdir -p /tmp/ansible-tmp-1415379422.44-263362473697853 && echo /tmp/ansible-tmp-1415379422.44-263362473697853’”]
result= {
‘stdout’: ‘’,
‘stderr’: 'OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013\n
debug1: Reading configuration data /etc/ssh/ssh_config\r\n
debug1: Applying options for \r\n
debug1: auto-mux: Trying existing master\r\n
debug1: Control socket “/root/.ansible/cp/ansible-ssh-10.250.1.55-22-root” does not exist\r\n
debug2: ssh_connect: needpriv 0\r\n
debug1: Connecting to 10.250.1.55 [10.250.1.55] port 22.\r\n
debug2: fd 3 setting O_NONBLOCK\r\n
debug1: fd 3 clearing O_NONBLOCK\r\n
debug1: Connection established.\r\n
debug3: timeout: 9665 ms remain after connect\r\n
debug1: permanently_set_uid: 0/0\r\n
debug1: identity file /root/.ssh/identity type -1\r\n
debug1: identity file /root/.ssh/identity-cert type -1\r\n
debug3: Not a RSA1 key file /root/.ssh/id_rsa.\r\n
debug2: key_type_from_name: unknown key type '-----BEGIN'\r\n
debug3: key_read: missing keytype\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug3: key_read: missing whitespace\r\n
debug2: key_type_from_name: unknown key type '-----END'\r\n
debug3: key_read: missing keytype\r\n
debug1: identity file /root/.ssh/id_rsa type 1\r\n
debug1: identity file /root/.ssh/id_rsa-cert type -1\r\n
debug1: identity file /root/.ssh/id_dsa type -1\r\n
debug1: identity file /root/.ssh/id_dsa-cert type -1\r\n
debug1: identity file /root/.ssh/id_ecdsa type -1\r\n
debug1: identity file /root/.ssh/id_ecdsa-cert type -1\r\n
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.0p1 Debian-4\r\n
debug1: match: OpenSSH_6.0p1 Debian-4 pat OpenSSH
\r\n
debug1: Enabling compatibility mode for protocol 2.0\r\n
debug1: Local version string SSH-2.0-OpenSSH_5.3\r\n
debug2: fd 3 setting O_NONBLOCK\r\n
debug1: SSH2_MSG_KEXINIT sent\r\n
debug3: Wrote 960 bytes for a total of 981\r\n
debug1: SSH2_MSG_KEXINIT received\r\n
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1\r\n
debug2: kex_parse_kexinit: ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-rsa,ssh-dss\r\n
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se\r\n
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se\r\n
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96\r\n
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96\r\n
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none\r\n
debug2: kex_parse_kexinit: zlib@openssh.com,zlib,none\r\n
debug2: kex_parse_kexinit: \r\ndebug2: kex_parse_kexinit: \r\n
debug2: kex_parse_kexinit: first_kex_follows 0 \r\n
debug2: kex_parse_kexinit: reserved 0 \r\n
debug2: kex_parse_kexinit: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1\r\n
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256\r\ndebug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se\r\n
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se\r\n
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96\r\n
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96\r\n
debug2: kex_parse_kexinit: none,zlib@openssh.com\r\n
debug2: kex_parse_kexinit: none,zlib@openssh.com\r\n
debug2: kex_parse_kexinit: \r\n
debug2: kex_parse_kexinit: \r\n
debug2: kex_parse_kexinit: first_kex_follows 0 \r\n
debug2: kex_parse_kexinit: reserved 0 \r\n
debug2: mac_setup: found hmac-md5\r\n
debug1: kex: server->client aes128-ctr hmac-md5 zlib@openssh.com\r\n
debug2: mac_setup: found hmac-md5\r\n
debug1: kex: client->server aes128-ctr hmac-md5 zlib@openssh.com\r\n
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent\r\n
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP\r\n
debug3: Wrote 24 bytes for a total of 1005\r\n
debug2: dh_gen_key: priv key bits set: 125/256\r\n
debug2: bits set: 480/1024\r\ndebug1: SSH2_MSG_KEX_DH_GEX_INIT sent\r\n
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY\r\n
debug3: Wrote 144 bytes for a total of 1149\r\n
debug3: check_host_in_hostfile: host 10.250.1.55 filename /root/.ssh/known_hosts\r\n
debug3: check_host_in_hostfile: host 10.250.1.55 filename /root/.ssh/known_hosts\r\n
debug3: check_host_in_hostfile: match line 1\r\n
debug1: Host '10.250.1.55' is known and matches the RSA host key.\r\n
debug1: Found key in /root/.ssh/known_hosts:1\r\n
debug2: bits set: 509/1024\r\n
debug1: ssh_rsa_verify: signature correct\r\n
debug2: kex_derive_keys\r\n
debug2: set_newkeys: mode 1\r\n
debug1: SSH2_MSG_NEWKEYS sent\r\n
debug1: expecting SSH2_MSG_NEWKEYS\r\n
debug3: Wrote 16 bytes for a total of 1165\r\n
debug2: set_newkeys: mode 0\r\n
debug1: SSH2_MSG_NEWKEYS received\r\n
debug1: SSH2_MSG_SERVICE_REQUEST sent\r\n
debug3: Wrote 48 bytes for a total of 1213\r\n
debug2: service_accept: ssh-userauth\r\n
debug1: SSH2_MSG_SERVICE_ACCEPT received\r\n
debug2: key: /root/.ssh/identity ((nil))\r\n
debug2: key: /root/.ssh/id_rsa (0x7ff98d26d620)\r\n
debug2: key: /root/.ssh/id_dsa ((nil))\r\n
debug2: key: /root/.ssh/id_ecdsa ((nil))\r\n
debug3: Wrote 64 bytes for a total of 1277\r\n
debug1: Authentications that can continue: publickey,password\r\n
debug3: start over, passed a different list publickey,password\r\n
debug3: preferred publickey\r\ndebug3: authmethod_lookup publickey\r\n
debug3: remaining preferred: \r\n
debug3: authmethod_is_enabled publickey\r\n
debug1: Next authentication method: publickey\r\n
debug1: Trying private key: /root/.ssh/identity\r\n
debug3: no such identity: /root/.ssh/identity\r\n
debug1: Offering public key: /root/.ssh/id_rsa\r\n
debug3: send_pubkey_test\r\n
debug2: we sent a publickey packet, wait for reply\r\n
debug3: Wrote 368 bytes for a total of 1645\r\n
debug1: Server accepts key: pkalg ssh-rsa blen 277\r\n
debug2: input_userauth_pk_ok: SHA1 fp 69:be:9f:27:19:27:09:6a:bc:a6:d4:0c:2f:79:d0:58:6d:72:15:2e\r\n
debug3: sign_and_send_pubkey: RSA 69:be:9f:27:19:27:09:6a:bc:a6:d4:0c:2f:79:d0:58:6d:72:15:2e\r\n
debug1: read PEM private key done: type RSA\r\ndebug3: Wrote 640 bytes for a total of 2285\r\n
debug1: Enabling compression at level 6.\r\n
debug1: Authentication succeeded (publickey).\r\n
debug1: setting up multiplex master socket\r\n
debug3: muxserver_listen: temporary control path /root/.ansible/cp/ansible-ssh-10.250.1.55-22-root.b522MvBV0wTcYeIJ\r\n
debug2: fd 4 setting O_NONBLOCK\r\n
debug3: fd 4 is O_NONBLOCK\r\n
debug3: fd 4 is O_NONBLOCK\r\n
debug1: channel 0: new [/root/.ansible/cp/ansible-ssh-10.250.1.55-22-root.b522MvBV0wTcYeIJ]\r\n
debug3: muxserver_listen: mux listener channel 0 fd 4\r\n
debug1: Entering interactive session.\r\n
debug2: set_control_persist_exit_time: schedule exit in 60 seconds\r\n
', ‘rc’: 0}

The important part is that rc=0, and stdout = ‘’. Stdout being empty causes _make_tmp_path() to return ‘’ because of this line:

rc = conn.shell.join_path(utils.last_non_blank_line(result[‘stdout’]).strip(), ‘’)

Apparently, something in my initial SSH connection is not entirely right, causing the tmp var to be empty, which then cascades to PUT having an incorrect output path, etc…

I found that if I change ControlMaster=no in my ssh_args (in /etc/ansible.cfg) that it WORKS every time. I am hoping that there’s a clue here somewhere that someone will benefit from.

Thanks,
Paul

Ah, mention of ControlMaster makes me think of this:
https://groups.google.com/forum/#!searchin/ansible-project/controlpersist/ansible-project/QUdxNK1zEH0/rQKnO827FUgJ

Do you have pipelining enabled? If so, you might be running into the
combination of updated CentOS6 openssh + pipelining bug that others
are reporting.

-Toshio

I’m not running pipelining, but I am running on CentOS.

In any case, I’m up and running now.

Thanks,
Paul