Ansible sometime could not get the output from the runner._low_level_exec_command,this would cause the task to fail.

Ansible version 1.5.3

Out playbook look like below:
debug.ym:

  • hosts:
  • cnode463
    tasks:
  • include: roles/conf/tasks/hadoop.yml

hadoop.yml

  • name: copy hadoop conf
    sudo: yes
    template: src={{ TEMPLATE_DIR }}/hadoop/{{item}}.j2 dest=/etc/hadoop/conf/{{item}}
    with_items:
  • core-site.xml
  • hdfs-site.xml
  • hdfs-site.private.xml
  • log4j.properties
  • hadoop-env.sh

when running the playbook, sometime we get failed.

TASK: [copy hbase conf] ******************************************************* 
ok: [cnode463] => (item=hbase-site.xml)
ok: [cnode463] => (item=log4j.properties)
failed: [cnode463] => (item=hbase-env.sh) => {"failed": true, "item": "hbase-env.sh", "parsed": false}

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
cnode463                   : ok=1    changed=0    unreachable=0    failed=1   

I debug the ansible code and add below code to print the result of running runner._low_level_exec_command

        print "****"
        print "cmd "+str(cmd)
        print "out "+str(out)
        print "err "+str(err)
        print "____"

And last I found that _low_level_exec_command may not get the output of the cmd correctly.

the debug log is below:

****
cmd mkdir -p $HOME/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534 && echo $HOME/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534
out /home/hadoop/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534

err
____
****
cmd rc=0; [ -r "/etc/hadoop/conf/yarn-site.private.xml" ] || rc=2; [ -f "/etc/hadoop/conf/yarn-site.private.xml" ] || rc=1; [ -d "/etc/hadoop/conf/yarn-site.private.xml" ] && echo 3 && exit 0; (/usr/bin/md5sum /etc/hadoop/conf/yarn-site.private.xml ) || (/sbin/md5sum -q /etc/hadoop/conf/yarn-site.private.xml ) || (/usr/bin/digest -a md5 /etc/hadoop/conf/yarn-site.private.xml ) || (/sbin/md5 -q /etc/hadoop/conf/yarn-site.private.xml ) || (/usr/bin/md5 -n /etc/hadoop/conf/yarn-site.private.xml ) || (/bin/md5 -q /etc/hadoop/conf/yarn-site.private.xml ) || (/usr/bin/csum -h MD5 /etc/hadoop/conf/yarn-site.private.xml ) || (/bin/csum -h MD5 /etc/hadoop/conf/yarn-site.private.xml ) || (echo "${rc}  /etc/hadoop/conf/yarn-site.private.xml")
out
err
____
****
cmd /usr/bin/python /home/hadoop/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534/copy; rm -rf /home/hadoop/.ansible/tmp/ansible-tmp-1396508595.41-255928955172534/ >/dev/null 2>&1
out
err
____
failed: [cnode463] => (item=yarn-site.private.xml) => {"failed": true, "item": "yarn-site.private.xml", "parsed": false}

This problem appear more and more frequently.Is it possible to fix it ?

Haven’t seen this.

If you can set up a minimal example that can reproduce this one and file a ticket we can help take a look.

I thought it is hard to set up the example. Because this problem only appear in one of our product environment.We never find this problem in our test environment.
We use ansible to monitor the machine’s port ,so ansible-playbook may run multiple at the same time .
Is it any params that we should notice when running multiple ansible-playbook at the same time???

在 2014年4月4日星期五UTC+8上午6时12分38秒,Michael DeHaan写道:

Can’t say for sure, but if you can get it to occur -vvvv output MIGHT be interesting.

You might also have a .bash_profile type script or MOTD outputting something that looks like JSON and has confused the parser - MOTDs normally don’t show up the way we invoke SSH but they did with dropbear (which you very very likely aren’t using).

If it’s reproducible consistently on the one production machine it should be possible to debug things (though perhaps would require some modification of Ansible).