ansible find.py seems to fail on NFS drives with certain mount options

Hi,

I opened an issue some time ago which was dismissed as a user issue and was directed here instead:
https://github.com/ansible/ansible/issues/53959

I’ve discovered more information since reporting that issue. Similar issues have also been observed on Mac as well as Windows, which makes me suspect NFS mount options, since the filesystem permissions in all other cases are the same between working accounts and non-working accounts.

If we look at specific logs, it appears that once again find.py is implicated. I’m not sure what find.py is doing that is failing, but maybe someone here can help?

`

our deploy script works for some users and fails for others. When it fails, it looks like this:

user2 ~/ansible-rubyvm> ansible-playbook deploy.yml --ask-become-pass -vvv
ansible-playbook 2.8.3
config file = /network/home/user2/ansible-rubyvm/ansible.cfg
configured module search path = [u’/home/user2/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’]
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
Using /network/home/user2/ansible-rubyvm/ansible.cfg as config file
BECOME password:

host_list declined parsing /network/home/user2/ansible-rubyvm/hosts as it did not pass it’s verify_file() method
script declined parsing /network/home/user2/ansible-rubyvm/hosts as it did not pass it’s verify_file() method
auto declined parsing /network/home/user2/ansible-rubyvm/hosts as it did not pass it’s verify_file() method

TASK [deploy : find deployments] *****************************************************************************************************************************************************************************************************************************************************************************
task path: /network/home/user2/ansible-rubyvm/roles/deploy/tasks/main.yml:2
ESTABLISH LOCAL CONNECTION FOR USER: user2
EXEC /bin/sh -c ‘( umask 77 && mkdir -p “echo /tmp/${USER}/ansible/ansible-tmp-1565734953.55-54346346260526” && echo ansible-tmp-1565734953.55-54346346260526=“echo /tmp/${USER}/ansible/ansible-tmp-1565734953.55-54346346260526” ) && sleep 0’
Using module file /usr/lib/python2.7/dist-packages/ansible/modules/files/find.py
PUT /home/user2/.ansible/tmp/ansible-local-1636121zxeQM/tmpd33rm8 TO /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py
EXEC /bin/sh -c ‘chmod u+x /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/ /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py && sleep 0’
EXEC /bin/sh -c ‘sudo -H -S -n -u root /bin/sh -c ‘"’“‘echo BECOME-SUCCESS-nydnzrdhipfrrspdeywfggmpcnzadjmp ; /usr/bin/python /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/AnsiballZ_find.py’”’"’ && sleep 0’
EXEC /bin/sh -c ‘rm -f -r /tmp/user2/ansible/ansible-tmp-1565734953.55-54346346260526/ > /dev/null 2>&1 && sleep 0’
ok: [localhost] => {
“changed”: false,
“examined”: 0,
“files”: ,
“invocation”: {
“module_args”: {
“age”: null,
“age_stamp”: “mtime”,
“contains”: null,
“depth”: null,
“excludes”: [
“sample_deployment.json”
],
“file_type”: “file”,
“follow”: false,
“get_checksum”: false,
“hidden”: false,
“paths”: [
“deployments/”
],
“patterns”: [
“*.json”
],
“recurse”: false,
“size”: null,
“use_regex”: false
}
},
“matched”: 0,
“msg”: “deployments/ was skipped as it does not seem to be a valid directory or it cannot be accessed\n”
}

script that is failing - deploy/main.yml:

  • name: find deployments
    find:
    paths: deployments/
    patterns: “*.json”
    excludes: “sample_deployment.json”
    register: files_matched

directory permissions from within the VM:

user2 ~/ansible-rubyvm> ls -la
total 37
drwxr-xr-x 8 user2 users 20 Jun 6 14:29 ./
drwxr-xr-x 15 user2 users 18 Aug 1 15:22 …/
drwxr-xr-x 8 user2 users 13 Aug 14 14:36 .git/
-rw-r–r-- 1 user2 users 113 Jun 6 14:14 .gitignore
-rw-r–r-- 1 user2 users 0 Jun 6 14:14 .placeholder
-rw-r–r-- 1 user2 users 8946 Jun 6 14:14 README.md
-rw-r–r-- 1 user2 users 639 Jun 6 14:29 aliases.sh
-rw-r–r-- 1 user2 users 67 Jun 6 14:14 ansible.cfg
-rw-r–r-- 1 user2 users 88 Jun 6 14:14 deploy.yml
drwxr-xr-x 2 user2 users 3 Jun 6 14:14 deployments/
drwxr-xr-x 2 user2 users 6 Jun 6 14:14 doc/
drwxr-xr-x 2 user2 users 3 Jun 6 14:21 files/
-rw-r–r-- 1 user2 users 151 Jun 6 14:14 goodies.yml
drwxr-xr-x 2 user2 users 3 Jun 6 14:14 group_vars/
-rw-r–r-- 1 user2 users 47 Jun 6 14:14 hosts
-rwxr-xr-x 1 user2 users 114 Jun 6 14:14 install.sh*
-rw-r–r-- 1 user2 users 57 Jun 6 14:14 requirements.yml
drwxr-xr-x 11 user2 users 11 Jun 6 14:14 roles/
-rw-r–r-- 1 user2 users 342 Jun 6 14:14 rubyvm.yml
-rwxr-xr-x 1 user2 users 479 Jun 6 14:14 setup.sh*

user2 ~/ansible-rubyvm> ls -la deployments
total 5
drwxr-xr-x 2 user2 users 4 Aug 14 14:41 .
drwxr-xr-x 8 user2 users 20 Jun 6 14:29 …
-rw-r–r-- 1 user2 users 0 Aug 14 14:41 app1.json
-rw-r–r-- 1 user2 users 212 Jun 6 14:14 sample_deployment.json
`

If I do a “mount | grep home” to find the current mount options I find the following differences between the account that works and the one that doesn’t:

`
working-server-nfs:/vmgr/home05/user1 on /network/home/user1
(nfs,
nodev,
automounted,
nobrowse)

broken-server-nfs:/vmgr/home06/user2 on /network/home/user2 type nfs
(rw,
relatime,
vers=3,
rsize=1048576,
wsize=1048576,
namlen=255,
hard,
noacl,
noresvport,
proto=tcp,
timeo=600,
retrans=2,
sec=sys,
mountaddr=x.x.x.x,
mountvers=3,
mountport=yyyyy,
mountproto=udp,
local_lock=none,
addr=x.x.x.x)

`

The only thing that stands out to me is perhaps the “noacl” option which disables use of a possible NFSACL sideband protocol (if available). In general, the one that doesn’t work seems to have a lot of “performance optimizations” (such as local_lock=none). It’s also possible that POSIX assumptions in python and/or find.py rely on certain features that these options remove.

Is there anything obvious here?

Thanks!

Correction, it turns out I was comparing mount options within the debian VM to those on the Mac. When comparing mount options between both environments, they are identical, except that one works and the other doesn’t.

Trying to gather more information because this doesn’t make any sense.

Sorry!

Ok, I just ran the deploy script on my machine (which works, with exactly the same files). Here are the logs from the working machine:

$ ansible-playbook deploy.yml --ask-become-pass -vvv ansible-playbook 2.8.0 config file = /network/home/user1/ansible-rubyvm/ansible.cfg configured module search path = [u'/home/user1/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/dist-packages/ansible executable location = /usr/bin/ansible-playbook python version = 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516] Using /network/home/user1/ansible-rubyvm/ansible.cfg as config file BECOME password: host_list declined parsing /network/home/user1/ansible-rubyvm/hosts as it did not pass it's verify_file() method script declined parsing /network/home/user1/ansible-rubyvm/hosts as it did not pass it's verify_file() method auto declined parsing /network/home/user1/ansible-rubyvm/hosts as it did not pass it's verify_file() method Parsed /network/home/user1/ansible-rubyvm/hosts inventory source with ini plugin ... TASK [deploy : find deployments] **************************************************************************************************************************************************************************************************************************** task path: /network/home/user1/ansible-rubyvm/roles/deploy/tasks/main.yml:2 <localhost> ESTABLISH LOCAL CONNECTION FOR USER: user1 <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p " echo /tmp/${USER}/ansible/ansible-tmp-1565811528.74-58741451285526 " && echo ansible-tmp-1565811528.74-58741451285526=" echo /tmp/${USER}/ansible/ansible-tmp-1565811528.74-58741451285526 `" ) && sleep 0’
Using module file /usr/lib/python2.7/dist-packages/ansible/modules/files/find.py
PUT /home/user1/.ansible/tmp/ansible-local-9685YDWEj4/tmpUMTB0r TO /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py
EXEC /bin/sh -c ‘chmod u+x /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/ /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py && sleep 0’
EXEC /bin/sh -c ‘sudo -H -S -p “[sudo via ansible, key=fnjawawcivjycahoaasqjgttvmdjzobb] password:” -u root /bin/sh -c ‘"’“‘echo BECOME-SUCCESS-fnjawawcivjycahoaasqjgttvmdjzobb ; /usr/bin/python /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/AnsiballZ_find.py’”’"’ && sleep 0’
EXEC /bin/sh -c ‘rm -f -r /tmp/user1/ansible/ansible-tmp-1565811528.74-58741451285526/ > /dev/null 2>&1 && sleep 0’
ok: [localhost] => {
“changed”: false,
“examined”: 2,
“files”: [
{
“atime”: 1565811338.5080934,
“ctime”: 1565811463.5763762,
“dev”: 54,
“gid”: 101,
“gr_name”: “users”,
“inode”: 88137,
“isblk”: false,
“ischr”: false,
“isdir”: false,
“isfifo”: false,
“isgid”: false,
“islnk”: false,
“isreg”: true,
“issock”: false,
“isuid”: false,
“mode”: “0644”,
“mtime”: 1565811437.9758408,
“nlink”: 1,
“path”: “deployments/app1.json”,
“pw_name”: “user1”,
“rgrp”: true,
“roth”: true,
“rusr”: true,
“size”: 281,
“uid”: 5954,
“wgrp”: false,
“woth”: false,
“wusr”: true,
“xgrp”: false,
“xoth”: false,
“xusr”: false
}
],
“invocation”: {
“module_args”: {
“age”: null,
“age_stamp”: “mtime”,
“contains”: null,
“depth”: null,
“excludes”: [
“sample_deployment.json”
],
“file_type”: “file”,
“follow”: false,
“get_checksum”: false,
“hidden”: false,
“paths”: [
“deployments/”
],
“patterns”: [
“*.json”
],
“recurse”: false,
“size”: null,
“use_regex”: false
}
},
“matched”: 1,
“msg”: “”
}

`

Maybe this explains why one works and the other fails, but I need help understanding where to look?

Thanks!

It can be a few possibilities, since you are using relative path it might not search in correct paths.
The easiest way to see what happens might be strace on the failing machine

   strace -f ansible-playbook deploy.yml --ask-become-pass 2>&1 | grep deployments

This will show every system call against deployments

The find module stops after it test deployments/ is a directory with os.path.isdir, you can test it with this code and is should print True, since you are using become you should run this with sudo.

   sudo -H python -c 'import os; print os.path.isdir("deployments/")'

Thanks Kai, I tried and narrowed down the following:

`

strace on isdir

$ diff nfs_{bad,good}.log
1,2c1,2
< user2 ~> sudo -H strace -f python -c ‘import os; print os.path.isdir(“nfsdir”)’
< execve(“/usr/bin/python”, [“python”, “-c”, “import os; print os.path.isdir("”…], [/* 17 vars */]) = 0

The other user’s home directory didn’t have execute privileges.

Thanks for your help and patience!