Spaces in URL and file name with get_url and unarchive

Hi

This URL is for the PyDev plugin for Eclipse:

http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/PyDev 4.4.0.zip

Note that there is a space in the URL and in the file name itself.

Those spaces mean that the get_url plugin, at least on CentOS 7, does not work for me.

Note that using a shell command with wget does work, with the same variables, that make up the URL and file name, I am using with the get_url plugin.

Further, the space in the file name means that the unarchive plugin does not work, at least on CentOS 7.

This latter issue has started occurring in the last week or so.

Note that using a command with unzip does not work either

Has anyone else see anything like this?

Many thanks

Nathan

Could you give some more information? Which version of ansible? What error message? The talk from your playbook that’s failing?

Thanks,
Toshio

Hi Toshio

Apologies for the lack of detail, and thanks for following up. The Ansible version is 1.9.4.

The playbook in question is here:

https://github.com/DevOps4Networks/ansible-eclipse

See this section at line 207 of tasks/eclipse.yml:

  • name: eclipse | Get eclipse Pydev plugin bundle

#TODO Test get_url again and raise a bug about spaces in the URL

#shell: mkdir -p /tmp/pydev; cd /tmp/pydev; wget “{{ pydev_url }}/{{ pydev_bundle }}” -O pydev.zip

get_url: url=“{{ pydev_url }}/{{ pydev_bundle }}” dest=/tmp/pydev

when: eclipse_plugins_pydev_enabled and eclipse_plugins_enabled

What I see with the get_url as shown above is :

failed: [localhost] => {“dest”: “/tmp/pydev”, “failed”: true, “response”: “HTTP Error 404: Not Found”, “state”: “absent”, “status_code”: 404, “url”: “http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/PyDev 4.4.0.zip”}

When I comment out the get_url, and instead use the shell command, I get:

virtualbox-iso: REMOTE_MODULE command mkdir -p /tmp/pydev; cd /tmp/pydev; wget “http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/PyDev 4.4.0.zip” -O pydev.zip #USE_SHELL
virtualbox-iso: EXEC [‘/bin/sh’, ‘-c’, ‘mkdir -p $HOME/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996 && echo $HOME/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996’]
virtualbox-iso: PUT /tmp/tmpVCp9bn TO /home/vagrant/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996/command
virtualbox-iso: EXEC /bin/sh -c ‘sudo -k && sudo -H -S -p “[sudo via ansible, key=tkmorsunmtemcrgbveavrpuzchnbshbv] password: " -u root /bin/sh -c '”’“‘echo BECOME-SUCCESS-tkmorsunmtemcrgbveavrpuzchnbshbv; LANG=C LC_CTYPE=C /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996/command; rm -rf /home/vagrant/.ansible/tmp/ansible-tmp-1447762454.08-148319008390996/ >/dev/null 2>&1’”‘"’’
virtualbox-iso: changed: [localhost] => {“changed”: true, “cmd”: “mkdir -p /tmp/pydev; cd /tmp/pydev; wget "http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/PyDev 4.4.0.zip" -O pydev.zip”, “delta”: “0:00:04.525719”, “end”: “2015-11-17 12:14:18.683421”, “rc”: 0, “start”: “2015-11-17 12:14:14.157702”, “stderr”: "–2015-11-17 12:14:14-- http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev%204.4.0/PyDev%204.4.0.zip\nResolving freefr.dl.sourceforge.net (freefr.dl.sourceforge.net)… 88.191.250.136, 2a01:e0d:1:8:58bf:fa88:0:1\nConnecting to freefr.dl.sourceforge.net (freefr.dl.sourceforge.net)|88.191.250.136|:80… connected.\nHTTP request sent, awaiting response… 200 OK\nLength: 20099782 (19M) [application/octet-stream]\nSaving to: ‘pydev.zip’\n\n 0K … … … … … 0% 268K 73s\n 50K … … … … … 0% 669K 51s\n 100K … … … … … 0% 12.7M 34s\n 150K … … … … … 1% 1.36M 29s\n 200K … … … … …

Which is to day that it works.

I tested the unarchive issue again with a space in the file name, and I can’t reproduce the issue for now. I’ll look out for it though.

Regards

Nathan

tldr; url escape your url string, replace the spaces with %20, and it
will work. You can use the urlencode jinja2 filter if you like but it
will change how you store the elements of your url slightly:

- vars:
    pydev_url_scheme: http
    pydev_url: freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/
    pydev_bundle: PyDev 4.4.0.zip
  tasks:
    - get_url: url="{{pydev_url_scheme}}://{{ pydev_url

urlencode()}}/{{ pydev_bundle|urlencode() }}" dest=/tmp/pydev

In depth:

I can duplicate and the problem would be solved by doing a
urllib.quote of the path element of the url string(which replaces
spaces with "%20") .... Not sure whether we should do that or not. It
would be a change in behaviour that breaks things for people who
already escape their url. In my experience, tools commonly accept
spaces and escape if they are not already escaped. Libraries commonly
force the user to escape themselves.

Experimenting shows that curl does not automatically escape spaces
(although its interaction with this server is somewhat surprising...
it's returning a 302 to a page saying the mirror failed rather than a
404). Manually changing the spaces to %20 is required.

Looks like wget attempts to be smart. I haven't worked out the exact
algorithm it's using but it seems to automatically escape spaces. It
escaped a percent not followed by a number. But it did not escape a
percent followed by a number. So with wget, both
"http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev 4.4.0/"
and "http://freefr.dl.sourceforge.net/project/pydev/pydev/PyDev%204.4.0/"
work but if you have a file named ugly%42file.html, the url you'd have
to create for that would have to manually escape the percent sign like
this: ugly%2542file.html.

So for ansible... we could take a few different paths i think:
* Do nothing -- we match curl's behaviour here. People can just use %
escaping for the urls they specify and the jinja iflter helps with
that.
* Do something like wget and escape spaces and certain uses of
percents. This will break backwards compat but probably for only a
few people.
* Add a parameter to get_url to let the user specify whether we should
escape or not defaulting to no escape. this is fully backwards
compatible and lets the user choose whether to escape or not.

Willing to hear some discussion and a pull request to implement a
change based on that.

-Toshio

Toshio

This is very helpful, thank you very much.

My vote is for is an option that turns it on explicitly.

My reasoning is as follows:

- That would obviate the need to change variable values explicitly, which may anyway not be possible if the variable value is external and not under the control or definition of the playbook, e.g. an environment variable or a variable generated by another system at runtime.
- It would support “intelligent” behaviour, and it is reasonable to expect tools such as Ansible to be intelligent and do such things for you.
- I find the current behaviour and the lack of an option surprising, obviously, so there is an element of failing to pass the test of “least surprise”.

Regards

Nathan