Downloading files with a specific regular expression from remote https host to target server local path.

Having some difficulty finding what is the Ansible code to do this:

Downloading files with a specific regular expression from remote http / https host to a target server local path.

So would like to wget http://remote.com/rpm/app*.rpm to a set of target servers. get_url doesn’t appear to support patterns as per Ansible Documentation. On the other hand, with_fileglob supports regular expressions and patterns but doesn’t work with HTTP.

Looking for suggestions.

Cheers,

TK

It's fundamentally impossible to do what you want, unless the remote
server offers some sort of file system equivalent, like a directory
index.

Dick

Thanks Dick!

I’ve started to get that impression after searching for quite some time.

Currently using a shell command like this to get only specific files down:

wget -r -nd --no-parent -A ‘patternhttp://site.com/path/to/file/

Hence why I was thinking it might be possible in Ansible.

Cheers,
TK

The recursive (-r) option of wget only downloads files that are ‘visible’.
This works fine for stuff like a web page with indexed directory listings etc.
But anything that is not listed won’t be magically retrieved.
If a site does not contain any links to content that is actually there, wget will not know about it and hence won’t download it.

If wget works, then http://remote.com/rpm must have links to all the files.
So you best bet is to use the ansible command module with said wget options - provided you want to use ansible.

Having said that, maybe you can elaborate on what the underlying task at hand is, and/or share the real/actual URLs etc.
It might be possible to achieve the same thing in a different way.

Dick

Let’s assume a real site:

http://mirror.centos.org/centos/7/os/x86_64/Packages/

wget -r -nd --no-parent -A '*glib*' [http://mirror.centos.org/centos/7/os/x86_64/Packages/](http://mirror.centos.org/centos/7/os/x86_64/Packages/)

Grabs all the packages that have glib in the name. Now you’re also right. If it’s not listed it won’t be downloaded. I’m talking about the listed packages.

So I used the shell command to run the wget, but I want to see if there is a pure ansible way of doing this.

Example:

`

wget -r -nd --no-parent -A ‘glibhttp://mirror.centos.org/centos/7/os/x86_64/Packages/

ls -altri|grep -v glib

total 72652
201326721 dr-xr-x—. 9 root root 4096 Mar 27 15:04 …
135239432 drwxr-xr-x. 2 root root 8192 Mar 27 15:07 .

well may you will need to read the output of curl and then grep over some html tags…

I do some like this :
curl grep -o -E ‘href=“([^”*]+)"’ | cut -d ‘"’ -f 2 | sort -n | sed -e ‘s////’ | tail -1

this wil give you el name of the last file in the list of href=“([^”*]+ in my case there are links…

if those files are tailor made for you then I would work with the team providing them to find an alternative way of doing this.

What if they created a json file containing all the file information and key that with the host info, after the files are generated?