Get_url with wildcards


I have problems downloading a file from a https repository.
What is the best way to download a file where I don’t know the full file name?

Using wildcards * or regex ([.*]) in the url of the get_url module does not work.
I can get the entire html of the url with the uri module, but then I would need a way to parse all that html to get a list of files.
And then I would need to build a logic that selects the correct file.

The repository does not have an API. It is just a httpd repository.

How do you approach such a situation?

This is what I tried:

    - name: get_url module
        url: "https://my-domain.tld/mydir/*.md5"
        dest: .
        validate_certs: false


1 Like

Unfortunately, the get_url module is not going to work this way. You will have to do the discovery of the full URLs yourself first, then loop over the discovered URLs.


That’s simply not how HTTP is designed to work, regardless of what method you’re using for retrieval. You have to know the full location of the resource you’re requesting.


I guess you are correct.

I used a workaround and parsed the html of the repository with curl and bash commands to get the file names. Maybe if the repository had an API there would be a more straightforward way.

as long as you’re aware of how incredibly brittle and likely to break, that can be … if the repo software changes the way it presents their HTML, you might be up all night trying to figure out why your playbooks broke.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.