remote-copy module?

Is there an existing module that works similar to copy but allows
one to use http resources instead of local files? Something like

  $ ansible all -m remote-copy -a 'src=http://internal.host/status-dashboard.txt dest=/etc/motd'

There is the git module, which does get checkouts.

sftp, ftp, and http:// all in one would be slick!

sftp, ftp, and http:// all in one would be slick!

I can give http:// a shot with httplib [1]... Shall I try? That should
be available in most installations, I think.

        -JP

[1] http://docs.python.org/library/httplib.html

Sounds great!

I looked at the git module for this, but my use case is more
expansive: I am investigating using ansible to replace our
masterless puppet infrastructure, which makes extensive use of file
resources, but we are limited (almost hamstrung) by puppet's
requirement that file sources be on the local file system. We have
about 1G worth of files, mostly binaries, almost all tiny, some
which churn quite a bit, and most of which come from third party
vendors, so using a VCS isn't a good match for this. What I would
need is the ability to put our files behind a fast but dumb
load-balanced httpd instance.

My main concern about a module of this type are performace- and
efficient-related: every invocation would require either pulling
down the file to compare with the local copy, or that the remote
server be able to transmit the checksum to the client (which makes
the server less dumb). It would be nice to support all the options
that copy supports (like first_available_file) but that might not be
reasonable.

If other people see value in a module like this, I can attempt to
create one, although I would have to strongly resist the temptation
to make it simply do os.system("wget") ...

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 08:30]:

Another point to keep in mind here is that what sort of files would you allow to be copied? Are you going to be filtering based on the Content-Type http header?

For example, will the module just get the response of the HTTP call and if it is 200 OK, will it simply create a file on the remote destination with the contents as that of the resource? Or, are you going to allow only certain file types?

Hi,

My main concern about a module of this type are performace- and
efficient-related: every invocation would require either pulling
down the file to compare with the local copy, or that the remote
server be able to transmit the checksum to the client (which makes
the server less dumb). It would be nice to support all the options
that copy supports (like first_available_file) but that might not be
reasonable.

HTTP has Etags for just this. When you GET it, the server returns the
Etag of the resource. If you add the if-no-match header with the
previously sent Etag, the server will respond with a 304.

Maybe the module should store the Etag of downloaded resources in
/var/lib/ansible?

Greetings,

Jeroen

Why not just write a rsync module and use it with ansible to sync directories? You can have one source of truth server that has all the files in one place (of course with proper backup and RAID). You can then write a module that runs a rsync command locally on that host.

Basically you can have an rsync module that you can probable use as follows

My gut feeling on this is that for 200s the content gets written,
for 304s the content is unchanged, 30x redirects are followed based
on a parameter (follow=true/false), and everything else is treated
as an error (in the same way that copy chokes if the src= is
missing). But there are a huge number of potential parameters for a
module like this: ssl/cert handling, retries, timeouts, following
redirects to untrusted domains, content-type filtering, etc.

* nix85 <firestarter.985 at gmail.com> [2012/07/19 06:03]:

* Jeroen Hoekx <jeroen.hoekx at hamok.be> [2012/07/19 15:03]:

I almost think that since there are so many possible ways to do this the shell module calling curl with the creates= parameter might be the way to go.

Though you don't know if the file is too old or not, using wget/curl to simulate yum seems to be a bad course to take, and we also have the git module.

What use case am I missing?

Yes, the shell module + creates= is definitely a possibility for
this, and I didn't think of it when I was formulating my use case
and initial question. The only thing it misses is the ability to do
things like first_available_file, but that's not a deal-breaker for
me.

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 09:16]:

first_available_file looks at the local filesystem anyway, so it works exactly the same for all modules. It would not be able to talk to a remote resource.

I would think that rsync would be a lot more efficient in achieving what you need. Use rsync with the shell module or write your own module to sync files across.

I've dealt with ETags and If-Modified headers quite a bit in a past life. I've found that most web servers provide one and usually both.

It's always good to be defensive, but this may be a bit premature to worry about. Hit some of the servers you'll be working with and check the headings to see if those tags are present. I'm guessing they will be.

I'd also suggest looking at compressed (gzip) content. With a bit of configuration Apache (I believe any modern webserver really) can be configured to transparently gzip files and send those to clients that say they will accept them.

http://www.diveintopython.net/http_web_services/gzip_compression.html
http://www.diveintopython.net/http_web_services/http_features.html#d0e27724

<tim/>

Funny you mention it Darren. I was just thinking about this last night.

My reasoning is different than Darren's though. I'm writing out a whole lot of interconnected configuration files that sit in various subdirectories and want to copy a (potential) batch of them to remote servers.

The copy module or even this hypothetical remote copy module will work great with a file or directory or a static list thereof. That's not what I'm dealing with though.

These generated files don't make sense in version control because they can be reproduced easily.

In the past I ran a rsync command and let it do it's magic creating paths & files, using compression and selective file transfers.

While I could use the command module for rsync, I like having a module that can apply some smart defaults and return better information.

<tim/>

So it sounds to me like you have a master source for all these files.

IF so - why not setup all your clients like read-only backup clients.
then, with ansible provision them with a set of files that they actually need/care about.

And restore them into place.

You could do ALL of that as a module, if you wanted or using just the command module.

-sv

Also - this
http://fedorapeople.org/cgit/skvidal/public_git/scripts.git/tree/copy_if_changed.py

could be adapted pretty easily.
it uses urlgrabber but swapping that out for urllib[2] wouldn't be terrible.

-sv

Sounds great!

OK, I dared. Pull request [1] is en-route to you.

This was exciting ... :wink:
Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!

        -JP

[1] https://github.com/ansible/ansible/pull/634

> Sounds great!

OK, I dared. Pull request [1] is en-route to you.

This was exciting ... :wink:
Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!

Glad to here it and very cool.

(I made a few comments about tweaks, nothing major as this looked good)

I want to get a little bit more feedback about the module import thing before we commit to it as an API, but if so, you can just resubmit it in a bit.

I want to avoid having to change all the modules if we change the API signature. If I don't hear anything today, we will consider it final, and can add functions, but will not take any away.