remote-copy module?

Chamberlain_Darren · July 19, 2012, 12:27pm

Is there an existing module that works similar to copy but allows
one to use http resources instead of local files? Something like

$ ansible all -m remote-copy -a 'src=http://internal.host/status-dashboard.txt dest=/etc/motd'

Michael_DeHaan · July 19, 2012, 12:30pm

There is the git module, which does get checkouts.

sftp, ftp, and http:// all in one would be slick!

jpmens1 · July 19, 2012, 12:48pm

sftp, ftp, and http:// all in one would be slick!

I can give http:// a shot with httplib [1]... Shall I try? That should
be available in most installations, I think.

-JP

[1] http://docs.python.org/library/httplib.html

Michael_DeHaan · July 19, 2012, 12:50pm

Sounds great!

Chamberlain_Darren · July 19, 2012, 12:55pm

I looked at the git module for this, but my use case is more
expansive: I am investigating using ansible to replace our
masterless puppet infrastructure, which makes extensive use of file
resources, but we are limited (almost hamstrung) by puppet's
requirement that file sources be on the local file system. We have
about 1G worth of files, mostly binaries, almost all tiny, some
which churn quite a bit, and most of which come from third party
vendors, so using a VCS isn't a good match for this. What I would
need is the ability to put our files behind a fast but dumb
load-balanced httpd instance.

My main concern about a module of this type are performace- and
efficient-related: every invocation would require either pulling
down the file to compare with the local copy, or that the remote
server be able to transmit the checksum to the client (which makes
the server less dumb). It would be nice to support all the options
that copy supports (like first_available_file) but that might not be
reasonable.

If other people see value in a module like this, I can attempt to
create one, although I would have to strongly resist the temptation
to make it simply do os.system("wget") ...

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 08:30]:

nix85 · July 19, 2012, 1:03pm

Another point to keep in mind here is that what sort of files would you allow to be copied? Are you going to be filtering based on the Content-Type http header?

For example, will the module just get the response of the HTTP call and if it is 200 OK, will it simply create a file on the remote destination with the contents as that of the resource? Or, are you going to allow only certain file types?

Jeroen_Hoekx · July 19, 2012, 1:03pm

Hi,

My main concern about a module of this type are performace- and
efficient-related: every invocation would require either pulling
down the file to compare with the local copy, or that the remote
server be able to transmit the checksum to the client (which makes
the server less dumb). It would be nice to support all the options
that copy supports (like first_available_file) but that might not be
reasonable.

HTTP has Etags for just this. When you GET it, the server returns the
Etag of the resource. If you add the if-no-match header with the
previously sent Etag, the server will respond with a 304.

Maybe the module should store the Etag of downloaded resources in
/var/lib/ansible?

Greetings,

Jeroen

nix85 · July 19, 2012, 1:09pm

Why not just write a rsync module and use it with ansible to sync directories? You can have one source of truth server that has all the files in one place (of course with proper backup and RAID). You can then write a module that runs a rsync command locally on that host.

Basically you can have an rsync module that you can probable use as follows

Chamberlain_Darren · July 19, 2012, 1:11pm

My gut feeling on this is that for 200s the content gets written,
for 304s the content is unchanged, 30x redirects are followed based
on a parameter (follow=true/false), and everything else is treated
as an error (in the same way that copy chokes if the src= is
missing). But there are a huge number of potential parameters for a
module like this: ssl/cert handling, retries, timeouts, following
redirects to untrusted domains, content-type filtering, etc.

* nix85 <firestarter.985 at gmail.com> [2012/07/19 06:03]:

Chamberlain_Darren · July 19, 2012, 1:15pm

* Jeroen Hoekx <jeroen.hoekx at hamok.be> [2012/07/19 15:03]:

Michael_DeHaan · July 19, 2012, 1:16pm

I almost think that since there are so many possible ways to do this the shell module calling curl with the creates= parameter might be the way to go.

Though you don't know if the file is too old or not, using wget/curl to simulate yum seems to be a bad course to take, and we also have the git module.

What use case am I missing?

Chamberlain_Darren · July 19, 2012, 1:23pm

Yes, the shell module + creates= is definitely a possibility for
this, and I didn't think of it when I was formulating my use case
and initial question. The only thing it misses is the ability to do
things like first_available_file, but that's not a deal-breaker for
me.

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 09:16]:

Michael_DeHaan · July 19, 2012, 1:24pm

first_available_file looks at the local filesystem anyway, so it works exactly the same for all modules. It would not be able to talk to a remote resource.

nix85 · July 19, 2012, 2:04pm

I would think that rsync would be a lot more efficient in achieving what you need. Use rsync with the shell module or write your own module to sync files across.

Timothy_Appnel · July 19, 2012, 3:13pm

I've dealt with ETags and If-Modified headers quite a bit in a past life. I've found that most web servers provide one and usually both.

It's always good to be defensive, but this may be a bit premature to worry about. Hit some of the servers you'll be working with and check the headings to see if those tags are present. I'm guessing they will be.

I'd also suggest looking at compressed (gzip) content. With a bit of configuration Apache (I believe any modern webserver really) can be configured to transparently gzip files and send those to clients that say they will accept them.

http://www.diveintopython.net/http_web_services/gzip_compression.html
http://www.diveintopython.net/http_web_services/http_features.html#d0e27724

<tim/>

Timothy_Appnel · July 19, 2012, 3:36pm

Funny you mention it Darren. I was just thinking about this last night.

My reasoning is different than Darren's though. I'm writing out a whole lot of interconnected configuration files that sit in various subdirectories and want to copy a (potential) batch of them to remote servers.

The copy module or even this hypothetical remote copy module will work great with a file or directory or a static list thereof. That's not what I'm dealing with though.

These generated files don't make sense in version control because they can be reproduced easily.

In the past I ran a rsync command and let it do it's magic creating paths & files, using compression and selective file transfers.

While I could use the command module for rsync, I like having a module that can apply some smart defaults and return better information.

<tim/>

Seth_Vidal · July 19, 2012, 9:29pm

So it sounds to me like you have a master source for all these files.

IF so - why not setup all your clients like read-only backup clients.
then, with ansible provision them with a set of files that they actually need/care about.

And restore them into place.

You could do ALL of that as a module, if you wanted or using just the command module.

-sv

Seth_Vidal · July 19, 2012, 9:33pm

Also - this
http://fedorapeople.org/cgit/skvidal/public_git/scripts.git/tree/copy_if_changed.py

could be adapted pretty easily.
it uses urlgrabber but swapping that out for urllib[2] wouldn't be terrible.

-sv

jpmens1 · July 20, 2012, 11:03am

Sounds great!

OK, I dared. Pull request [1] is en-route to you.

This was exciting ...
Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!

-JP

[1] https://github.com/ansible/ansible/pull/634

Michael_DeHaan · July 20, 2012, 11:15am

> Sounds great!

OK, I dared. Pull request [1] is en-route to you.

This was exciting ...
Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!

Glad to here it and very cool.

(I made a few comments about tweaks, nothing major as this looked good)

I want to get a little bit more feedback about the module import thing before we commit to it as an API, but if so, you can just resubmit it in a bit.

I want to avoid having to change all the modules if we change the API signature. If I don't hear anything today, we will consider it final, and can add functions, but will not take any away.

Topic		Replies	Views
Extend copy module to copy a file from a an http, https or ftp location (like get_url) Ansible Project	4	5	August 6, 2013
Using Ansible copy module for remote to remote file transfer (same remote host) Ansible Project ansible-project	0	5	May 31, 2019
New module (draft) for copying files locally. Ansible Developer	8	6	August 16, 2015
By popular demand, remote md5 now occurs before copy transfer Ansible Project	0	1	July 7, 2012
Very Inefficient Remote Copy Ansible Project	14	11	May 8, 2012

remote-copy module?

Related topics