Copy module optimization

I note that the copy module copies the file in all cases, then replaces it, and notifies, only if it’s changed. This moves a lot of bits that don’t need to be moved. I saw a few posts about this, but only a few.

I was thinking of an optimization where copy would arrive on the new system with the pathname, ownership, mode and md5 of the file, as well as a URL of where to get the file if what it finds does not match.

Does anybody think this is a good idea?

Other questions:

  • what sort of transport should be used when the client pulls the file? http? https? git? Others?
  • how do we keep unauthorized systems out of the repo?

Discussion?

Thanks,

Ed Greenberg

I was thinking of an optimization where copy would arrive on the new system with the pathname, ownership, mode and md5 of the file, as well as a URL of where to get the file if what it finds does not match.

URL and such is not necessary. As previously discussed, what needs to be done is the copy code needs to take a remote md5 before deciding to copy, and the actual copy module code can go away.

This will need to wait for 0.6 as it is past the feature cutoff for 0.5, but is still pretty simple.

Does anybody think this is a good idea?

Other questions:

- what sort of transport should be used when the client pulls the file? http? https? git? Others?

We should do nothing other than ssh initially, which is just a change to the execute_copy_module code followed up by a truncation of the library/copy module.

(There is already a git module if people want to do git checkouts)

The fileserver idea has been discussed but the remote md5 optimization needs to happen first.

--Michael

Hi Michael,

Thanks for the quick response. I did some searching for a discussion of this but did not find much at all. I searched for “copy module.”

Best,
Ed