unarchive always unzips

Tuomas-Matti_Soikkel · January 27, 2016, 11:18pm

Is there any reasoning behind the fact that unarchive module using zip never checks the destination if the files already exists. This is quite inconvenient when extracting very large files.

I understand that tar has --diff argument and it makes it easy to use, but It could be fairly trivial to implement this with python zipfile. It is already in in use in unarchive.py:

https://github.com/ansible/ansible-modules-core/blob/devel/files/unarchive.py

We could use ZipFile.infolist() to get filenames and file sizes and check them against destination. There’s already a method which decides that zip files are always not unarchived:

def is_unarchived(self, mode, owner, group): return dict(unarchived=False)

I could definitely contribute if this sounds plausible.

t-m

Dick_Davies · January 28, 2016, 3:34pm

Try adding a creates: clause on the unarchive task, that might do what you want.

Tuomas-Matti_Soikkel · January 28, 2016, 8:14pm

Thanks for the answer.

I tried that but this just doesn’t overwrite the existing files. It takes the same amount of time than just overwriting.

It’s not that usable when extracting multiple GB’s of binaries.

Tar however skips the whole thing with --diff arg

Toshio_Kuratomi · January 28, 2016, 10:02pm

I believe it is just that no one has implemented that yet however I
vaguely recall looking into it once and finding that it would be
harder to implement than I had originally thought. If you take a look
at what tar --compare checks:
https://www.gnu.org/software/tar/manual/tar.html#SEC66 it seems to be
file size, mode, owner, modification date and contents. IIRC, I think
checking some of those with zip wasn't as easy as looking into data
structures that were available from the ZipFile API (some may have
been extensions to the zip standard which would need a fallback and
others may not have existed at all... I can't recall). It would
certainly be nice if we could do this but if zip files don't have
enough information inside to do that it might be easier to figure out
why creates= is too slow for you and fixing that.

-Toshio

Dick_Davies · January 29, 2016, 8:45am

Ah, checking the creates clause _after_ extracting the file seems like a bug.

That said, mine tend to be around the 50Mb mark - but are you unarchiving
from a local file on the node, or directly off the network / control machine?

Even with files that small, up/downloading each time was pretty slow.
Instead we do
something like

- name: download kafka {{ kafka_version }}
get_url: url={{ kafka_tarball_url }}
dest=/root/{{ kafka_version }}.tgz

- name: extract tarball to {{ kafka_dir }}
unarchive: src=/root/{{ kafka_version }}.tgz dest=/opt/
copy=no creates={{ kafka_dir }}

Tuomas-Matti_Soikkel · February 2, 2016, 11:41am

I believe it’s because zip doesn’t support it like tar does. So I suppose there is no fix for that other than manually comparing the contents with python ZipFile api. Unfortunately zip lacks most of the unixy features which tar provides like user/group modes, modification date and so on.

So the final question is do we choose convenience over control? Just check file sizes from headers and don’t extract if matches. Is this reasonable or unacceptable?

Toshio_Kuratomi · February 3, 2016, 12:28am

Just those two would be too big a change. We’d need to support at least contents as well. If zip supports permissions as an extension we probably should support that when present as well.

-Toshio

Dick_Davies · February 4, 2016, 8:23am

I must've misunderstood how 'creates=' is implemented, sorry.

it makes no sense to me that the type of archive would make any difference;
surely if a 'guard' is there (that's just checking if a file path
exists) that should
prevent the task from running regardless of what the task is?

Can anyone with a better understanding explain this?

Dag_Wieers · March 30, 2016, 12:13pm

I implemented exactly what youneeded.

Could you test the following pull-request and provide feedback ?

https://github.com/ansible/ansible-modules-core/pull/3307

Next I would like to make unarchive idempotent completely, using native zipfile and tarfile modules. A lot of the code to make zip support idempotent can simply be reused, however it also means we have to implement whatever functionality tar has ourselves (with the added benefit, that this functionality will work identically for all other archive formats).

So first things first, improve the zip support as-is.

More information in the PR description linked above.

Topic		Replies	Views
unarchive module: creates does not work Ansible Project	9	27	December 18, 2014
Unarchive and move Ansible Project	3	23	January 29, 2015
MODULE FAILURE - Error in unarchive module (don't unarchive if file already exists) Ansible Project	0	12	June 17, 2016
unarchive is not skipping if the folder already exists Ansible Project	4	112	November 28, 2018
Unarchive Ansible Project	0	0	February 2, 2016

unarchive always unzips

Related topics