unarchive is slow to decompress very big tar.gz file

I have a below playbook which I am trying to run but it takes forever. Size of “test.tar.gz” is 100GB.


  • hosts: TEST_BOX
    serial: 1
    tasks:
  • name: copy and untar latest tar.gz file
    unarchive: src=test.tar.gz dest=/data/tasks/files/

I am using ansible 2.4.3.0. Any thoughts how can I make it faster?

Does anyone have any thoughts on this?

Does anyone have any thoughts on this?

Maybe.

>
> I have a below playbook which I am trying to run but it takes forever.
> Size of "test.tar.gz" is 100GB.

What is taking forever, the copying or the untar-ing?

> ---
> - hosts: TEST_BOX
> serial: 1
> tasks:
> - name: copy and untar latest tar.gz file
> unarchive: src=test.tar.gz dest=/data/tasks/files/
>
>
> I am using ansible 2.4.3.0. Any thoughts how can I make it faster?

It depends on where the bottle neck is.
Your are copying from your client, if that has a slower connection you could put the file on another machine/server and get the file from there instead.

Is there any way to avoid copying? Can we not just untar over the wire if there is any way?

Not with unarchive.

If you put the file on a web server you could do this

- shell: curl -s https://some.url/test.tar.gz | tar xzC /dest

But this solution is not idempotent

Ok got it. But if I try this way by using shell module:


  • hosts: TEST_BOX
    serial: 1
    tasks:
  • name: copy and untar latest deals.tar.gz file
    shell: “cd /data/tasks/files/; tar -xvzf test.tar.gz”

It doesn’t work and I get an error like this: Am I doing anything wrong with my above tasks?

You need to transfer the file to the host first.
So that's why I in my example used curl, it will get the file and then send them in memory to tar that untar the files and save them to disk.

It have the same problem and I noticed that for my .tar.bz2 archove the next command is called
/bin/gtar --list -C /data/sync --show-transformed-names --use-compress-program=pbzip2 --exclude=data/tmp -f /tmp/data-backup.tar.bz2

and it indeed takes twice more time presumably because of --list option.

The playbook part is

`

  • name: Unpack data dump
    unarchive:
    remote_src: yes
    src: /tmp/data-backup.tar.bz2
    dest: “{{ destination }}/sync”
    extra_opts: [“–use-compress-program=pbzip2”,“–exclude=data/tmp”]
    tags:
  • restore
  • data-restore
    `

and having --list in the comman line does not make any sense, since I didn’t call it!

And the next command it calls is!

/bin/gtar --diff -C /data/sync --show-transformed-names --use-compress-program=pbzip2 --exclude=data/tmp -f /tmp/data-backup.tar.bz2