git sparse-checkout

Hi

Just wondering if anyone else is doing git sparse-checkouts in ansible playbooks ?

Maybe they have a better idea than me…

The way to do it is this:

mkdir && cd
git init
git remote add –f

Enable sparse-checkout in the repo:
git config core.sparsecheckout true

Configure sparse-checkout by listing your desired sub-trees in .git/info/sparse-checkout:

echo some/dir/ >> .git/info/sparse-checkout
echo another/sub/tree >> .git/info/sparse-checkout

Checkout from the remote:

git pull

If repo was already checked out before sparse checkout was done, do:

git read-tree -mu HEAD

The way I’m writing my playbook is this:

tasks:

  • name: getting my git repo

git: repo=gitolite@my.git dest=/bla remote=origin version=production

  • name: enable sparse-checkout
    shell: git config core.sparsecheckout true chdir=/bla creates=/bla/.git/info/sparse-checkout
    notify: sparse-checkout

  • name: reset folder to only contain sparse structure if it was previously created
    shell: git read-tree -mu HEAD chdir=/bla

handlers:

  • name: sparse-checkout
    copy: files/sparse-checkout dest=/bla/.git/info/sparse-checkout

The thing I don’t like about this, is I have to checkout my git repo before hand if I want to use the Ansible module, otherwise I need to do custom commands to init the repo myself, or am I missing something?

If anyone else are doing this please show me if it can be done better.

This is not something the module currently has implemented as a feature.

In fact, this is the first I’m aware of git having this feature.

I’m open to pull requests that attempt to add a nice syntax for doing this to the git module, but I’m not sure there would be a nice one, and generally suspect this would be infrequently used.

FWIW, I’ve only ever had issues with sparse checkout. For some reason they just stop working sometimes. YMMV, but unless you really need a sparse checkout, I would avoid it.

we are moving away from it, but for now it is needed…

I’m interested in learning more about why someone would need it to decide whether to entertain the pull request or suggest other workarounds.

Let me know if you can!

I could possibly see it if you have a massive repo and only need one or two files and are short on space. Generally a shallow checkout is enough here.

Sparse checkouts are a somewhat hacked feature into git.

purely due to low bandwidth machines that we are trying to not download to much extra bloat on.

Although I do download it completely the first time, after sparse checkout is on, it only downloads the content that is specified.

We have a huge “local” www templating system, with different skins/themes, that are not needed on all sites.

We are luckily moving away from storing it in git soon, as it is not really ideal for this.

Another use case for sparse checkout is for website deployment. The development repository has the site code plus Vagrant file and development configuration files, deployment scripts, etc. so the environment can be shared or recreated easily. For deployment, you only want to pull the site code (a directory) from the project repo to the server.

Submodules don’t solve this well because the site code is changing during development requiring double commits (in the site-only repo and in the whole-project repo).

Subtrees require a separate repo for the site code and duplicate copies of code (in the site-only repo and the whole-project repo). Also, subtrees don’t move with the repo, so you have to reestablish them when you pull the project repo–and know to do it. And that introduces an opportunity for code to get out of sync between the two repos.

Sparse checkout keeps all the project resources in one repo (whole-project) and only pulls the site code to the staging and production servers for deploy.

(If anyone has better solutions for this scenario, I’m all ears.)

I believe I heard the feedback recently (I didn’t dig in) that spare checkouts were a bit of a hack, but someone was quite intereted (and we have a pull request) for bare repos, which are checkouts without the .git
directory.

That being said, I’m probably open to both, but if there are any caveats documented on the git documentation about when they might explode, we should link to them.

FWIW, sparse checkout is documented at the bottom of http://git-scm.com/docs/git-read-tree.

The only issue documented seems to be if you decide you want the whole repo at a later date, you have to force it.

Sparse checkout was originally developed to provide SVN-like functionality (externals or shallow checkouts depending on who you read) that some people found useful. http://vmiklos.hu/blog/sparse-checkout-example-in-git-1-7

Some people may consider it a hack because it requires a repo before you can set it up, but if you clone the repo, your checkout isn’t sparse. So instead of simply checking out the remote repo with a parameter that lists the parts you want, sparse checkout requires you to git init the repo, set up sparse checkout, repoint the origin, and then pull code .

After a bit more research, sparse checkout in one line:

git clone --template=path/to/template-directory --config core.sparsecheckout=true

where path/to/template-directory/info/sparse-checkout becomes .git/info/sparse-checkout.

make it so!

I have working code for this including documentation updates in the header and a playbook I used to test it (rsync template-directory to remote, clone github repo with template and config to do sparse checkout). Reading the “contributing” guide, I’m not sure what “make tests” does besides run 0 tests–probably because I don’t know how to set up tests in nose. I’m not a Python programmer, just know enough to mimic similar code that’s in the git module already and make it work.

Should I submit a pull or is there a guide for setting up the tests?

Thanks.

“I’m not sure what “make tests” does besides run 0 tests–probably because I don’t know how to set up tests in nose.”

It runs quite a bit more than 0.

You have to install nosetests

“yum search nose” # etc

However, the focus of the unit tests are not module coverage – that’s more for an integration testing effort.

Debian here. But I installed nose and it ran, but reported no tests found.

Pull request below includes link to test playbook.

https://github.com/ansible/ansible/pull/4923