25

According to this thread, exclusion in Git's sparse-checkout feature is supposed to be implemented. Is it?

Assume that I have the following structure:

papers/
papers/...
presentations/
presentations/heavy_presentation
presentations/...

Now I want to exclude presentations/heavy_presentation from the checkout, while leaving the rest in the checkout. I haven't managed to get this running. What's the right syntax for this?

6 Answers 6

10

Sadly none of the above worked for me so I spent very long time trying different combination of sparse-checkout file.

In my case I wanted to skip folders with IntelliJ IDEA configs.

Here is what I did:


Run git clone https://github.com/myaccount/myrepo.git --no-checkout

Run git config core.sparsecheckout true

Created .git\info\sparse-checkout with following content

!.idea/*
!.idea_modules/*
/*

Run 'git checkout --' to get all files.


Critical thing to make it work was to add /* after folder's name.

I have git 1.9

Sign up to request clarification or add additional context in comments.

1 Comment

Denis P’s answer can now do this in a single command.
9

With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command.

Git 2.37 (Q3 2022) makes the cone mode the default. See last section of this answer.


First, here is an extended example, starting with a fast clone using a --filter option:

git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD

Using the cone option (detailed/documented below) means your .git\info\sparse-checkout will include patterns starting with:

/*
!/*/

Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:

# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false

# remove .git\info\sparse-checkout
git sparse-checkout disable

# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/

# populate working-tree with only the right files:
git read-tree -mu HEAD

I'm just trying to figure out what to pass to git sparse-checkout set to exclude something.

That is trickier.

A workaround might involve including everything else explicitly.

This should result in the presentations directory being included in your sparse checkout, but without the heavy_presentation subdirectory

That would be:

# Initialize the sparse-checkout feature
git sparse-checkout init --cone

# Set the directories you want to include and exclude
git sparse-checkout set presentations/*
git sparse-checkout add '!presentations/heavy_presentation'

In details:

(See more at "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee)

So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).

See commits Merged by Junio C Hamano -- gitster -- in commit bd72a08, 25 Dec 2019:

sparse-checkout: add 'cone' mode

Signed-off-by: Derrick Stolee

The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option.

The config man page includes:

`core.sparseCheckoutCone`:

Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.

The git sparse-checkout man page details:

CONE PATTERN SET

The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result in O(N*M) pattern matches when updating the index, where N is the number of patterns and M is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed when core.spareCheckoutCone is enabled.

The accepted patterns in the cone pattern set are:

  1. Recursive: All paths inside a directory are included.
  2. Parent: All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.

By default, when running git sparse-checkout init, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:

/*
!/*/

This says "include everything in root, but nothing two levels below root."
If we then add the folder A/B/C as a recursive pattern, the folders A and A/B are added as parent patterns.
The resulting sparse-checkout file is now

/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/

Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.

If core.sparseCheckoutCone=true, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in the sparse-checkout.

So:

sparse-checkout: init and set in cone mode

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.


Note, the --cone option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster -- in commit ea46d90, 05 Feb 2020)

doc: sparse-checkout: mention --cone option

Signed-off-by: Matheus Tavares
Acked-by: Derrick Stolee

In af09ce2 ("sparse-checkout: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone' option was added to 'git sparse-checkout init'.

Document it in git sparse-checkout:

That includes:

When --cone is provided, the core.sparseCheckoutCone setting is also set, allowing for better performance with a limited set of patterns.

("set of patterns" presented above, in the "CONE PATTERN SET" section of this answer)


How much faster this new "cone" mode would be?

Git 2.25 (Q1 2020) includes:

sparse-checkout: use hashmaps for cone patterns

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

That change swaps out regex-based matching for simple hashset lookups of prefix patterns in cone mode, getting performance on par with full checkouts even when using thousands of include/exclude rules.

For the OP's question: To exclude presentations/heavy_presentation you must use the cone-mode pattern form (!presentations/heavy_presentation/) rather than arbitrary regex exclusions in non-cone mode.

/$folder/
/!/$folder/*/

And:

sparse-checkout: respect core.ignoreCase in cone mode

Signed-off-by: Derrick Stolee

Adds case-insensitive matching when core.ignoreCase is set, by swapping to a case-folding hash and comparison in cone mode.

For the OP's question: If your presentations directory or the heavy_presentation name varies in case, the exclusion pattern will still match correctly under cone mode.

And (still Git 2.25):

sparse-checkout: list directories in cone mode

Signed-off-by: Derrick Stolee

Changes git sparse-checkout list to emit only the root directories used in cone mode rather than verbose include/exclude pairs.

For the OP's question: Reminds you that git sparse-checkout list will not show your !presentations/heavy_presentation exclusion directly, since it collapses down to directory names in cone mode.


unpack-trees: correctly compute result count

Reported-by: Johannes Schindelin Signed-off-by: Derrick Stolee

Fixes a pointer-arithmetic bug in cone-mode skipping logic to accurately count how many index entries to bypass.

For the OP's question: A cone mode skips whole directories when included or excluded: so excluding a single file like heavy_presentation inside presentations/ still processes the entire directory at once.


With Git 2.26 (Q1 2020):

sparse-checkout: fix cone mode behavior mismatch

Reported-by: Finn Bryant Signed-off-by: Derrick Stolee

Corrects a bug where file paths fed into set were treated as recursive directory matches, causing some files to be omitted.

For the OP's question: Trying to exclude presentations/heavy_presentation as a file in cone mode without the proper pattern may not work as expected, because cone mode treats inputs as directories.

And:

sparse-checkout: create 'add' subcommand

Signed-off-by: Derrick Stolee Introduces git sparse-checkout add for incrementally growing the sparse set, merging new directory inputs into the existing cone.

For the OP's question: While you can add new directories to include, there is no parallel remove or exclusion command in cone mode: you still need raw patterns for exclusions like !presentations/heavy_presentation.

And:

sparse-checkout: work with Windows paths

Signed-off-by: Derrick Stolee Normalizes backslashes to slashes so Windows users can set A\B\C and have it map to Unix-style A/B/C.

For the OP's question: You do have path normalization before matching: so on Windows you would write your exclusion as !presentations/heavy_presentation/ (with slashes) even if you typed backslashes.


The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.

With Git 2.27 (Q2 2020), this limitation has been lifted.

sparse-checkout: stop blocking empty workdirs

Reported-by: Lars Schneider Signed-off-by: Derrick Stolee Allows the sparse-checkout patterns to result in an entirely empty working tree without error.

For the OP's question: You now can exclude everything under presentations/ (including heavy_presentation) if desired: cone mode no longer forbids an empty result, though excluding single files still needs the correct pattern syntax.


With Git 2.37 (Q2 2022):

sparse-checkout: make --cone the default

Signed-off-by: Elijah Newren Switches cone mode on by default, favoring directory-based sparse inputs over free-form patterns.

For the OP's question: Since exclusions (!path) require pattern syntax that cone mode disables by default, you must turn cone mode off (git sparse-checkout init --no-cone) to use !presentations/heavy_presentation.

And:

git-sparse-checkout.txt: wording updates for the cone mode default

Signed-off-by: Elijah Newren Reframes documentation to treat inputs as directories unless core.sparseCheckoutCone is false.

For the OP's question: Confirms that to exclude individual subpaths like presentations/heavy_presentation, you need non-cone mode's pattern support rather than the default directory-only cone mode.

6 Comments

you paste the whole doc in there as if this whole thing was needed to answer the question....
Remove your copy/paste of the docs, and link to it instead. That's an unnecessary bifurcation of information. We're in the business of organizing information, not forking it.
I dedicated a good portion of my life to reading this answer twice and I swear it ignores the question and just pastes every part of the Git changelog that mentions sparse-checkout. I'm just trying to figure out what to pass to git sparse-checkout set to exclude something. I swear it doesn't support it and you need to edit the info file manually
@MichaelMrozek Good question. I have edited the answer to include the known workaround.
Another deny-of-service "answer".
|
9

With Git 2.37 (released in June 2022) it is much easier. To exclude one folder and a few files matching a mask (just to provide more general/helpful example than the question asks) I did this:

git sparse-checkout set --no-cone '/*' '!/folder/' '!/path/to/dist/*.map'

This worked quite intuitively (well, after a few hours spent to find this formula). The folder completely disappears, all the *.map files from path/to/dist folder, too. Nothing else was touched.

A few important bits:

  1. I strongly suggest to backup your local repo before starting if it has any unstaged/ignored files. My first try (without '/*' etc.) was scary - as if most of my data disappeared. #5 below seemed to help to restore everything, but you never know for sure with a big repo...

  2. '/*' was the magic piece. It asks GIT to include everything not excluded later on. It doesn't work without it (removing lots of repo contents). It must come first in the list!

  3. If using double quotes instead of single quotes, you need set +H for the command to get through (bash treats ! as a special command). And set -H afterwards to restore the default bash behaviour.

  4. I recommend to check what is GIT's interpretation of the paths you used by typing:

    cat .git/info/sparse-checkout

    Before finding the "formula" for my case I was surprised with the results quite a few times (e.g. see #6).

  5. Do ls for a few repo paths after running the command. If things go wrong, then git sparse-checkout disable should restore all the missing files. At least this worked very well in my case.

  6. Better use quotes for all your paths. Especially important in '/*'! Here is what I got in .git/info/sparse-checkout when I used it without quotes (each from new line, for some reason stackoverflow doesn't format that well):

    /bin /dev /etc /home /lib /lib64 /opt /proc /root /run /sbin /tmp /usr /var !folder/ !path/to/dist/*.map

    You can imagine that these patterns weren't what I wanted to say...

  7. Mind leading slashes everywhere ('!/folder/'). If omitted ('!folder/') then folders with such a name will be deleted everywhere in the hierarchy, not just on the top level.

  8. --no-cone is now important. This was the default mode in the past, and this may introduce lots of confusion when looking at older advice over the internet! GIT docs elaborate on that if you want to understand things better.

Hope this helps someone.


Update: Added leading slashes to the excluded paths, explained in #7 above.

7 Comments

Good feedback, that seems easier than the git sparse-checkout command as I initially reported back in 2019.
Exclamation marks are allowed with the single quote ', e.g., '!/folder/', even without set +H.
What pattern would you use to ignore files in top level of repository using --no-cone mode?
Try "!/file.txt". I think it should work.
I also spent an hour reading up on this, but one question is still unanswered: Will sparse checkout only remove unwanted files/folders from checkout, or also the objects in the .git folder?
|
8

I would have expected something like the below to work:

/*
!presentations/heavy_presentation

But it doesn't. And I did try many other combinations. I think the exclude is not implemented properly and there are bugs around it (still)

Something like:

presentations/*
!presentations/heavy_presentation

does work though and you will get the presentations folder without the heavy_presentation folder.

So the workaround would be to include everything else explicitly.

2 Comments

Thanks, confirmed. I have edited your post to add another example that was not working.
Your first solution worked for me in git 2.21.0 on windows
4

I had the same problem. I fixed it with something like:

!presentations/heavy_presentation
presentations/*

How I understand that it works: It reads the file rule by rule. If something is included, it includes all paths that contain that word, and it doesn't change its status anymore until the end of the sparse checkout. If you add exclude rule before include, in my opinion it will delete the files first and than mark all as included.

I am not completely sure, this is what I have supposed based on my experience and has been working for me. I hope it will help someone.

Comments

0

Short Answer:

git sparse-checkout set /* !/presentations/heavy_presentation/
git sparse-checkout init [--cone]

--cone Option: Not relevant for only few patterns / small repo, but for speeding up in general. Requires a certain canonical order of the patterns as explained by the sparse-checkout / CONE PATTERN SET documentation). Can be introduced later also by:

git config core.sparseCheckoutCone true

1 Comment

This won't work currently (I'm on Git 2.37 now), at least without quotes. /* is now expanded into /bin /dev /etc /home and so on. Also, the init command is now deprecated. I added a more detailed answer with what worked for me currently: stackoverflow.com/a/75264657/11575732

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.