50

For example, I want to download PCL 3d_rec_framework.

This is the git repository of PCL: https://github.com/PointCloudLibrary/pcl.git

How can I download this directory?

https://github.com/PointCloudLibrary/pcl/tree/master/apps

I tried this, but it didn't work:

sam@sam:~/code/pcl_standalone$ git clone https://github.com/PointCloudLibrary/pcl/tree/master/apps/3d_rec_framework
Cloning into '3d_rec_framework'...
error: The requested URL returned error: 403 while accessing https://github.com/PointCloudLibrary/pcl/tree/master/apps/3d_rec_framework/info/refs
fatal: HTTP request failed
sam@sam:~/code/pcl_standalone$ 

I don't want to download git of PCL and remove all other directories that I don't want.

How do I download just a single directory?

Zanna
  • 70,465
sam
  • 6,831

9 Answers9

67

dobey's answer is no longer the case since git v1.7. You can now checkout certain folders from a repository. The full instructions are found here.

git init <repo>
cd <repo>
git remote add -f origin <url>

git config core.sparseCheckout true

echo "some/dir/" >> .git/info/sparse-checkout
echo "another/sub/tree" >> .git/info/sparse-checkout

This tells git which directories you want to checkout. Then you can pull just those directories

git pull origin master
skukx
  • 779
  • 5
  • 2
  • 3
    This implies all Ubuntu versions have 1.7 available. You should check that to be the case and comment on your answer here as to which individual versions will actually work. PowerShell is also not Ubuntu and therefore should not be included, in my opinion. – Thomas Ward Jul 06 '15 at 22:07
  • 2
    @ThomasW. All currently supported versions of Ubuntu do include at least git 1.7, and most are 2.x now. – dobey Feb 04 '16 at 23:38
  • 6
    Still this will clone the whole repository and then do that sparse checkout. – Clerenz Jun 07 '18 at 12:46
  • 2
    @dobey, Seriously you removed useful information that people finding this question with Google might very much be looking for?! If I was forced to use powershell I would definitely like to see the pipe details, they are not obvious! echo "some/dir/" | Out-File -Encoding ascii .git/info/sparse-checkout echo "another/sub/tree/" | Out-File -Append -Encoding ascii .git/info/sparse-checkout – Samuel Åslund Dec 19 '18 at 07:53
  • I am so glad that it worked, but how do I include .gitignore file? I tried echo '.gitignore' >> .git/info/sparse-checkout and echo './.gitignore' >> .git/info/sparse-checkout, neither worked. Thanks! – zyy Feb 19 '20 at 04:27
21

git clone --filter + git sparse-checkout downloads only the required files

E.g., to clone only files in subdirectory small/ in this test repository: https://github.com/cirosantilli/test-git-partial-clone-big-small-no-bigtree

git clone -n --depth=1 --filter=tree:0 \
  https://github.com/cirosantilli/test-git-partial-clone-big-small-no-bigtree
cd test-git-partial-clone-big-small-no-bigtree
git sparse-checkout set --no-cone small
git checkout

You could also select multiple directories for download with:

git sparse-checkout set --no-cone small small2

This method doesn't work for individual files however, but here is another method that does: https://stackoverflow.com/questions/2466735/how-to-sparsely-checkout-only-one-single-file-from-a-git-repository/52270527#52270527

In this test, clone is basically instantaneous, and we can confirm that the cloned repository is very small as desired:

du --apparent-size -hs * .* | sort -hs

giving:

2.0K    small
226K    .git

That test repository contains:

  • a big/ subdirectory with 10x 10MB files
  • 10x 10MB files 0, 1, ... 9 on toplevel (this is because certain previous attempts would download toplevel files)
  • a small/ and small2/ subdirectories with 1000 files of size one byte each

All contents are pseudo-random and therefore incompressible, so we can easily notice if any of the big files were downloaded, e.g. with ncdu.

So if you download anything you didn't want, you would get 100 MB extra, and it would be very noticeable.

On the above, git clone downloads a single object, presumably the commit:

Cloning into 'test-git-partial-clone-big-small'...
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), done.

and then the final checkout downloads the files we requested:

remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), 10.19 KiB | 2.04 MiB/s, done.
remote: Enumerating objects: 253, done.
remote: Counting objects: 100% (253/253), done.
Receiving objects: 100% (253/253), 2.50 KiB | 2.50 MiB/s, done.
remote: Total 253 (delta 0), reused 253 (delta 0), pack-reused 0
Your branch is up to date with 'origin/master'.

Tested on git 2.37.2, Ubuntu 22.10, on January 2023.

TODO also prevent download of unneeded tree objects

The above method downloads all Git tree objects (i.e. directory listings, but not actual file contents). We can confirm that by running:

git ls-files

and seeing that it contains the directories large files such as:

big/0

In most projects this won't be an issue, but the perfectionist in me would like to avoid them.

I've also created a very extreme repository with some very large tree objects (100 MB) under the directory big_tree: https://github.com/cirosantilli/test-git-partial-clone-big-small

Let me know if anyone finds a way to clone just the small/ directory from it!

About the commands

The --filter option was added together with an update to the remote protocol, and it truly prevents objects from being downloaded from the server.

The sparse-checkout part is also needed unfortunately. You can also only download certain files with the much more understandable:

git clone --depth 1  --filter=blob:none  --no-checkout \
  https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git checkout master -- d1

but that method for some reason downloads files one by one very slowly, making it unusable unless you have very few files in the directory.

Another less verbose but failed attempt was:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small

but that downloads all files in the toplevel directory: https://stackoverflow.com/questions/75311408/how-to-prevent-git-clone-filter-blobnone-sparse-from-downloading-files-on-t

The dream: any directory can have web interface metadata

This feature could revolutionize Git.

Imagine having all the code base of your enterprise in a single monorepo without ugly third-party tools like repo.

Imagine storing huge blobs directly in the repo without any ugly third party extensions.

Imagine if GitHub would allow per file / directory metadata like stars and permissions, so you can store all your personal stuff under a single repo.

Imagine if submodules were treated exactly like regular directories: just request a tree SHA, and a DNS-like mechanism resolves your request, first looking on your local ~/.git, then first to closer servers (your enterprise's mirror / cache) and ending up on GitHub.

I have a dream.

The test cone monorepo philosophy

This is a possible philosophy for monorepo maintenance without submodules.

We want to avoid submodules because it is annoying to have to commit to two separate repositories every time you make a change that has a submodule and non-submodule component.

Every directory with a Makefile or analogous should build and test itself.

Such directories can depend on either:

  • every file and subdirectory under it directly at their latest versions
  • external directories can be relied upon only at specified versions

Until git starts supporting this natively (i.e. submodules that can track only subdirectories), we can support this with some metadata in a git tracked file:

monorepo.json

{
    "path": "some/useful/lib",
    "sha": 12341234123412341234,
}

where sha refers to the usual SHA of the entire repository. Then we need scripts that will checkout such directories e.g. under a gitignored monorepo folder:

monorepo/som/useful/lib

Whenever you change a file, you have to go up the tree and test all directories that have Makefile. This is because directories can depend on subdirectories at their latest versions, so you could always break something above you.

Related:

  • Awesome answer. Appreciate the sample repo and explanations. Follow-up question: how do you then checkout a particular branch? I tried git checkout <branch_name> but it fails with "error: pathspec '<branch_name>' did not match any file(s) known to git" – Saca Aug 30 '23 at 06:42
  • 1
    @Saca I've never tried it, but did you try to play with git clone --branch? – Ciro Santilli OurBigBook.com Aug 30 '23 at 06:48
  • 1
    That works. Same git clone command you had but just add --branch to it. Nice. Thanks agian! – Saca Aug 30 '23 at 06:53
13

First, do:

git clone --depth 1 [repo root] [name of destination directory]

Then:

cd [name of destination directory]

...And lastly:

git filter-branch --prune-empty --subdirectory-filter [path to sub-dir] HEAD

It's that easy. Git will rewrite the repo so that only the desired sub-dir is included. This works even if the sub-dir is several layers deep. Just name the destination directory the name of the sub-dir. Then in the "git filter-branch" command put the relative path to the sub-dir. Oh, the --depth 1 tells git to only download the top of the head (essentially removing the history).

  • This allows you to download a single sub directory, but the question pertains to multiple directories.. is that possible this way? I have to say I don't see how this works, looking at the documentation. – Joeppie Jan 11 '18 at 10:25
  • 2
    Is there an easy way to refresh that directory from time to time? – Clerenz Jun 07 '18 at 13:11
7

You cannot. With git, you clone the entire repository, and the full history of the repository.

There are some workaround solutions to be able to get a single file out of a git archive, listed on a Stack Exchange answer for the same question, but you will still have to download the entire repository to get that single file or directory you want.

dobey
  • 40,982
4

Concise, modern (2020+) answer

Yes, it can be done with git 2.19+, several years old already.

Sparse clone:

git clone --no-checkout --depth 1 --sparse --filter=blob:none \
    ssh://git@git.domain.tld:7999/$ORG/$REPO.git
cd $REPO

git config ... # as needed

Sparse checkout:

git sparse-checkout init --cone
git sparse-checkout add relevant/dir/  # trailing / said important
cat .git/info/sparse-checkout          # to verify

git checkout $BRANCH # should take only a moment

git status On branch $BRANCH Your branch is up to date with 'origin/$BRANCH'.

You are in a sparse checkout with '2%' of tracked files present.

nothing to commit, working tree clean

  • Good answer, works. The important section is to use --no-checkout in clone, as described. This means no files are downloaded. After that you can use git sparse-checkout to configure what should be downloaded (checked-out). To exclude only a single directory first include all /* then substract /excluded - note that there are two modes cone (new) and pattern mode (old). Cone mode does not support wildcards like *. – FireEmerald Mar 26 '24 at 19:16
1

For GitHub repos, you can clone any sub-directories of any GitHub repository (at any reference) using https://github.com/HR/github-clone

1

If the url of the repository is this

https://github.com/blah/blah2.git

and from there you want the folder images which you see through this url

https://github.com/blah/blah2/tree/master/images

Then do

# Install subversion to use svn
!apt-get install subversion

# Get what you want by adding "/trunk" to the repo url and the folder you want
#!svn checkout REPO_URL/trunk/DIRECTORY
!svn checkout https://github.com/blah/blah2.git/trunk/images

This works inside Google Colab

Rub
  • 171
1

I think i'm late but now you can do this with:

git clone --depth 1 --filter=blob:none https://github.com/PointCloudLibrary/pcl.git --sparse

this does shallow clones (latest) without any file contents (only the directory structure)

Then cd into the pcl folder

git sparse-checkout init --cone

git sparse-checkout set app

tells Git to only checkout the app directory and its contents

jpaugh
  • 552
-1

CHANGE tree/master/ to trunk/

SIMPLE :

svn export https://github.com/REPONAME/examples/trunk/lite/examples/

Like for your

svn export https://github.com/PointCloudLibrary/pcl/trunk/apps

To Install svn UBUNTU:https://linuxtechlab.com/simple-guide-to-install-svn-on-ubuntu/

Windows: https://tortoisesvn.net/downloads.html

enter image description here