git 2.25.0 (2020-01-13) ¶

Sparse checkout management made easy ¶

In the past few releases, you might have read mentions of topics like partial clone support and sparse checkouts in blog posts such as these.

In 2.25, Git takes another step closer to bringing mature and configurable partial clone support to all users.

What are partial clones ¶

Before we dive into the new changes, it’s worth taking some time to discuss what partial clones are, and where they’re at today.

A clone of a Git repository copies all of its data: every version of every file in the history. For very large repositories, the cost of network transfer and local storage can make this awkward or even impossible, even if you’re only interested in a subset of the files.

In the past several versions, Git learned the ability to execute a “partial” clone, which means that it can now clone and work with repositories without having all of their contents.

Partial clones are still considered an experimental feature from Git’s point of view. For instance, many providers (such as GitHub) don’t support this feature yet, and it’s continually changing and evolving within Git from release to release.

For now, let’s focus on better understanding partial cloning by reviewing the perspective from both the server and client.

The client must do two things: First, it must be able to tell the server that it wants only some objects from a repository. Likewise, it must also be able to tolerate local repositories which lack a complete set of objects.

On the other hand, the server must be able to interpret the client’s request to serve only some objects, and be able to generate an adequate response.

How does this get done today? Let’s say that your repository has a manageable amount of history, but too many files to fit comfortably on your hard drive. In this case, you might want to clone only part a repository’s contents, by executing something like:

$ git clone --filter=blob:none --no-checkout /your/repository/here

Let’s break down what that means.

First , specifying –filter= allows you to tell the server you’re cloning from the objects you choose. (In our example, we asked the server to avoid sending us blobs, but you can use a number of possible qualifiers).

Next , we have to tell Git that it can skip checking out the repository after it receives a response from the server. Why? Because if Git tries to check out the contents, it will realize that it has missing objects, and try to request them from the server. We can prevent this from happening with –no-checkout which tells Git to avoid checking out the repository entirely.

Now we have a repository on disk that has some of the objects from the server, but none of them are checked out to be read/written to/etc.

What do we do now?

Somehow, we have to tell Git which objects are okay to skip when checking out the repository to be able to actually check out.

Thankfully, we can use a sparse checkout in order to make this happen