Subcommit Git

2024-07-20

Git is my go-to version control system, but I first learned source control on SourceGear Vault. While I've appreciated getting comfortable with git and would not go back, there are some features of Vault that I miss. In particular, Vault is a mono-repo, and so each project is simply a subfolder in that repo. This way, each folder has its own version at any given time. While you can have a mono-repo in git and there are various ways of mixing and matching repos, it's still not very natural to treat each directory as having its own history and version information.

So I wanted to see how git could be modified to include this and it turns out to be pretty easy to adjust the core data structure to accomodate this. This shouldn't be considered as a serious proposal, but just an exploration of an alternate universe of git.

To make this concrete I implemented it on top Write Yourself A Git, which would make a for a simple starting point. My code is here.

Background

To see how we're modifying git, you'll need to know about how git structures its data. If you don't know anything about this at all, I suggest the Pro Git book's Git Internals chapter, particularly the Git Objects section.

In a regular git repository, we have a branch which points to a particular commit. That commit will point to a parent commit, and also to a tree object, corresponding to the top-level directory of the repository. That tree objects then points to the blobs (files in the repository), and other tree objects (subdirectories in the repository). In this way the commit represents the state of the repository at the time the commit was made.

A diagram illustrating the connections discussed above

What we're changing

In order to give each tree its own history, we will need commit objects pointing to each tree. This means that instead of pointing to other tree objects, trees will point to subcommits. These subcommits then point to the parent subcommit for that tree (i.e. last time the tree changed).

A diagram illustrating addition of subcommits

Note that subtrees are only updated with new subcommits when the contents of the tree changes, so the chain of subcommits 'skips' changes in the main commit chain that don't modify the subcommit's corresponding subdirectory.

Use-Case: Extracting history for a subdirectory

It is possible in vanilla git to extract the history of a directory (using git subtree) and use that to share the history of a subdirectory with another repository. However, the problem with this is that 1. the commits for the subtree have to be generated at the time of the split 2. after integrating the subtree into another repository there will be no clear indication of where the commits came from. By using subcommits, the history of the directory is already available to split out. And when the subcommits are brought to another repository, they will have the exact same hashes as the subcommits from the original repository.

Demo Repository

There is a demo subcommit-git repository in the project repository.

Is this a Good Idea?

No, probably not as such. Setting aside the fact that this is incompatible with vanilla git, this is more an exploration of an idea rather than a serious proposal.