Friday, July 31, 2009

git rebase Considered Awesome

git rebase is subject to quite a few controversial opinions, but in all my experience working with Git it seems that most people who use Git, even those who use rebase, myself included, often don't really know exactly what it does. Apparently, many people are afraid of using it because it does evil magic like changing the past and eating babies. The truth is that git rebase actually does so much more!

The critics of rebasing generally contrast it to Subversion, in which history is never modified after being recorded. However, much like the baby eating, this is just delusional superstition when you understand what Git actually does (and what subversion doesn't do).

Personally, I think git rebase should be used all the time; that is, git pull --rebase should be used instead of plain git pull (this can actually be the default).

You probably shouldn't rebase published heads, that can confuse people, but that's not a reason not to use rebase correctly.. I don't like the anti rebasing dogma:

  • Just because you can do evil doesn't mean you should (or even that it's easy)
  • Even if people do that anyway, if everyone uses git pull --rebase it won't actually be a problem[1].

In my experience rebase always produced the most desirable results, creating a clean history instead of one riddled with meaningless nonsense. rebase is smart.

Explaining exactly what rebase does is a job for the documentation, but what I feel the docs are lacking on is explaining what rebase is actually for.

Forward-port local commits to the updated upstream head

Actually, I take that back. This is an excellent way of explaining what rebase is for. Assuming, of course, that you already know what that means ;-)

Broken down, we have a few fnords to sort out:

  • upstream head
  • updated head
  • local commits
  • forward porting

I don't really want to explain what these mean, just to clearly define them.

Upstream head

Git is distributed. Even if you work with a single central Git repository for your project, that's just a special, simplified case of a distributed workflow.

When you clone a repository you get your own local copies of the remote heads (branches).

When you commit your work only your own local copies of these heads are modified.

git push will overwrite the references on the remote side with the local ones, but by default only if the operation is safe (Git calls this "fast forward"), when the local revision is derived from the remote revision (the commit in the remote version is an ancestor of the local revision).

If this is confusing then you should read up on how to work with remote repositories in Git first.

Updated head

Between the point in time when you started working on an up to date local copy of a head and the the time you want to push your changes, other people may have modified the remote version.

When this happens the development has diverged, there are two chains of commits in history, one is the updated head, and one is the local head.

Local commits

The local commits is the chain of commits leading from git merge-base updated-head local-head to the local head.

These are commit objects that are not visible by walking the history of the upstream head (updated or not), but only by walking the history of the local head.

Forward porting

This is where the awesome of rebase comes into play.

You can run git merge updated-head in your local version to create a new merge commit, that will combine the changes two diverged histories leaving you with the results of both in both the commit log and the source tree.

This is what git pull does by default; it's the same as git fetch followed by git merge.

git rebase is much more clever.

The results of git merge and git rebase are the same in terms of the resulting trees: the files will end up containing the same merged changes, and you will similarly need to run git mergetool if there are any conflicts.

The difference is that git rebase will take your local commits and apply them one by one to the revision from the updated upstream head, effectively creating a brand new local head with new versions of the local commits, whereas git merge creates a single merge commit that is derived from both the local commits and the upstream ones.

A while ago I was unaware that rebase is actually very smart about how, and more importantly whether or not to apply every local commit. rebase supports patch idempotence, local changes which have identical changes upstream (even if the commit metadata is different) are simply skipped without error. This means that changes that were merged upstream, even if they were cherry picked, signed off, etc, will still be dealt with correctly.

Similarly, merge commits that are no longer meaningful will be omitted. If you run git pull followed by git pull --rebase the merge commit created by the first pull will be omitted from the results.

The new set of forward ported local commits is clean and minimal, and therefore easier to work with.

Habitual rebasing

If you always use git pull --rebase your local changes will never get out of control, resulting in a mess of branches and merges. They will simply be the most minimal set of changes needed to bring the upstream head into the changed version that you are trying to create in your development.

Correctly using rebase to create a clean local history is simple using git pull --rebase.

Furthermore, when working with local (and therefore probably unpublished) commits you can even modify them in other ways as you keep writing code. For a more managed approach to creating clean patches incrementally see Stacked Git. This sort of history (and change) rewriting is fair game. This is what Linus means when he says Keep your own history readable and Don't expose your crap.

svn rebase

Under Subversion, when you run svn commit but you aren't up to date, you need to run svn update to forward port local changes to the updated upstream head. Does that sound familiar?

The difference is that in SVN you're rebasing history that hasn't been saved anywhere yet. With git you can at least back out of it (git reset, git rebase --abort) if you run into hurdles like conflicts. Subversion is merciless, it will simply modify your working copy with conflict markers, and you have no choice but to clean up the mess.

So in a sense Subversion does just as much "evil history rewriting" as Git does, except that in Git every step of the process is recorded as explicit versioned data, that you can compare with and revert to freely, whereas in Subversion it's done as one monolithic operation to your uncomitted working changes.

Breaking idempotent patches

Git is not omniscient.

There are a number of things that won't be automatically cleaned up by git pull --rebase (and rightly so):

  • commit --amend
  • rebase --interactive's edit and squash
  • git merge --squash

Applying such changes to a published branch is not just reorganization of history. The resulting commit history will have different changes when the deltas are actually computed.

Trying to rebase on top of something that had these operations applied to it will cause conflicts to the person running rebase, leading to confusion, which brings us back to the reason people are so afraid of rebase.

As long as you're aware of the implications of such operations, and make sure never to publish them where people won't expect them to happen, you can use them as much as you like.

[1] as long as there are no conflicts or other things that actually modify the deltas in a way that rebase can't automatically pick up


rcaputo said...

Subversion does let users dabble in minor revisionism via svn propset, but they keep their genie tightly stoppered behind the default pre-revprop-change hook. Git's rebase is wild and free, granting wishes on demand. That scares the bejeezus out of me.

Jakub Narebski said...

Actually "git pull --rebase" workflow should be familiar to Subversion users. This is commit-update-recommit workflow, an improved version of Subversion's update-before-commit workflow. :-)

Nicholas Braden ("LB") said...

"afraid of using it because it does evil magic like changing the past and eating babies. "
LOL, the 'eating babies' part had me laughing the whole rest of the article.