Thursday, June 11, 2009

Migrating from Subversion to Git

I've been seeing many Subversion repositories being hastily imported to Git. This is unfortunate because not having a cleanly and correctly imported history can reduce the effectiveness of Git's powerful tools, such as git bisect or git blame. Having an accurate revision control history is very helpful for tracking down regressions. Here's my take on how to do this properly.

Subversion issues

There are a few typical problems in Subversion repositories that I've seen:

  • History tends to be crufty (svn ci -m "oops"). Some people consider cleaning such history a bad habit (since it's not what "actually" happened), but IMHO reason to preserve history is so you can figure out the purpose or nature of a change.
  • Merge metadata is missing. Even with merges created using SVK or Subversion 1.5, git-svn doesn't import this information.
  • Tags aren't immutable. People sometimes adjust them to reflect what the release really ended up being, but at that point the tag has effectively become a branch. Again, there's no metadata.

When you make a checkout with git svn the results could often be significantly improved:

  • Git has a very good representation for merges.
  • Git supports clean annotated tags (tags with commit messages).
  • Commit messages could be reformatted to follow Git conventions.

When I converted the Moose Subversion repository I wrote a small collection of scripts.

Preparing a git-svn chekout

If you have made any merges using SVK or Subversion 1.5 (Update: see comments) then you should probably use git-svn from Sam Vilain's svn-merge-attrs branch of Git to save a lot of time when restoring merge information. This version of git-svn will automatically add merge metadata into the imported repository for those commits.

Assuming you have a standard trunk, branches and tags layout, clone the repository like this:

git svn clone --prefix=svn/ --stdlayout --authors-file=authors.txt http://example.com/svn/

For large repositories I like to use svnadmin dump and svnadmin load to create a local copy. You can also just run the conversion on your Subversion server. For local repositories use a file:/// URI.

Cleaning up tags and branches

git svn-abandon-fix-refs

will run through all the imported refs, recreating properly dated annotated tags (but only if they haven't been modified since they were created), and making branches out of everything else. It'll also rename trunk to master.

The resulting layout is more like what a Git repository should look like, so git tag -l and git branch -l work as expected.

Restoring merge information

If some of the merges were made by hand or if you didn't use Sam's git-svn then you'll need to recreate merge metadata by hand. Fortunately this is easily done using the .git/info/grafts file.

The grafts file is a simple table of overridden lists of parents for specific commits. The first column is the commit whose parents you want to override, and the rest of the line is the list of new parents to use. For a regular commit there is only one parent, the previous commit. Merges are commits with more than one parent.

Suppose we have a subversion repository where revision 1 creates a project, revision 2 creates a branch, revision 3 modifies the branch, and revision 4 merges it back into trunk. If imported to Git without the metadata revision 4 will have a single parent, revision 1, but its parents should be 1 and 3.

If the IDs of the imported commits are:

Revision Git Commit
1 e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e
2 7448d8798a4380162d4b56f9b452e2f6f9e24e7a
3 a3db5c13ff90a36963278c6a39e4ee3c22e2a436
4 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1

The line in the .git/info/grafts file that fixes revision 4 would look like this:

9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e a3db5c13ff90a36963278c6a39e4ee3c22e2a436

If you view the history using GitX or gitk then you should now see revision 4 has become a proper merge.

Rewriting history

Most people can happily skip this step.

If you'd to change the history you can now run git rebase --interactive and use the edit command and git commit --amend to clean up any commits or squash to combine commits. This is probably a topic for another post, but it's worth mentioning.

However, make sure you keep other tags and branches synchronized when you rebase. This can be done using the grafts file.

Final cleanups

The last bit of conversion involves running

git svn-abandon-cleanup

to clean up SVK style merge commit messages (where the first line is useless with most Git log viewers), and remove git-svn-id strings.

The actual message filtering is done by the git-svn-abandon-msg-filter script. You can customize this to your liking.

Another important side effect the git filter-branch --all step in git svn-abandon-cleanup is that the grafts entries are incorperated into the filtered commits, so the extra merge metadata becomes clonable.

Finally, all merged branches are be removed (using the safe -d option of git branch).

Publishing

The resulting Git repository should be ready to publish as if you created it locally.

git remote add origin git@example.com:repository.git
git push --all
git push --tags

Nontrivial grafting

You can still cleanly import a repository does not follow the standard directory layout or has other complications (e.g. the repository was moved without importing). Use git-svn to import each directory of history separately and then use grafts to stitch the parts back together.

You can write a script to create a grafts file using git rev-parse, like David Wheeler did for Bricolage.

The grafts file can also be used to hide commits, clean up modified tags, etc.

11 comments:

nothingmuch said...

I just realized the assumption that you're using a proper authors file seemed so obvious to me that I neglected to mention it ;-)

You have been warned, If I see svnuser@uuid I will mock you on the internet!

monkeyhelper said...

git svn-abandon-cleanup isn't a git command. I assume you need to install something from here :

http://github.com/nothingmuch/git-svn-abandon/tree/master

Unfortunately the README doesn't explain how you go about doing this.

nothingmuch said...

HI,

I've added a bit more info to the README, I hope that clears it up.

Thanks

c9s said...

Useful ! Thanks!

nuba said...

Hi there,

Thanks for the great post!

This snippet may need reworking, tho: You can still cleanly import a repository does not follow the standard directory layout or has other complications (e.g. the repository was moved without importing) you can still.

Regards, Nuba

nothingmuch said...

Fixed, thanks =)

Mina Naguib said...

Hi

I'm converting an SVN repo to GIT, and am suffering from the problem of merges.

Specifically, a "merge" done in SVN appears as a normal commit in git (single parent, not multiple parents as it should be).

From the article, I assumed that using samv's fork of git fixes this issue, however that is not so.

It appears that samv's fork only fixes the issue if the svn project used either svnmerge.py or SVK as merge helpers.

Merges done with vanilla svn, even 1.5+ which includes svn:mergeinfo metadata, are not handled by samv's fork (or any other fork/patch that I've seen).

It took me a couple of hours to conclude this, specifically since samv's fork appears to have tests that mention "mergeinfo", but not actual code to handle that case.

nothingmuch said...

Eep, thanks for the correction, I will update the post.

What did you end up doing?

SamV said...

Mina, you are correct - and given that there are quite a few projects now using the SVN 1.5+ merge support, I guess it's worth fixing.

Enjoy

Mina Naguib said...

@SamV

Fantastic! Thank you very much. Works like a charm.

szabgab said...

regarding grafting, what should be in the file if I have 1 the original file 2 is the first change on the branch then there are several changes on both the branch and the trunk and 10 is the merged revision?