I've been seeing many Subversion repositories being hastily imported to Git. This is unfortunate because not having a cleanly and correctly imported history can reduce the effectiveness of Git's powerful tools, such as git bisect or git blame. Having an accurate revision control history is very helpful for tracking down regressions. Here's my take on how to do this properly.
There are a few typical problems in Subversion repositories that I've seen:
- History tends to be crufty (svn ci -m "oops"). Some people consider cleaning such history a bad habit (since it's not what "actually" happened), but IMHO reason to preserve history is so you can figure out the purpose or nature of a change.
- Merge metadata is missing. Even with merges created using SVK or Subversion 1.5, git-svn doesn't import this information.
- Tags aren't immutable. People sometimes adjust them to reflect what the release really ended up being, but at that point the tag has effectively become a branch. Again, there's no metadata.
When you make a checkout with git svn the results could often be significantly improved:
- Git has a very good representation for merges.
- Git supports clean annotated tags (tags with commit messages).
- Commit messages could be reformatted to follow Git conventions.
Preparing a git-svn chekout
If you have made any merges using SVK
or Subversion 1.5 (Update: see comments) then you should probably use git-svn from Sam Vilain's svn-merge-attrs branch of Git to save a lot of time when restoring merge information. This version of git-svn will automatically add merge metadata into the imported repository for those commits.
Assuming you have a standard trunk, branches and tags layout, clone the repository like this:
git svn clone --prefix=svn/ --stdlayout --authors-file=authors.txt http://example.com/svn/
For large repositories I like to use svnadmin dump and svnadmin load to create a local copy. You can also just run the conversion on your Subversion server. For local repositories use a file:/// URI.
Cleaning up tags and branches
will run through all the imported refs, recreating properly dated annotated tags (but only if they haven't been modified since they were created), and making branches out of everything else. It'll also rename trunk to master.
The resulting layout is more like what a Git repository should look like, so git tag -l and git branch -l work as expected.
Restoring merge information
If some of the merges were made by hand or if you didn't use Sam's git-svn then you'll need to recreate merge metadata by hand. Fortunately this is easily done using the .git/info/grafts file.
The grafts file is a simple table of overridden lists of parents for specific commits. The first column is the commit whose parents you want to override, and the rest of the line is the list of new parents to use. For a regular commit there is only one parent, the previous commit. Merges are commits with more than one parent.
Suppose we have a subversion repository where revision 1 creates a project, revision 2 creates a branch, revision 3 modifies the branch, and revision 4 merges it back into trunk. If imported to Git without the metadata revision 4 will have a single parent, revision 1, but its parents should be 1 and 3.
If the IDs of the imported commits are:
The line in the .git/info/grafts file that fixes revision 4 would look like this:
9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e a3db5c13ff90a36963278c6a39e4ee3c22e2a436
If you view the history using GitX or gitk then you should now see revision 4 has become a proper merge.
Most people can happily skip this step.
If you'd to change the history you can now run git rebase --interactive and use the edit command and git commit --amend to clean up any commits or squash to combine commits. This is probably a topic for another post, but it's worth mentioning.
However, make sure you keep other tags and branches synchronized when you rebase. This can be done using the grafts file.
The last bit of conversion involves running
to clean up SVK style merge commit messages (where the first line is useless with most Git log viewers), and remove git-svn-id strings.
The actual message filtering is done by the git-svn-abandon-msg-filter script. You can customize this to your liking.
Another important side effect the git filter-branch --all step in git svn-abandon-cleanup is that the grafts entries are incorperated into the filtered commits, so the extra merge metadata becomes clonable.
Finally, all merged branches are be removed (using the safe -d option of git branch).
The resulting Git repository should be ready to publish as if you created it locally.
git remote add origin email@example.com:repository.git git push --all git push --tags
You can still cleanly import a repository does not follow the standard directory layout or has other complications (e.g. the repository was moved without importing). Use git-svn to import each directory of history separately and then use grafts to stitch the parts back together.
You can write a script to create a grafts file using git rev-parse, like David Wheeler did for Bricolage.
The grafts file can also be used to hide commits, clean up modified tags, etc.