Git Commits are Automagically Recorded in IP Logs

I rolled out some changes last week that pull some information for Intellectual Property (IP) Logs from Git. IP Logs provide a record of where the various intellectual property contributions for a project come from; they are a very important tool when it comes to tracking the provenance of a project.

With the changes, Eclipse project IP Logs now contain an entry in the “Contributors and Their Contributions” section for every Git commit authored by somebody who was not a committer at the time of the commit. Here’s a screen shot of part of the IP log for the Virgo project:

In this screen shot, you can see that Hristo made several contributions to the project prior to becoming a committer. From the commit record (see below), you’ll notice that the first of the listed commits (abbreviated to “197178”) shows Hristo as the author, and Glyn as the committer.

commit 1971786de1ffb81b6e1e610759e20302b036e675
Author:     Hristo Iliev <...>
AuthorDate: Mon Sep 27 12:00:48 2010 +0100
Commit:     Glyn Normington <...>
CommitDate: Mon Sep 27 12:00:48 2010 +0100

    bug 326156: set default ant target

1       1       build-documentation/build.xml

The IP Log tool takes a couple of steps to identify these contributions. In the first step, commit records that have different values in the author and committer fields are identified and cached in a database to speed up access (parsing the Git log can be pretty expensive). In the second step, the identified records are further scrutinized to determine if the individual identified in the author field was a committer at the time of the commit. Those records that were authored by a non-committer make it into the log.

The cache is updated every time you request the log. The update operation is pretty cheap, but there’s a wrinkle. Every time we update–in order to keep things speedy–we review only those records that have commit datetime stamps that are newer than the last time we updated the cache. Due to the nature of Git, it’s entirely possible (probable?) that somebody might push some commits with datetime stamps that precede our last cache update, meaning that some commits may not be properly represented in the IP log. The only way to be 100% certain is to scan the complete repository.

Scanning an entire repository (or, as in the case of the Virgo project, multiple repositories) every time the IP Log is requested is too time consuming to be useful, so we have a separate process that periodically does a complete scan of the repositories to make sure that the caches are up to date.

The solution is not ideal, but it should do the job. I’ve looked into using a Git hook to update the cache as new commits are pushed into a repository, but there are some technical challenges that need to be overcome in order to make that option work. There’s some discussion on Bug 327594.

My next step will be to update the entry on handling Git contributions in the Eclipse Wiki.

This entry was posted in EDP. Bookmark the permalink.

5 Responses to Git Commits are Automagically Recorded in IP Logs

  1. That’s awesome Wayne… finally, easy ip logging!

  2. Something like `git shortlog -e` would give some quick stats across a repository too…

    • waynebeaton says:

      Yup. But running that across 20 arbitrarily-large repositories might get time consuming. Long-running resource-intensive web queries are the sort of thing that keeps webmaster awake at night.

  3. Eike Stepper says:

    What about a post-commit hook that places a “recache” job in a separate processing queue?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s