What’s in an IP Log Scan?

For the past week, most of my time has been spent neck-deep in IP Logs.

Eclipse projects are required to submit their IP Log for review prior to any release. With Eclipse Indigo due to release in just under a month (June 22/2011), I’ve been bit-buried in IP Logs. Of the 62 projects participating in Indigo, I have received IP Logs from 57 of them.

Processing an IP Log occurs in two stages. In the first stage, I compare the contents of the log against the code being distributed by the project. Once I give the IP Log my approval, the IP Team works their magic on it; comparing the contents of the log against their records.

The comparison that I perform is rather coarse: I have a script that lists all of the files found in the project’s directory on the download server (including those nested within ZIP/GZ/TAR/JAR files), extract the “.jar(.pack.gz)?” files from that list, and then try to resolve each of them. Many of the files–especially those that follow an “org.eclipse.*” convention–are from Eclipse projects; I identify those projects. Many of the files, however, are third-party libraries; I identify the contribution questionnaire (CQ) record (if it exists) for each of these libraries.

Identifying the project based on the file name is harder than it seems. Many projects follow the third-portion-is-the-project-id convention and are relatively easy to sort out (though even this requires some gymnastics to get correct). Others require that I painstakingly root through the code repository for a project to determine which patterns belong to which projects (do you hear me, Web Tools?) and maintain a mapping of exceptions to the rule. This has allowed me to move past years of repeatedly asking the ECF project where the “ch.ethz.iks.slp” bundles come from (this is code that has been contributed to and is maintained by ECF). I need to be able to sort out the project from which an “eclipse.org” bundle comes from to ensure that the scanning tool is smart enough to identify third-party JARs that have been pulled in through reuse of Eclipse code (for which we do not require a CQ).

Mapping the third-party bundles is a bit challenging. Over the past couple of years, I have created a mapping that pairs file names with corresponding CQs. I have a mapping, for example, that connects CQ 2114 (iText PDF library Version: 1.5.4) with files of the form “com.lowagie.itext_1.5.4.*.jar”, “com.lowagie.itext.source_1.5.4.*.jar”, and “itext_1.5.4.*.jar”. The wildcards in the file names allow me to map to a file regardless of whatever qualifier might be included in the name.

The scanning tool makes confirming the integrity of project downloads a lot easier than the completely manual “scan the directories” approach I used to take. However, every time I run the tool on a new project, or a project with a new release, I inevitably have to add a mapping to my list. This is a another painstaking process: very often this mapping can be done based on the title of the CQ, but very often it requires that I break open attachments on a bunch of CQs to hunt for the right one. Sometimes I have to ask the project to help me sort it out.

I do sometimes discover missing CQs. Most of the time, the problem can be solved by creating a new “piggyback CQ” (that is a CQ for a library that has already been approved for use in another Eclipse project). Infrequently, I find an unexpected use of a library, or version of a library that requires a little more work to remediate. In the end, though, the downloads coming from an Eclipse release are squeaky clean.

As I said earlier, the current scan tool is pretty coarse. There’s some cleverness in it, but it depends on some pretty clunky notions. In its current form, the scan tool is a quick and dirty solution to a problem during my Linux BASH fascination phase. In the process of hacking it together, I’ve laid some groundwork for better tools (the project/bundle mappings and CQ/file name mappings are the most useful artefacts). I could do a better job, for example, if I could hack into the manifests and determine the dependencies. But I’m not going to do with with BASH/PHP scripts: that’s something that p2 is pretty good at. Any next generation tool will have to be built based on that.

Right now, I have to run the tools manually (from the command-line) on server; they’re a little on the heavyweight side, so we don’t want to open them up to just let anybody run them at any time. At some point, I intend to set up a low-priority process that just grinds through the projects and generates static reports for each of them.

This entry was posted in Community, EDP. Bookmark the permalink.

4 Responses to What’s in an IP Log Scan?

  1. Miles Parker says:

    Wayne, with most projects using either Maven or Buckminster for their project builds, have you considered using those tools for scanning these projects? In some cases you’ll run across false positives — for example I have some 3rd party dependencies that are build time only — but almost by definition you should get all of the artifacts since the build won’t work without them. I know that Buckminster will even create manifests and other such artifacts for you and it even supports the ability to create custom resolution mechanisms.

    • waynebeaton says:

      I did think about possibly using Maven or Buckminster. My concern is that I end up in a position where I have to use one set of tools for some projects and another set of tools for others. Using p2 should allow me to have one set of tools that works everywhere. Even that’s not true, however, as some Eclipse projects produce non-p2 artefacts.

      At this point, any talk of next generation tools is theoretical, anyway.

  2. christian campo says:

    That would be great if we get a “this is wrong in your download” report as soon as it happens…..

    christian

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s