For the past week, most of my time has been spent neck-deep in IP Logs.
Eclipse projects are required to submit their IP Log for review prior to any release. With Eclipse Indigo due to release in just under a month (June 22/2011), I’ve been bit-buried in IP Logs. Of the 62 projects participating in Indigo, I have received IP Logs from 57 of them.
Processing an IP Log occurs in two stages. In the first stage, I compare the contents of the log against the code being distributed by the project. Once I give the IP Log my approval, the IP Team works their magic on it; comparing the contents of the log against their records.
The comparison that I perform is rather coarse: I have a script that lists all of the files found in the project’s directory on the download server (including those nested within ZIP/GZ/TAR/JAR files), extract the “.jar(.pack.gz)?” files from that list, and then try to resolve each of them. Many of the files–especially those that follow an “org.eclipse.*” convention–are from Eclipse projects; I identify those projects. Many of the files, however, are third-party libraries; I identify the contribution questionnaire (CQ) record (if it exists) for each of these libraries.
Identifying the project based on the file name is harder than it seems. Many projects follow the third-portion-is-the-project-id convention and are relatively easy to sort out (though even this requires some gymnastics to get correct). Others require that I painstakingly root through the code repository for a project to determine which patterns belong to which projects (do you hear me, Web Tools?) and maintain a mapping of exceptions to the rule. This has allowed me to move past years of repeatedly asking the ECF project where the “ch.ethz.iks.slp” bundles come from (this is code that has been contributed to and is maintained by ECF). I need to be able to sort out the project from which an “eclipse.org” bundle comes from to ensure that the scanning tool is smart enough to identify third-party JARs that have been pulled in through reuse of Eclipse code (for which we do not require a CQ).
Mapping the third-party bundles is a bit challenging. Over the past couple of years, I have created a mapping that pairs file names with corresponding CQs. I have a mapping, for example, that connects CQ 2114 (iText PDF library Version: 1.5.4) with files of the form “com.lowagie.itext_1.5.4.*.jar”, “com.lowagie.itext.source_1.5.4.*.jar”, and “itext_1.5.4.*.jar”. The wildcards in the file names allow me to map to a file regardless of whatever qualifier might be included in the name.
The scanning tool makes confirming the integrity of project downloads a lot easier than the completely manual “scan the directories” approach I used to take. However, every time I run the tool on a new project, or a project with a new release, I inevitably have to add a mapping to my list. This is a another painstaking process: very often this mapping can be done based on the title of the CQ, but very often it requires that I break open attachments on a bunch of CQs to hunt for the right one. Sometimes I have to ask the project to help me sort it out.
I do sometimes discover missing CQs. Most of the time, the problem can be solved by creating a new “piggyback CQ” (that is a CQ for a library that has already been approved for use in another Eclipse project). Infrequently, I find an unexpected use of a library, or version of a library that requires a little more work to remediate. In the end, though, the downloads coming from an Eclipse release are squeaky clean.
As I said earlier, the current scan tool is pretty coarse. There’s some cleverness in it, but it depends on some pretty clunky notions. In its current form, the scan tool is a quick and dirty solution to a problem during my Linux BASH fascination phase. In the process of hacking it together, I’ve laid some groundwork for better tools (the project/bundle mappings and CQ/file name mappings are the most useful artefacts). I could do a better job, for example, if I could hack into the manifests and determine the dependencies. But I’m not going to do with with BASH/PHP scripts: that’s something that p2 is pretty good at. Any next generation tool will have to be built based on that.
Right now, I have to run the tools manually (from the command-line) on server; they’re a little on the heavyweight side, so we don’t want to open them up to just let anybody run them at any time. At some point, I intend to set up a low-priority process that just grinds through the projects and generates static reports for each of them.