So the Dash backend process is running. Again. In fact, it’s run to completion once already.
Dash is the Eclipse Foundation service that provides commit data. Dash is used when we IP Logs, show graphs on project pages, and for general sorts of activity queries via the Dash user interface. It’s designated a “tier 3” service by the Webmaster Team (next business day), it’s not redundant, and has split ownership (Webmaster owns the server, I own the scripts).
The server that runs the script that generates Dash data fell over about a week and half ago and causing all of the services that depend on the data to fail. The problems started with the server losing contact with the Eclipse project Git and SVN repositories that Dash mines for data; this resulted in a completely empty database. We solved that problem quickly and restarted the process. Mining data from the bajillion source code repositories we have is a very time-consuming process that takes days to complete. Unfortunately, the machine died before it completed, so it had to be restarted. Then it lost contact with the repositories again.
At this point, I should make it clear that I’m in no way being critical of the Eclipse Webmaster team. I inherited responsibility for this service a few years ago, and have been content to coast with the status-quo with regard to service level agreements as the importance of the service has grown.
The Dash backend process was designed many years ago. It was, in fact, created when we only had CVS repositories and has been upgraded to include first SVN, and then Git repositories (I did the Git stuff). It sort of works like this:
- Create a new table named “commits_batch”;
Walk through the CVS commits, summarize them, and add them to commits_batch;
- Walk through the SVN commits, summarize them, and add them to commits_batch;
- Walk through the Git commits, summarize them, and add them to commits_batch;
- Drop the table named “commits” and rename “commits_batch” to “commits”; and
- Generate aggregate data (used to report committer status, render charts, etc.).
Either this all works, or this all fails. When the server lost contact with the SVN and Git repositories, the script failed on those two points but powered through the rest, killing the data collected by the previous successful run in the process.
The server that had been running Dash was pretty old by modern standards, so the Webmaster has retired it and moved Dash to a new server that seems to be pretty solid. I’ve redeployed the scripts there and it’s running now. I still have a few front-end things to sort out, but the backend is running, and the services that depend on the data are catching up.
Damn my optimistic nature. Dash is severely broken. Ask me about workarounds for your IP Logs. We'll get this licked.
— Wayne Beaton (@waynebeaton) September 24, 2013
Dash worked well in the early days. But with changes in technology and some very steady growth in the number of projects and source code repositories, it’s starting to show its age. It was never designed to handle the sort of scale we have today and so it’s well past time to start thinking about a replacement.
I have some thoughts about the replacement:
- Keep the data as current as possible using incremental updates instead of long/infrequent running batch processes;
- Track authors and committers;
- Track activity in Gerrit, Bugzilla, forums, and mailing lists (commits only provide one metric of project activity); and
- Make Dash a tier 2 service;