The Eclipse Development Process doesn’t make any specific requirements on what projects are supposed to distribute. Projects are required to operate in an open and transparent manner, so I guess that we could say that project are required to distribute their source code. And they do distribute source code via Git, SVN, and CVS. But how projects provide their code outside of the source code repositories is really left to the individual projects.
Most Eclipse projects distribute something from the download server. Some projects provide p2 repositories. Some provide archives (generally in the form of ZIP or tar.gz) files of project bundles. Some of these archives include bundles from other projects (including project bundles and third-party libraries). Eighty (80) projects, for example, distribute at least one bundle from EMF alongside their own bundles. Seventy-five (75) projects ship at least one bundle from Eclipse Platform. Thirty-five (35) projects include bundles from ECF with their distribution.
I can’t remember the last time that I installed any Eclipse software from an archive file. If you excuse the Eclipse for RCP and RAP Developers package that is. It’s literally been years since I’ve downloaded any other project’s archive file and installed it “the old fashioned way”.
I posed the archive question on Twitter. @dougschaefer replied with “Off-line installs”. This makes sense when you consider users who are stuck behind a firewall and can’t use p2. @irbull suggested that building target environments is still problematic and it is oftentimes easier to just piece together a handful of archives. @njbartlett added that “offline installs are crucial for setting up training courses”. These are great reasons to keep these archives, so I’m willing to accept that having them is valuable (my own usage patterns notwithstanding).
But then, why do so many projects include bits from other projects? Some of the downloads are “all-in-one” packages: convenient distributions that pull all the necessary bits together into a single handy download. This is also good stuff, but certainly adds a lot of weight on our download server when you think about multiple versions of “all-in-one” packages for multiple platforms. This can easily become gigabytes of data in very short order.
I started this discussion because disk space use continues to increase an an alarming rate. We are very concerned not only about the rising cost of maintaining our own servers, and backup; but the cost to our many mirrors. Some mirrors are already selective with regard to what they rsync from our servers. And Denis has done a good job of identifying files that should not be replicated. Even after that pruning, however, there are a lot bits left. As the volume increases, we run the risk of angering and possibly losing mirrors.
If we assert that all the different forms in which project code is distributed today are necessary and vital to the ongoing health of the projects and community, what should we say about retention? I know that many projects have a retention policy for the bits they provide for download. Some even document their policy. For most projects, however, it’s more ad hoc. How long before bits are moved from the download server to the archive server? How long before they’re just moved to the big bit-bin in the sky? What–if anything–do you keep forever? How long are nightly, integration, and milestone builds retained?
I really don’t want to create a formal policy for this. Projects have enough burden without adding still more to the pile. This is, however, the sort of thing that projects need to think about. At least a little. It’s another example of tragedy of the commons: the more we wait for others to step up and take care of the problem for us, the more likely it is that we’re all going to lose out.
It would be helpful, for example, if files–especially large ones–that aren’t downloaded all that much could be moved to the archive server which is not mirrored. Assuming the download script is used, this move will be transparent to your community. If files start to become more popular, they can be moved back.
I consider this the start of the discussion (or–more likely–the continuation of an ongoing discussion). What can the Eclipse Foundation do to help?
For some reason I’m reminded of the bit in This is Spinal Tap, “Are there any requests? … get off the what?”