simon_phipps
Columnist

What you can learn from the monster LibreOffice project

analysis
Feb 15, 20137 mins

More developers with varying skills, yet better code? More iterations, yet greater stability? It's happening with LibreOffice

A large legacy code base is a challenge for any team to embrace and improve. So how well does a distributed team of volunteers address the problem?

A talk at FOSDEM shed light on how the large and diverse team assembled by The Document Foundation (TDF) is approaching the huge LibreOffice code base left in the wake of Oracle’s withdrawal from OpenOffice.org. The result is not only an impressive sequence of on-time releases, but also a range of development innovation. In particular, the “bi-bisect” technique they’ve developed could be a great approach for others faced with large, complex code bases.

[ Also on InfoWorld: New LibreOffice turns up the heat on Microsoft | Review: LibreOffice 4 leaves you wanting more | Track the latest trends in open source with InfoWorld’s Technology: Open Source newsletter. ]

The talk, “LibreOffice: Cleaning and Refactoring a Giant Code Base,” was delivered by Michael Meeks, a developer employed by Suse who has been working on LibreOffice (and OpenOffice.org before it) since 2000. Meeks covered both the development challenges of LibreOffice and the new features of the 4.0 release, which InfoWorld covered on release day. But the narrative chronicling the development challenges was instructive, inspiring, and worth digging into.

Lowering barriers

How did the community form in the first place? A core of developers carried over their work from the former OpenOffice.org community, but the key move was to make it easy and fun to join in with development, so the project avoided barriers to participation. The mailing lists were tuned to welcome newcomers rather than to favor existing developers; a page of “easy hacks” created for newcomers had small, tasty morsels to chew over, and extensive README files were made easily findable.

The project adopted Git as its version control system, allowing easy contributions as the best-known and most widely used open source tool for the task. The process has been simplified further by adding Gerrit, which enables what Meeks described as “permission-free commits.”

The code base was hard to build, so the project set up automated Tinderbox continuous-integration build servers, allowing any developer to work on the code without needing to create their own complex build environment in multiple operating system environments. The code has been substantially cleaned up with translation of comments from German to English for more accessibility around the world (most developers have English as at least a second language). The clean-up also involved a great deal of refactoring of old approaches into more modern ones and the elimination of unused code left over from defunct platforms — this is 20-year-old code, after all.

Most recently, the project has dealt with larger, more ambitious refactoring, such as reimplementing the Microsoft document filters and introducing layout-based dialogs in place of hard-coded options. Meeks covered a number of significant tasks that are in progress; his slides (PDF) offer full details.

Upholding quality

All this change to a complex, fragile, legacy code base could well lead to breakages. Indeed, regressions were a constant issue for LibreOffice. To deal with them, the project has taken several approaches to ramp up code quality without killing progress in the name of stability. They’ve greatly increased the number of unit tests available, which has allowed more changes to be made more quickly, adapting to the ever increasing flow of contributions from new contributors.

Two “cultural” improvements have helped a lot. First, the interface to the Bugzilla bug tracker has been greatly enhanced, so it’s relatively easy for end-users to report problems with enough detail so that they can be reproduced. The Bugzilla Assistant is graphical and workflow-based, leading users through the otherwise complex process of reporting a bug. Second, the release timeline has been locked into a predictable schedule. This means developers know when their work will get shipped (if it meets the quality threshold) and prevents people playing politics by trying to sync releases with the work of high-status individuals. Both improvements have boosted LibreOffice.

All the same, breakage happens, but a brilliantly simple invention by team member Bjoern Michaelsen of Canonical has allowed anyone on the QA team to locate the bugs that cause regressions. Git includes the git bisect command, which helps pinpoint regressions. The developer identifies a version of the code with the defect under investigation, as well as a historic version that did not exhibit the defect. Git then offers a commit between those two and asks if that version exhibits the defect. By repeatedly bisecting the list of commits, Git permits rapid isolation of the point the defect was introduced.

A new enhancement for Git

Standard enough, but LibreOffice is a huge, complex code base, with often 50 or more commits per day. Building any one of them could take several hours, rendering the bisect process infeasible.

Michaelsen realized this could be overcome by storing the binary build the Tinderboxes create after each commit. He’s established a repository specifically for them, so it’s now easy to check out the fully built version of LibreOffice that matches each commit offered for testing by the git bisect command. Using this “binary bisect” approach (shortened to “bi-bisect”), a relatively inexperienced tester who doesn’t even know how to construct the code could verify the location of a defect in a quarter of the time it would take for one build. Even subtle bugs needing detailed investigation have surrendered to bi-bisect. Indeed, one I reported has been fixed this way.

Does all this innovation deliver? Meeks showed statistics suggesting it does. Despite complex refactoring and a diverse developer community with varying skill levels, stability is improving and bug levels remain under control.

Meeks offered a lesson from all this. He noted that traditional enterprise development uses a slow, conservative process to achieve quality because conventional wisdom says a higher rate of change decreases quality. But he suggested the curve has two sides. The approach TDF is using is so open and so rapid that — despite the high rate of change for the code — the low-process environment, rapid build approach, and rapid release timeline together mean new bugs are fixed fast.

There’s much to learn from this experience. Even if your development team is unwilling to embrace the super-rapid iteration approach, techniques such as bi-bisect, continuous integration, simplified bug reporting, and permissionless commits can all be adopted piecemeal.

People first

The most important point of all was right at the start. Meeks emphasized that an open source project may seem to be about code, but is actually about people.

Developers and end-users gather in an open source community because of their own interests and needs. To succeed, any community must focus on this: on ethos, on reciprocity rather than exploitation, on licensing as a constitution for the community, on friendship, and especially on keeping everything fun.

LibreOffice succeeds not because of the code but because of the way the people who work on it self-organize and treat one another, starting each involvement with assumed trust and empowerment. That’s a lesson every open source community needs to take away.

This article, “What you can learn from the monster LibreOffice project,” was originally published at InfoWorld.com. Read more of the Open Sources blog and follow the latest developments in open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

simon_phipps

Simon Phipps is a well-known and respected leader in the free software community, having been involved at a strategic level in some of the world's leading technology companies and open source communities. He worked with open standards in the 1980s, on the first commercial collaborative conferencing software in the 1990s, helped introduce both Java and XML at IBM and as head of open source at Sun Microsystems opened their whole software portfolio including Java. Today he's managing director of Meshed Insights Ltd and president of the Open Source Initiative and a directory of the Open Rights Group and the Document Foundation. All opinions expressed are his own.

More from this author