Thin Air

Versioning Smalltalk

Having been working in Smalltalk for a few years now, I find I occasionally forget just how different it is from the mainstream world of programming. The other day Avi posted about the recent interest in versioning systems and how what we're doing in Monticello is both similar and different to what's going on in other languages.

On the one hand, we're wrestling with the same information-theoretic problems as all other versioning systems. Essentially we want to be able to merge the work done by developers working separately in such a way that changes that don't affect each other are handled automatically, but those that do conflict are detected so that a human can figure out how to harmonize them. We want the merge process to be fast, the history data to be compact, and the restrictions placed on how developers work to be minimal.

On the other hand, Smalltalk code isn't like that of other languages. The issue isn't so much where it's stored - text files or image files - but how it's created. The structures needed to execute the code at runtime, classes and compiled method objects, are built up directly by the development tools. The only text involved is little snippets that make up method bodies. Heck, even when Smalltalk is written out to a text file, that file just contains a series of expressions that can be compiled and executed to rebuild the same executable objects in another image.

So for large parts of a Smalltalk program, there is no text to version. This is a problem because it means versioning Smalltalk programs with the same tools that the rest of the world uses is very difficult.

It can be done, of course. The precursor to Monticello was called DVS, and was mainly concerned with representing Smallalk code textually so that that we could version it with CVS. It would scan the text files for CVS's conflict markers and present them to the user for resolution. This worked ok most of the time, and was an improvement over collaborating via change sets.

But CVS has problems (hence then need for Subversion, Arch, Monotone, darcs, Codeville, BitKeeper etc.), and DVS wasn't able to completely bridge the gap between the objects created by the Smalltalk dev tools and the textual representations that CVS was dealing with. The result was lots of bogus conflicts. If two developers created methods that sorted near each other alphabetically, for example, that would be a textual conflict as far as CVS was concerned, but not a conflict at all in the Smalltalk world.

In trying to work around these problems, DVS had grown from a "little utility" for versioning Smalltalk code with CVS into a versioning system that used CVS as a backend. The only way to improve it was to ditch CVS and do the versioning in Smalltalk. And this is where the lack of a textual representation turned into an opportunity.

A Monticello snapshot is a list of definitions that make up a package. Working with them is almost absurdly easy compared to working with text. The standard diffing and patching that tools like CVS do is trivial, and that let us put our effort into solving the harder problems that the post-CVS generation of tools are tackling. As Avi noted, the solutions we came up with work, but they're not very elegant, and now we're looking for better ones.

Now, Smalltalkers tend to be enthusiastic about Smalltalk, and that can come across as arrogant. zippy's reaction isn't all that unusual. But I think language holy wars are a distraction from the intent of Avi's post. Smalltalk really is different from other languages, and that makes it interesting. What happened with Monticello is a recurring pattern. There's lots of tools out there that the Smalltalk community can't use, and so we're forced to write our own. Fortunately doing so easier than one might think, and what we end up with is pretty good.

The other thing that's easy to miss is that the Monticello approach can be applied to any language, not just Smalltalk. It's a bit more work, because you need to parse the language syntax before doing versioning operations, and of course, you loose the language-independence that text based tools have traditionally enjoyed.

Even so, I think mainstream versioning systems will end up there eventually. IDEs are leading the way - Eclipse, IDEA and their ilk are gradually replacing generic text editors like vi and Emacs, opening the way for syntax-aware versioning. The Stellation project was pursuing this, though it doesn't seem to have made progress for a while.

In the meantime, it'll be interesting to see how Monticello evolves as we make the most of our handicap.

Posted in monticello