Thin Air

Essential Code Implemented

A while back I posted on the question of which should be considered more authoritative, source code or byte code. The conclusion I came to was that neither is ideal as a "canonical" representation of a program; an abstract syntax tree would better fill that role.

Well, that notion stuck with me and I've started working on a simple tree-shaped representation for Smalltalk code. The idea driving this project is fairly simple: there's no "one true representation" of a program. It's really quite an abstract thing and it needs to be represented differently in different contexts. However, the most abstract representation is the AST, which can be easily converted to other forms as needed.

The AST form is most natural for manipulation by tools such as browsers, debuggers, type inferencers, version control, translators etc. ASTs can be executed directly - this is how the Ruby interpreter works, for example, but Smalltalk traditionally compiles to bytecode, which is can be more efficiently interpreted by the VM.

For presentation to the programmer, you want yet another form - class browsers and source code. And there may be other representations that are useful for presenting to the programmer: class diagrams, pattern summaries etc. (This is one of the core concepts of Intentional Programming as Darius Clarke commented on my last post on this topic.)

So the idea is to shift between these representations as fluidly as possible, and preserve as much of the available information as possible. So the AST form preserves much of the formatting information that the programmer originally entered with the source code, and can reconstruct that source faithfully.

However, that goal shouldn't get in the way of optimizing a particular representation for its context, which is the whole point of multiple representations in the first place. I'm really interested in and excited by projects for optimizing Smalltalk execution, such as Eliot Miranda's AOStA or Bryce Kampjes Exupery. In optimizing bytecodes or native code for fast execution, we may loose the information-equivalence between compiled methods and their ASTs, and that's OK.

In fact it's a good thing, because decoupling the representations used by the tools and the VM can make each more flexible and more powerful. Take the "senders" button in the browser, for example. If we optimize away certain messages sends by inlining the methods they call, we interfere with the browser's ability to trace the senders. If the browser is operating the AST, however, we don't have that problem. We are free to optimize the compiled methods for fast execution, the AST for ease of analysis, and the programmer's representation for clarity.

The first application of this new representation will be in OmniBrowser, which I'm in the process of adapting to operate on syntax trees rather than directly on the runtime. (Actually, OmniBrowser already has a layer of indirection between it and the runtime - this is what makes things like the Package Browser possible - so this is will actually be a simplification of that layer.)

Further down the road, I'd also like to use the same package representations in Monticello, since they provide a much richer model of the package, and could allow versioning and merging at a finer grain than the current model allows.

Posted in compilers

Going Places

In the last couple of days I've had to do a lot of running around in the course of having my wisdom teeth removed. (Remember that chapter of Cryptonomicon that dealt with Randy's wisdom teeth? Yeah. I had the "easy" one out today. It took 2 hours instead of 10 minutes, and the dentist was amazed by the gnarly roots on the "monstrosidad" of a tooth he wrenched from my skull.) Anyway, I had to do a lot of running around. And along the way, I noticed once again what a livable city Quito is.

First, there usually isn't much need to go very far. I don't know what the zoning policies of Metropolitan Quito are, or even if they have any, but they work well. There's a good mix of residential and comerical usage just about everywhere, and so most things are within walking distance. Very little of it is fancy, but it works.

Then, if you do have to travel a significant distance, it's both cheap and easy. Taxis, for example, are everywhere, and you can go pretty well anywhere for less than $5. I rarely pay more than two. You have to be a little careful about shifty cabbies, but it doesn't take long to learn the ropes.

The buses, though, top everything. They're even more common than cabs, they go everywhere, they only cost a quarter. However, riding a bus is not for the faint of heart. It took me a couple of months of acclimatization before I had the guts to try it. The thing is, there's no actual "bus system" in Quito. It's just an entrepreneurial free-for-all of little bus companies trying to make a buck moving Quiteños to and fro.

That's not to say that the buses are random. There are established routes, and these are posted on a sign in the front window of the bus. So you have about two seconds - between the sign becoming legible and the bus passing you - to decide if it's going your way. There's not much room on the sign, so it's a list of street names and landmarks, often abbreviated. You've got to be fairly familiar with the city to fill in all the gaps and do all the vector additions you need to make a decision.

The other thing is that, if you do decide to flag one down, it usually won't come to a complete stop. Ok, for little old ladies and children, the driver will take special pains. But such an obviously young and healthy fellow as myself just doesn't inspire that level of service. So you want to judge your opening carefully. You may have to cross a lane or two of traffic to get to the bus, and getting hit by a taxi won't advance your cause. Once you're there, getting on is pretty easy. The driver slows down, the barker gets out of the way (more about him in a moment), you grab the conveniently placed handholds, and up you go.

So the barriers to entry are fairly high, but once you've got things figured out, it's a nearly ideal system. Yesterday, I wandered out to the corner, hopped on a bus that looked like it was going in roughly the direction I wanted. As luck would have it, it dropped me off right in front of the radiograferia. Today my luck didn't quite hold: I had to walk 4 blocks to the dentist's office.

I think the division of labour has something to do with it. The driver drives, and the barker handles everything else. Mostly that means hanging out of the open door of the bus haranguing pedestrians with the bus route, but he also collects money, answers questions, keeps an eye out for the cops if there are more passengers than seats, and helps little old ladies aboard.

Going places in Quito is so much fun.

Posted in ecuador

More Static on Types

There's another thread on Lambda the Ultimate which I've been following with some interest. Unfortunately it seems to have degenerated into a static- vs. dynamic-typing flame war, which happens all-too-frequently for my taste.

It's quite a shame really, because I think the functional languages community and the dynamic languages community have a lot to learn from each other. (And since we're all outcasts from the mainstream computing world, we ought to stick together.) These are a few thoughts I've had about why the two groups communicate so poorly.

First, I think there's often confusion around terminology. In particular, the static/dynamic and strong/weak typing dichotomies are often conflated.

Labelling a language as statically- or dynamically- typed refers to the way variables are treated during compilation. In statically-typed languages, the compiler attaches type information to variables, and uses that information to catch type errors and perform optimizations. Compilers in dynamically-typed languages treat variables simply as named references to values, and leave it until runtime to determine how to perform operations on them.

The strong/weak dichotomy refers to how values are treated at runtime. Strongly typed languages attach type information to values, and programs cannot alter those types, while weakly typed languages treat values as "bits in memory" and how those bits are treated is largely a convention.

Programming languages are often either weakly- and statically-typed (eg, C++) or strongly- and dynamically-typed (Smalltalk), which may be one of the reasons for confusion.

Another thing that seems to enter into these debates is perspective. They often have the feel of "mathematicians vs. engineers" or "theory vs. practice". The exchanges end up being endless repetitions of, "Static typing guarantees that you won't introduce certain kinds of errors," which is rebutted with, "Yes, but it also guarantees that I can't get my work done easily."

Dynamic folks like to take a gardening approach to programming. They're up to their knees in the mud, hands dirty, planting and pruning, swatting bugs as they appear and composting weeds for fertilizer. They view the system as a living, evolving thing, and value testing, feedback and iterative development for figuring out what works and what doesn't. They don't worry about ensuring that everything goes right from the beginning, because a little pruning or landscaping can fix any problems that come up.

Static folks, on the other hand, take the architecture approach. They sit at a drafting table, and design structures of concrete and steel. They view the products of their work as monuments which must withstand the pressures of time and work hard to imbue them with mathematical grace and harmony. They know that structural failures can be catastrophic, so they build safety into their designs from the beginning.

Ok, maybe I went a bit overboard with the metaphor, but I hope this illustrates the different perspectives I'm talking about. Being a Smalltalker, I tend to prefer the iterative approach to development. Or rather, having learned the hard way that I can't plan for every eventuality, I appreciate that Smalltalk lets me quickly develop possible solutions, gather feedback and begin the cycle again. That said, I also appreciate the utility of the mathematical tools used by the functional programming community. What I want is a system that combines the agility of Smalltalk with the robustness of, say, Haskell.

And there's no reason the two need to be placed in opposition. Strongtalk was a great example of dynamic system that allowed rapid development, yet also provided tools for performing static type analysis in order to catch errors. In addition, the Strongtalk VM did optimizations based on type information gathered at runtime, which eventually wound up in Sun's HotSpot Java VM.

Frank Atanassow has promised to write up his take on the whole issue, in a paper called "How to argue about types," which I figure will be quite interesting as a reasonable view from "the other side."

Posted in compilers

Street Vendors

You can buy nearly anything from street vendors in Quito. Phone cards, sunglasses, CDs and DVDs, bananas, corn, limes or chochos. Just about anything you can carry easily. One time I passed a man standing next to an old-fashioned drugstore scale - 5 cents to weigh your self.

Walking home from the grocery store this afternoon, I passed a man selling record players on the street. And not the kind of thing you would use to spin a party, either. No, these were genuine immitation Victrolas - wooden boxes with turn-tables on top and great big horns attached by complicated plumbing.

I declined to buy one.

Posted in ecuador

Assembling Turtles

Lambda the Ultimate recently had a post on High Level Assembly with a link to an example of Object Oriented HLA. Reading that code is just creepy. There's something very attractive about being able to create powerful OO abstractions, while at the same time being able to control the machine at a low level. This is one of the things I like about Smalltalk, actually, although in that case it's low-level control of the virtual machine. On the other hand, I shudder at the thought of writing real software in assembly. (I guess I'm showing my age with that statement.)

This reminds me of a point made by Ian Piumarta about "language levels" in Squeak. At the top is eToys, a prototype language used in education. It lets children create graphical objects using paint tools and attach scripts to them to provide behaviour. Despite its simplicity, eToys is quite powerful; it can be used to create complex animations and simulations.

At some point though, kids grow up, and some of them might like to peek behind the curtain and see what's going on at the next level down, in Smalltalk. At this level, objects belong to classes. We can create instances and send messages to them, or create new classes and methods for responding to those messages.

Below that is a level that might be called meta-Smalltalk. It's the part of Smalltalk that deals with its own implementation - Metaclasses, the Compiler, MethodContexts, etc.

Below meta-Smalltalk is the virtual machine, which is implemented in a Smalltalk-like language called Slang. Slang is syntactically valid Smalltalk, and in fact, the Squeak VM can run a copy of its self, but Slang's semantics are such that it can be translated into C and compiled into native machine code for fast execution.

But here there's a bit of a disconnect, because the VM level is different from all the levels above. The higher levels of Squeak are all integrated into the same environment - to move from eToys to Smalltalk, for example, one need only click a button in the script viewer to see the Smalltalk code for a particular script. To move from Smalltalk to meta-Smalltalk, one need only click the "class" button in a browser or bring up a debugger.

To make changes at the VM level, though, one has to generate and build a separate VM; the Slang implementation is not accessible from Smalltalk. And this is what is appealing about HLA. Its lowest level of execution is accessible from the level above. Of course, HLA only has two levels, and neither of them are accessible at runtime - as in Slang, the program has to be reassembled and launched again before the changes can take effect.

Still though, it makes me wonder about the possibility of creating a system that really is turtles all the way down. It seems like Exupery might be a component of such a system, as might Ian's VVM project. Perhaps a more dynamic conception of HLA might even play a role.

Posted in compilers

STS 2004: Getting Away With Smalltalk

Well, Smalltalk Solutions is over, and after several days on the road, I'm finally home. Blogging from the conference turned out to be a non-starter, as I ended up using my spare moments to redo my presentation slides. So I've got a backlog of posts to catch up on. Up-to-the-minute reporting wasn't my intent anyway - James Robertson and Michael Lucas-Smith did a fine job of that. Instead, I want to share what I learned at StS.

Avi's keynote introduced what turned out to be a key theme of the conference: there are certain areas of the software industry where you can get away with using non-mainstream languages, and these are opportunities to use the productivity we gain from Smalltalk to competitive advantage. Avi's focus is web applications, and his talk walked through their evolution from CGI to the use of sessions, components and ultimately, continuations. With each step, the application design becomes more sophisticated, with more powerful object-oriented abstractions. In the end, we have Seaside - Avi's Smalltalk framework for web apps.

At one point, Avi quoted Cees de Groot:

Seaside is to most other Smalltalk web toolkits as Smalltalk is to most
other OO languages; it's as simple as that.

That quote captures the gut-feeling reaction I have to Seaside. It's good in the same way that Smalltalk is good. It's the way things should be, but rarely are.

And Seaside is an especially attractive application of that Smalltalk goodness, because it's one of those where you can get away with using it. Nobody cares what's running on your server, and most of the time they can't even tell. What they do care about is the usability and functionality of your application, and your ability to tune it in response to their needs. And that's were Smalltalk is a competitive advantage.

That theme was picked up by Lars Bak in his keynote on Resilient, and in fact just about all of the talks about Smalltalk for embedded and mobile systems. The Resilient VM fits into 32K of memory, and can run an application in 128K. It can be remotely debugged or dynamically updated without interrupting the functioning of the system. Try doing that in Java or C, which are the mainstream languages for these tiny systems.

Web applications and embedded systems are quite different beasts, but they do have two things in common: First, they both execute in a controlled environment, communicating with the world through standard protocols. Second, the business environment in which they are deployed is extremely competitive. The business value they provide and the ability to adapt to changing business conditions is more important than conforming to industry norms. I wonder where else we might find these conditions...

Posted in community

STS 2004: Cryptography and Smalltalk

Martin Kobetic made a really excellent presentation on VisualWorks Cryptography API. He walked through the major elements of cryptography - stream cyphers, block cyphers, message authentication codes, hashes, digital signatures etc. In each case he demonstrated the VW implementation.

The surprising thing that became apparent as Martin spoke is just how simple this stuff actually is. The VW API is extremely simple; generally you can do straightforward tasks with a couple message sends. Even the implementation seems pretty straightforward - the hard part isn't making it work, but making sure there are no hidden vulnerabilities..

One aspect of the presentation was very, very cool. The slides were presented using a custom VW app which had a very effective method of demonstrating the various cyphers. By selecting an area of the display, Martin could apply a cypher to that part of the display bitmap. This was really handy for visualizing encryption - after applying the cypher, blocks of the screen would appear to be white noise. He also did a great demonstration of the catastrophic effect of reusing a key with a block cypher - when two areas of the screen were encrypted with the same key and overlaid, the original images were plainly visible, no decryption necessary.

Congrats to Martin for a very clear presentation of a difficult topic.

Posted in community

Essential Code

Not long ago, Avi Bryant posed an interesting question on the squeak-dev list: "which is more authoritative, the source code or the bytecode?" Or to put it another way, what is the essence of a program, and how can we represent that in the machine?

From an information-content standpoint, the two forms are nearly equivalent. Source code is compiled to bytecode which can be decompiled back into source code. But there are subtle differences - bytecode looses the temporary names and formatting of the original source code, and with it something of the author's intent. On the other hand, the bytecode is better connected to the runtime system - variables are bound, selectors have been interned, etc. But neither is really suitable as an "authoritative" form a method, at least not from a tools perspective.

There are two problems with source code. The first is that it's out of date. It represents the method at the time the author compiled it, but (as Avi mentioned in his original post) that same string might not compile now, because of changes elsewhere in the system. At the same time, source code is really difficult for tools to work with. It has to be parsed for even such "simple" operations as selecting a message send or variable, to say nothing of an operation like "browser senders."

The problem with bytecode, on the other hand, is that it's an implementation detail. It's meant to be executed by the VM, and the performance of the system depends on the VM's ability to do that efficiently. So a CompiledMethod's ability to represent the abstract structure of the method is held hostage by the need to optimize its execution.

Now, for a lot of purposes, a method would be ideally be represented as an abstract syntax tree. If it were carefully designed, the AST could carry enough information to reconstruct the original source code with the author's formatting intact, and would be equally easy to convert to optimized bytecode or even native code for execution. Best of all, it would be easier to write tools such as the Refactoring Browser or SmallLint which take advantage of an easy-to-examine-and-manipulate representation of methods.

In all the systems I'm familiar with, ASTs are very transient things - produced during compilation but immediately thrown away. We'd have to rearrange quite a few of the basic assumptions of the system in order to use ASTs as the canonical representation of source code. There would be practical considerations as well - how much space does an AST require compared to a CompiledMethod or a chunk in the .changes file? How long would it take to generate bytecode from an AST?

Serialization and compression may help overcome some of these problems. This paper by Stork and Haldar presents a way of encoding ASTs based on their grammar, and is designed for fast decoding by a Just-In-Time compiler.

I'm not planning on writing a new VM for Squeak anytime soon, but I can't help thinking that some of these ideas will find their way into Monticello and OmniBrowser, one way or another. OB already uses Squeak parse nodes to do syntax-based selection in code browsers, and I'm very interested in Andy Tween's Shout package, which was released today.

Posted in monticello

First Post

They call it the Middle of the World. I'm writing from Quito, Ecuador - a stone's throw from the Equator and 2850 meters above the sea. (For the metricly-challenged, that's a little shy of 10000 feet.) I'm several months into a year-long "sabbatical" - a chance to learn Spanish, expand my cultural horizons, and explore the beauty of the Andes.

Geography not withstanding, I'm off to Seattle for Smalltalk Solutions next week. I'm giving a talk on the new tools being developed by the Squeak community and the culture that shapes them. This is a subject that's fascinated me for some time; I'm looking forward to see what people have to say about it.

Naturally I'm going to pay particular attention to Monticello and OmniBrowser, the projects I've been working on for the last couple of years. This site is my attempt to put a little more support behind them, so they can be useful to a wider audience.

As I learn more about building agile development tools, navigating Latin society and trekking in the Andes, I'll post my findings here.

Posted in ecuador