Thin Air

Everything about squeak

Scripting languages and IDEs

Posted by Colin Putney on October 20, 2006

On the Squeak development list there's been a lot of talk lately about creating a scripting language based on Squeak. On the surface it seems like a great idea. Scripting languages are popular, dynamism is in vogue, and it would be nice to be able to use Smalltalk for all the day-to-day utilities and admin tools that tends to get done in Perl or Ruby. On top of that, the main drawback of scripting languages is that there aren't any good IDEs for them. Squeak has a great IDE, and should be able to provide a great script development environment.

I'm pretty skeptical of the idea, because I think scripting languages and IDEs are like oil and water. They just don't mix. What follows is a post I made to the Squeak list defending this position. First, I'd like to define some terms.

IDE - This is a program that allows one to view and manipulate another program in terms of it's semantic elements, such as classes and methods, rather than in terms of the sequence of characters that will be fed to a parser. IDEs might happen to display text, but they also provide tools like class browsers, refactoring and other transformations, auto-completion of identifiers etc, things that require a higher level model of the program than text. Examples include various Smalltalk implementations, Eclipse, Visual Studio, IDEA.

Scripting language - a programming language and execution model where the program is stored as text until it is executed. Immediately prior to execution, the runtime environment is created, the program's source code is parsed and executed, and then the runtime environment is destroyed. This is an important point - the state of the runtime environment is not preserved when execution terminates, and one invocation of a program cannot influence future invocations.

Now, one might quibble over my definition of "scripting language." Fine, I agree that it's not a good general definition of everyday use of the term. But it's an important feature of languages like Ruby, Python, Perl, Javascript, and PHP and one that makes IDEs for those languages particularly hard to write.

Damien Pollet brought up the key issue in designing a Smalltalk-bases scripting language - should the syntax be declarative or imperative?

Imperative syntax gives us a lot of flexibility and power in the language. A lot of the current fascination with Ruby stems from Java programmers discovering what can be done with imperative class definitions. The Ruby pickaxe book explains this well:

In languages such as C++ and Java, class definitions are processed
at compile time: the compiler loads up symbol tables, works out how much
storage to allocate, constructs dispatch tables, and does all those other
obscure things we'd rather not think too hard about. Ruby is different. In
Ruby, class and module definitions are executable code.

Executable definitions is how metaprogramming is done in scripting languages. Ruby on Rails gets a lot of milage out of this, essentially by adding class-side methods that can be called from within these executable class definitions to generate a lot of boring support code. In Java, we can't modify class definitions at runtime, and that's why Java folks use so much XML configuration.

Python does this too. Perl5 is pretty weird, but Perl6 is slated to handle class definition this way as well. Javascript doesn't have class definitions, but we can build up pseudoclasses by creating objects and assigning functions to their properties.

When writing an executable class definition, we have the full power of the language available. You can create methods inside of conditionals to tailor the class to it's environment. You can use eval() to create methods by manipulating strings. You can send messages to other parts of the system. You can do anything.

I'm making a big deal out of this, because I think it's a really, really important feature of modern scripting languages.

Declarative syntax, on the other hand, gives us a lot of flexibility and power in the tools. Java, C++ and C# have declarative class definitions. This means that IDEs can read in the source code, create a semantic model of it, manipulate that model in response to user commands, and write it back out as source code. The source code has a cannonical represenation as text, so the code that's produced is similar to the code that was read in, with the textual changes proportional to the semantic changes that were made in between.

This is really hard to do with scripting languages, because we can't create the semantic units of the program just by parsing the source code. You actually have to execute it to fully create the program's structure. This is problematic to an IDE for many reasons: the program might take a long time to run, it might have undesirable side effects (like deleting files), and in the end, there's no way to tell whether the program structure we end up with is dependent on the input to the program.

Even if we did have a way to glean the program structure from a script, there would be no way to write it back out again as source code. All of the metaprogramming in the script would be undone, partially evaluated, as it were, and we'd be stuck with whatever structures were created on that particular invocation of the script.

So, it would appear that we can have either a powerful language, or powerful tools, but not both at the same time. And looking around, it's notable that there are no good IDEs for scripting languages, but none of the languages that have good IDEs lend themselve to metaprogramming.

There is, of course, one exception. Smalltalk.

With Smalltalk, we have the best of both worlds. A highly dynamic language where metaprogramming is incredibily easy, and at the same time, a very powerful IDE. We can do this because we sidestep the whole issue of declarative vs. imperative syntax by not having any syntax at all.

In Smalltalk, classes and methods are created by executing Smalltalk code, just like in scripting languages. That code creates objects which reflect the semantic elements of the program, just like in the IDEs for compiled languages. One might say that programs in compiled languages are primarily state, while programs in scripting languages are primarily behavior. Smalltalk programs are object-oriented; they have both state and behavior. The secret ingredient that makes this work is the image - Smalltalk programs don't have to be represented as text.

And that's why a Smalltalk-like scripting language wouldn't be worthwhile. It leaves out the very thing that makes Smalltalk work so well - the image. It would have to have syntax for creating classes - either imperatively or declaratively. We'd end up limiting either the language or the tools, or if we tried hard enough, both.

I'd much rather see a Smalltalk that let me create small, headless images, tens or hundreds of kilobytes in size, with just the little bits of functionality I need for a particular task. If they had good libraries for file I/O, processing text on stdin/stdout and executing other commandline programs, they'd fill the "scripting language" niche very well. If they could be created and edited by a larger IDE image, they'd have the Smalltalk tools advantages as well.

I have high hopes for Spoon in this regard. Between shrinking, remote messaging and Flow, it's already got most of the ingredients. It just needs to be packaged with a stripped down VM, and integrated into the host operating system.

Posted in ide semantics refactoring language design scripting squeak

Announcements

Posted by Colin Putney on July 8, 2006

The basic design strategy for OmniBrowser is simple: rather than modelling a browser with one large and complex object (like Browser does), break it up into a network of smaller, simpler objects. From there, the design is pretty straightforward, and it's much easier to build lots of kinds of browsers from the same code base.

This design does have a downside, though. It makes event handling more difficult, because the objects that need to communicate to respond to events are often in distant parts of the network, and can't rely on the the structure of the network to find each other. Early versions of OmniBrowser responded to events, such as a click, with a cascade of messages, with each object letting it's neighbors know about the the event. This had the advantage that each object only needed to know about it's immediate neighbors, but it was also fragile and prone to infinite loops as neighbors repeatedly notified each other of the same event.

My second attempt to address this problem involved the use of a Dispatcher. This was an central object that all notification messages would flow through. As the various parts of the browser were created, they would register with the dispatcher to receive messages. This was an improvement, because objects could send messages to "everybody" rather than to an explicit receiver. But it was still awkward, and the event handling code was still convoluted and difficult to understand.

I've just finished up the implementation of my third attempt, this time based on Vassili Bykov's notion of Announcements. I talked to the folks at Cincom about porting the code to Squeak, but that didn't work out. I ended up just doing a mini-implementation that meets my needs for OmniBrowser. (Actually this was probably what I should have done in the first place. It was probably less work for me to re-implement Announcements from scratch than it would have been for someone at Cincom to get corporate approval to release the code under an open source license.)

Despite all the positive things Vassili had to say about Announcements, I have to admit I was surprised what an improvement it made in OmniBrowser's event handling code. My first pass at the conversion was simple. I replaced messages sent to the dispatcher with announcements sent to the announcer. Then I installed an announcement spy and browsed around the image a bit. It turned out that every event resulted in 3 or 4 redundant announcements, and probably even more unnecessary updates to the UI.

So I made a second pass, explicitly aimed at removing all the redundant announcements. In many cases, this meant finding the ultimate source of a particular announcement. For example, OBSelectionChanged should only be announced from two places in the code. All the other places where it was being announced were redundant, and had to be removed. By spying on announcements, I was able to get a clearer idea of the code flow in response to different events, and find other ways to simplify.

I suspect there's even more simplification that can be made, but even without it, moving to Announcements was a big improvement.

Posted in omnibrowser announcements visualworks smalltalk squeak

Monticello 2 alpha release

Posted by Colin Putney on May 24, 2006

One of the things that surprised me at Smalltalk Solutions this year was the continuing interest in Monticello 2 from outside the Squeak world. Now that I'm not working in VisualWorks day-to-day anymore, I've been more focused on solving the problems that we have with using Monticello 1 in Squeak.

However, there is a real need for tools to make cross-dialect development easier, and versioning is an important component of that. After doing a few demos, I had volunteers to maintain VisualAge and Dolphin ports. The VisualWorks folks all seem pretty busy, but I'm sure somebody will step up when MC2 gets to production quality.

With all that momentum coming out of the conference, I cleaned up the code a bit, wrote an installer and posted the first alpha to SqueakMap. The reaction has been mostly positive, particularly given that Monticello 2 is still very raw and there's no documentation at all.

To remedy that I'll post some discussion of the architecture and features of Monticello 2 over the coming weeks.

Posted in smalltalk monticello dolphin visualworks visualage squeak