Thin Air

Everything about semantics

Scripting languages and IDEs

On the Squeak development list there's been a lot of talk lately about creating a scripting language based on Squeak. On the surface it seems like a great idea. Scripting languages are popular, dynamism is in vogue, and it would be nice to be able to use Smalltalk for all the day-to-day utilities and admin tools that tends to get done in Perl or Ruby. On top of that, the main drawback of scripting languages is that there aren't any good IDEs for them. Squeak has a great IDE, and should be able to provide a great script development environment.

I'm pretty skeptical of the idea, because I think scripting languages and IDEs are like oil and water. They just don't mix. What follows is a post I made to the Squeak list defending this position. First, I'd like to define some terms.

IDE - This is a program that allows one to view and manipulate another program in terms of it's semantic elements, such as classes and methods, rather than in terms of the sequence of characters that will be fed to a parser. IDEs might happen to display text, but they also provide tools like class browsers, refactoring and other transformations, auto-completion of identifiers etc, things that require a higher level model of the program than text. Examples include various Smalltalk implementations, Eclipse, Visual Studio, IDEA.

Scripting language - a programming language and execution model where the program is stored as text until it is executed. Immediately prior to execution, the runtime environment is created, the program's source code is parsed and executed, and then the runtime environment is destroyed. This is an important point - the state of the runtime environment is not preserved when execution terminates, and one invocation of a program cannot influence future invocations.

Now, one might quibble over my definition of "scripting language." Fine, I agree that it's not a good general definition of everyday use of the term. But it's an important feature of languages like Ruby, Python, Perl, Javascript, and PHP and one that makes IDEs for those languages particularly hard to write.

Damien Pollet brought up the key issue in designing a Smalltalk-bases scripting language - should the syntax be declarative or imperative?

Imperative syntax gives us a lot of flexibility and power in the language. A lot of the current fascination with Ruby stems from Java programmers discovering what can be done with imperative class definitions. The Ruby pickaxe book explains this well:

In languages such as C++ and Java, class definitions are processed
at compile time: the compiler loads up symbol tables, works out how much
storage to allocate, constructs dispatch tables, and does all those other
obscure things we'd rather not think too hard about. Ruby is different. In
Ruby, class and module definitions are executable code. 

Executable definitions is how metaprogramming is done in scripting languages. Ruby on Rails gets a lot of milage out of this, essentially by adding class-side methods that can be called from within these executable class definitions to generate a lot of boring support code. In Java, we can't modify class definitions at runtime, and that's why Java folks use so much XML configuration.

Python does this too. Perl5 is pretty weird, but Perl6 is slated to handle class definition this way as well. Javascript doesn't have class definitions, but we can build up pseudoclasses by creating objects and assigning functions to their properties.

When writing an executable class definition, we have the full power of the language available. You can create methods inside of conditionals to tailor the class to it's environment. You can use eval() to create methods by manipulating strings. You can send messages to other parts of the system. You can do anything.

I'm making a big deal out of this, because I think it's a really, really important feature of modern scripting languages.

Declarative syntax, on the other hand, gives us a lot of flexibility and power in the tools. Java, C++ and C# have declarative class definitions. This means that IDEs can read in the source code, create a semantic model of it, manipulate that model in response to user commands, and write it back out as source code. The source code has a cannonical represenation as text, so the code that's produced is similar to the code that was read in, with the textual changes proportional to the semantic changes that were made in between.

This is really hard to do with scripting languages, because we can't create the semantic units of the program just by parsing the source code. You actually have to execute it to fully create the program's structure. This is problematic to an IDE for many reasons: the program might take a long time to run, it might have undesirable side effects (like deleting files), and in the end, there's no way to tell whether the program structure we end up with is dependent on the input to the program.

Even if we did have a way to glean the program structure from a script, there would be no way to write it back out again as source code. All of the metaprogramming in the script would be undone, partially evaluated, as it were, and we'd be stuck with whatever structures were created on that particular invocation of the script.

So, it would appear that we can have either a powerful language, or powerful tools, but not both at the same time. And looking around, it's notable that there are no good IDEs for scripting languages, but none of the languages that have good IDEs lend themselve to metaprogramming.

There is, of course, one exception. Smalltalk.

With Smalltalk, we have the best of both worlds. A highly dynamic language where metaprogramming is incredibily easy, and at the same time, a very powerful IDE. We can do this because we sidestep the whole issue of declarative vs. imperative syntax by not having any syntax at all.

In Smalltalk, classes and methods are created by executing Smalltalk code, just like in scripting languages. That code creates objects which reflect the semantic elements of the program, just like in the IDEs for compiled languages. One might say that programs in compiled languages are primarily state, while programs in scripting languages are primarily behavior. Smalltalk programs are object-oriented; they have both state and behavior. The secret ingredient that makes this work is the image - Smalltalk programs don't have to be represented as text.

And that's why a Smalltalk-like scripting language wouldn't be worthwhile. It leaves out the very thing that makes Smalltalk work so well - the image. It would have to have syntax for creating classes - either imperatively or declaratively. We'd end up limiting either the language or the tools, or if we tried hard enough, both.

I'd much rather see a Smalltalk that let me create small, headless images, tens or hundreds of kilobytes in size, with just the little bits of functionality I need for a particular task. If they had good libraries for file I/O, processing text on stdin/stdout and executing other commandline programs, they'd fill the "scripting language" niche very well. If they could be created and edited by a larger IDE image, they'd have the Smalltalk tools advantages as well.

I have high hopes for Spoon in this regard. Between shrinking, remote messaging and Flow, it's already got most of the ingredients. It just needs to be packaged with a stripped down VM, and integrated into the host operating system.

Posted in ide semantics refactoring language design scripting squeak