Thin Air

Scalability

Now that I've been at Quallaby for a little while, I've begun to get a sense of what is going on in our app. The most striking thing is an apparent contradiction: At first glance it's an incredibly boring, even trivial application. We fetch files from remote machines, parse them, and load the data into a relational database. Users view the data via a web-based reporting tool. But when you look more closely, the gymnastics we have to go through to accomplish this are amazing.

One reason is scale. We're processing statistical data gathered from devices on very large networks. The exact volume varies from customer to customer of course, but it's pretty easy to get over a million records per minute going into the database, hour after hour, day after day. It's so much data that statistical reports on it can't be computed on the fly. It all has to be pre-computed as the data is loaded, or the reporting interface won't be responsive enough to be usable. Of course, that puts even more stress on the backend - that "trivial" application to fetch data and load it into the database.

Another source of interesting complications is the nature of the statistics we need to compute. Conceptually they're pretty simple; for example, the number of packets sent or received on a particular port of a particular device. But the method for locating those bits of data is enormously variable, as each type of network device presents the information differently. So we make this part of the application scriptable, and turn over the job of dealing with the quirks of ATM paths, in-octets and QoS thresholds to the networking experts.

As a result, the subproblems we have have to solve to get data from A to B are fascinating. Take scripting: we currently have several DSLs for specifying how data should be handled as it goes through various stages of processing on its way to the database. There are too many of them, in fact, and we're currently working on consolidating the user-scriptable portions of the app on two languages: ECMAScript and SQL.

From a computer-science point of view these are really interesting choices. On one hand we have a dynamic, imperative, prototype OO/functional language. ECMAScript might be described as a cross between Self and Lisp, wrapped up in C syntax. It fits in nicely with many of the things we're used to doing in the Smalltalk world, but with a more mainstream syntax.

On the other hand, we have SQL, a declarative query language based on relational algebra. But instead of executing the queries against tables in a database, we're applying them to virtual tables representing data in network devices, intermediate results as it moves through the processing pipeline, or in any one of several tables in the central database. Naturally, the implementations of both languages have to be robust, memory efficient and fast.

Personally, I'm fascinated by computer languages, so for me this is the most interesting part of what we're doing. But there's gobs of other interesting problems that we run into: memory management, execution optimization, cluster computing etc. Recently we've been digging into the research that Google Labs has been doing in this space. We don't have scale nearly as high as Google does, of course, but we're running into many of the same issues they are, and every good idea helps.

Posted in design