Saturday, January 19, 2008

On Blub

There's an interesting (old) discussion thread at Raganwald.com:
http://weblog.raganwald.com/2006/10/are-we-blub-programmers.html

Blub Theory evolves over the course of the comments. First off, the requirements for Blubbiness are defined as:
  1. There's at least one language that's worse for the task at hand, and the programmer realizes (validly) that it's less suited for the task than Blub.
  2. There's at least one language that's better for the task, and the programmer doesn't realize that it's better for the task than Blub.
That's Blub from the programmer's perspective. For the adventurous programmer, Blub is the language you're fighting against when you try to introduce Python or OCaml to your programming team. (Citations: Beating the Averages, The Python Paradox)

Another commenter glances on the management perspective of Blub:
I misread "Blub" to be "bulb". As in when a programmer burns out his employer just throws him away and screws in a new one.
'Nuff said.

When we're not trying to pin it down too carefully, we know exactly what Blub is. The first Blub was Fortran — and in some circles, it's still Blub. Guy Steele Jr. (who was involved in the design of Scheme, Common Lisp, and Java in previous efforts to take down previous Blubs) is currently working on Fortress with the same goal. Fortress looks nothing like Fortran, but the name's close enough to get the point across, with the point being that it's intended to be much better suited for tasks where scientific programmers instinctively reach for Fortran. C++ was Blub for the '90s, and since real Blubs never die, it's still the Blub of choice for most performance-critical stuff. Java out-Blubbed C++, and now Java and C# are splitting the Blub market. However, examples of Blub code are still generally a simplified C++, since the equivalent in Java would take too much boilerplate to be worth the column space.

(Sidebar: Modern Fortran looks almost nothing like the original Fortran. C++, especially variety found in Visual Studio now, and with the 200X extensions, is also a mutant. And it's not a superset of C99, either, defeating the original purpose of the language. Visual Basic, VBA and VB.NET are not compatible, despite the naming. How can a language take its users for such a ride over the years when another language with a different name might be the more logical next step? Javascript and C# surely got a bit more mileage on name recognition. All the evidence points to human psychology still working on programmers.)

But the argument breaks down when we try to explain why another language is better suited for the task than Blub. There are C++ programmers out there who can bust out a better program than you can write with any other language. Out of the last 10 years of the ICFP Programming Contest, 3 winners used Haskell, 3 used OCaml, and 2 used C++ — and this is a contest arranged by functional-programming gurus. Python and Ruby have never received any prizes, though Perl was recognized by the second-place team last year.

I see two axes to evaluate languages on: something like the front end and the back end. Semantics and implementation. Both are labeled "power", but for the front end that implies what the language does for the programmer, and for the back end it's what the program does for the machine.

Example 1: Ruby has an excellent front end. It's one of the most expressive languages available; that's probably why 37signals picked it for Rails. The best semantic ideas from Smalltalk, Common Lisp and Perl are all in there; most of the famous Design Patterns are built in either implicitly or explicitly. (It's not that the language makes them all obsolete; the language designer just had the foresight to implement the tricky ones for you.)

But the back end has been playing catch-up. Performance lagged well behind even Python and Perl until the very recent v1.9, and there's no native-code compiler. I could be misinformed, but I've also heard that: threading suffers the same issue as Python of limiting the interpreter process to a single processor (or so I've heard); there's no built-in foreign-function interface a la Haskell or Python's ctypes module; Unicode support has rough spots; and large-scale parallelism and concurrency basically mean running a bunch of separate Ruby processes.

There's a lot for a Blub programmer to pick on.

Example 2: Delphi, a.k.a. Object Pascal, has an excellent native-code compiler, with support for cross-platform compilation and single-file (DLL-free) executables, and can also run on .NET, with all these options available through the same IDE. It's competitive with C on benchmarks, often faster. Integration with databases and other external components is solid. Refactoring tools are included with the IDE, lots of fun with static analysis. Object Pascal itself was originally designed at Apple some years before Borland picked it for their offering, then abandoned, but there seems to be something inherent in the language that enables highly optimized compilation. The Free Pascal implementation, for instance, comes well ahead of every other language in the Computer Language Benchmarks Game when memory and speed are weighted equally. On the combined benchmarks, Free Pascal uses only half as much memory as C (gcc)!

The catch is, Object Pascal is a cheesy-looking language. On the same benchmarks set, comparing the size of the code in gzipped bytes (emphasizing tokens instead of characters), Object Pascal comes in 24th out of 33 languages, just behind C. It beats Fortran, Java and C++, but not C#. I think I'd just buy more RAM rather than rewrite a Blub program in Object Pascal.

The benchmarks tend to be heavy mathematical algorithms, rather than general-use applications, so certain things like I/O, libraries and support for bottom-up programming and meta-programming are discounted. Regardless, Python, Perl and Ruby are the top 3 languages for code size on these benchmarks — I think Lisp was hurt more by this aspect of the benchmarks, since syntactic sugar isn't built in; there's no room for code reduction via mini-language. Haskell was probably helped by the absence of I/O. In general the benchmarks show that Blub languages perform well but are somewhat verbose, while scripting and Web-friendly languages are concise but have poor performance; Prolog ranks badly in every way, while OCaml and Haskell do well in every way; this fits reality fairly well for number-crunching but not for the Web or AI. Let's acknowledge once again that benchmarks aren't perfect, and forge ahead.

Fact: A language and a compiler are not the same thing.

But the Object Pascal example should show that a language can baby the compiler to give better results. The three arguments go:
  1. Declaring static types and generally programming close to the metal gives the compiler the information it needs to generate an optimally efficient program. That's why C can be fast — it's a close fit to the hardware. Same goes for Java and the JVM.
  2. Using the right abstractions and strong type inferencing lets the compiler get a high-level view of what your algorithm is doing, allowing it to do more optimizations itself. That's why OCaml and Haskell can be fast — they're a close fit to the pure algorithm.
  3. While the expressiveness of new languages like Ruby and Python is appealing, the race to incorporate imperative, object-oriented and functional programming styles into every major language is actually resulting in weaker languages. Borrowing features doesn't bring a language any closer to providing a new model of computation, and it certainly doesn't give a better angle of attack at the whole point of all of this — making the computer hardware do what we want.
The third argument was made by John Backus, creator of Fortran, the Backus-Naur Form for defining programming language syntaxes, and later the FP and FL programming languages.
"Programming languages appear to be in trouble. Each successive language incorporates, with a little cleaning up, all the features of its predecessors plus a few more. [...] Each new language claims new and fashionable features... but the plain fact is that few languages make programming sufficiently cheaper or more reliable to justify the cost of producing and learning to use them."
— John Backus

The talk that began with this argument went on to introduce function-level programming. At the time, everyone thought Backus was talking about functional programming, so it unintentionally gave a boost to Lisp and later the ML family, of which Haskell and OCaml are derived. But no: it was actually about a new language called FP, somewhat based on APL. FP begat FL, which went nowhere, but Morgan Stanley created an ASCII-friendly variant of APL called A+ (which is now free, GPL'd software), and the proprietary J and K have carried the torch since then. The use of these languages now seems to be mostly in the financial world. (Perhaps because it's really very well-suited to financial tasks, and perhaps because that's where APL made its splash — who knows, it may have even Blubbed its way in.)

The main idea is point-free programming: rather than pushing values around (as even functional programming languages do), compose functions together to create an algorithm that only references functions, not values. Then create a basic set of operators that can be composed together to create higher-level functions. This is an excellent way to do manipulate arrays and matrices. Haskell touches on this idea but doesn't emphasize it.

Benchmarks for any of these languages are hard to find, but I see one cryptic statement about K here:
 [k is much faster than c on strings and memory management.]

And another startling statement on Wikipedia:
The performance of modern CPUs is improving at a much faster rate than their memory subsystems. The small size of the interpreter and compact syntax of the language makes it possible for K applications to fit entirely within the level 1 cache of the processor. Vector processing makes efficient use of the cache row fetching mechanism and posted writes without introducing bubbles into the pipeline by creating a dependency between consecutive instructions.

It looks pretty convincing to me. Finally, a fresh look at how programming languages make a machine do work.

This seems to be the argument missing from every language war: by removing the non-orthogonal parts of a language, it becomes more powerful. K doesn't have objects or continuations, and it doesn't need them. Likewise, Haskell restricts the ability to modify state to monads, Erlang's flow-control constructs throw out traditional iteration entirely, and Lisp virtually strips out syntax itself.

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.
— Revised5 Report on the Algorithmic Language Scheme


The corollary to (and irony of) the Blub paradox is that since these optimized languages are missing constructs found in Blub — by design — a Blub programmer will always have plenty to pick on.