Chad Perrin: SOB

10 February 2007

Java benchmarks, enterprisey apps, and the future of Ruby (is Smalltalk)

Filed under: Geek — apotheon @ 12:59

Assaf over at Labnotes recently wrote that timing is everything. I read that a couple days ago when it was first posted, but the discussion in comments has progressed into some hotly contested ground — as is almost inevitably the case when someone starts comparing languages, particularly languages socially set against each other as contenders for best-ness by highly visible talking heads in the programming world. In this case, the enemy languages are Java and Ruby, and the topic is performance benchmarking.

It is almost always the case that when people start comparing benchmarks between Java and another (non-Smalltalk) language the Java-heads will demand that all benchmarks be run post warm-up, after the JVM has been running a bit, so that it has a moment to do its optimization magic. Simiarly, the non-Java people demand that just as the start-up times of other languages are always measured, so too should those of Java, and the fact Java starts slow and later runs fast should be taken as a whole rather than the slow start being ignored. After all, it’s the total runtime of your operation that matters, not just part of it.

Since the very beginning of my experience with such debates, I’ve stayed out of them, because I could see points to both sides, and also because I loathe Java and don’t like to get into debates with the Kingdom of Nouns crowd about their orthodoxy. It can get pretty grim wading through all that sometimes, as I’ve seen and managed to mostly avoid, despite my cantankerous and confrontational nature on matters of opinion like this.

When I first read the post at Labnotes, there wasn’t much discussion. More happened later that day, though, and now I wish I’d come back to check on it, because it got interesting. Now, I have something to say. It’s a bit too long to just post as a comment, however, when it could make such a great post at SOB. Ignore the fact that I just referred to my own writing as great, please — move along, no arrogance to see here, we are just a hedge.

Assaf maintains in that post that Java’s warm-up time should be measured in a benchmark, for essentially my above-mentioned commonly cited reasons. A participant in the discussion, self-identified as Charles Oliver Nutter, had this to say:

However, there’s one other aspect you’re ignoring: a large part of the world’s computing is not done in response to a single command, it’s in a long-running process where startup time pales compared to the runtime of the application as a whole. Yes, it’s probably quicker to use something like Ruby to run a single command at a command line, but it’s certainly faster to run a long-running process in Java. It’s all a matter of tradeoffs; to say that either is always the correct answer is foolhardy.

And also…if your Java-based command runs in five seconds plus two seconds to warm up while your Ruby-based command runs in ten, you do get more done with Java in less time. And the vast majority of computing problems will dwarf startup time with their longer runs.

He seems to be overstating the case quite a bit — his choice of phrasing implies that the whole world is running large applications all the time, so that anywhere you shouldn’t be using C or assembly language for maximum performance you should instead be using Java more often than not. At least, that’s the impression I got. On the other hand, Charles Oliver Nutter has a point, no matter how much I actually loathe Java and believe its mythic benefits are generally blown way the hell out of proportion to their actual applicability. That point is the one and only reason that Java is actually successful, in a technical rather than social sense of the term: in “enterprisey” applications.

If you start up your huge, sprawling, fifty-thousand-lines of “enterprisey” code application and let it run for the next thirty years, and compare it to your non-VMed Ruby script that does the same thing as a particular bit of Java code, in practice your massive Java app needs to be measured for performance after warm-up to make an honest case for either language as the better choice in that circumstance. If, on the other hand, your Java app is going to be restarted every time you want to perform a specific action with it, and it’s meant to be a lightweight desktop application, it’s more honest in terms of performance measurement to include “warm-up” time in the measurement. If you’ll be performing the same action over and over for a couple hours before shutting down the process, you should include warm-up times and every iteration, and do the same for the Ruby code, just to be completely fair.

This is why these stand-alone toy algorithm benchmarks are so notoriously useless in reference to real-world situations. Everybody wants to measure how long it takes to do a single arithmetic calculation, and nobody takes running conditions into account. It takes a bit of work to properly interpret toy algorithm benchmarks so that they actually apply realistically to the actual mission-critical runtime world. It can be done, but requires more than the time utility and your interpreter, VM, or compiler for each language.

In other words, part of the reason there’s so much debate over Java performance measurement is simply that the Java people are measuring performance for “enterprise” deployments, and everyone else is measuring performance for one-shot applications. If you adhere to the philosophy that rebooting is murder, as I do for most purposes, you actually assume an almost ethical need to use the Java VM only-after-warm-up approach to measuring performance, since your applications should never need to reboot. On the flipside, however, that concept properly refers only to systems, not utilities (though I don’t recall Steve Yegge making that distinction) — a system performs a utility function, and the utility act itself is nothing to worry about (for instance, running ls on a unix system) as long as you don’t have to kill the whole system along with the utility.

Ultimately, it makes some sense for a lot more software of certain types to move toward an optimizing VM for perpetual runtimes rather than the one-off methods used for most applications today (even most Java applications, where you start it, wait for the warm-up, then use it for a while, and shut it down). Java, however, probably won’t be the language for that purpose in the long run. It’s a mess — a huge, ponderous mess. There’s nothing really stopping Ruby from moving into the position of “bestest perpetual runtime VM language” at some point, so far as I’m aware. The fact that Java is the heavyweight in that arena has largely to do with the fact that it’s:

A) the language closest to C++ in syntax of the major VM languages, which was a benefit in terms of rapid adoption around the time it really hit its stride (notice how I toss that insightful statement in without fanfare)

B) one of a very few major VM languages in existence (that one as well)

C) very well marketed (not that one — everybody and his vbscript-programming dog has said that)

In fact, I think Ruby would make a truly excellent choice to “replace” Java in that position: its ubiquitous and seamless object orientation, extremely friendly and unambiguous syntax, and flexible and pervasive metaprogramming capabilities add up to a potent mixture for writing code for complex systems meant to run “forever”. It is only the current interpreter implementation that makes that so dismayingly impractical as a way to employ the language — not only because it’s an interpreter, but because it’s a slow interpreter. Ruby is doomed to the life of a web-and-scripting language until that’s fixed, even if it’s an incredibly beautiful, glorious web-and-scripting language with a complex systems language hiding inside, struggling to get out.

Just compare Ruby and Java in cases where your code isn’t running “forever”, and you can include “warm-up” time in your Java benchmarks — which will help prove once and for all that Java should never be used in situations where Ruby is currently a contender. It’s also true that Ruby with its slow interpreter should not often be used for “enterprisey” applications of the sort where Java dominates the field — even if it does manage to accomplish certain types of simple arithmetic routines pretty quickly.

On the other hand, we could skip bringing Ruby into the perpetual VM running model altogether, and just use Smalltalk for that purpose as God intended.


  1. How long does it take for fire up the JVM? Compared to Ruby?

    Sounds like a simple question, but I think the fact that most Java developers can’t answer it is revealing.

    You see, as much as this is a question about performance, it has nothing to do with performance.

    Comment by assaf — 10 February 2007 @ 01:32

  2. Sounds like a simple question, but I think the fact that most Java developers can’t answer it is revealing.

    I think it is, as well. On the other hand, what exactly it reveals is up for debate. All I know for sure is that it reveals something about the likelihood of most Java developers being able to construct a very solid case for Java performance — namely, that it’s unlikely.

    You see, as much as this is a question about performance, it has nothing to do with performance.

    I could interpret that several ways. I’d appreciate it if you’d elaborate on how you meant it.

    Comment by apotheon — 10 February 2007 @ 03:04

  3. The gut reaction many people had to my post is based on the ongoing belief that the Java startup time is measured in seconds.

    That was true in ’96.

    But many people still hold on to that world view and refuse to re-evaluate where we stand. They internalize it’s shortcomings, some of which are no longer true, and then rationalize to defend them.

    And it’s not just performance. A lot of the stuff that happens around Java is the result of cognitive dissonance. Of accepting fallacies instead of learning new facts.

    Comment by assaf — 10 February 2007 @ 10:47

  4. Good insights here, and in assaf’s subsequent response.

    I learned long ago that in benchmarks you must test exactly what you intend to run. The only meaningful benchmark is performed with the real application.

    1986: We were upgrading to a new version of the language we were using. It had a significant runtime component, so we wanted to benchmark it well before adopting. We constructed tests of every intensive operation we could think of, from I/O to floating-point math. Every operation was faster in the new version, sometimes by a factor of two or more. But when we ran the real application, it was three times slower.

    Why? The language vendor had implemented an automatic memory management algorithm that wasn’t nearly as smart as a human at mapping out what code to swap out of memory and when (this was back when a program had to run in 32KB). Applications thrashed the program images on disk. Our benchmark programs were small enough that we never reached that threshold (or should I say, thrashold).

    Comment by Sterling Camden — 12 February 2007 @ 03:40

  5. Yes, I think you should say “thrashold”. I like that term, and may use it myself.

    Comment by apotheon — 12 February 2007 @ 04:08

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License