Chad Perrin: SOB

30 September 2007

I learn something new every day — this time about Python and Ruby.

Filed under: Geek — apotheon @ 03:34

It’s sort of a generally recognized “fact” that Python executes a little faster than Ruby. Generally speaking, nobody in the Ruby community seems to have any objections to that estimation of execution speed, at least up through Ruby v1.8.x (though 1.9, soon to be 2.0, is another story).

On the ruby-talk list, someone whose identifier string is “Ruby Maniac” (e.g. From: Ruby Maniac <foo@bar.baz>) started two or three threads about how Ruby isn’t as fast as Python, and therefore isn’t as good as Python, among other very trollish statements. This has spawned some actually useful discussion about Ruby performance and other characteristics of both the language and the implementations thereof, largely ignoring “Ruby Maniac”. It has also spawned a fair bit of discussion about how to properly handle trolls, with just a little bit of discussion about whether “Ruby Maniac” qualifies as a troll. I enjoyed some kudos for discussing the fact that, as much as I personally dislike Python and as much as “Ruby Maniac” has decreased the signal:noise ratio on ruby-talk, his(?) worst offense was really to the Python community by (mis)representing Python fans as trollish pricks.

Mostly, however, the resulting discussion has proved fruitful, in terms of addressing subjects of performance and other features of the language and implementations.

An interesting tidbit about the relative performance of Python and Ruby came up on Friday, but I haven’t gotten around to reading the interesting tidbit until today. For background, on Friday I saw that someone posted the following equivalent code snippets from Ruby and Python.

#!/usr/bin/env ruby

s = ""                                                                                                                   
i = 0
while line = gets
  s += line
  i += 1
  puts(i) if i % 1000 == 0

#!/usr/bin/env python
import sys

s = ""
i = 0
for line in sys.stdin:
  s += line
  i += 1
  if i % 1000 == 0: print i

The observation made about it was that the Python code was reasonably quick with output coming smoothly and at a fairly steady pace, while running the Ruby code resulted in output coming more and more slowly as execution progressed and the string got longer. I’ve tested it myself by piping a cat of several large text files into the script, and it does indeed slow down between every iteration of outputting a number incremented by a thousand, by a quite visible degree.

The reason, of course, is that in the above Ruby code a new object is being created for every single iteration of the string concatenation, which produces a fair bit of process overhead. I watched CPU usage jump up to about 85% for that process while watching the output of top in another term window. The suggested fix that came up on the list was to use the << operator rather than += for the string concatenation operation, so that instead of s += line it would say s << line. This actually uses a true in-place concatenation, rather than allocating a new string object that contains a concatenation of the previous string object’s value with the newly input string.

I already knew all that. I didn’t know using the += operator (well, technically it’s a method in Ruby, but it looks like an operator) would have that dramatic an effect, however. News to me, especially considering that the Python code was doing the same thing and didn’t have the same slow-down problems.

The optimized version of the Ruby code (using << instead of +=), then kicked the crap out of the Python code in a performance test. What surprised me about this is that Python apparently doesn’t provide an equivalent optimization option, however. This is the part I learned today — “Florian Frank” posted to ruby-talk the following as an explanation for why Python has no equivalent concatenation performance optimization:

Yes, for a + b python allocates a new string of length(a) + length(b), then a‘s and b‘s content are copied into the new string. There is only an optimization if a or b is empty. a << b in ruby first reallocates memory of length(b) to a‘s then copies only b‘s content to a. If b is empty ruby does nothing. The more complex concatenation operation in python is caused by python’s immutable strings.

The final lesson I get from this, at this point, is that there’s yet another reason for me to dislike Python’s immutable strings.

Granted, there may be some other “better” way to get similar optimization in Python that has not caught my attention. If so, I’d like to know about it. I certainly didn’t expect to find that something like an iteratively growing string operation would be so much more optimizable in Ruby than in Python.

It sorta flies in the face of the common “wisdom” about Python generally outperforming Ruby. Sure, it’s just one use case comparison out of many, but it seems like kind of a big one.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License