The discussion in response to my I learn something new every day — this time about Python and Ruby has gotten a bit lengthy. 25 comments is too many to really address everything properly within further comments — it’s rapidly approaching the point where I might just let others discuss the matter and not get involved. Rather than ignore it, though, I’ve decided to tackle it here, in a new “top level” post.
Jeremy Bowers made a good point about various languages’ implementations of a
+= operator. This is sort of an implementation detail, though, rather than a common linguistic design choice — and tends to be more suited to a compiler (in Java’s case, a bytecode compiler) than to an interpreter. Python provides the ability to compile source code to bytecode, as I understand it, and Perl does a JIT parse tree compilation every time you run it — to some extent, such an implementation might be appropriate in both cases, though it may also impose some new limitations on how the language itself will be designed in the future. For this reason, even though VM implementations of Ruby are starting to appear, I don’t know that such an implementation of the
+= operator (or method, in this case) would be a good choice for Ruby. Luckily, Ruby has
<< as well, and collect-before-joining approach can be otherwise implemented by the programmer if so desired.
Jeremy’s also right, as far as I’m aware, about Python “generally” outperforming Ruby on an algorithm-by-algorithm basis, for current stable releases of Python and Ruby. I certainly didn’t mean to suggest that Ruby outperforms Python consistently — only that certain language design decisions limit the ability to eke greater performance out of certain types of operations, even when the design decision in question doesn’t seem directly related. In this case, it results in a greater performance benefit to Ruby for a particular type of operation than for Python, for a roughly comparable algorithm.
The heart of his final statement:
If this post says anything, it speaks to the dangers of how high-level languages can obscure the underlying algorithms, and therefore obscure the performance implications of them.
It’s true. On the other hand, unless you’re a core maintainer for one of these languages, the part that’s of most interest to you is likely to be how this affects the way you program. Since I’m not a core maintainer for any programming language at present, and have only ever directly contributed at all to a language by doing some scut-work for a C compiler project completely unrelated to these languages in particular, my focus was on how the algorithms used in implementing these languages affects the algorithms I’ll use when writing code.
A lot of attention was given to the choice of algorithm and how idiomatic it is to Python, of course. For instance, someone using the name “nirs” said:
The idiomatic Python code is:
s =  for line in lines: s.append(line) s = ''.join(s)
Similarly, Paddy3118 said:
In Python one is taught not to concatenate strings using
+=but to use the join method instead.
Going back to Jeremy for a moment, he made the salient point that needs to be made here:
When you use two different algorithms in two different languages, all bets are always off; language differences are generally swamped by the differences in algorithms.
To be fair, I did ask (toward the end of my original post about string concatenation in Python and Ruby) for better ways to improve string concatenation performance in Python. On the other hand, several responses seemed to be offering counterarguments rather than improvements. These two in particular are doing something completely different from what I originally addressed — string concatenation. Instead, they say that you can get similar results with better performance by doing something else. All this means, in the end, is that when you tell your doctor “It hurts when I do this,” he should tell you “Don’t do that.” Sometimes, it is unfortunately the fact that it would be nice to be able to do that.
Of course, I just picked two names out of a hat. The same point was made as well by Simon Willison, metapundit, JamesH, Brandon Corfman, DDP, Vincent Foley, someone identified as “Anonymous”, and somewhat rudely by someone using the name Masklinn.
Someone else identified only as “Anonymous” posted the words “Python does have mutable strings. They’re called ‘lists’,” which points out that the same problem might be solved differently (as have the above-noted users of lists to avoid string concatenation), while simultaneously making a strictly incorrect statement.
I was somewhat impressed with Spacebat‘s response, in that it both suggested three different approaches to speeding up the operation in Python and discussed the downsides of each, rather than simply pretending that if you want a faster program you shouldn’t want the output you actually set out to get in the first place. It shouldn’t seem impressive that someone treated the matter reasonably, but it does.
Mark Thomas, meanwhile, pointed out that what I posted wasn’t idiomatic Ruby either — something the people complaining about the lack of Python “smarts” in the original example probably never thought to consider. I’m not really sure what qualifies for “idiomatic” Python style in this case: Mark suggests that the original example for Python was idiomatic, and I’m inclined to agree that it’s idiomatic Python for string concatenation operations, distinct from idiomatic “avoid string concatenation operations” Python. Mark’s example of idiomatic Ruby style is on the money, but would not have served as an effective comparison example to demonstrate the differences in the two languages’ handling of string concatenation (going back to the “different algorithms make a bigger difference than different languages” idea, again). Interesting (to me, at least) is the fact that, despite being visually quite different from the original example and leveraging the beautification and iteration capabilities of Ruby to positive effect, the execution optimizing piece of the code is exactly the same — namely, use of
<< in place of
+=. Its execution performance is equivalent to the original “optimized” version as well.
Finally, there were several references to the idea of using an IO library call (by Chadwick Morris, Smel, Troy Kruthoff, one of Spacebat’s suggestions, and someone called Tom). While that’s useful to anyone thinking about writing code that behaves similarly, it also obscures the issue somewhat — after all, once we start doing library calls like that to make up for performance bottlenecks we lose any ability to compare the features of the core language. There are, I’m sure, at least a dozen Ruby libraries out there I could similarly use to change the performance characteristics of my examples.
- Justin James made some interesting comments as well, but I think they deserve their own response.
- In the future, people posting code here might want to take note of the note above the comment text entry box that reads “Markdown: You can also format text using Markdown syntax.” In particular, indenting every line of code by four spaces in addition to any other spaces your code needs for indentation should provide the code formatting you need.
- There’s also a note below that text box, just above the “Preview” button, that reads “Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.” I’m afraid I didn’t get back to checking on this post’s comment activity for a couple days, and as such a bunch of people ended up saying roughly redundant things. This is not their fault, but the fault of the necessity of comment moderation and my own slowness to get around to dealing with comment moderation.