Chad Perrin: SOB

3 October 2007

optimizing operations

Filed under: Geek — apotheon @ 03:00

A few days ago, in I learn something new every day — this time about Python and Ruby, I posted on the subject of how a core language’s limitation to immutable strings can lead to limitations in how certain operations may be performed within that language. In this case, “immutable strings” was just a stand-in for any “best practices” attempt to purify the way a language works. This doesn’t mean such attempts are necessarily bad, of course — just that there’s always a trade-off when you give up flexibility for some form of “correctness”.

I was also motivated to address the subject because it was an interesting case of where a common assumption about performance is violated, in part to help make the point that in many instances choosing a tool based on its performance characteristics is not only a case of premature optimization but subject to what pharmacists call a “paradoxical reaction”. In other words, choosing a tool for performance purposes can sometimes end up hurting performance more than it helps.

One of the responses I received was from Justin James. Unlike many responses, he didn’t immediately start trying to prove my example “wrong” by addressing subject areas that are irrelevant, orthogonal, or at best tangential to my points. Instead, he brought up the interesting subject of operation optimization (as contrasted with work-arounds or just suffering with a slow operation). I’ve decided to address some of his points directly here, but before I start quoting him I’ll provide a general overview of his opinion on the matter, as I read it:

He finds multiple operator choices and reimplementing operators with different algorithms at the cost of programmer time spent building scaffolding inferior approaches for operator optimization to “smart” compiler optimization or simply using the right algorithm the first time when implementing the language. In general, I agree. There are (as always) caveats.

To expect the run of the mill programmer to know the internal differences between “less than less than” and “+=”, or to even expect them to remember thousands of little rules like “use ‘less than less than’ instead of ‘+=’ under these circumstances” is rediculous.

I agree. That’s why all you should have to remember is the difference between functional operators and operators with side-effects. See, the += operator is to some degree effectively a functional operator — it’s an infix operator that takes two inputs and returns a value. It’s not really a “purely” functional operator, of course, because it reuses the label on the left-hand side of the expression, but it doesn’t reuse the object that label previously “contained”. The << operator, on the other hand, does an in-place modification of the object.

Knowing that, and knowing something about the craft of programming in general, “use ‘less than less than’ instead of ‘+=’ under these circumstances” becomes a deduction based on expertise rather than a rote-memorized rule.

The mere mortals who make up the bulk of the population who become programmers are just not built this way.

Alas, the bulk of the programmer population aren’t worth their paychecks. The way they’re built is at great expense by shoddy “carpenters” at Blub U, where the only language allowed is Java. They don’t even know what a functional programming paradigm really implies in terms of programming practice — so of course they shouldn’t be expected to know the difference between << and +=. That would require a passion for learning, an education in something other than “industry best practices”, and some critical thinking skills.

using an object instead of an operator makes it much less likely to be used!

To clarify, in the .NET example you provided, you seem to be saying that having to build a class and instantiate an object where something more like a simple operator should suffice greatly reduces the likelihood that people will use the “better” means of achieving equivalent results. I agree — great, heaping masses of scaffold code used to prop up the simplest operations is just a bad idea in general. I know you’re not speaking of objects per se, however, because everything you see in Ruby that looks like an operator is in fact a message sent to one of its apparent “operands” — an object.

This is the same reason why people use regex’s in Perl constantly and never in OO language (and halfway between “constantly” and “never” in PHP, where it seems to be a function call), even though the regex pattern syntax itselx is identical… doing it via object is a hassle and doing it through an operator is a cinch.

There are exceptions to this. Some of the more dogmatically designed OO languages definitely seem to revel in the act of throwing hurdles in the way of simple, straightforward use of regexen, of course. On the other hand, object oriented regex syntax in Ruby is every bit as easy to use as Perl’s regex syntax, in my opinion. The fact that Java seems hell-bent on dissuading you from ever leveraging the power of regexen — and even Python forces library calls before regexen are made available — does support your point in general, though.

In any event, I feel that this is the exact type of thing that optimizing compilers are for.

Optimizing operation algorithms is certainly a job for compilers, VMs, and even interpreters, as much as possible. Still, there are cases where a distinct need for doing something one way or the other in a manner where a compiler is unlikely to be able to automagically guess your intent. In such cases, it can be handy to have options such as those presented by Ruby’s += and <<.

5 Comments

  1. And what is pretty cool here, is that you and I are in total agreement! :)

    An amazing example of what I am talking about, is Perl. Perl always has 500 ways of doing something, and only 10 you would have known about in advance until something you wrote that shouldn’t even parse suddenly starts generating proper output. Here is a great example:

    open INFILE, $filename;
    
    foreach $line (<INFILE>) {
        print $line . "\\n";
    }
    
    close INFILE;
    

    This is a totally insane piece of code. In no other language that I have used would this work. Yet magically it does. Perl says to itself, “well, ‘foreach’ is an interator, so I need something to iterate through, an array… but all I have here is a file handle. I know, let’s try reading the file into an array by splitting it into elements with the default record seperator as the split character, and then I will iterate over that!” And lo and behond, “Magic Happens” and the code runs! And it runs about half as fast as the right code, which everyone who read the Perl docs learned was the right way to do it.

    Indeed, the only reason to ever write that code (and the person who wrote that code freely admitted that this is how it came to be) is someone unfamiliar with Perl, but who knows other languages, stumbling through things just enough to conjure that piece of “Magic Happens” up. And once this person sees that “Magic Happens”, that code becomes part of the codebase and even worse, copy/pasted for 5 more years as that programmer works on other projects.

    So, that being said, while I appreciate have the power and flexibility that choices like “+= vs. less-than-less-than” provide, I think it really hinders the average code quality. Most of us learn the string concatenation, for example, and never bother to look further. Heck, it was YEARS after I learned the .Net Framework that I learned about the StringBuilder class. This was actually a great example of how “expertise” can cut both ways. In this case, I considered myself an “expert”. I found the string concatenation operator and felt happy with myself. Case closed. If I had walked into .Net as a naive beginner, I would have done a lot more reading and learning, instead of blindly stumbling through it, using whatever compiled and didn’t throw errors.

    As a result, it was not until I started doing relatively low-level .Net work on some pretty CPU/RAM intensive projects that I started learning about things like StringBuilder, implementing IDisposable, generics, and so on.

    The comments to the blog that started this highlight the problem perfectly. You did a straight Ruby-to-Python translation, which is precisely what anyone who knows one language well will do when they write code in a language they do not know, or know poorly. As a result, you wrote some Python that compiled and ran without error, but could have been much improved (although many of those optimizations that people presented changed the logic of the code a bit more than a simple “optimization” should, IMHO).

    J.Ja

    Comment by Justin James — 4 October 2007 @ 09:40

  2. Ack, WordPress chewed up my Perl code! It stripped the “less than” and “greater than” signs that surrounded “INFILE” on the “foreach” line.

    J.Ja

    Comment by Justin James — 4 October 2007 @ 09:41

    1. If you indent every line of code by four extra spaces, the WordPress Markdown plugin will automatically turn your code into monospaced font without markup interpolation. Note that nesting this inside of a numbered list breaks things a little bit, so to get the sort of results I did with the Ruby code below you have to jump through some hoops. I recommend keeping code outside of numbered lists for now, in case you were considering doing as I do in your next post, or something like that.

    2. For some asinine reason, backslashes need to be escaped with additional backslashes, as I just discovered — though maybe only when inside quotes. I don’t know, haven’t really fully tested it.

    3. I edited your code for you, so it should now look about right.

    4. In Ruby, I’d write the same code thusly:

      File.open(filename).each do |line|
        puts "#{line}\\n"
      end

    5. Actually, I’d probably do it with puts line instead of puts "#{line}\n", but I wanted to make sure the code produced exactly the same output yours did. I suspect the extra newline was a bug (due to not testing the code, shame on you), because you may have for a moment forgotten that when reading in lines from a file Perl keeps the newlines unless you explicitly choose to throw them away.

    6. I’d probably prefer the brace syntax rather than do...end syntax if the puts line was made that short. Thus, my ruby example should look more like this:

      File.open(filename).each {|line| puts line }

    7. You said “I think it really hinders the average code quality.” What I think really hinders average code quality isn’t that the language gives you options — it’s that people “learn” just enough Perl to get by writing crappy code, and don’t bother really learning the language. I wouldn’t blame the language for that, any more than I’d blame Perl for the fact that pretty much everything on Matt’s Script Archive would fail spectacularly if “use strict;” were added to the beginning of every script there. Bad programmers are not the fault of a language having options. After all, options that allow you to write bad code if you really want to often also allow you to write good code — better code than you could otherwise write in that language — if you know how.

    8. You said “You did a straight Ruby-to-Python translation, which is precisely what anyone who knows one language well will do when they write code in a language they do not know, or know poorly.” Actually, what I did was intentionally choose a code form that wasn’t idiomatic for either language. I didn’t write good Ruby, then translate into Python. I likewise didn’t write good Python and translate it into Ruby. I wrote generic iterative string concatenation code of the sort that would run in basically every mainstream language with no changes that would confuse anyone, and translated it into Ruby and Python. The idiomatic Ruby code would actually look more different from the examples I initially gave than idiomatic Python code, because of the joy of iterating with Ruby blocks.

    9. You said “many of those optimizations that people presented changed the logic of the code a bit more than a simple ‘optimization’ should, IMHO.” The reason those optimizations did so is that Python has no optimization that doesn’t change the algorithm enough to completely reframe the question. That was central to my point — you can’t optimize it effectively without destroying the example algorithm or going outside of the core Python language.

    Comment by apotheon — 5 October 2007 @ 01:58

  3. > “The reason those optimizations did so is that Python has no optimization that doesn’t change the algorithm enough to completely reframe the question.”

    If by “reframing the question”, you mean “removing the pointless progress counter”, then I guess we agree on that.

    There’s no reason to perform string concatenation for this problem. Like I said before in the comments, you can just do “s = sys.stdin.read()” and be done. I’m sure there is an equivalent statement for Ruby.

    Heavy string concatenation pays a performance penalty in any language to various degrees, and it’s an simple indication that you need to look at alternatives.

    Comment by Brandon Corfman — 10 October 2007 @ 07:08

  4. If by “reframing the question”, you mean “removing the pointless progress counter”, then I guess we agree on that.

    Pointless? What if the progress counter is the point of the code you’re adding to the program?

    Comment by apotheon — 11 October 2007 @ 10:12

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License