Chad Perrin: SOB

21 June 2006

Golfing between languages: Ruby vs. Perl

Filed under: Geek,Metalog — apotheon @ 12:33

Sterling, aka “Chip”, posted an interesting challenge/exercise to his weblog earlier this month. Because I’m a slacker, I haven’t gotten around to reading that weblog entry until last night. I do this sort of thing regularly: I “forget” to read much of my regular online reading material for a couple weeks, then I get a hankering for stimulating text to read and get caught up on some stuff.

Sterling has recently been the featured blogger on iBLOGthere4iM, and he referred to me as one of sixteen favorite bloggers of his (out of a blogroll of about sixty, last I checked). That’s pretty good, considering he’s also one of my favorites. I might just keep a narrower focus in my blogroll here at SOB than he does, or perhaps I just don’t have as many online friends (hard to believe), but my blogroll is significantly smaller than his. That might be giving me an inflated view of my own importance, to be picked out as a favorite from that list.

Regardless, I suspect that we each read each others’ material for some of the same reasons, including some of the contemplative ponderings on programming, and programming languages. The man has a lot more experience with programming than I have, but I seem to be a little more willing to go wandering off down the garden path of wild speculations than he is when discussing programming concepts. Maybe that’s a sign of his wisdom, but he seems to enjoy reading it. He provided something more concrete than my usual stuff when he brought up his Ruby mailing list management script golf challenge, in any case.

He reports that he was inspired by a webpage listing a bunch of Ruby/shell one-liners to write a brief Ruby script to manage mailing list data that comes to him in some very nastily inconstant file formats. They’re apparently CNSSTSV files, where CNSSTSV stands for Comma or Newline or Semicolon or Space or Tab Separated Values. Sometimes with multiples of each. He managed to get it done in seven lines of code and suggests he might need to practice his Ruby a bit more so he can get it done in one line of code, then challenges others to see if they can golf it down to a smaller script in Ruby, or even Perl. I decided to take up the challenge when I saw this last night, so here we are.

I wrote two versions of the script in Perl, one optimized for linecount and one for word count, via the wc utility on my Linux laptop. Note that for the following results, I same-lined opening braces on my looping structures. I normally give them their own line, but went with the same-line option (which tends to be more common among other programmers, anyway, the philistines) in part to reduce line count for silly things like block delimiters.

In the output from wc that follows, preceding each block of code catted from its file, the three numbers listed are line count, word count, and character count, respectively:

$ wc rbmail
  7  22 168 rbmail
$ cat rbmail
@addresses = {}
while $< .gets
  $_.split(/[;,\s]/).each do |addr|
    @addresses[addr] = addr if (addr =~ /\S/)
@addresses.sort.each { |addr| puts(addr[0]) }

$ wc plmail-0
  6  16 113 plmail-0
$ cat plmail-0
while (<>) {
  foreach (split /[;,\\s]/) {
    $address{$_} = "$_\\n" if /\\S/;
print(sort values(%address));
$ wc plmail-1
  5  18 117 plmail-1
$ cat plmail-1
$in .= $_ while <>;
foreach (split /[;,\\s]/, $in) {
  $address{$_} = "$_\\n" if /\\S/;
print(sort values(%address));

I'm pretty sure I could have done it in a single "line" of code (of reasonable length), if I was going to mix Perl with bash at the CLI as was done on the page of Ruby one-liners he indicated in his weblog entry, but he seems to be doing all his Ruby coding on a crippled OS that doesn't include a good shell like bash, so I decided to make the code cross-platform for him. The Ruby code might be pared down to a (slightly longer) single line when mixed with bash, too, but I haven't tried -- and, frankly, don't know Ruby as well as him, despite the fact I think my gushing about it was one of the reasons he decided to give the language a whirl.

If you have ideas on how to golf this further, please contribute. I've tried combining the print statement in the Perl scripts into the while() and foreach() loops, but always ended up running up against a problem (in some cases, the problem being that it actually increased line, word, and character counts). If you have any hints on how to do this without significant increases in code, I can probably use shebang line options to reduce the code "size" by making the while() and, perhaps, foreach() loop structures unnecessary.

Looking at the Ruby script, I don't immediately see a way to make it shorter without adopting habits so bad I hesitate to use them in golfing something that's actually going to be in use aside from shortening the addresses variable name to address. That, of course, would reduce the character count by six, but otherwise leave the length of the script untouched. Not really worth the effort of editing, all things considered, especially since I tend to consider character count a poor judge of language succinctness except where things like language keywords and function (or subroutine or procedure or method) names are concerned.

NOTE for the kiddies: Do as I say, not as I do. Don't ever write production Perl code without using the strict and warnings pragmas. I actually used those when I wrote these scripts, then deleted them and the lexical variable declarations. I did that, of course, to reduce script length, since the code works the same either way -- but it's a bad habit, so don't do it except when playing golf.


  1. looks like your blogware took out some backslashes; we’ll see if it does the same for me. i’d definitely do this at the command line in perl… the uniqueness requirement makes it a bit tricky to make it down to a single line. here’s my first attempt:

    perl -wle ‘do { $addr{$_}++ for split /[;,\s]+/ } while <>; print for sort keys %addr’ [list of files]

    that’s actually reasonably efficient because it doesn’t slurp the whole file(s) in at once, though the output is probably still constructing a huge list before printing it. if you want an actual unbroken statement, this should work on files of reasonable size:

    perl -le ‘print for grep { !$seen{$_}++ } sort map { split /[;,\s]+/ } <>’ [list of files]

    i don’t think there’s ever been an easier language than perl for one-liner text manipulation :-)

    Comment by sosiouxme — 21 June 2006 @ 05:14

  2. minor refinements: since these are emails, case shouldn’t matter so might as well lowercase them to remove case-sensitive duplicates. also grepped out empty string and delayed sorting to post-grep.

    perl -le ‘print for sort grep { /\S/ && !$seen{$_}++ } map { split /[;,\s]+/, lc } <>’ [files]

    yeah, i’m a perl fool!

    Comment by sosiouxme — 21 June 2006 @ 05:43

  3. […] Recent Comments sterling on Maybe it isn’t the mark of the beast, but…Zach on Maybe it isn’t the mark of the beast, but…SOB: Scion Of Backronymics » Golfing between languages: Ruby vs. Perl on Ruby does mailing listssterling on Favorite programming projectJustin James on Favorite programming project […]

    Pingback by These are a few of my favorite blogs, part II -- Chip’s Quips — 22 June 2006 @ 11:56

  4. Thanks for both the one-liners and the heads-up about the missing backslashes.

    The code tags in WordPress don’t maintain indentation, and as you’ve pointed out the pre tags apparently don’t prevent WordPress from parsing out certain characters, so I ended up having to combine the two. I’m not too keen on the italics for code tags: I may have to edit the code that handles that later, so that I get a more teletype-like look.

    I should get back to learning Logo, and get far enough to be able to do something like this in that language as well. I’m pretty sure it’ll be a bit more complex, since I don’t think Logo has native regex support (alas).

    Comment by apotheon — 22 June 2006 @ 01:44

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License