Chad Perrin: SOB

30 April 2008

recursion confessional

Filed under: Geek,Profession — Tags: , , — apotheon @ 11:45

I have taken a couple of college classes in programming. I remember conversations after class with the instructor from my Programming 101 (not sure that’s what it was actually called) course, where we strayed from C/C++ (the implementation language(s) for the class) to matters related to Cobol and Perl, among other things. I remember being frustrated by the fact that while taking a class in JavaScript, I discovered that I learned programming more slowly in a formal education environment than on my own.

Otherwise, my programming learning has basically all been self-taught. As a result of this, I tend to learn the stuff for which I have a real use, and that’s about it. Oh, sure, I “learn” some stuff for purposes of wanting to know how things work as an academic exercise, out of curiosity and fascination, but if I don’t have real-world applications for what I’m “learning” I don’t really learn it in a way that makes it ultimately useful when the real-world application of the concept does rear its head. It’s only then that I actually learn it in a practical, applicable manner.

I got to thinking about this in part because I reread Dan Kegel’s How To Get Hired — What CS Students Need to Know. In it, he identifies three basic programming skills that many programming job applicants (surprisingly) lack:

  1. looping (e.g. “Write a loop that counts from 1 to 10.”)

  2. doing basic arithmetic in your head (e.g. “What’s the number after F in hexadecimal?”)

  3. practical recursion (e.g. “Use recursion to solve a real problem.”)

The reason this got me thinking about my (lack of) “formal” education in programming is point three — practical recursion. See, I’ve never really had to use recursion in the real world, and it sure didn’t come up in introductory courses in C, C++, and JavaScript. The obvious use-case for recursion is parsing arbitrary depth hierarchical trees, such as directory structures — but the vast majority of my “real world” work has been in Perl, PHP, and Ruby, all of which make implementing the actual tree-parsing algorithm an exceedingly rare necessity for common cases. In such languages, the recursion is done for you. Ruby compounds the “problem” by providing such excellent iterators that even in the rare case, you can put together an easier, more elegant-looking solution with an iterator block than with recursion anyway (and punishes deep recursion by failing at tail-call optimization, too, at least as recently as version 1.8).

Of course, the first time I had a real-world problem to solve where recursion was the obvious best choice, I’d surely learn it sufficiently to solve that little deficiency in my skillset, but this still offers two problems:

  1. Being unfamiliar with recursion outside of gratuitous examples like Fibonacci sequence generators, I may well not notice when a recursive solution is marginally better than the iterative solution I’m using instead. It would take a really obvious (to me) need for recursion for me to notice I needed it, I suspect.

  2. The ability to learn concepts quickly and apply them well while learning them doesn’t prepare me for the circumstance posited in Dan Kegel’s essay: job interviews. In other words, he probably wouldn’t hire me.

Every once in a while, I find myself kicking around the idea of getting hired on at a “normal” programming job. One of the things that holds me back (aside from all the other things, like my enjoyment of rolling out of bed at 10:30 instead of jumping up at 6 AM, choosing the projects I work on rather than toeing a corporate line, et cetera) is the fact that any place I’d want to work, in an environment with smart programmers doing interesting things, I’d probably lack some of the “signifier” skills interviewers would expect of me. The very best places would be those where someone would think to ask “Why don’t you know how to do this?” rather than just chalking up my ignorance to incompetence — thus perhaps allowing me the chance to get in anyway — but that’s such a rare case that I’m unlikely to find a hiring dev shop like that anyway. It’s difficult enough finding a place that doesn’t have arbitrary, pointless requirements like “must have Master’s degree” for “entry-level programmer” jobs.

I’m probably a prime example of what people who value certain levels of “higher education” in “computer science” (note that the quotes around CompSci are meant to denigrate the quality of most CS degree programs, not the concept of CS itself) like to use as examples of how right they are to ignore matters like intelligence, passion, adaptability, integrity, and other actually useful characteristics, preferring sheepskin as the Metric Most High for a new hire. This isn’t because I’d be bad at the job, even if it required constant use of advanced recursive techniques. It might take me a month to spool up to the same speed on recursion as everyone else there, sure — but the initial investment would be worth getting me rather than someone with a Master’s in CompSci who’s dumb, plodding, rigid, and prone to doing everything for CYA value rather than the success of the project.

None of that changes the fact that, without the standard CS degree program course of study, I’m at a distinct disadvantage for certain types of interview skills-test questions.

At least I know that the next number after F is ten, though.

29 April 2008

wordpress.com != good domain hosting

Filed under: Geek,Metalog — apotheon @ 02:04

In required cookies == bad design, I pointed out that requiring cookies just for people to view free content is a bad idea. When denying cookies means you can’t view the Website in question, this means a lot of people aren’t going to view the Website — including me, much of the time.

I gave this advice to all and sundry Website maintainers:

Maybe you should actually visit your own site on a computer you haven’t used to do so before, and deny all cookies for the site. See what happens.

If you have registered a domain name and point it at a wordpress.com Weblog account, I can already tell you what happens on some systems: it enters an infinite loop during page load. The page stays white, and the browser just starts over the page loading process repeatedly, without end in sight. I haven’t let it run long enough to see if it eventually crashes the browser (I suspect it doesn’t, but I suppose there’s a possibility, depending on browser memory management).

So . . . if you’re smart about how you manage your online life, you won’t use wordpress.com for domain hosting. Interestingly, I’ve discovered that http://cameron.blogs.foxnews.com uses wordpress.com — which might say something about the people managing blogs.foxnews.com.

I looked into what it takes for an average, non-professional Weblog user to get domain hosting at wordpress.com (I figured immediately that this probably costs money). What I discovered at the premium features page for Domain Registration and Mapping is:

If the domain isn’t registered, it will ask you for some information and then register it for you, and add domain mapping, for $15/yr.

If you already own a domain you registered somewhere else, you can map it for $10/yr.

I personally recommend against ever doing both domain registration and Webhosting through the same company. It’s best to hedge your bets (as I discovered from personal experience years ago, when a Webhost/registrar I was using basically held my domain name hostage when I wanted to switch Webhosts to someone that provided better service for less money). Unless you have a really unscrupulous registrar like GoDaddy, or something equally dodgy, your most likely problem in outsourcing domain hosting is the Webhosting service provider — and if you registered your domain through someone else, it’s trivial to switch service providers for Webhosting. If not, things might get messy.

Setting up a WordPress Weblog at my Webhost is as easy as doing so at wordpress.com, plus I have more control over the form of the Weblog, and I get to do a whole lot else with the Webhosting account too (including SSL/TLS access to POP, IMAP, and Web email accounts). Most relevantly, when someone denies cookies while visiting SOB, it doesn’t enter an infinite refresh loop.

notes:

Testing on multiple platforms shows that the infinite refresh loop manifests on some systems, and not on others. I haven’t investigated enough to know why that is. So far, it has shown up on an MS Windows Vista machine and a FreeBSD machine with Firefox, but not on MS Windows XP with Firefox or IE, nor on Debian with Iceweasel (rebranded Firefox).

With IE on Vista, refusing cookies one at a time, it eventually loads after about 20 times denying the cookie. With Opera on Vista, denying all cookies immediately, it loads. With Safari on Vista, it eventually loads (apparently all you can do for cookie handling is either deny all or accept all — not a good set of options for the security-conscious).

It has only been tested on one of each system configurations (IE+Vista, IE+XP, Fx+Vista, Fx+XP, Fx+FreeBSD, Iceweasel+Debian, Opera+Vista, Safari+Vista), so your mileage may vary.

re: email . . .

Looking at what wordpress.com offers for email addresses when hosting your domain name at a wordpress.com account, I see that the email options they offer (through Google Apps) are extremely limited. For instance, you don’t get the email addresses if you only map a subdomain (like sob.apotheon.org instead of apotheon.org itself) to the wordpress.com account. I also don’t have to let Google index all my email for search engine and targeted marketing purposes.

This little tidbit from wordpress.com’s Email and a blog on the same domain page is interesting too:

Custom urls are not supported.

So, yeah . . . just don’t do it. If you want a Weblog with your own domain (or subdomain!), don’t use wordpress.com.

If they fix the infinite refresh loop problem, it might start being worthwhile, depending on your budget, or if you only intend a very few known people who you can be sure will always accept cookies there to be able to access it anyway — but otherwise, it’s just a bad idea.

27 April 2008

real closures

Filed under: Geek — apotheon @ 07:46

I haven’t talked about lexical closures in a while. There was a brief time, months back, when it was almost all I talked about here — but then I went for long periods without touching the subject here at SOB. I feel inspired to discuss it again. One of the reasons I feel thusly inspired is having read Closures in Ruby and Python at the weblog “A Ship at Port”. The author, “Matt”, expresses his disappointment in a lack of clear explanation for why he should give a damn about the existence (or nonexistence) of proper lexical closures in any given language. I hope to provide such an explanation suitable to helping any of my readers who have the same questions understand, in basic terms at least, some of the practical value of closures.

What’s a “lexical closure”?

For those who aren’t entirely clear on the concept, a lexical closure (or “closure” for short) is a first-class function that has an enclosing lexical scope “closed” around it.

There are a number of implications of this state of affairs in a lexical closure — many of which are often included in definitions of closures, even though they are really consequences of lexically closing a function rather than defining characteristics. Among them:

  • A first-class function is a function that is treated, in computer science terms, as a “first-class object” — also known as a “value”, “entity”, or “citizen”. They can be treated exactly like entities that can be assigned to variables, operated on, evaluated, and otherwise manipulated exactly like data. “First-class objects” have been occasionally defined as entities that belong to a “first-class data type”, in fact — which means that first-class functions are functions that belong to a first-class data type as well. Generally, first-class functions only exist in languages that either do not differentiate between code and data (e.g. Lisp) or provide some mechanism for turning data into executable code and vice-versa (e.g. Java, though it’s kind of an unnecessarily painful process in that language last I checked, and some point to flaws in the resulting Java object that they say disqualify it as a true first-class function).

  • Lexical scoping, generally speaking, is a form of static scoping (determined at “compile time”) where a given scope has access to whatever’s in its enclosing scope, but the opposite is not true. Thus, if you have a scope containing variable food nested inside an enclosing scope that contains variable bard, the nested scope can also access variable bard, but the enclosing scope cannot access variable food — because in the enclosing scope, food is out of scope.

  • A scope is “closed” when whatever it contains “goes out of scope”. For instance, in a lexically scoped language a function has its own scope — and when that function ends, it “goes out of scope”. This means that any variables that were created within that scope cease to exist, their place in memory freed up and any data they carried “forgotten” by the program. The usually way to keep data from within a function that has ended, closing its scope, is to generate a return value that contains that data — which is, in fact, simply data and no longer part of that (now closed) scope.

  • A common way to create a first-class function is to use a normal, named function, and generate a return value that consists of an unnamed function that can be assigned to a variable or otherwise operated on like any other data. Doing this all by itself does not really create a closure, though, because it doesn’t have an enclosing scope “closed” around it — it just creates a first-class function (a function as data, essentially), indistinguishable in practice from a function created as a lambda without the aid of an enclosing function. The lambda kernel method in Ruby can be used for this purpose, for instance.

  • To close lexical scope around a first-class function in the example of creating it inside another function, you need to actually have some connection between the generated first-class function and the enclosing scope. Thus, if there’s a bard variable in the enclosing function’s scope, and the first-class function makes use of that variable somehow, the enclosing scope is “closed around” the first-class function (creating a closure). Otherwise, the enclosing scope simply closes, and that’s it: it’s gone. When a scope closes on nothing but itself, it evaporates; when it closes on a first-class function, however, it becomes a closure — a closed, but persistent, scope, with persistent state.

  • Protection, in computer science terms, is usually applied to concepts of object oriented programming. Encapsulation is a related term. Encapsulation basically means that a chunk of code is encapsulated as a single, (semi-)atomic entity by a surrounding API, so that the only way to access it is via that API. Object oriented languages achieve encapsulation to varying degrees of success. Protection is to a datum as encapsulation is to a chunk of code — it ensures that a given set of data is protected against access from outside its parent entity (i.e. an object) except via a specific API. As with encapsulation, object oriented languages achieve protection to varying degrees of success. In languages where code and data are not strictly separated from one another conceptually (e.g. Lisp), the distinction can get a little fuzzy.

  • A proper lexical closure effectively provides “perfect” protection. Because the enclosing scope of a closure has gone out of scope, it can no longer be directly accessed (in theory), leaving calls to the closure function’s API as the only means of operating on the data from that scope. If the closure function’s API does not provide any means of accessing some of that data, you simply cannot access it. There are, in practice, exceptions to this (of course) — such as in the case of pointers, references, and other ways to “get around” the closing of scope, but these involve basically reaching directly into the memory used by the program and twiddling bits in some way. That’s a messy, dangerous, and usually discouraged means of working on data. Short of using such means of access to produce programming tricks like (for instance) creating objects, first-class functions, linked lists, and closures using references in Perl, these means of direct access to stored data are generally a sign you’re doing something wrong — like creating memory violation conditions as some kind of dirty hack rather than coding something up properly.

Why should I care about closures?

The canonical, “Hello World” style of closure example is an incrementor — something that, every time it’s called, produces a number greater than the previous number. I’ll show examples of how to generate such incrementor closures that are configurable in two ways, and the closure generator will take two arguments to achieve these configurations. The first is a starting number. The second is an incrementing quantity. Thus, you could generate an incrementor that starts at a number of your choosing and, when called, produces a number incremented by a value you chose as well. My examples should work fine on most Unix/Linux system configurations, depending on your execution path configuration, et cetera.

First, I’ll present an example that shouldn’t really work anywhere — the most basic incrementor closure example, presented in non-executable pseudocode. In this example, there are no arguments sent to the generator, and the resulting closure is non-configurable. It just starts at zero, and prints out a value incremented by one each time it is called.

Pseudocode:
  function generate {
    base = 0
    return lambda { return base += 1 }
  }

  incrementor = generate
  execute incrementor
  execute incrementor
Incrementor closure in Perl (where the my keyword creates a variable with lexical scope):
  #!/usr/bin/perl -l

  sub generate {
    my $base = $_[0];
    my $interval = $_[1];

    sub { $base += $interval }
  }

  $incrementor = generate( 5, 2 );
  print $incrementor->();
  print $incrementor->();
Incrementor closure in Ruby:
  #!/usr/bin/env ruby

  def generate( base, interval )
    lambda { base += interval }
  end

  incrementor = generate( 5, 2 )
  puts incrementor.call
  puts incrementor.call
The output of both of these looks like so:
  7
  9

You may ask why you can’t just create a function to do the above for you without all this mucking about with abstract concepts like first-class functions and closed lexical scope. For instance . . .

Perl example:
  #!/usr/bin/perl -l

  sub generate {
    $base = $_[0];
    $interval = $_[1];
  }

  sub incrementor {
    $base += $interval;
  }

  generate( 5, 2 );
  print incrementor();
  print incrementor();
. . . or in Ruby (where the dollar sign on a variable name gives the variable global scope):
  #!/usr/bin/env ruby

  def generate( base, interval )
    $base = base
    $interval = interval
  end

  def incrementor
    $base += $interval
  end

  generate( 5, 2 )
  puts incrementor()
  puts incrementor()

Both of these will produce exactly the same results, run as-is, as the previous examples. They ignore certain functionality that may become necessary in more complex programs, however. For instance, each of them provides exactly one incrementor function, and every time you need an incrementor that starts at a new number you must reset the values of $base and $interval. If you then want to go back to an earlier set of configurations for an incrementor, because you want to increment two things differently, you must restore your earlier values where you left off with them manually. Check out examples in Perl and Ruby.

With the closure example, however, you can simply create new incrementors for new configuration values, and use the old incrementors again when you need them, without so much duplication of code and data — and without the attendant maintenance nightmare. Check out examples for this state of affairs in Perl and Ruby.

Is every single Ruby block really a closure?

So far, I’ve explained closures and provided examples not only of how to make a closure but of how their properties may be of some use to simplify code and improve maintainability. Last but not least, I’ll tackle another subject that helped inspire this post.

Some time ago, discussion of closures on the ruby-talk mailing list alerted me to a bizarre cultural characteristic of the Ruby community: it is the semi-official position of the most vocal members of the community on the subject of closures that every single “block” in the Ruby language is a closure. Keep in mind, as I discuss this, that in Ruby the word “block” is used with a very particular, specific meaning that differs from the casual manner one might use the term “block” to refer to a chunk of code within its own scope or routine in other languages.

The problem with the notion that a Ruby block is automatically and always a closure is that a block can be created that only exists as long as its parent scope — or even goes out of scope before its enclosing lexical scope does. In fact, a block could conceivably be created whose sole enclosing scope is that of the program itself. Obviously, if the program ends (thus going out of scope), the block does too.

This means that the lexical scope within which the block exists may not be closed around that block. The block is essentially a first-class object, satisfying one requirement, but the lexically closed scope around the block may never happen. This, it seems obvious to me, should disqualify some blocks from being closures — which means the statement that all blocks are closures in Ruby is false.

The rejoinder suggests that the lexical context of the block does not need to go out of scope for the block to be a closure. All that is needed, according to people who hold to this notion, is that the block needs to have access to that lexical context. It doesn’t even need to make use of that context at all — doesn’t have to actually access the enclosing scope, but only be allowed access.

There’s one small problem with that:

That’s a defining characteristic of lexical scope.

In other words, by that definition of a closure, anything that contains a lexical scope and exists within an enclosing lexical scope is automatically a closure. This, of course, would mean that any language that provides lexical scoping allows closures.

I don’t know about you, but I find this notion absurd. It’s poppycock. Closures consist of more than just a scope within a scope. WTF meaningful existence can a closure have if it’s nothing but a scope within a scope? Why would anyone have bothered to refer to it by the name “closure”? Where, for that matter, would the term “closure” have arisen if you never had to close a scope to get a closure?

. . . and why would everyone have been lying about Scheme being the first formal existence of a proper closure for all this time? Lexical scoping certainly existed before Scheme hit the scene.

What else is there?

For a previous discussion of lexical closures, it might be worthwhile to check out my SOB entry of August 2006 titled Linguistic Elegance Test: Closures and Lists. This includes some links to explanations and definitions of closures, too.

Any questions?

(dedicated to Sterling)

edit: It suddenly occurs to me that maybe some of the members of the Ruby community fail to differentiate between closures and callbacks. Most, if not all, Ruby “blocks” are effectively callbacks. They do not fit the definition of a closure, but the difference may be subtle and easily missed, depending on one’s perspective.

Older Posts »

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License