Chad Perrin: SOB

23 August 2006

Linguistic Elegance Test: Closures and Lists

Filed under: Cognition,Geek — apotheon @ 04:05

I have talked about lexical closures a fair bit here at SOB. They factor heavily into what I’m about to discuss, so I’ll try defining them for those of you who might be unclear on the concept. This assumes some basic understanding of programming concepts, however: my target with this is programmers, after all.

If you visit Wikipedia at the Closure disambiguation page, you’ll find this definition:

an abstraction binding a function to its scope

From there, clicking on the link for Closure (computer science) takes you to a page whose first line of text defines closures thusly:

In programming languages, a closure is a function that refers to free variables in its lexical context.

It then goes on to say this:

A closure typically comes about when one function is declared entirely within the body of another, and the inner function refers to local variables of the outer function. At run time, when the outer function executes, a closure is formed. It consists of the inner function’s code and references to any variables in the outer function’s scope that the closure needs.

Closures are commonly used in functional programming to defer calculation, to hide state, and as arguments to higher-order functions.

A closure combines the code of a function with a special lexical environment bound to that function (scope). Closure lexical variables differ from global variables in that they do not occupy the global variable namespace. They differ from object oriented member variables in that they are bound to function invocations, not object instances.

The thoroughly excellent book Learning Perl Objects, References, and Modules, published in later editions under the name Intermediate Perl, offers the following explanations:

The kind of subroutine that can access all lexical variables that existed at the time it was declared is called a closure (a term borrowed from the world of mathematics).
Closures are “closed” only on lexical variables, since lexical variables eventually go out of scope. Because a package variables (which is global) never goes out of scope, a closure never closes on a package variable. All subroutines refer to the same single instance of the global variable.

In short, then, a lexical closure is (in my own words):

a block of code that can be passed around as a first-class function that contains data that achieves protection and encapsulation because it is assigned to a lexical variable in the closure’s enclosing scope, and has subsequently gone out of scope, though the closure persists

I could probably make that prettier, with some effort, but there it is.

A lexical variable, in case you’re not aware, is a variable with static local scope. That is to say that its scope is definable at compile time and is local to its nearest enclosing scope. In Perl, a variable is lexical if it is declared with my(). In Ruby and UCBLogo, all new variables are lexically scoped by default. Python does not provide proper lexical scoping, so far as I am aware, as its local variables are not fully accessible within scopes enclosed by the variable’s scope (they can be read, but not written to, effectively changing them from variables to constants for enclosed scopes). This limitation can apparently be short-circuited for single array elements, but this behavior looks more like a bug of the language design than a feature.

Lexical closures (also known simply as “closures” for short) require two things:

  1. proper lexical scoping
  2. “anonymous” blocks of code that are first-class functions

Lexical scoping provides a number of benefits to programming languages that implement it. As time passes, more and more often the languages we encounter implement lexical scoping. Some languages, such as Perl, predate the common recognition of the benefits of lexical scoping, and incorporate it into their design later: this applies to Lisp, as well. Other languages have incorporated lexical scoping from day one, and in the future I expect this number to grow prodigiously, as it is likely that any language designer worth a damn will be using lexical scope for years to come (at least until someone invents or discovers something even better).

Lexical scope contributes to cleaner code, reduces the likelihood of scope-related bugs, and because it is defined at compile time, runtime execution for a program with only lexically scoped variables is faster. More benefits than these apply, of course, but I’m sure you can learn more about the subject on your own.

The power and flexibility of functions that can be passed around as parameters for other functions should be fairly self-evident to even a thoroughly mediocre programmer, whether that programmer can immediately envision instances where it is directly useful or not.

By combining these two characteristics of a language — lexical scope and first-class functions — to create closures, we introduce protection and encapsulation. These are characteristics of data bound to code whose value are quite central to the entire object oriented programming paradigm.

I have recently addressed the matter of elegance in programming here at SOB. Within that discussion, and in a follow-up comment to it, I explained and demonstrated the varied potential for elegance as I defined it within the designs of various programming languages. My initial, very simple, demonstration involved merely providing a series of “hello world” programs in a number of different languages so that the amount of scaffolding cruft necessary to achieve this simplest of operations — outputting a string — could be compared between languages. It was intentionally an extremely simple example, as the point was to demonstrate the lack of elegance necessary to achieve such simple results in some languages. This is, I believe, a very salient and useful means of comparing the elegance of code a given set of languages provide for their users.

It is not, however, the only salient and useful comparison, and is by no means a comprehensive metric for judging the potential elegance within a language. In recognition of this, Colin Alston in his karnaugh.za.net weblog suggested that list processing might provide some useful data in a comparison between languages. He went further than merely suggesting it, however, and went on to provide such a demonstration in his weblog entry Elegance: a function of consistency.

(Note: Much of the following may seem confusing due to the fact that the “consistency” essay has been moved. As part of the move, it seems that all comments on the original weblog post there have been lost, thus eliminating much of the context for the following.)

Colin’s heart seems to rest with Python, one of a series of languages that I tend to hold in high esteem in general. Its closest relatives, it seems to me, are Perl and Ruby: a little further afield are the various Lisps. Python imposes some arbitrary limitations on the programmer, however, such as limiting anonymous first-class functions (called “lambdas” in Python, probably as a direct homage to the use of the term in Lisp, despite their limitations in Python) to a single semantic line of code. Another apparently arbitrary limitation is its restriction of scope (as mentioned above) so that variables that might otherwise be lexically scoped are read-only constants from the perspective of enclosed scopes.

This aside, I was surprised to see the length of the Python example provided as a solution to Colin’s list-processing comparison problem. It was not terribly long, of course — much shorter than C, C++, and Java examples — but several syntactic elements longer than I expected. I rather suspect that, like the Ruby example provided by one of his readers, it can be shortened somewhat. The Python example provided contains nine distinct syntactic elements (where the list used to populate the array is taken as a single syntactic element).

The Ruby example provided by Colin’s reader “jerith”, meanwhile, contains eight distinct syntactic elements, but I have shortened it to a mere five elements. Surprisingly, Perl outperforms even Ruby handily: Colin’s reader Jonathan McKeown provided both a seven-element example (beating the previous Python and Ruby examples) and a four-element example (beating my own golfed Ruby example). One might make a case for each of the Perl examples from Jonathan McKeown containing an extra syntactic element in the form of qw// used to define the list, but this is used as a tool to make the code easier to read and write, and is not a necessity of the code, in addition to which it is merely syntactic sugar for quote punctuation in the list definition — and, even if it is counted as a distinct syntactic element, it only brings the Perl examples into a tie with Ruby.

My own Perl example, provided later in the discussion, further golfed the Perl down to three syntactic elements. That’s one third the number of distinct syntactic elements used in the Python example. Thus, my surprise.

I have here a third suggestion for a point of comparison between languages. In this comparison, I use a lexical closure that operates upon a list and returns the results of that operation. The results are output to STDOUT (generally, the command line). The list itself is defined as arguments to the otherwise noninteractive program, thus allowing the program to behave as a filter utility. Each element of the modified list is printed on a separate line in the program’s output. The input and output of the program should look like this:

# programname arg1 arg2 arg3
arg3
arg2
arg1

The number of arguments the program can take should be arbitrary. As one might guess by the above example, the list processing operation I’ve decided on is a simple reversal of the elements in the list. I intentionally, as with the “hello world” example, deliberately chose a task that should be simple enough for a programmer to understand, at least in general terms, what is going on in examples even from languages with which (s)he is not familiar — though in this case there is some general programming knowledge that is not universal (lexical scoping, first-class functions, and the closure that results from these) in use; hopefully any shortfall in required knowledge has been addressed here.

I hope that quite a few of you will contribute examples of how these requirements for a demonstration of linguistic potential for elegance. In the meantime, I will start things off with two examples of my own.

First, Perl:

#!/usr/bin/perl -l

sub foo { my @list = @_; sub { reverse @list }; }

$bar = foo(@ARGV); print for $bar->();

Unlike Colin’s example, one need not agonize over the number of distinct syntactic elements at all in this case, but we are left with another potential point of contention: the -l argument in the shebang line. My take on it is that this is a directive to the interpreter, effectively redefining a particular core function of the language, and not a syntactic element. That being the case, this Perl example contains sixteen distinct syntactic elements, by my count. The closure generator itself contains nine, the function call that actually creates the closure to be used contains four, and the closure call statement where its results are printed to STDOUT contains three.

The closure generator suffers significantly from Perl’s heritage as a language without lexical scoping in earlier versions, and from the fact that in the addition of lexical scope, it was not made the default scope used for new variables. As a result, the @list array must be explicitly declared and assigned for it to have lexical scope, adding three otherwise unnecessary distinct syntactic elements. Similarly, the closure call and print statement contains a syntactic element that is not necessary in some other languages — the for loop. This is only necessary because of the requirement for printing each list element on a separate line, however. If the input list elements contained newline characters, this would not be necessary, and neither would the -l in the shebang line.

Second, Ruby:

#!/usr/bin/ruby

def foo (list) lambda { list.reverse } end

bar = foo(ARGV) puts bar.call

Because of Ruby’s puts() kernel method, there is no need for an interpreter directive as in the Perl example. This does provide for potentially increased elegance of code, if you accept as a given that adding a newline to output as a matter for distinct syntactic treatment increases gratuitous cruft, rather than simply being a lack of syntactic sugar. As I went easy on Perl in this matter, however, I will do the same for Ruby: let us consider puts() a syntactic point of elegance.

That being the case, this program contains a grand total of thirteen syntactic elements (the language keyword end is not counted, being merely a delimiter, any more than braces are in Perl). The same number of distinct syntactic elements are in use in the call to the closure and output to STDOUT, and in the closure instantiation itself, but Ruby is a clear winner in the closure generator definition. While one might point to a number of superficially apparent reasons for this by examining the list of additional syntactic elements involved in the Perl example, these are all necessary only because of the difference between the two languages in how lexical scope is achieved. In Ruby, it is the default behavior of the language, while in Perl it must be explicitly declared.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License