Chad Perrin: SOB

11 June 2007

The DRY Principle and Documentation

Filed under: Cognition,Geek — apotheon @ 04:33

Someone recently wrote a rather narrow, uninsightful complaint about things that suck in open source development. The problems with that lengthy complaint, in short:

  1. Software packaging appears broken sometimes simply because a given piece of software hasn't gained enough popularity for anyone to bother. Keep in mind that with open source software you get to see (and use) software before it's completely ready for prime time, pretty much by definition. Once it hits the "big time" (or even the "fairly small, but at least noticeable, time"), software in the open source world gets incorporated into the best software packaging systems known to man.
  2. Documentation, for all its problems in the open source world, is actually better on the whole than in the closed source world, from what I've seen. By far, the best documentation for any OS I've come across is FreeBSD's. OpenBSD's has a stunning reputation for completeness and usefulness as well. Debian's manpage coverage is so extensive it boggles the mind. The books you can get for Linux in general are many, varied, and extensive. Contrast this with the well-known problems of MS Windows documentation, for instance. MacOS X has great documentation — for a closed source, proprietary OS. That's not saying much, though, when contrasted with the completeness and extensiveness of FreeBSD's documentation.
  3. SourceForge isn't exactly a place for comparing the state of documentation and packaging of open source software with that of closed source software. That's like going around to all the proprietary software vendors and checking to see what they've got in their project queues, measuring the quality of documentation and packaging for software that isn't even guaranteed to get continued development funding (let alone release-worthy).

That's not to say that there aren't problems with packaging and documenting software in general. If "taw" referred to software in general, rather than specifically singling out open source software as the "bad apple" (and thus ignoring the fact that closed source software seems, on the whole, to fare worse), I'd have simply nodded my head in agreement. Something needs to be done about packaging (for distribution and installation) and documenting software as it's developed.

I've already talked about ensuring you achieve good results with software deployment by ceasing to consider deployment procedures as separate from development. Since packaging/installation, aka "deployment procedure", actually involves thinking about the structure of the software and writing code, it's an easy thing to incorporate deployment development in your application development process. The two are a natural fit, which might help people to realize that they are, in fact, one — not two — after all. The deployment procedure for a piece of software is part of its interface. Learn that simple truth, and you should be able to put it all together. Voila, you're set.

Documentation doesn't seem to be as easy — not by a long shot. Documentation, as it is practiced, is a long, drawn-out process of duplicating significant parts of the source code of your software in English. Add to that the fact that you must also essentially create an after-action application use flowchart using the English language (rather than one of those old plastic flowchart stencils), and it starts looking like a severe pain in the butt that will never get finished properly.

The DRY principle applies to documentation as much as to actual software development. The principle, at its core, is simply a statement of the fact that when you repeat yourself, you introduce inconsistency. Bugs are unavoidable with duplication. You don't duplicate code because when one part changes, the other part gets out of sync. One could even consider pretty much every single advance in programming practice in the last forty or fifty years to be an attempt at solving the problem of duplication. That's the whole point of programming in the first place — automation, which saves you from duplicating effort.

It's no wonder that documentation development, as essentially a plain-English duplication of software development, gets neglected — or ends up out of date and sometimes worse than useless. The only real solution to the problem, it seems, is to try to figure out how to eliminate duplication without making either programming or documentation suck. There have been some abortive attempts to achieve this, or something akin to it (see the invention of the eminent Dr. Knuth known as "Literate Programming", COBOL's syntax, and RDoc), but they haven't tended to be really successful — whether its failure was in popularity or technical effectiveness — as a means of producing end-user documentation.

I'm not sure how to solve the problem, personally. It needs to be fairly universal in its applicability (it has to work just as well in Visual Studio, Emacs, and ed — vi too, of course, but that goes without saying since vi is the One True Editor), and it needs to truly eliminate the duplication problem. Finally, it needs to provide good end-user documentation. It's a real problem.

Does anyone out there have any ideas for solutions?

why open source code has to be better

Filed under: Cognition,Geek — apotheon @ 03:48

Just looking at the title of this post, before the post is even written, I'm struck by the myriad interpretations that could be applied to it. It can be read in any number of ways, spanning a wide and varied spectrum of meanings. We'll see, by the time I'm done, how many of those might still apply:

I've written elsewhere about why open source code has to be more secure. I didn't use a title so cryptic and protean in its meaning at the time, of course, because it was written for professional publication — where the kind of word play I used in the title here is generally a no-no. As such, I also constrained the content somewhat to avoid directly addressing many of the potential meanings of the phrase "why open source code has to be more secure". Think about this, though: peer review means exactly that your code has to be more secure. What code must be as secure as code that could be viewed — and reviewed — by (almost) literally anyone, for a period without known limits? It has to be more secure, because you never know who's going to see it, and your reputation as a programmer is attached to that code. Open source development is probably the fastest way to build a reputation for yourself as a programmer, but there's nothing that says that reputation has to be a good one other than your ability to turn out good code.

Peer review keeps us honest.

That's why open source code has to be better. You have to plan for the inevitable future when, if anyone ever cares at all about the code, other people will read it. You can't just hide your obfuscated tangle of "we don't know why it works, don't change it or you'll break it" spaghetti behind binary compilation and a big fat copyright notice. It doesn't have to be perfect, but it sure as hell shouldn't be embarrassing. When you're releasing the code to the world for free distribution, you want to release quality. Why do you think neophytes in open source programming write tiny little snippets to fix bugs, while neophytes in closed source enterprise Java shops write massive, tightly coupled modules as their on-the-job training? Nobody outside the closed source chop shop is going to see that tangled mess of Java. That's why.

. . . but wait, there's more:

It's increasingly an accepted truism that when you write software, you write it to be maintained. I'm convinced that's one of the reasons so-called Agile Programming hit the big-time — it's a microcosm version of the true life-cycle of a piece of software. In the project manager's notebook, and in the corporate meeting room, software is something that is built, polished, and put to use. At its least realistic extreme, this view of software leads to programs being considered atomic, finished products, manufactured and boxed up to be sold in units. This means there's a beginning, middle, and end to a software development project, and that's it. Voila: you have a program.

That's all poppycock, of course. As the aphorism goes, software is never finished — only abandoned. It's true. It's more true than even most people parroting that phrase (adapted from a similar one about art) realize. It's more true of software by unimaginable degrees than it is even of art. Sure, art is always fiddled with until that critical "abandonment" occurs, the artist always seeking to perfect it, but in the end there really is an end. Beyond that point, it becomes useful. With software, abandonment means it has ceased to be useful. Software isn't just fiddled and tweaked in an asymptotic approach to perfection, the way an oil painting might be — oh, no, it's constantly developed, it continually evolves, and it changes. One single software life cycle might be half a dozen different applications entirely before it dies of loneliness after it is abandoned. The only thing that "finishes" software is stagnation. Software engineering is the manifestation of an "intelligent design" theory of evolution.

How does this make the code in an open source project better? Simply put, in the open source world people are far more aware of the fact that, when they write software, it will be read and rewritten by others. Software is written not simply to be compiled and shipped, but to be read. It must have understandable structure, pleasing form, and clarity. Clear code is good code. It's not just a matter of knowing your audience — the difference between closed source software development and open source software is, more often than not, bound to the simple fact that open source developers are aware there's an audience for code at all. Think about that — completely aside from why you want to write better code, if you're developing open source software, you must be aware that other developers are your audience, not just users. Users are secondary. As long as the software does what you want it to do, (other) end users are largely irrelevant. It's the source code that has an audience, and as such it must be valuable in its own right, completely aside from the sort of programmatic functionality it calls into the world as an "end" result.

Open source code has to be better because it's not just written for the (perceived) quality of the functionality — it is also, and perhaps even primarily, written for the quality of the code itself. Perhaps surprisingly to those who apply Waterfall development methodologies, Microsoftian pseudo-Hungarian notation, and typical "enterprisey" concepts of object orientation*, the quality (meaning, mostly, "clarity") of your source code bears a roughly direct relationship to the quality of the functional application itself.

You can blame the generally high quality of open source code on the fact that open source code is, generally, written specifically for someone else to be able to read and reason through.

. . . but wait, there's still more:

Innovation is someone coming up with a great idea and making it happen. I won't quibble with you over the use of the term "great"; it's just a placeholder for that actual je ne sais quoi that fits the definition of "innovation". Call it what you will, innovation is something that occurs to someone — to one individual — and gets turned into something tangible (or at least persistent). Innovation can happen anywhere. In software, it can happen in open source software development and in closed source software development with equal ease. Whether or not it ever sees the light of day tends to vary from one development model to the next, of course. Innovation survives to public release far more often in the average Agile Programming shop than the average Waterfall shop. It happens more often in open source development than in closed source development, too — and for much the same reasons (which I leave as an exercise for the reader, so that the reader's brain doesn't get too fat from sitting on the couch absorbing text all the time).

Implementation is part of innovation. The idea itself is without value to anyone but the guy who came up with the idea, and it doesn't become innovative until it's implemented. Innovators don't have to be good programmers. They can turn out severe crap every day of their lives. As long as it implements the idea, and others take note of the value of the idea as implemented, it's successful innovation. Implementation can even take the form of manipulating others into doing the scut work for you. Write a book about the perfect mousetrap, and let someone else build it and get all the retail sales profits of the thing — it's you that gets the credit for the idea, and it's your book people buy to try to learn how to be smart like you. You've innovated, but it wasn't real until someone did something with it. Until then, you were just a blowhard. There was no proof you had any clue what you were talking about. As Eric Raymond or Linus Torvalds (depending on which version of the story you like) might say, "Show me the code."

Ahh, right, the code. The truth is that they don't have to see the code for it to be innovative (though it can help). What matters is that it works. Software isn't a product, though — it's a process. You don't just create software and declare it finished, as I've already mentioned. It grows, guided by many hands (and many, many more if it's open source). The more hands there are that have a hand in it (so to speak), the better it gets — as long as it's allowed to. Think about that for a moment. In a closed-source shop, it is changed according to the corporate plan. In open source software, one group of developers may guide the software toward a particular end, but if that doesn't make it better it either withers away and dies from lack of interest (as it might when released to resounding applause silence in the closed source world) or, as is often the case, gets forked and done right. Ooh, look — it got better despite the "best" efforts of the people "in charge"!

Truly innovative code, no matter how poorly written (as long as it actually works), keeps its innovations for as long as they're relevant and valuable. The code may get refactored, but the innovation at the core is still there — even if differently expressed. It gets cleaned up and clarified. It gets better. It gets better just because changes that make it worse don't survive in the long run.

Oh, sure, there are exceptions. I've been generalizing so much throughout this thing, though, that I hardly think this generalization goes far amiss:

Open source code has to be better because if it isn't, it dies.

Survival of the fittest. See that? Darwinian evolution does work within the framework of intelligent design! You just have to give up on the idea of conscious, manipulative control to make it fit.

. . . kinda like the way open source code gets better.

(* "If our object model properly implements protection, our crappy developers cannot hurt each others' code, and the app as a whole will probably work no matter how crappy the developers are!")

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License