Chad Perrin: SOB

26 April 2006

OpenDocument Format (ODF) vs. MS OpenXML Format (OpenXML)

Filed under: Geek,Liberty — apotheon @ 07:47

I’ve decided to cobble together a sort of Frankenstein’s Monster entry from a number of different comments I’ve made elsewhere in debates relating to ODF vs. OpenXML file formats. I’ve expanded upon the original phrasing a little, and I’ve tried to clarify my statements and suit them to this venue, but otherwise it’s pretty much just plagiarizing myself. Err, I mean it’s referencing my own work, as academics often do the works of their colleagues.

First off, a disclaimer to hopefully dissuade cries of mere anti-Microsoft bias, or “bashing”, or “zealotry”, or whatever: If Microsoft gets a standard approved then lets it go so that it stays “standard”, I don’t much care that Microsoft developed it. In fact, I’ll applaud Microsoft. What I don’t want to see happening is Microsoft getting a document format approved as a “standard”, then playing silly buggers with it as it has with HTML, CSS, C++, Javascript, and basically everything else on which it has gotten its grubby mitts.

Allow me to summarize my position, clarify the above, and repeat a bit:

I’d be happy to use a document format that was initially designed by anyone, Microsoft included, as long as it becomes a truly open, truly standardized format with clear and public documentation so that everyone can use it, and as long as Microsoft doesn’t sabotage standards compliant adoption of the format by producing software that misuses it.

Until a week or so ago, I didn’t know nearly enough about Microsoft’s proposed format, or even about ODF, in their technical aspects, to be able to comment meaningfully on which is technically better — and my technical knowledge of the formats still has big holes in it so that I try to avoid speaking outside the range of what I actually know. I’d prefer the technically better format, whichever it is, if that’s the only concern. Unfortunately, it’s far from the only concern. Even better would be both formats supported by all major office suites, but I won’t use a format that introduces significant security or stability issues, or that isn’t open and free, except when absolutely necessary — and even then, only under protest — but the security and stability issue doesn’t seem to be even a tertiary concern here, let alone a primary concern.

I’ve been doing some research. Here’s what I found:

  1. The ODF is necessarily a bit more resource hungry because it is more comprehensive than OpenXML. One can argue for either side of that — comprehensiveness and resource efficiency both have their positive points. I find it notable, however, that Microsoft’s sole point of argument here is in direct contradiction with other common Microsoft-sympathizer arguments. Specifically, I often hear the complaint that the reduced resource footprint (and I do mean dramatically reduced) of some piece of open source software as compared with its Microsoft-stack analog is functionally irrelevant because of the increased performance of hardware. See debates about Vista vs. Linux on the basis of resource-hungry operation and hardware requirements for examples. That being the case, one must wonder why there is suddenly such a distinct reversal of argument here, with the (marginally) reduced resource footprint of OpenXML as compared with ODF becoming Microsoft’s rallying cry. Yes, it’s marginal: the “100 times as much” resource footprint of ODF cited in some arguments is as compared with Microsoft’s binary formats, under very specific conditions, using very specific test conditions narrowly defined, which mixes application resource usage with document resource requirements liberally. It does not compare ODF with its XML-based text data formats, which show the above-mentioned marginal resource usage advantage.
  2. While Microsoft has signed a covenant of nonlitigation, this doesn’t actually open the format at all. It only opens the implementation of it. While this might at first glance appear to be nitpicking, it’s worth noting that Microsoft could easily pull the old bait-and-switch as it historically, and consistently, has with almost all its technologies. All it needs to do is get the standard approved, convince everyone that it’s “just as open as” (or even “more open than”) ODF, get it widely implemented to the extent that market dominance is maintained for its office software, then change the format specification for its next office suite release (or the next service pack, for that matter) without telling anyone until the day the new implementation hits the market. This artificially creates and reinforces a technical advantage by turning the document format upon which the industry standardizes into a moving target. The market dominance practices of Microsoft in this regard are clear and well demonstrated by Microsoft’s intention of supporting its own “open” format without also supporting the competing ODF, while its competition does everything in its power to support Microsoft’s formats alongside native and open formats. So long as Microsoft retains the ability to unilaterally and at its sole discretion alter the format specification (even if it must get “approval” of the new format each time, though the notion that it must do so is a dubious one at best), its format is not truly open, due to a conflict of interest for the sole specification-maintaining party.
  3. Aside from performance concerns, the sole technical benefit of OpenXML is the more inclusive ability of it to incorporate additional custom-designed shemas, both in loosely and tightly coupled manners. Despite propaganda to the contrary, ODF is capable of easily incorporating custom schemas, primarily by way of embedded ability to support W3C-standard XForms. XForms support is intentionally constrained in its ability to support custom schemas, as compared with OpenXML’s support for custom schemas, for the purpose of obviating the detrimental aspects of custom schema definition and inclusion that are endemic to OpenXML’s specification. The primary reason this less constrained implementation of custom schema inclusion is considered undesirable is the fact that it fosters creation of nonportable documents: while conforming to the OpenXML document specification, they would include nonportable scripting and data formatting. Of particular concern here is the fact that this would create increased opportunity for Microsoft Office to be designed by Microsoft to leverage an “open” format for the purposes of producing nonportable documents, again to promote and maintain market dominance.
  4. Microsoft’s substantive excuse for preferring OpenXML is centered around making a document format backward compatible, when backward compatibility with closed formats while designing a new, supposedly “open”, document format should be confined to making an application backward compatible. Making a document format backward compatible with other (primarily binary) proprietary document formats is actually counterproductive to the purposes of designing and adopting an open document format standard. Rather than making the documents backward compatible (specifically with previous Microsoft document formats, ignoring other older document formats), make your new application that supports the new document format backward compatible so that it can translate freely between the two document formats. This solves the problem for the user and it provides encouragement for the real purpose of the open document format: moving documents, both old and new, to a format that makes better sense in terms of both portability and accessibility. In any case, there’s certainly nothing preventing Microsoft from implementing both ODF and OpenXML, one for widespread compatibility and the other for backward document format compatibility, other than Microsoft’s own intention of freezing out competitors through anticompetitive practices. Additionally, Microsoft’s history of ignoring document format compatibility between versions of its own applications, and providing only rudimentary and temprorary application support for earlier formats, strikes me as a pretty clear indicator of its true intent: to manufacture excuses for trying to ensure sole control by Microsoft of the widely adopted “open” document format of the future.

According to Wikipedia, the Danish government’s definition of an “open standard” (and the Danish definition is accepted EU-wide as the minimum set of requirements to qualify as an “open standard”) is as follows:

  • The costs for the use of the standard are low.
  • The standard has been published.
  • The standard is adopted on the basis of an open decision-making procedure.
  • The intellectual property rights to the standard are vested in a not-for-profit organisation, which operates a completely free access policy.
  • There are no constraints on the re-use of the standard.

OpenXML basically fails, at least in part, on all but two of those points, and it has the potential to eventually fail on one or both of those exceptions.

Also according to Wikipedia:

“The primary goal of open formats is to guarantee long-term access to data without current or future uncertainty with regard to legal rights or technical specification.”

. . . and . . .

“A common secondary goal of open formats is to enable competition, instead of allowing a vendor’s control over a proprietary format to inhibit use of competing products.”

For an example of the benefits of open formats:

HTML, and its successor XHTML, are open standards. You can readily see what damage has been done to interoperability by Microsoft’s domination of the web browser market in the fact that there are a lot of websites that have (thanks to MS’s anticompetitive practices, leveraging market dominance to increase market dominance) been coded specifically to Internet Explorer’s quirks so that other browsers are “shut out”. Imagine for a moment how much worse it would be if there were no (X)HTML standard. As things currently stand, Microsoft’s stubborn use of a bastardized markup implementation in IE is finally being challenged, in large part because it is in egregious violation of standards.

As a result, we are seeing increased competition in the browser niche of the application market years after IE had pretty much sewn up that niche by destroying Netscape’s ability to compete. That competition is not only resulting in the advancement of browser technology in Microsoft’s competitors, but is also forcing Microsoft to try to keep up with the Jonses by improving upon IE and related software after years of technological stagnation. OneCare, the Windows Firewall, inclusion of tabs and other advanced browser features in IE7, the ability to turn off ActiveX capabilities, addition of granular control over script execution in the browser, sandboxing, and many other security “improvements” (or at least attempts at the appearance thereof) can almost directly be attributed, at least in part, to competition in the web browser market niche.

If there were not an open markup standard from which Microsoft couldn’t just deviate completely without incurring some negative consequences, Microsoft’s “HTML” would be something (unrecognizable to HTML) else entirely that nobody else would be allowed to use, XHTML would never have been invented, and competition in that niche and other, related niches would be nothing more than a fond memory.

Additionally, a couple of terms to keep in mind as reasons to avoid proprietary formats:

  • “vendor lock-in”
  • “embrace, extend, extinguish”

Someone recently asked for a list of reasons for preferring open formats for documents over closed/proprietary formats. Part of the problem with answering that question is that it is asking for a list (by which I think is meant “a list of very short statements about advantages to an open standard”), when lengthy explanations are needed above and beyond mere bullet point items to get the point across. I took a whack at it anyway. Don’t blame me if the reasons for some of these advantages are not immediately obvious within the context of the list, though. In addition, some of the list items overlap others because of the fact that I tried to address much of the explanatory necessities of trying to get the point across, which required looking at different angles of the same issues.

For fun, I presented this list within a simple Perl script that can be run on a unix-like system to spit out randomly generated selections from the list when the program is called from the command line. I’m not entirely familiar with use of Perl on a Windows machine, but this script should be 100% portable by simply replacing the shebang line of the script with whatever Windows needs in its place.

  use strict;
  use warnings;


  my @reason = (
  'Open formats eliminate legal restrictions on implementation.',
  'Open formats ensure the full specification is available to implementors.',
  'Open formats are far more difficult to leverage for anticompetitive practices.',
  'Open formats do not lock organizations into reliance on a specific vendor.',
  'Open formats provide greater ease of access over wide distributions of data to varying populations.',
  'Open formats development tends to be pressured by common needs rather than marketability concerns.',
  'Open formats do not change for no reason other than driving new application version uptake.',
  'Open formats do not tend to involve sneaky ways to slip proprietary data formats into them.',
  'Open formats foster inter-application compatibility.',
  'Open formats do not allow imposition of royalty fees on implementors.',
  'Open formats are more conducive to third-party software innovation.'

  print @reason[rand(@reason)], "\\n";

From what I’ve seen thus far, it looks like both OpenXML and ODF specify a system of interrelated, modularized XML files to define a single document. Both, to some extent, allow for these to be combined into a single XML file for an alternative document saving format, or at least a drastic reduction in that modularization. When saved as a collection of interrelated files, however, they are then compressed using Zip-compatible compression. This bothers me, not only because the Zip algorithm is proprietary (though apparently free of implementation encumbrances), but also because the saved document format is no longer human-readable. Both format specifications claim human readability by pointing out the fact that they’re XML documents (and complex XML’s human readability is suspicious anyway), but ignore the fact that by storing the files in Zip archives they are rendered in a binary compressed format that requires translation to a human-readable form. In this respect, both document formats fall down. I am, understandably, disappointed. What the hell were the ODF people thinking? Microsoft, of course, doesn’t really give a fig — they’d rather make it as human-unreadable as possible, to improve on the vendor lock-in characteristics of MS file formats — but this seems antithetical to the aims of ODF.

Anyhow, there it is.

a bit about the blog spam situation

Filed under: Metalog — apotheon @ 06:59

I started getting a lot of blog spam a little while ago. I started using the automatic post moderation feature that sends posts into moderation if they contain too many links. This worked for a little while, though I still had to delete the posts from the moderation queue myself. Better that than getting false positives and never knowing it, I reasoned.

I started getting hit by blog spammers who included only one link in the body of the comment, or no links at all, and only used the ability to use a URL to make a link of the poster’s name to create links to whatever they were spamming. I of course was somewhat troubled by this, and after a while of deleting several a day I decided to change things.

I instituted a policy here at SOB where anyone that wanted to post a comment needed to register. Thus far, I haven’t gotten any blog spam at all, and the amount of discussion that my posts generate doesn’t seem particularly reduced by this. Of course, a problem here is that potential legitimate discussion that doesn’t happen is never noticed: I don’t know if someone refuses to comment due to the mandatory registration. I’m considering removing the necessity of an email address when someone registers, because I know that might be a barrier to casual commenting even when it isn’t spam, but I’m also hesitant to open the door even that much to spammers.

It was suggested by a reader known here as Alex that I should use Akismet and the WordPress Spam Image plugin, in his comments to my entry about requiring registration. I’ve looked at Akismet, and it appears to be a heuristic spam filter based on spam example blacklists, which if well-executed would be an excellent approach to the matter. I’ve chosen to eschew it for now, however, for reasons not easily articulated. Perhaps I’ll revisit this later. The Spam Image plugin looks easy to use and probably reasonably effective, though as long as I require registration I’m not sure it’s actually necessary. I’ll have to think about it.

Speaking of the Spam Image plugin, there’s something similar being used over at Chip’s Quips, another weblog I make an effort to follow. He always has interesting stuff to say, and has said quite a bit about blog spam. In particular, he seems disappointed with the performance of WordPress in blocking spam based on my own reports in comments to his weblog, but he probably shouldn’t be: I’ve done almost nothing about stemming the tide until I started requiring user registration to post comments, and I’ve done nothing since. I haven’t actually used any of the more advanced anti-spam technologies available to WordPress users, and thus really don’t have anything to say about them, positive or negative, except as a visitor who compares what he sees on others’ weblogs. What I see is this:

The image plugin being used to keep spammers out of Chip’s Quips is awful. Half the time, the characters you have to enter aren’t even legible to the visitor, let alone to a spambot. I tried to comment on this a couple times, and couldn’t get through the spam filtering to post the commentary, so I gave up. Sorry, Sterling: I’d rather have told you in a less public way than this, but I can’t get through. It’s like those child-proof caps that some manufacturers use that are so effective they even keep the adults out. This is the flip side of the “false positives” coin, and something I really would like to avoid: I don’t want to make it difficult for people to post legitimate commentary. I’m going to try to alert him to this post with a comment to one of his, but I don’t know if it will get through. I recommend checking out his posts on the matter of blog spam, in any case, which are basically all linked-to through this entry about ham, and jam, and spam.

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License