## 26 April 2006

### OpenDocument Format (ODF) vs. MS OpenXML Format (OpenXML)

Filed under: Geek,Liberty — apotheon @ 07:47

I’ve decided to cobble together a sort of Frankenstein’s Monster entry from a number of different comments I’ve made elsewhere in debates relating to ODF vs. OpenXML file formats. I’ve expanded upon the original phrasing a little, and I’ve tried to clarify my statements and suit them to this venue, but otherwise it’s pretty much just plagiarizing myself. Err, I mean it’s referencing my own work, as academics often do the works of their colleagues.

First off, a disclaimer to hopefully dissuade cries of mere anti-Microsoft bias, or “bashing”, or “zealotry”, or whatever: If Microsoft gets a standard approved then lets it go so that it stays “standard”, I don’t much care that Microsoft developed it. In fact, I’ll applaud Microsoft. What I don’t want to see happening is Microsoft getting a document format approved as a “standard”, then playing silly buggers with it as it has with HTML, CSS, C++, Javascript, and basically everything else on which it has gotten its grubby mitts.

Allow me to summarize my position, clarify the above, and repeat a bit:

I’d be happy to use a document format that was initially designed by anyone, Microsoft included, as long as it becomes a truly open, truly standardized format with clear and public documentation so that everyone can use it, and as long as Microsoft doesn’t sabotage standards compliant adoption of the format by producing software that misuses it.

Until a week or so ago, I didn’t know nearly enough about Microsoft’s proposed format, or even about ODF, in their technical aspects, to be able to comment meaningfully on which is technically better — and my technical knowledge of the formats still has big holes in it so that I try to avoid speaking outside the range of what I actually know. I’d prefer the technically better format, whichever it is, if that’s the only concern. Unfortunately, it’s far from the only concern. Even better would be both formats supported by all major office suites, but I won’t use a format that introduces significant security or stability issues, or that isn’t open and free, except when absolutely necessary — and even then, only under protest — but the security and stability issue doesn’t seem to be even a tertiary concern here, let alone a primary concern.

I’ve been doing some research. Here’s what I found:

1. The ODF is necessarily a bit more resource hungry because it is more comprehensive than OpenXML. One can argue for either side of that — comprehensiveness and resource efficiency both have their positive points. I find it notable, however, that Microsoft’s sole point of argument here is in direct contradiction with other common Microsoft-sympathizer arguments. Specifically, I often hear the complaint that the reduced resource footprint (and I do mean dramatically reduced) of some piece of open source software as compared with its Microsoft-stack analog is functionally irrelevant because of the increased performance of hardware. See debates about Vista vs. Linux on the basis of resource-hungry operation and hardware requirements for examples. That being the case, one must wonder why there is suddenly such a distinct reversal of argument here, with the (marginally) reduced resource footprint of OpenXML as compared with ODF becoming Microsoft’s rallying cry. Yes, it’s marginal: the “100 times as much” resource footprint of ODF cited in some arguments is as compared with Microsoft’s binary formats, under very specific conditions, using very specific test conditions narrowly defined, which mixes application resource usage with document resource requirements liberally. It does not compare ODF with its XML-based text data formats, which show the above-mentioned marginal resource usage advantage.
2. While Microsoft has signed a covenant of nonlitigation, this doesn’t actually open the format at all. It only opens the implementation of it. While this might at first glance appear to be nitpicking, it’s worth noting that Microsoft could easily pull the old bait-and-switch as it historically, and consistently, has with almost all its technologies. All it needs to do is get the standard approved, convince everyone that it’s “just as open as” (or even “more open than”) ODF, get it widely implemented to the extent that market dominance is maintained for its office software, then change the format specification for its next office suite release (or the next service pack, for that matter) without telling anyone until the day the new implementation hits the market. This artificially creates and reinforces a technical advantage by turning the document format upon which the industry standardizes into a moving target. The market dominance practices of Microsoft in this regard are clear and well demonstrated by Microsoft’s intention of supporting its own “open” format without also supporting the competing ODF, while its competition does everything in its power to support Microsoft’s formats alongside native and open formats. So long as Microsoft retains the ability to unilaterally and at its sole discretion alter the format specification (even if it must get “approval” of the new format each time, though the notion that it must do so is a dubious one at best), its format is not truly open, due to a conflict of interest for the sole specification-maintaining party.
3. Aside from performance concerns, the sole technical benefit of OpenXML is the more inclusive ability of it to incorporate additional custom-designed shemas, both in loosely and tightly coupled manners. Despite propaganda to the contrary, ODF is capable of easily incorporating custom schemas, primarily by way of embedded ability to support W3C-standard XForms. XForms support is intentionally constrained in its ability to support custom schemas, as compared with OpenXML’s support for custom schemas, for the purpose of obviating the detrimental aspects of custom schema definition and inclusion that are endemic to OpenXML’s specification. The primary reason this less constrained implementation of custom schema inclusion is considered undesirable is the fact that it fosters creation of nonportable documents: while conforming to the OpenXML document specification, they would include nonportable scripting and data formatting. Of particular concern here is the fact that this would create increased opportunity for Microsoft Office to be designed by Microsoft to leverage an “open” format for the purposes of producing nonportable documents, again to promote and maintain market dominance.
4. Microsoft’s substantive excuse for preferring OpenXML is centered around making a document format backward compatible, when backward compatibility with closed formats while designing a new, supposedly “open”, document format should be confined to making an application backward compatible. Making a document format backward compatible with other (primarily binary) proprietary document formats is actually counterproductive to the purposes of designing and adopting an open document format standard. Rather than making the documents backward compatible (specifically with previous Microsoft document formats, ignoring other older document formats), make your new application that supports the new document format backward compatible so that it can translate freely between the two document formats. This solves the problem for the user and it provides encouragement for the real purpose of the open document format: moving documents, both old and new, to a format that makes better sense in terms of both portability and accessibility. In any case, there’s certainly nothing preventing Microsoft from implementing both ODF and OpenXML, one for widespread compatibility and the other for backward document format compatibility, other than Microsoft’s own intention of freezing out competitors through anticompetitive practices. Additionally, Microsoft’s history of ignoring document format compatibility between versions of its own applications, and providing only rudimentary and temprorary application support for earlier formats, strikes me as a pretty clear indicator of its true intent: to manufacture excuses for trying to ensure sole control by Microsoft of the widely adopted “open” document format of the future.

According to Wikipedia, the Danish government’s definition of an “open standard” (and the Danish definition is accepted EU-wide as the minimum set of requirements to qualify as an “open standard”) is as follows:

• The costs for the use of the standard are low.
• The standard has been published.
• The standard is adopted on the basis of an open decision-making procedure.
• The intellectual property rights to the standard are vested in a not-for-profit organisation, which operates a completely free access policy.
• There are no constraints on the re-use of the standard.

OpenXML basically fails, at least in part, on all but two of those points, and it has the potential to eventually fail on one or both of those exceptions.

Also according to Wikipedia:

“The primary goal of open formats is to guarantee long-term access to data without current or future uncertainty with regard to legal rights or technical specification.”

. . . and . . .

“A common secondary goal of open formats is to enable competition, instead of allowing a vendor’s control over a proprietary format to inhibit use of competing products.”

For an example of the benefits of open formats:

HTML, and its successor XHTML, are open standards. You can readily see what damage has been done to interoperability by Microsoft’s domination of the web browser market in the fact that there are a lot of websites that have (thanks to MS’s anticompetitive practices, leveraging market dominance to increase market dominance) been coded specifically to Internet Explorer’s quirks so that other browsers are “shut out”. Imagine for a moment how much worse it would be if there were no (X)HTML standard. As things currently stand, Microsoft’s stubborn use of a bastardized markup implementation in IE is finally being challenged, in large part because it is in egregious violation of standards.

As a result, we are seeing increased competition in the browser niche of the application market years after IE had pretty much sewn up that niche by destroying Netscape’s ability to compete. That competition is not only resulting in the advancement of browser technology in Microsoft’s competitors, but is also forcing Microsoft to try to keep up with the Jonses by improving upon IE and related software after years of technological stagnation. OneCare, the Windows Firewall, inclusion of tabs and other advanced browser features in IE7, the ability to turn off ActiveX capabilities, addition of granular control over script execution in the browser, sandboxing, and many other security “improvements” (or at least attempts at the appearance thereof) can almost directly be attributed, at least in part, to competition in the web browser market niche.

If there were not an open markup standard from which Microsoft couldn’t just deviate completely without incurring some negative consequences, Microsoft’s “HTML” would be something (unrecognizable to HTML) else entirely that nobody else would be allowed to use, XHTML would never have been invented, and competition in that niche and other, related niches would be nothing more than a fond memory.

Additionally, a couple of terms to keep in mind as reasons to avoid proprietary formats:

• “vendor lock-in”
• “embrace, extend, extinguish”

Someone recently asked for a list of reasons for preferring open formats for documents over closed/proprietary formats. Part of the problem with answering that question is that it is asking for a list (by which I think is meant “a list of very short statements about advantages to an open standard”), when lengthy explanations are needed above and beyond mere bullet point items to get the point across. I took a whack at it anyway. Don’t blame me if the reasons for some of these advantages are not immediately obvious within the context of the list, though. In addition, some of the list items overlap others because of the fact that I tried to address much of the explanatory necessities of trying to get the point across, which required looking at different angles of the same issues.

For fun, I presented this list within a simple Perl script that can be run on a unix-like system to spit out randomly generated selections from the list when the program is called from the command line. I’m not entirely familiar with use of Perl on a Windows machine, but this script should be 100% portable by simply replacing the shebang line of the script with whatever Windows needs in its place.

  #!/usr/bin/perl
use strict;
use warnings;

srand;

my @reason = (
'Open formats eliminate legal restrictions on implementation.',
'Open formats ensure the full specification is available to implementors.',
'Open formats are far more difficult to leverage for anticompetitive practices.',
'Open formats do not lock organizations into reliance on a specific vendor.',
'Open formats provide greater ease of access over wide distributions of data to varying populations.',
'Open formats development tends to be pressured by common needs rather than marketability concerns.',
'Open formats do not change for no reason other than driving new application version uptake.',
'Open formats do not tend to involve sneaky ways to slip proprietary data formats into them.',
'Open formats foster inter-application compatibility.',
'Open formats do not allow imposition of royalty fees on implementors.',
'Open formats are more conducive to third-party software innovation.'
);

print @reason[rand(@reason)], "\\n";

From what I’ve seen thus far, it looks like both OpenXML and ODF specify a system of interrelated, modularized XML files to define a single document. Both, to some extent, allow for these to be combined into a single XML file for an alternative document saving format, or at least a drastic reduction in that modularization. When saved as a collection of interrelated files, however, they are then compressed using Zip-compatible compression. This bothers me, not only because the Zip algorithm is proprietary (though apparently free of implementation encumbrances), but also because the saved document format is no longer human-readable. Both format specifications claim human readability by pointing out the fact that they’re XML documents (and complex XML’s human readability is suspicious anyway), but ignore the fact that by storing the files in Zip archives they are rendered in a binary compressed format that requires translation to a human-readable form. In this respect, both document formats fall down. I am, understandably, disappointed. What the hell were the ODF people thinking? Microsoft, of course, doesn’t really give a fig — they’d rather make it as human-unreadable as possible, to improve on the vendor lock-in characteristics of MS file formats — but this seems antithetical to the aims of ODF.

Anyhow, there it is.

1. I’m not entirely familiar with use of Perl on a Windows machine, but this script should be 100% portable by simply replacing the shebang line of the script with whatever Windows needs in its place.

Replaced with the full path to the perl binary. ;) I once tried doing some perl self-training on Windows while I went on a Linux hiatus.

Comment by Alex — 26 April 2006 @ 09:42

2. How is the path represented? Does it still use a shebang at the beginning of the line? Would it look something like the following?

# !C:\path\to\binary.exe

Comment by apotheon — 26 April 2006 @ 10:08

3. The implementation of Perl I use (ActivePerl 5.005_03) uses file extension association like other Windows programs, so the shebang line is ignored, and you can just execute it. If the extension isn’t associated, you can always do:

perl reason.pl

Comment by SterlingCamden — 27 April 2006 @ 10:38

4. I don’t believe so, but its been a few years since I tried. Learning Perl describes how to do it correctly, IIRC.

Comment by Alex — 27 April 2006 @ 11:14

5. Sterling: Thanks for the tip, Chip!

Alex: Not my copy of Learning Perl, I’m afraid. I have Second Edition, which says “UNIX Programming” at the top. Hah.

Comment by apotheon — 27 April 2006 @ 02:29

6. Ouch. Hrm.

Go here (:

Comment by Alex — 27 April 2006 @ 10:48

7. Thanks. Now I know.

Comment by apotheon — 27 April 2006 @ 10:57

8. […] Chad Perrin writes of open formats, Microsoft, and true standards. Rather above my technical know how to summarize, so just go read it and be informed. […]

Pingback by Ameliorations » Boys and Girls in Toyland — 28 April 2006 @ 11:13

9. Hi Apotheon,

The fact that both ODF and MSOOX files “are rendered in a binary compressed format that requires translation to a human-readable form” is not real issue, for 2 reasons:

1) As long as the compression algorithm is a well established, royalty-free, de facto standard, available on any platform as an end-user utility or through free APIs in a lot of programming languages, the compressed packaging of an office document doesn’t really prevent anybody from reading the content. Executing a “unzip” command on an average office document is a one-click/one-second operation, that takes less time than the reading of the first line of content.

2) The ODF specification allows the compressed packaging but allows the flat, uncompressed XML storage as well. However, the OpenOffice.org software (and any future ODF-compliant office software) could just not survive in a competitive environment without using the compressed packaging, because nobody is happy to deal with a flood of unzipped XML over the disks and the networks.

Comment by jmgdoc — 9 May 2006 @ 01:54

10. Making a document format backward compatible with other (primarily binary) proprietary document formats is actually counterproductive to the purposes of designing and adopting an open document format standard. Rather than making the documents backward compatible (specifically with previous Microsoft document formats, ignoring other older document formats), make your new application that supports the new document format backward compatible so that it can translate freely between the two document formats.

Excuse me, but how is an application going to be able to “translate freely between the two document formats” if the formats are not compatible?

ODF cannot faithfully represent Word and Excel documents. Therefore, it isn’t even an option for most companies currently using MS Office. As it stands, the contest isn’t between ODF and OpenXML, it is between DOC and OpenXML.

Of course there is the little war between OpenOffice and KOffice over which one really implements ODF correctly.

Comment by Jonathan Allen — 24 January 2007 @ 05:01

1. Jonathan Allan —