Chad Perrin: SOB

28 August 2009

Significance of a Sample, in Ruby

Filed under: Geek — apotheon @ 11:18

The following is how to calculate the statistical certainty (i.e., the “statistical significance” of your sample taken as a percentage) that the results of a randomly selected sampling of a given size is actually representative of the total population. Note that this all assumes a normal distribution (i.e., the well-known “bell curve”). Calculations will be represented using Ruby source code and results, since it’s easier to type (and, I think, to understand) Ruby code than the typical mathematical notation used for these calculations. Any line of “code” that starts with => is actually the result of the current one-line calculation.

Disclaimer: The following explanation is correct, to the best of my recollection. If there’s anything wrong with it, I blame the passage of time, because I haven’t done this stuff since college. Please correct any errors in my explanation in comments following this SOB entry. Feel free to offer suggestions for how to make my explanatory Ruby code more clear in the aggregate to readers who may not know Ruby (but can muddle through code in general), or how to prettify the scripts at the end, but keep in mind that the point was to explain how to do some simple statistical calculations and not so much to write Great Software in this case. In fact, clarity of explanation for people who might not know the language is why I chose Ruby instead of Scheme, since I don’t know that I’m familiar enough with Scheme to make it as readable as my Ruby code (especially considering some people might get hung up on the prefix notation).

Without further ado, the process of calculating statistical significance of your sample starts with computing the average of your raw data:

raw_results = [1,2,3,4,5,6]
=> [1,2,3,4,5,6]

raw_total = raw_results.inject {|sum,n| sum + n }
=> 21

raw_mean = raw_total.to_f / raw_results.size
=> 3.5

Next, compute the differences for each data point from the average:

mean_difference = raw_results.collect {|n| n - raw_mean }
=> [-2.5, -1.5, -0.5, 0.5, 1.5, 2.5]

Then, compute the squares of each difference from the average:

mean_diff_squares = mean_difference.collect {|n| n ** 2 }
=> [6.25, 2.25, 0.25, 0.25, 2.25, 6.25]

Because you’re using a sample of the population rather than measuring the total population, you’ll subtract one from the sample size to calculate the standard deviation. The standard deviation is calculated by determining the average of the mean difference square values, then determining the square root of that number:

square_total = mean_diff_squares.inject {|sum,n| sum + n }
=> 17.5

square_mean = square_total / (mean_diff_squares.size - 1)
=> 3.5

stdev = Math.sqrt square_mean
=> 1.87082869338697

It’s difficult to pronounce stdev, and typing standard_deviation all the time is annoying, so let’s use the name of the Greek letter usually used to refer to the standard deviation in formulae:

sigma = stdev
=> 1.87082869338697

This all assumes a truly random sampling of the population, of course.

Now you just need to decide how precise your results have to be. A common assumption is that 95% certainty is “enough” for initial experimental results, though until your experimental results can be confirmed by independent experimentation it’s just “interesting” and not “meaningful”. To determine statistical certainty, you need to first determine the standard error of the mean:

sem = sigma / Math.sqrt(raw_results.size)
=> 0.763762615825973

Next, you calculate the relative standard error — which is the standard error of the mean divided by the mean:

rse = sem / raw_mean
=> 0.218217890235992

That’s your uncertainty. To find out your certainty, just subtract the number from 1:

certainty = 1 - rse
=> 0.781782109764008

Translate that into a percentage:

(certainty * 100).to_i.to_s + '%'
=> "78%"

As you can see, your rate of certainty is only about 78% — well short of the 95% target certainty (which is to be expected from a sample size of only six). As your sample size grows, your estimated statistical certainty increases, all else being equal.


If you wrote a simplified Ruby script called “stat_sig.rb” for all this, it might look like this:

#!/usr/bin/env ruby

raw_results = ARGV.collect {|s| s.to_f }

sample = raw_results.size

raw_mean = (raw_results.inject {|sum,n| sum + n }).to_f / sample

diff_squares = raw_results.collect {|n| (n - raw_mean) ** 2 }

sigma = Math.sqrt( diff_squares.inject {|sum,n| sum + n } / (sample - 1) )

certainty = 1 - ( ( sigma / Math.sqrt(sample) ) / raw_mean )

puts ( (certainty * 1000).to_i / 10.0 ).to_s + '%'

A simple script I wrote called “numbers.rb” to generate data sets for a sort of off-the-cuff heuristic “that looks right” test of the “stat_sig.rb” script looks like this:

#!/usr/bin/env ruby

(1..ARGV[0].to_i).each {|n| puts n }

I used them together by typing something like this (with the 100 indicating I want my data set to consist of every number from 1 to 100) at my Unix shell prompt:

stat_sig.rb `numbers.rb 100`


Don’t forget that changing any of the underlying assumptions for the above explanation (such as that your results will conform to a normal distribution) can invalidate this methodology for calculating a measure of statistical significance. Also remember that 95% certainty is just a rule-of-thumb threshold for statistical significance, and that number may change depending on the circumstances of your statistical analysis.

26 August 2009

It’s not my fault your business model sucks.

Filed under: Liberty,Profession — apotheon @ 10:37

(The following was inspired by a question asked in response to The Mythology of Intellectual Property.)

There are innumerable ways to make money without copyright — and, in many cases, people are already doing so and may not even realize it.

For instance, for the most part signed bands use record sales solely to pay off debt incurred as part of their record deals for purposes of getting initial record publishing and distribution done; they get their actual living wages (on the rare occasion when they can make a living from music) by playing live gigs and selling merchandise. They may not actually realize it in many cases, but for most professional musicians the real financial benefit they get from record sales — the one part of the profession that requires copyright — is advertising. The record labels get profits directly from record sales, while the musicians just get well enough known to be able to make money at their live shows. News flash; it’s a lot cheaper to distribute yourself over the Internet, and let people burn your CDs if they want to, than to pay out the nose to have some suited schmucks at Sony/BMG make money off you and only advertise for you incidentally.

For other examples of how people can make money without jealously guarding their intellectual monopolies, look at Cory Doctorow (he keeps making his books and short stories available for free online); Radiohead, Nine Inch Nails, and Harvey Danger (made more net profits off records they basically gave away than on albums distributed through the usual channels); and Websites that use movies and other video productions to drive traffic to them so they can make money through secondary effects (such as on-site advertising, merchandising, and so on).

There are also services such as fundable that provide an easy framework for getting people to pledge money toward the eventual free release of something. You create something, ask for a particular target value in contributions and, once you get the money, release it to the world; voila, you’ve been paid. A number of writers have used this to finance authorship of books, and a number of musicians have done the same for production of album-length collections of music.

I make money by writing (both articles in English and software source code), in fact — far more than the piddly quantities I get from advertising on this obscure site — and I would love for copyright to go away. It’s not like I’m working in manufacturing and advocating for someone else’s industry to change all its rules. I’m talking about what I want to happen with the very fields of endeavor where I make money. This is why, every time I can reasonably do so, I attach a copyfree license to everything I create — usually the Open Works License.

The real answer to the question, though, is much simpler than all of the above:

It’s not my fault your business model sucks.

I’ve said it before (more than once in fact) and, given half a chance, I’ll say it again.

The Mythology of Intellectual Property

Filed under: Cognition,Liberty,RPG,Writing — apotheon @ 04:41

Intellectual Property may be the most pernicious myth of our time. The lies, misunderstandings, and myths of Intellectual Property so obscure the truth about copyright, patent, and trademark law that even those of us who oppose such legalisms must still work to shake loose our last remaining illusions. It seems like every few months I stumble across yet another insight into the nature of so-called Intellectual Property that leaves me surprised I never noticed the flaw in my thinking, and aghast at how deeply rooted the mythology of Intellectual Property has become.

The Product of the Intellect Isn’t Property

The first, most obvious, and perhaps most difficult to fight among all the superstitions surrounding Intellectual Property is the notion that it is property at all — at least in the way people talk about it being a matter of property. People talk about “owning” copyright, and needing the protection of law to defend one’s ability to profit from what one “sells”. The truth of the matter is that copyright and property laws have nothing in particular to do with each other. They are entirely distinct bodies of law.

Even the law doesn’t recognize copyright as “property”. If you violate copyright laws, it is not called “theft”; it is called “copyright infringement”.

Here’s a quick litmus test for your notion that copyright is property: Why does copyright have a limited period under law, while property is forever? While you’re at it, look into the US Supreme Court rulings on the subject starting with Wheaton v. Peters.

People Who Oppose Copyright Aren’t Thieves

Try disabusing someone of notions of the “obvious” moral imperatives of copyright law in a public online discussion forum, and you will almost certainly find yourself being called a thief. The “argument” tends to go something like this load of claptrap:

[You] like to steal things that [you] like. [You] have come up with several longwinded rationalizations for why [you’re] entitled to have everything that anyone in the world creates without paying for it.

Cries of “Thief!” are apparently the equivalent of calling someone a Nazi when it comes to a discussion of the ethicality of copyright law. Similarly to Godwin’s law, we seem unable to escape from the Law of Copyright Discussion Fallacy: As a discussion of copyright law grows longer, the probability of someone’s argument being fallaciously dismissed as mere justification for theft approaches one. The thief card gets played more often and with greater certainty in discussions of copyright than the Nazi card ever did in Usenet, and it doesn’t prove a thing about the rightness or wrongness of copyright law. Try telling that to some self-satisfied copyright-wing conservative who isn’t willing to actually think through the opposing argument, however, and you will find yourself frustrated by the difficulties of teaching a pig to sing.

Patents Don’t Encourage Innovation

Milton Friedman once articulated the core economic fallacy of patent law quite clearly:

For one thing, there are many “inventions” that are not patentable. The “inventor” of the supermarket, for example, conferred great benefits on his fellowmen for which he could not charge them. Insofar as the same kind of ability is required for the one kind of invention as for the other, the existence of patents tends to divert activity to patentable inventions.

In short, patents don’t encourage innovation; they simply skew market activity toward patentable innovations.

I remember, back in the ’80s, that the Big Thing for eco-hippies to complain about was the vanishing rainforests. All kinds of crazy excuses were advanced for why we should stop the slash-and-burn farming practices of South America, including the lunatic notion that we’d destroy the Earth’s ability to renew the oxygen content in the air and we’d all end up suffocating as a result, completely ignoring the fact that the vast majority of plant-based oxygen production happened in the ocean. Such arguments ultimately only harmed the eco-hippies’ case, when arguments based on real concerns over vanishing rainforests would surely have been much more successful. One such crazy argument was that the cure for cancer could be hiding in that forest, waiting to be discovered, amongst its many uncategorized species of life, and all we had to do to preserve it is ensure that nobody ever destroys plants in the Amazon rainforest again (thus preventing them from finding the cure for cancer).

The real problem there, however, is that nobody will fund the search for a cure for cancer (or HIV, or ebola, or whatever) that comes from a natural source. Extracts from natural sources are not patentable. Only the process of creating synthetic compounds is patentable, which makes pharmaceutical research focus much more on developing salable synthetic compounds that require only the most minimal “innovation” rather than cures for the most problematic diseases. An artificial advantage has been granted to any pharmaceutical research firm whose focus is on convenience synthetics, creating a skewing of market forces away from pursuit of necessary cures regardless of source.

Copyright Isn’t the Natural State

People seem, for some reason, to think that copyright is an integral part of a natural state of property ownership. Self-styled libertarians in particular are often guilty of this line of thinking, particularly when the “we have rights because we own ourselves” set starts jawing about Intellectual Property. The truth is that copyright isn’t about property at all; it’s about censorship.

People may balk at the notion that copyright is censorship. They think of censorship as being something government does to suppress original speech. The truth of the matter, though, is that speech doesn’t have to be original to be censored. Simply repeating something you were told is a form of free speech — and if someone else said it first, that person has the power of law on his side to censor what you’re saying.

Even when confronted by obvious evidence of the fact that copyright is just a subset of censorship policy, as in the case of people who are threatened with DMCA takedown notices when they post customer service emails online while complaining about the company that sent the emails, people typically express their dismay that copyright is being “abused” to enact “censorship” when that’s “not what copyright is for at all”. Bad news, sweetie; that’s not abuse of copyright. That’s just the way copyright works.

Copyright is, in fact, such an unnatural state of affairs that it didn’t even exist as a policy until a mere 77 years before provision for copyright and patent law was written into the US Constitution, with England’s Statute of Anne.

Trademark Law Is Not Trouble-Free

Even many who oppose copyright and patent law subscribe to the notion that trademark law is the exception to the “Intellectul Property law is bad” rule. It seems clear, at first glance, that trademark law just protects us against fraudulent behavior — and for a while, I thought it was exempt from the problems of copyright and patent laws. The truth of the matter is much more insidious, however.

Trademark law is not at all necessary to protect us from such fraud. The law should simply recognize malicious deception as a violation of rights in and of itself, regardless of any trademark claims. Meanwhile, trademark law has been used as a means of circumventing grants of license when distributing derivative works. An accidental case of this sort of problem is that of the trademark brouhaha over Firefox that caused the Debian project to rebrand it as Iceweasel. A much more intentional and malicious case is that of the way some third-party publishers deal with the OGL.

Take a look at the “open content” and “product identity” statements accompanying the OGL inside your D&D-derived game books at some point in the future (if you have any). Many of them will contain severely limited language about what qualifies as “open content”, such as the following from the Iron Kingdoms Character Guide, published by Privateer Press:

“Open Game Content” means the game mechanic and includes the methods, procedures, processes and routines to the extent such content does not embody the Product Identity and is an enhancement over the prior art and any additional content clearly identified as Open Game Content by the Contributor . . .

This boils down to saying “Anything copyrightable isn’t Open Game Content unless we specifically say it is, and anything that falls more into the realm of patents is Open Game Content.” Considering that there’s already case law pointing out that anything that falls under the rubric of patent law isn’t subject to copyright anyway, Privateer Press is just saying “Yeah, all that OGL stuff? Fuck you.” The publisher is just trying to get away with something. In fact, I’m pretty sure that if WotC/Hasbro wanted to, it could destroy the entire Privateer Press line of Iron Kingdom RPG books and do some severe financial damage to this third-party publisher by taking it to court over violation of the OGL.

It gets worse when you see the Product Identity identification:

“Product Identity” means product and product line names, logos and identifying marks including trade dress; artifacts; creatures characters; stores, storylines, plots, thematic elements, dialogue, incidents, language, artwork, symbols, designs, depictions, likenesses, formats, poses, concepts, themes and graphic, photographic and other visual or audio representations; names and descriptions of characters, spells, enchantments, personalities, teams, personas, likenesses and special abilities; places, locations, environments, creatures, equipment, magical or supernatural abilities or efects, logos, symbols, or graphic designs; and any other trademark or registered trademark . . .

Hell, I’m probably violating their asinine license-violating version of the license by discussing it with you, all because of this misuse of “product identity” — which is essentially a euphemism for “pseudo-trademark”. For those of you familiar with Microsoft’s antics, think “look and feel”.

Plagiarism Isn’t a Subset of Copyright Law

Plagiarism is deception. Copyright infringement is copying. Even giving credit to someone is copyright infringement if you copy the part of the work that gives credit to the original author. Ironically, leaving out attribution for the original author actually reduces the amount of copyright infringement of which you’re guilty. I have no fucking clue how anyone can possibly think that eliminating copyright law is the same as endorsing plagiarism. If you copy something and redistribute it, as long as you don’t claim you created it when in fact someone else did, you aren’t committing plagiarism.

For some incredibly stupid and unfathomable reason, some people actually think that without copyright law plagiarism is “okay”, though.

Copyright Law is Not Enforceable

I recommend giving my recent TechRepublic article The Pirate Bay is back with a vengeance (which you’d already know about from an email if you were involved in the copyfree community . . .) for more on this subject. I’ve written about it before, and don’t feel like padding the word-count of this SOB entry much more.

And So On

I’m sure I’ve left a lot out that I could say. I could write a book on the subject, but you’re not here to read a book, and I have other stuff to do tonight — like eat dinner and do some work on some RPG materials (which I will be releasing under an open content license when it is “done enough” to bother, most likely the Open Works License).


I am not a lawyer. None of this is legal advice.

Older Posts »

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License