Thursday, September 21, 2006

Log-Log Plots, Power Laws, and Pirates

"What rolls downhill..."

I didn't blog about International Talk Like a Pirate Day, but at least I can have a belated entry that involves piracy.

Gazebo recently blogged about a study concerning the number of 'r's that appear in pirate exclamations (based in google hits). The study thankfully advises caution in intrepreting power laws from log-log plots (amen!), but their data made things like there was a power law with an exponent of -4.

I had four immediate reactions (with (2)--(4) highly correlated):

1. I was amused. (I also thought about a certain graph on the FSM webpage...)

2. I thought that this was about as worthy of being published in Physical Review Letters as some of their prior articles claiming power laws in various contexts (more on this later).

3. I thought of Barabasi and his overblown powerlaws in his work on the Web as a network.

4. I immediately had the urge to rant about power laws.


(1) is straightforward, so let's move on. I'll just move straight to (4). The connection with (2) will come into play a bit, but I'll probably not get into (3) here.

I have given variants of this rant on numerous occasions, because there are some scientists out there (including several who study networks) who feel for some reason that log-log plots and power laws are surrogates for actual science.

It is true that power laws show up a lot. Their appearance is not a surprise at this stage (this used to be different, but people now understand that heavy-tailed phenomena occur a lot), but lots of people (especially in physics, as far as I can tell) their appearance is the final conclusion of a paper when it is really a step along the way (or even almost the first step in the case of analyzing, say, a real-world graph). I'm not being entirely fair because I'm slamming the entire careers of some statistical physicists with this comment and there are some situations in which just finding a power law is useful, but there's also tons of bullshit that goes on.

OK, now let's suppose somebody is claiming a power law. People know that they can get a PRL by doing this, so they try to pull power laws out of their asses. To see a power law, one plots data using logarithmic axes so that a straight line over some range means "power-law" behavior. Now, one thing that is important is over what range this occurs. Unless this holds over a decent number of decades (powers of 10) of data, than it is meaningless to conclude that one has a power law. Those of us who took laboratory classes learned very early that to get straight lines in our data, we just took more logarithms until the lines looked straight. (It's kind of like drinking more alcohol until Rosanne Arnold suddenly looks hot, although this may not actually be mathematically possible, whereas making numerous curves look straight via logarithms most definitely is.) Even if you plotted Angelina Jolie using a log-log plot, she would look like a straight line!

A few years ago, there was a study of the average number of decades in "power laws" in papers reported in all the Physical Review journals (or maybe it was just PRL? I forget.), and the authors of this paper concluded it was something like 1.2. (I don't remember who wrote the paper, but either Mark Newman or Steve Strogatz brought this up when I audited Strogatz's complex systems course as a graduate student at Cornell.) That is not enough to conclude power law behavior, but it certainly didn't stop people.

There are many times that I have sat in the audience of a talk and seen someone claim a power law that I think is spurious because the number of decades of data over which the purported behavior is too small to justifuably make that claim. (Moreover, the claim doesn't actually help you with anything practical--such as renormalization--unless it holds over enough decades.) I remember one math grad student at Cornell even claimed a "power law" behavior over half a decade of data. He was a nice guy, but I ripped him apart for that. (Because he was discussing a bifurcation problem, there actually was something relevant in the functional relationship at a particular instant, but he was trying to infer something that could not even come close to being justified based on what he had.)

Grrrrrrrrrrrrrrrrrr.....

2 comments:

Anonymous said...

If you clicked a particular series of links starting from my post you might have found this already, but if not: here's a Cosma Shalizi post where he talks about spurious power-law inferences.

Mason said...

Ah, I should have recognized the name before. Cosma and I have a co-author in common (and I also recognize several other coauthors on his publication list).