Sunday, February 22, 2015

The Big, the Rich, and the Good (Data)

Nate Silver has written a fascinating article about the stellar success and rapid progress of baseball analytics versus the less-than-rapid progress of data analytics in other areas (e.g., economic and earthquake forecasts).

Putting the baseball angle aside (and of course I like that angle very much), one thing I really like is the very concise comments about "big data" versus what Silver calls "rich data" and why sports analytics have genuinely improved answers to many important-for-it questions, whereas other situations still struggle immensely to use their data for genuinely large increases in understanding.

Note that I have sometimes used the term "good data" before as a contrast to "big data", though importantly good data can be either big or small, and I think that Silver is thinking of Rich Data strictly as a subset of Big Data. See this blog entry of mine as well as additional blog entries referenced therein.

One could also, I suppose, ask whether these other systems are "more complex" than sports competitions, but I'm not sure (a) whether that's actually true and (b) how to quantify it in a way that goes beyond "Ooh, it's complex." Of course, we have measures of information content, but I expect there would be a lot of assumptions involved in crunching such numbers in these cases. Anyway, it's a thought I had, so I figured that I should at least bring it up.

No comments: