My choice for this weekend’s Big Think post stems from a recent Wired article by Chris “The Long Tail” Anderson, in which he attempts to argue that the ability to sort through gigantic databases of information — something he associates with Google — will mean “the end of the scientific method.” As I understand it, his argument is that since we have so much data, we can just use algorithms to find correlations in the data, and that will produce as much insight as years of traditional scientific research. The piece is entitled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” and there’s a somewhat related post from Kevin Kelly (another Wired alumnus) on his blog Technium that he has entitled “The Google Way of Science.”
I think Anderson’s piece is an interesting thought experiment, and it forces us to think about how the sheer quantity of data we have available to us changes how we do things. However, like many others who have responded to his article (check the comments on the article for more), I think it has a number of serious flaws — and they are all summed up in the title, which implies that having a lot of data and some smart algorithms to sift through it means “the end of the scientific method.” That’s just ridiculous. It reminds me of philosopher Francis Fukuyama writing a book in the early 1990s about “the end of history,” in which he argued that the clash of political ideologies was more or less over, and that liberal democracy had effectively won. As we’ve seen since then, this was more or less complete rubbish.
Anderson argues that “The Petabyte Age is different because more is different.” There’s no reason for believing that this is true, however. Expanding the amount of data — even exponentially — doesn’t change the fundamental way that the scientific method functions, it just makes it a lot easier to test a hypothesis. That’s definitely a good thing, and I’m sure that scientists are happy to have huge databases and data-mining software and all those other good things; but that doesn’t change what they do, it simply changes how they do it. With all due credit to Craig Ventner of the Human Genome Project, sifting through reams of data about genetic pairs and sequencing them can help tell us where to look, but not what to look for, or what it means.
Whenever a game-changing technology like Google comes along, it’s tempting to extrapolate its benefits to virtually every sphere of our lives: “Hey, this thing Archimedes came up with called the screw is the best thing ever — now we never have to use nails or pulleys ever again!” But to take what Google does with PageRank and extend it to all of scientific research is absurd, (Kevin Kelly thinks so too). Even Google’s fiercest defenders would probably take issue with Anderson’s argument that its approach to ranking pages works because “If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.” The fact is that many of Google’s results are useless and bad, despite the fact that PageRank is functioning exactly as advertised.
And for the record, correlation still doesn’t mean causation, and likely won’t for the foreseeable future. Correlation just means that you found some data that shares some kind of relationship with other data; it can help suggest causation, but it doesn’t replace it.