Skip to Main Content (access key 1)
Skip to Search (access key 2)
Skip to Search GO (access key 3)
Skip to comments (access key 4)
Skip to navigation (access key 5)
Skip to top of page (access key 6)
Thursday, June 26, 2008 | Reason : In the News | print version Print | Comments |

Document The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

by Wired

Thanks to SPS for the link.

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory/#/

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
By Chris Anderson

"All models are wrong, but some are useful."

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.

Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

The big target here isn't advertising, though. It's science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.

In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including Google File System, IBM's Tivoli, and an open source version of Google's MapReduce. Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?

Chris Anderson (canderson@wired.com) is the editor in chief of Wired.

Comments 1 - 44 of 44 |

Reload Comments | Back to Top | Page Numbers

1. Comment #199762 by glenister_m on June 26, 2008 at 10:35 am

To paraphrase:

Your data is impressive, but is it science?

I have to wonder if during his work Venter collected some gene fragments, and two unrelated fragments were linked by chance into a unique sequence, whether that could be mistaken for a new unknown species? It would then take a lot of work to determine that no such species existed. Or what if the genes are leftovers from an extinct species that doesn't have modern analogs and therefore no basis for comparison?

I appreciate the value of discovering a new species, but on this planet at least can you really be credited with discovering it if you don't know anything about it? (Obviously if we detected another planet with both oxygen and methane in the atmosphere, that would indicate an unknown life form, and would be a big discovery).

Other Comments by glenister_m

2. Comment #199764 by Caudimordax on June 26, 2008 at 10:38 am

 avatarYikes! It sounds very squishy to me - isn't there a lot of correlation out there that doesn't have a thing to do with causation? No mechanistic explanations? I'm very suspicious.

Other Comments by Caudimordax

3. Comment #199765 by Oystein Elgaroy on June 26, 2008 at 10:39 am

 avatarWeird article. I always thought model building was an essential part of science...

Other Comments by Oystein Elgaroy

5. Comment #199770 by Steve Zara on June 26, 2008 at 10:43 am

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all


Pure tosh.

Collecting vast quantities of data alone isn't science. Venter presumably had models of what would be sufficient genetic difference to qualify as the identification of a new species.

Collecting huge amounts of data and statistically analysing this has always been part of mainstream science. There is no change in the scientific method that results from that.

Other Comments by Steve Zara

6. Comment #199780 by Caudimordax on June 26, 2008 at 10:57 am

 avatarThanks for the link Epeeist - that said it all, or most of it.

Other Comments by Caudimordax

7. Comment #199781 by qomak on June 26, 2008 at 10:58 am

 avatarWhat a ridiculous article. Google's founding innovative idea was that internet users can decide more efficiently than some code written in AI; this idea was not really revolutionary. It was simply a cheap and code-wise efficient way to get around the semantic analysis of the documents.

Now, how the hell the author can generalize this to the above article and at the same time claim it is logically sound is a mystery.

The simplest objection than data cannot give you any prediction and without prediction, whatever you're doing is useless.

Other Comments by qomak

8. Comment #199782 by Sciros on June 26, 2008 at 10:58 am

 avatarHmmm... throw lots of data at an algorithm and see what patterns it finds. This doesn't "supercede" anything other than worse approaches to finding those same patterns. The explanation for the pattern, and indeed choosing a good algorithm in the first place -- that is science.

Other Comments by Sciros

9. Comment #199790 by epeeist on June 26, 2008 at 11:06 am

 avatarComment #199770 by Steve Zara
Collecting huge amounts of data and statistically analysing this has always been part of mainstream science. There is no change in the scientific method that results from that.
Data mining in the finance industry comes to mind as well.

Having worked in industries where one does designed experiments and others where one just gathers huge amounts of data and hopes to sort it out later I know which tends to work better.

And as has been noted elsewhere, there is an correlation between the decrease in the number of pirates and the increase in global warming.

Other Comments by epeeist

10. Comment #199794 by qomak on June 26, 2008 at 11:10 am

 avatar
We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.


Ah, more juicy stuff. This just shows how ignorant he is of the status of current clustering algorithms.
At this very moment, there is almost no satisfactory theoretical model of clustering. Clustering is still in its infancy and the field is really chaotic. There are some statistical models but if you don't like models, I have no idea how you are going to use these algorithms.

Even worst. How do we test which of the algorithms based on these models is correct? We pick a data set for which we know the answer, run the algorithm, then sit back and see how close we get. Repeat this for a few times (maybe nudge the data a bit here and there) and you have a new paper in clustering.

The nail in the coffin is that to write an algorithm you will need a model, either a theoretical framework or at least some practical heuristic assumptions. To claim somehow these algorithms can help us get rid of the models is like trying to saw off the branch you are sitting on.

Other Comments by qomak

11. Comment #199795 by Steve Zara on June 26, 2008 at 11:11 am

Comment #199790 by epeeist

I used to work in molecular modeling - Monte Carlo simulation (which, as the name suggests, involved statistics). I would have loved to have terabyte-scale systems with thousands of processors.

Other Comments by Steve Zara

12. Comment #199800 by Apathy personified on June 26, 2008 at 11:18 am

 avatar
What can science learn from Google?


IMO, sweet FA.

Other Comments by Apathy personified

13. Comment #199803 by Edouard Pernod on June 26, 2008 at 11:24 am

 avatarWired is dumb.

Other Comments by Edouard Pernod

14. Comment #199813 by squinky on June 26, 2008 at 11:33 am

 avatarPure crap. This sounds like a computer scientist's wet dream. I remember computational types saying that after the human genome was sequenced, we'd be able to cure diseases using a computer. Right! Get out of the darkened office, stop staring at the screen, and go get a beer or get laid or something you stupid neckbeards!

Other Comments by squinky

15. Comment #199816 by advocatus_diaboli on June 26, 2008 at 11:35 am

Another problem is that many of the algorithms involved, even if we pretend they are not themselves based on models, use a fair bit of fuzzy math themselves and only make relative comparison. They sort out enough to ensure that a, b, and c have a certain relation to one another but in complex systems, especially something as diverse as entire ecosystems, I have difficulty believing our dear Ventor is hitting all his targets(suppose I am off to pull up information on him now).

I should clarify that not all genetics algorithm's find only approximates but it has been my experience that the larger the database of items being compared the more approximation is favored to exacts as a better solution(how many people have the same first 4 letters of their surname, same birthdate, and same last four digits to their SS number; for instance)

The next thing this guy will be telling is us that bubble-sort will lead to the singularity.

Other Comments by advocatus_diaboli

16. Comment #199819 by Don_Quix on June 26, 2008 at 11:37 am

 avatarWired has been going continuously downhill for a decade. I pretty much only read it for the gadgets and ads anymore.

Other Comments by Don_Quix

17. Comment #199823 by advocatus_diaboli on June 26, 2008 at 11:40 am

This sounds like a computer scientist's wet dream.


Quite the opposite! Their reliance upon some god-algorithm that can determine anything and everything undermines the very processes by which useful algorithms are created and in the end they still rely on models. They are basically just suggesting we throw away construction instructions to those models and wing-it.

The data is wholly useless without some means of interpretting it which this article does not show us they intend to do. Their results may lead to further investigation into a specific area and help us bipass some tedious processes, but at the end of the day the scientific method comes home wanting to know where its dinner is and demands a quickie before the news at 10.

Other Comments by advocatus_diaboli

18. Comment #199829 by Cartomancer on June 26, 2008 at 11:52 am

 avatarReminds me of the way the majority of nineteenth century academics (mainly German ones) used to do Classics and History - simply gather as huge and compendious a selection of documents as possible then edit them all in as much painstaking detail as you can, publish in huge cyclopean volumes of anriquarian lore and call it scholarship. Very useful for giving us properly edited classical texts, but for understanding the way the Greek and Roman worlds worked? Not so bright there...

Other Comments by Cartomancer

19. Comment #199842 by Veon on June 26, 2008 at 12:12 pm

 avatarI hope this article is written by an economist.
Going with whatever works best is economics (hence Google's success)
Figuring out how and why it works best, now that is science

Other Comments by Veon

20. Comment #199845 by 8teist on June 26, 2008 at 12:19 pm

 avatarI hope that its not Bill O`Reilly`s emerging tech conference ......today`s topic , the wheel and its impact on society....ban it

Other Comments by 8teist

21. Comment #199856 by LochRaven on June 26, 2008 at 12:34 pm

 avatar"There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show."

Is it just me or does this almost have a religious analogy to it? How about this?:

"There is now a better way. The Bible allows us to say: "God is enough." We can stop looking for models. We can analyze the scriptures without hypotheses about what they might show."

Maybe I'm just being paranoid.

Other Comments by LochRaven

22. Comment #199858 by Sciros on June 26, 2008 at 12:35 pm

 avatar
Pure crap. This sounds like a computer scientist's wet dream. I remember computational types saying that after the human genome was sequenced, we'd be able to cure diseases using a computer. Right! Get out of the darkened office, stop staring at the screen, and go get a beer or get laid or something you stupid neckbeards!

Computer simulations/data miners are actually your friend if you're interested in a lot of current research. They're guided by MODELS, though, so the Wired guy basically doesn't know what he's talking about. Though, based on your last sentence, I submit that neither do you.

Other Comments by Sciros

23. Comment #199867 by 82abhilash on June 26, 2008 at 12:48 pm


There is now a better way. Petabytes allow us to say: "Correlation is enough."


Because for Google to make lots of money all they have to do is co-relate popular data on the net with monetizing instruments like ads, banners and product placement. The process is mechanistic and efficient. Correlation is enough for this particular business model, for now, business models too tends to get outdated over time. Wait did I say model? A system visualized in the mind of an entrepreneur? A model he tests in the real world, where the success is defined by profit? Which means google has to have an idea of why some people buy their ad space and why some people click on them? After all they are still in business are they not? And they are consistently making good decisions. Which means it is not just luck.

But in any case, it is a stretch of the imagination to say 'and therefore that is how science should work from now on.' That is plain stupid.

What else is plain stupid? This article. Even in business you need more than co-relation if you plan to last. I think the people at google know that.

Other Comments by 82abhilash

24. Comment #199900 by kwhitefoot on June 26, 2008 at 1:29 pm

 avatarStatistics help generate theories by revealing correlations and certainly help validate them.

But surely the single most important feature of a model or a theory is its predictive ability. How does a statistical correlation engine do that without a model?

Other Comments by kwhitefoot

25. Comment #199927 by adamhaar on June 26, 2008 at 2:03 pm

Re: 25. Comment #199900 by kwhitefoot
Hear, hear!
That was the first thing that I though of after reading the article. All data mining can do is examine the past; to attempt reasonable prediction one needs a model of some sort.

Other Comments by adamhaar

26. Comment #199956 by gr8hands on June 26, 2008 at 2:33 pm

I suppose I'm the first to mention the specific errors in Chris Anderson's inaccurate rant relating to language.

That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German).
Clearly Anderson has not actually looked at the Google website, where they actively solicit fluent native speakers in languages to help them make Google available in other languages.
And why it can match ads to content without any knowledge or assumptions about the ads or the content.
What a joke! The ads have a tremendously low (and ridiculous) match rate! And Anderson is obviously ignorant about website metatags.

Anderson is also confused about the validity of the number of incoming links actually providing data -- when you're doing a search! If you want to find information on griffins, perhaps you would think www.griffin.com would be high in the listing, or a website that has the word "griffin" on it a large number of times, but not necessarily!

No, he's clearly . . . I'm not sure how to end that sentence.

Other Comments by gr8hands

27. Comment #199959 by advocatus_diaboli on June 26, 2008 at 2:39 pm

gr8hands, if I can find it I had a link(I think I got it from the Code Project newsletter if anyone receives it and can think of the date)to a list of amusing google ad hacks where people were exploiting the great intelligence that is google's ad system to give amusing results.

Any system that can be so easily exploited should not be the basis of scientific progress. Unless of course we have Yahoo and Ask.com peer review its search entries.

Other Comments by advocatus_diaboli

28. Comment #199969 by Barry Pearson on June 26, 2008 at 2:57 pm

 avatarHow is this supposed to tell us what new data we need to seek?

What new telescope do we need? What new collider? What new biological probes?

Other Comments by Barry Pearson

29. Comment #199970 by TeraBrat on June 26, 2008 at 2:59 pm

We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.


If it was that simple it wouldn't be so hard to prove global warming. The problem is that most natural systems are so complex you can't always be sure what numbers to throw in. There are things it's impossible for a computer to solve. I'd like to see a computer give the exact composition of lignin or humic acid. It will never happen because their structures are way too diverse. Even if you were able to give an exact description of a lignin molecule I doubt a computer could give you every possible molecule it could become under all possible conditions.

Other Comments by TeraBrat

30. Comment #199972 by advocatus_diaboli on June 26, 2008 at 3:02 pm

Even if you were able to give an exact description of a lignin molecule I doubt a computer could give you every possible molecule it could become under all possible conditions.


That puts me in mind of http://en.wikipedia.org/wiki/Rosetta@home

Other Comments by advocatus_diaboli

31. Comment #199983 by AmericanGodless on June 26, 2008 at 3:16 pm

 avatarThis article reminds me of one I read in "Omni" magazine about 25 years ago that suggested that biologists should stop experimenting with animals, and just model them with computers. I wrote to them and asked how we would ever "model" the cellular biochemistry of animals without ever experimenting with a live animal. No answer. I dropped my subscription. What ever happened to "Omni?"

Having since then moved from biology to earth science, I took a break this afternoon from writing programs to make certain kinds of original data easier to visualize so that human data managers might better judge its quality, and read this article where I learned that what I am doing is worthless.

So, while they are crunching the numbers and finding all of those statistical correlations that need no model to tie them to a humanly-constructed conception of reality, here's a possiblility to throw into the mix: What if the data they are using is wrong?

Oh, sorry, that would be a part of a "model" for interpreting the data, wouldn't it? We aren't supposed to care why the data play out way they do, just report what the numbers do, how they correlate with each other, and go on to publication.

My job is working with historic data that have recorded (in a sometimes faulty way) some geophysical aspects of a world in the past that we cannot go back and measure again. There are many cases in which we have to flag the data as compromised in one way or another; sometimes it is correctable, sometimes it is not. All those corrections, all of those compromises to the data, become at least a small part of the "model" that is used to understand how the reality of today (and, we hope, of tomorrow) correlates with the raw numbers collected in the past. It is a necessary part of science.

Not that I think that computers are incapable of eventually becoming conscious, and making their own decisions on how to correlate the "bad" data with, say, a systematic problem that affected one or two out of a dozen investigations; or with a bad batch of instrument probes in another case; or a computer bit-shift in another case; or a units conversion problem in another. But before they do, they will have to have internalized a lot more information about how both humans and machines can mess things up. And when they do, what they will be doing is building models themselves, and will have taken a large step toward artificial intelligence. When the computers start, on their own, suggesting to the humans new ways in which to look at the data, ways that the human beings have not thought of on their own, then they will have earned a place on the author line of the published papers. Until then, science will continue to make progress through human-designed models, not unexamined and un-modeled correlations.

Other Comments by AmericanGodless

32. Comment #200047 by OrbitalMike on June 26, 2008 at 5:15 pm

 avatarThe use of statistics to describe the data is meaningless unless the underlining model of the statistical method is defined. Gaussian distributions will only get you so far. Log-Log correlations may hint at an underlying reason and model where a straight linear model might give you unrecognizable garbage. Mr Anderson seems to have never had to do anything in science except write gushing reviews of information sciences, ala Google. Maybe he should consider taking some elementary statistics (yes, even 6-sigma crap) before he spouts off again on the utility of petabytes of data.
He could also use some rudimentary (remedial) courses in a real physical science. He seems to think that "Science" should return to the days of just collecting and classifying data and specimens without any regard for understanding what the data means.

Other Comments by OrbitalMike

33. Comment #200049 by ricey on June 26, 2008 at 5:20 pm

WIRED is loosing the plot.

RealClimate rumbled these guys too:

http://www.realclimate.org/index.php/archives/2008/06/wired-magazines-incoherent-truths/

Maybe they think the controversy will improve circulation

Other Comments by ricey

34. Comment #200056 by chuckgoecke on June 26, 2008 at 5:48 pm

 avatarIn my former field(now I'm in Horticulture) of Petroleum Reservoir engineering, we constructed huge(at the time) numerical models to predict the future performance and recoverable reserves of oil or gas from a reservoir. This modeling was mainly during the preliminary development, because we had no data about how the reservoir would actually work. Once production starts, and a sizable amount of production data, plus other supporting observations, such as pressures, and ratios of fluids(oil, water, and gas)is collected, a much simpler, but more robust method is used to predict future performance. Called decline curve analysis,its basically graphical extrapolation of the production trends. Hopefully, once a decent amount of good climate data is collected, something like this will be possible for the climate, because the numerical models suuuuck! Only one step above a wild ass guess stab in the dark.

Other Comments by chuckgoecke

35. Comment #200058 by dr joneZ on June 26, 2008 at 5:52 pm

 avatarWhy do (most) humans always think that all you have to do is collect data and that somehow the data will do all your thinking for you? What about creativity?
How is this supposed to tell us what new data we need to seek? What new telescope do we need? What new collider? What new biological probes?
Barry Pearson's comment is the most perceptive so far IMO. In order to analyse anything there needs to be something there to analyse doesn't there? Imagine a bunch of experts seated around a table, ready and fired up to do some hardcore number-crunching. They all go home early because nothing happened. Why? Nobody was able to come up with anything to analyse.

Other Comments by dr joneZ

36. Comment #200086 by dragonfirematrix on June 26, 2008 at 8:07 pm

 avatarI cannot wait for the age of real enlightenment to begin:

Exabyte, then
Zettabyte, and finally a
holotabytes :)

Other Comments by dragonfirematrix

37. Comment #200089 by Scot Rafkin on June 26, 2008 at 8:32 pm

 avatar`Science is built up of facts, as a house is built of stones; but an
accumulation of facts is no more science than a heap of stones a
house.'
- Jules-Henri Poincare

Other Comments by Scot Rafkin

38. Comment #200125 by jo5ef on June 27, 2008 at 12:07 am

Usually I'm a fan of a bit of arm waving about the impact of the Internet etc on human knowledge myself, but this article seems to be incoherent and lacking in real insight.

Other Comments by jo5ef

39. Comment #200136 by Raiko on June 27, 2008 at 1:10 am

 avatar
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.


Wow. Has this person ever done any science? Where does he think the ideas come from where and how to start looking for useful data?

Other Comments by Raiko

40. Comment #200210 by sean salvador on June 27, 2008 at 4:47 am

Not much to say here.
Veon got it spot on with the above comment

"hope this article is written by an economist.
Going with whatever works best is economics (hence Google's success)
Figuring out how and why it works best, now that is science"

Other Comments by sean salvador

41. Comment #200352 by inverse on June 27, 2008 at 11:37 am

Speaking as someone with a degree in computer science....

This article strikes me as complete nonsense. Google uses an algorithm to determine the relative importance or relevance of text. An algorithm is a model. The amount of data you apply it to is irrelevant. It only matters that the model works.

It's akin to saying quantum mechanics isn't science because the LHC is going to generate more data than we can readily analyze. The whole point is testing models and seeing which ones work.

Google has done well because their particular model works relatively well. I would also suggest that their model of a lightweight front end was and is a major contributing factor, but that's beside the point.

The whole things strikes me as an argument from ignorance - the required models are too complicated to understand, so I'll just sit in the corner and have faith in the data.

Other Comments by inverse

42. Comment #200511 by Rational_G on June 27, 2008 at 6:50 pm

 avatarThis article is complete nonsense.

The scientific method is alive and well.

Science literacy, however..............

Other Comments by Rational_G

43. Comment #201099 by Gynwer on June 29, 2008 at 3:02 am

Now, computer science "evolves" from one buzz to the next, so what the article claims can - in my humble opinion - safely be ignored.

What did surprise me is the absolute animosity I read in the above answers. And not only towards the article, but towards computer science as a whole.

I wonder why that is ...

Other Comments by Gynwer

44. Comment #202103 by latsot on June 30, 2008 at 9:24 pm

This kind of data mining is just another type of observation about the world and as such fits neatly within the scientific method with every other type of observation. It tells us where to look.

A few years back, I ran a project that used clustering techniques to identify targets for drug discovery. It worked pretty well, but all it did was tell us where to look so that we could form hypotheses and test them in an entirely conventional way. I for one would be rather reluctant to trust a drug that has been developed without a model of how it works.

These 'science is dead' buggers really piss me off. I understand that the author has just got caught up in his own enthusiasm and hasn't really thought it through, but it hurts the already skewed public perception of science when self-important idiots come out with highly-publicised crap like this.

And the Editor in Chief of Wired should simply know better.

Other Comments by latsot
Reload Comments | Back to Top

Comment Entry: Please Login

Register a new account

Username:

Password: