Nassim Taleb urges skepticism when assessing the claims of Big Data miners. He warns that big data “may mean more information, but it also means more false information”:
[B]ig data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.