Spurious Correlations


That’s the name of Tyler Vigen’s site. How he describes it:

I created this website as a fun way to look at correlations and to think about data. Empirical research is interesting, and I love to wonder about how variables work together. The charts on this site aren’t meant to imply causation nor are they meant to create a distrust for research or even correlative data. Rather, I hope this projects fosters interest in statistics and numerical research.

Adi Robertson examines the many graphs:

Sift through its data sets, and you’ll find all sorts of statistics that can be mapped onto each other — margarine consumption and the divorce rate, crude oil imports and number of train collision deaths, bee colony growth and the marriage rate. If you ever need to demonstrate that two things can appear connected purely by chance or some entirely separate factor, this is your site.

Nathan Yau highlights his favorites:

Some of the gems include: the divorce rate in Maine versus per capita consumption of margarinemarriage rate in Alabama versus whole milk consumption per capita, and honey produced in bee colonies versus labor political action committees. Many things correlate with cheese consumption.

Dylan Matthews joins the conversation:

Those all have correlation coefficients in excess of 0.99! That is very very high! By comparison, Alan Abramowitz’s extremely accurate “Time for Change” model of presidential elections (it predicted Obama would get 52.2 percent of the two-party vote; he got 51.4) has a correlation coefficient of 0.97, which Abramowitz correctly calls “extraordinary.” The point is that a strong correlation isn’t nearly enough to make strong conclusions about how two phenomena are related to each other. Abramowitz’s model is worth trusting not just because of its high correlation but because it predicts presidential elections based on factors that logically should matter to voters, like the state of the economy and what party currently controls the White House. That gives it theoretical plausibility, which a theory in which, say, US whole milk consumption is driven by the marital status of Mississippians, lacks.

Michael Byrne adds:

Humans love correlation. We love correlation because we love stories, narratives: this happened, leading to this, and next should be this other thing. We look for the forms of stories in the world, and a story is roughly the opposite of coincidence, which is things just happening together because time is just a substance of many layers, a stack of happenings.

Update from a reader:

Notice the icon of the site – it’s a small picture of the number 42. I emailed Tyler Vigen yesterday because my colleague and I had a guess of why ’42’? He confirmed that it’s a reference to The Answer to the Ultimate Question of Life, The Universe, and Everything from The Hitchhiker’s Guide to the Galaxy.

Neat :)