A bad chatbot might luck its way to victory if the judges aren’t familiar with tell-tale signs of chatbot-ness. That’s usually of less importance when your panel includes experts in the field of computer science. In this case, it included an actor from Red Dwarf and a member of the House of Lords, both of whom are incredibly accomplished and by all indications brilliant minds, but not specifically trained in this field.
David Auerbach argues that “Eugene Goostman” did in fact pass the Turing test – but that the test itself has a fatal flaw:
Trashing the Reading results, Hunch CEO Chris Dixon tweeted, “The point of the Turing Test is that you pass it when you’ve built machines that can fully simulate human thinking.” No, that is precisely not how you pass the Turing test. You pass the Turing test by convincing judges that a computer program is human. That’s it. Turing was interested in one black-box metric for how we might gauge “human intelligence,” precisely because it has been so difficult to establish what it is to “simulate human thinking.” Turing’s test is only one measure.
So the Reading contest was not the travesty of the Turing test that Dixon claims. Dixon’s problem isn’t with the Reading contest – it’s with the Turing test itself. People are arguing over whether the test was conducted fairly and whether the metrics were right, but the problem is more fundamental than that.”Intelligence” is a notoriously difficult concept to pin down. Statistician Cosma Shalizi has debunked the idea of any measurable general factor of intelligence like IQ. Nonetheless, the word exists, and so we search for some way to measure it. … The Turing test, famous as it is, is only one possible concrete measure of human intelligence, and by no means the best one.
Elizabeth Lopatto offers some background about how Turing turned imitating a conversation into a proxy for intelligence:
The strength of the test is obvious: “intelligence” and “thinking” are fuzzy words, and no definition from psychology or neuroscience has been sufficiently general and precise to apply to machines. The Turing test side steps the messy bits to provide a pragmatic framework for testing.
But this strength is also the test’s weakness. Turing at no point explicitly says that his test is meant to provide a measure of intelligence. For instance: human behavior isn’t necessarily intelligent behavior—take responding to an insult with anger. Or typos: normal and human, but intelligent?
Joseph Stromberg still believes the episode was noteworthy:
This announcement certainly doesn’t mean that self-aware robots are about to take over the world – and it doesn’t even mean that there’s one out there capable of consistently fooling people into thinking its a human. It does, however, mean that one has crossed the threshold Turing predicted would be passed by 2000, a meaningful milestone on the way to artificial intelligence.
That said, there are plenty more milestones that still need to be passed — even in terms of the Turing test. The Loebner prize, for instance, will award a silver medal for the first program to pass a text-only test, but a gold medal for one that passes an audio test — something that’s probably still a long way off.
But a less-charitable George Dvorsky makes the case that it’s time to abandon the “bullshit” Turing test:
Turing had no way of knowing that human conversation – or the appearance of it – could be simulated by natural language processing (NLP) software and the rise of chatterbots. Yes, these programs exhibit intelligence — but they’re intelligent in the same way that calculators are intelligent. Which isn’t really very intelligent at all. More crucially, the introduction of these programs to Turing Test competitions fail to answer the ultimate question posed by the test: Can machines think?
Though impressive, and despite their apparent ability to fool human judges, these machines – or more accurately, software programs – do not think in the same way humans do. … It’s all smoke and mirrors, folks. There’s no thinking going on here – just quasi pre-programmed responses spouted out by sophisticated algorithms. But because Turing’s conjecture was directed at assessing the presence of human-like cognition in a machine, his test falls flat.