Google is on it:
Jim Edwards calls the Google Translate app “the most astonishing piece of mobile software I’ve seen in months”:
Google Translate lets you read anything in a foreign language; translate any text, even handwriting; and carry on a live conversation with another person as the app translates what you’re saying. The software translates instantly, whether via text, photo, or voice.
Or as Lance Ulanoff explains, “once you select the two speaking languages, Google Translate can auto detect them as they are spoken.” Below is a demo that’s almost as remarkable as the GIF seen above:
Ulanoff tried it with a Turkish-speaking intern:
At first, we stumbled because the intern, who is also fluent in English, kept responding to what I was saying and not waiting for Google Translates’ interpretation. Eventually, she got it right and started waiting for my translated words to be spoken by the app.
Initially, she told me, the translation was perfect, but when I started to speak in longer sentences, it basically fell apart and got a lot of it wrong. As I tested with others who spoke in Greek, German and French, we noticed the same thing. We could never completely rely on Google translate to get the words right.
He found a lot of similarities between Google’s app and Skype Translator. John Pavlus tested out the latter:
The limitations of Skype’s translation software are … revealing, since they show how difficult it is for even the smartest machine to mimic the subtleties of effective human conversation. Determining which meaning of a word is appropriate in different contexts can be vexing. “If software is translating between American and British English, and it recognizes the word ‘football,’ it also needs to know when to change it to ‘soccer’ and when to keep it as ‘football’ or ‘gridiron,’” says Christopher Manning, a professor of linguistics and computer science in Stanford University’s Natural Language Processing Group.
Matthew Braga explains why realtime translation is so difficult:
“The reason that real-time [translation] is difficult for most of us is that it’s really a matter of probabilities,” said Gerald Penn, associate chair of the University of Toronto’s department of computer science, and a specialist in natural language processing. In a modern speech recognition system, a computer is typically trained on a language model—essentially, a database of what people are most likely to say, and in what order. Using this model, a computer gathers speech data from a microphone, and makes some educated guesses about what was actually said.
“The modern approach is not to make the guess right away,” Penn explained, “but to collect the evidence, and then rank it, score it, and augment it.” The challenge is performing this process fast and accurately enough that you can create the illusion of a conversation, where the translation appears to happen in real-time.