SOMINI SENGUPTA June 24, 2012
Luis von Ahn, the founder of Duolingo, an online foreign language translation service. Photo: JUSTIN MERRIMAN
Language does not come naturally to machines. Unlike humans, computers cannot easily distinguish between, say, a river bank and a savings bank. Satire and jokes? Algorithms have great trouble with that. Irony? Wordplay? Cultural context? Forget it.
That human edge in decoding what things mean is what a computer scientist turned entrepreneur, Luis von Ahn, is betting on. His startup, Duolingo, which has just opened to the public, proposes to put armies of language learners to work translating text on the web.
For the learners, Duolingo offers basic lessons, followed by sentences to translate, one at a time, from simple to more difficult. For online content providers wanting translations, Duolingo offers, for now at least, free labour. Because it is still in its early days, there are no independent assessments available of how accurate or efficient it can be.
The site has been available by invitation only for the last five months and is now limited to English, Spanish, French and German. People and companies can submit their content to Duolingo for translation, a service the company may begin to charge for. To provide content for its lessons, Duolingo can also harness whatever text is not under copyright or is released under a liberal Creative Commons licence. Users vote for the best translations, providing some measure of quality control.
"You're learning a language and at the same time, helping to translate the web," von Ahn said. "You're learning by doing."
Google Translate, by contrast, relies entirely on machines to do the work — and while it usually captures the essence of a piece of text, it can sometimes produce bewildering passages. Google leverages vast amounts of data to produce its output, feeding its translation engine with texts that have been translated into multiple languages, including United Nations proceedings, which are then used to train its machines.
Von Ahn, by contrast, is leveraging what he hopes will be crowds flocking to Duolingo for free language lessons.
Crowdsourcing is at the heart of von Ahn's ambitions. His last enterprise, ReCaptcha, makes use of those wavy letters and numbers that Web users transcribe every day on sites to ensure that they are not robots trying to break in. Von Ahn gathered those squiggles from digitized images of old manuscripts, books and newspapers. Every time they transcribe the wavy words, web users provide free help in transcribing fading texts that are hard for a machine to read. Google bought his startup in 2009.
Von Ahn, an associate professor at Carnegie Mellon University in Pittsburgh, where Duolingo is based, came up with the translation idea when he noticed that friends and relatives in his native Guatemala had far less content available to them online if they did not know English. The web, von Ahn argued, is inferior in Spanish.
"It's got much less information. I see people struggling with that a lot," he said. "They don't get the information we take for granted."
Human and machine translation can work in different scenarios, said Alon Lavie, another Carnegie Mellon professor who has a machine translation company called Safaba, aimed at corporate clients. When businesses need to translate large amounts of text into multiple languages, machine translation can be more useful, said Lavie, particularly if business confidentiality is at stake.
"Where I think Duolingo's crowdsourcing makes a lot of sense is in scenarios where a consumer or enterprise has a small translation job that needs to be done quickly and cheaply, and the translation needs to come out at 'human' quality — similar to what a human translator or bilingual speaker would generate," Lavie said.
The New Yok Times has been experimenting with Duolingo as a potential means to translate its digital content to other languages, said Marc Frons, the company's chief information officer, but has made no commitments to using the service.
Von Ahn is thinking of taking on Wikipedia as his first translation project.
Wikipedia has more content available in English — nearly 4 million articles — than in any other language. German, French and Dutch follow, with 1.4 million, 1.3 million and 1 million articles. In other popular languages, Wikipedia content is sparse: In Spanish, there are only 900,000 articles, and in Swahili, spoken across East Africa, fewer than 24,000.
A spokesman for the Wikimedia Foundation, Jay Walsh, said that anyone who wanted to use Wikipedia material for translation was welcome to do so (it is published under a Creative Commons licence), but that feeding it back into the Wikipedia sites would require "a conversation" to make sure translations were accurate.
"The community that makes up Wikipedia — they are confronted with the simultaneous challenge of growth and also quality, making it excellent," Walsh said.
For Duolingo to work well, it needs a huge crowd of learners. The more proficient they become, the greater the chances of accurate translations. In Duolingo, a large piece of text is broken into easy and difficult pieces — by a computer, of course — then parceled out to students at varying levels and put back together, again by a machine. Von Ahn said that "eventually we intend to charge content providers either for faster or more accurate translations."
Duolingo has raised $3.3 million in venture capital. The actor Ashton Kutcher is among the backers, along with Union Square Ventures and the business advice author Tim Ferris.
The New York Times