When I was young and learning Anthropology and Psycholinguistics, we learned that computer translation, like speech understanding, was essentially impossible, if only because we hadn’t the foggiest notion of how people do these things—so how could we program a computer to do it?
In my lifetime, this certainty has disappeared, though we still haven’t a clue how people do it. First, speech generation and then recognition became extremely reliable, based on probabilistic computational models that successfully mimic human behavior without mimicking humans. It shouldn’t work, but it does.
By the way, similar advances are happening in the analysis of emotions and other non-verbal behavior. Computers are getting very good at inferring human emotion from sensors, even though we humans have little clue how they do it.
In the past two decades, language translation has become feasible for computers, via various learning processes. Again, these computer translations are very good, but the method has nothing to do with how people understand language (as far as we know). Again, it shouldn’t work, but it does.
This fall researchers at Google have deployed Google’s Neural Machine Translation a system that not only learns to translate between two languages, but learns how to translate between many languages at the same time . A side effect of this process is the ability to, at least sort of, translate between two languages for which there is no sample data (termed “zero shot” translation).
In this sense, the system is learning something about “human language” in general, which linguists and psychologists have been seeking to understand for centuries, without clear success. Wow!
The basic idea is to use large samples of translations (i.e., from humans), to learn enough to translate examples not in the training data. Interestingly, the system works from sentences, i.e., the data is a collection of sentences with corresponding sentences in the second language. Given that a sentence is neither an atomic unit of meaning, nor a complete context for the meaning, it is interesting that the learning works so well from this data. For that matter, there isn’t always a one-to-one translation between sentences in two languages. Theoretically, there isn’t any a priori reason this method should work at all, but it does!
This approach works for one pair of languages, e.g., English to Spanish, but doing it one at a time means that you need N^2 translators for N languages. There are thousands of human languages, and Google currently translates for about 100.
The new system gloms all that into a single model, tagging each example with what the target language is. (This aspect of the method is trivial!) With these tags, the model learns to translate everything into everything. Cool!
Why wasn’t this done before now? Scale. The combined model and dataset is absurdly large, and takes corresponding computing resources to handle it. The training step for the experiment reported in  takes weeks to run on 100 GPUs, which means it would have been impossible even a decade ago.
While the scale is impressive, and the notion of doing many to many learning in a single model is cool, the big headline is that this method seems to (somehow) learn to translate between languages that it has no direct examples of. So, when it learns from a English to Spanish sample, and a Portuguese to English example, the resulting neural model can also do a “transitive” Portuguese to Spanish translation about as well as a model trained for those two languages.
This is cool, and remarkable for “the pleasant fact that zero-shot translation works at all”, it is also “the first demonstration of true multilingual zero-shot translation.” (, p. 8)
This unprecedented result leads to pretty basic questions about just what is going on here. How does this “zero shot” translation work? In particular, we wonder if the model is actually learning some kind of abstract, general meta language, and “interlingua”. And if so, how can we understand this interlingua?
The paper offers only the first look at these questions, with some data that offers “hints at a universal interlingua representation”. My own view is that the data suggests that the answer may be complicated, in that the model is likely learning more than one kind of translation. But there is certainly much to study here!
Considering that this sort of machine translation was generally considered to be flat out impossible a few decades ago, and considering that linguists have been fruitlessly searching for an interlingua for centuries, this work is truly remarkable.
As I commented above, it is yet another case where computational methods have achieved performance roughly equivalent to human cognition, even though it is obviously not a model of how human cognition and language works.
When you think about it, this is one of the most remarkable areas of intellectual advance of the early twenty first century. I suspect that, as dinosaurs like me die off, there will be a remarkable synthesis of the immense, laboriously hand-made, legacy of language theory and neurolinguistics, with these empirically derived computational models. The result will be an elegant meta theory of what “human language” actually is, with an understanding of the “design decisions” are incorporated into human nervous systems (and specific computer models), and a concomitant story about how these evolved and relate to our kindred species on Earth.
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean, Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Google, 2016. https://arxiv.org/abs/1611.04558
- Sam Wong, (2016) Google Translate AI invents its own language to translate with. New Scientist, https://www.newscientist.com/article/2114748-google-translate-ai-invents-its-own-language-to-translate-with/