While his tea cools, Josh stares out the window at the imposing mountain not far off, the one described in the novel he's been reading and rereading for years. The cafe's crowded, but the conversations, beautiful and strange, wash over him. Josh is alone, but not lonely. When he next looks up from his book, however, someone's across from him, speaking. At first, the words are unintelligible, and Josh shakes his head back and forth while he slips a tiny cylinder into each ear. Then, the language become familiar, and Josh catches the tail-end of a sentence: "...my favorite book."
"This one?" Josh asks, holding it up.
"Mine too," he says. "It's a translation, of course, but I must've read it a dozen times."
His new friend smiles. "Which translator: Google or Amazon?"
Eventually, technology optimists argue, we will be able to read a machine-translated version of any book no matter how obscure its original language. Translation was one of the earliest non-numeric problems that computer programmers tackled, and in 1954, Georgetown and IBM co-released the first significant machine translator, capable of translating from Russian into English such sexy sentences as, “Magnitude of angle is determined by the relation of length of arc to radius.” In total, the program knew 60 sentences, adhered to six grammar rules, and stored 250 vocabulary words. Responding to a dazzled public, the authors predicted that the problems of machine translation would be solved in three to five years. Ten years later, a report from the National Academy of Sciences condemned the project as colossally disappointing, and its funding was funding.
Years later, in 2001, Google entered the fray and quickly outpaced its competitors. Starting with six languages (English, Portuguese, German, Italian, Spanish, and French), Google Translate quickly grew its repertoire, quality, and speed. By 2005, the company’s program, which then supported eight languages, won a machine translation contest by using 1,000 computers to tackle 1,000 sentences in 40 hours. Today, in 2016, entire websites in 103 languages are instantly translatable in under a second. Every month, the service boasts over 500 million users, 92 percent of whom come from outside the United States. And each day, Google generates over a billion translations, more than the contents of a million books and more than all professional translators accomplish in a year. Translation of text, though, is but a warm-up for what programmers hope to accomplish – or what they claim they already have.
Last week, New York City-based Waverly Labs announced its recent invention, Pilot, a set of two ear buds that costs $299. Scheduled to be released by spring of 2017, the device purports to offer near-simultaneous translation for four languages. Inspired "when he met a French girl," Andrew Ochoa, the company's founder, says that Pilot promises "a life untethered, free of language barriers." After the announcement, Forbes questioned Waverly Labs' credibility but ignored the larger assumption at the core of Waverly Labs' project: Issues of funding aside, if you wanted to fall in love using machine translation, would it work?
Despite how quickly machine translation has progressed in the last few decades, language is a data set that's far more complex than it seems, so no matter how quickly translation technology evolves, the stochastic messiness of our speech will always outpace it. However, as Josh's encounter in the cafe will show, what may be considered machine translation’s failure is ultimately a human triumph.
* * *
Josh wasn't planning on meeting anyone but the coincidence is too attractive to pass up. Just as he's about to explain that his mom gave him this book after he broke his leg in a car accident, that he's had the same paperback copy for over a decade, the waitress is refilling his mug with hot water. When she speaks, he notices something a little off about her cadence, how the accents fall a little less rhythmically than those around her. Now that his translators are in, Josh figures he may as well ask where she's from—the first true test of the device in his ear.
Computers, like the IBM-Georgetown machine, used to learn languages the same way that humans do: by internalizing the messy spattering of rules, exceptions, and exceptions to exceptions found in all languages. Because grammar is so complex, the programs used to have to master millions of commands, and beyond basic phrases, the resulting translations often sounded clunky. In 1949, the scientist Warren Weaver proposed an alternative to rule-based translation called statistical machine translation (SMT). Instead of attacking language one minutia at a time, Weaver suggested a two-pronged approach: First, the computer would mine millions of documents looking for statistically significant linguistic patterns, thereby discovering the grammar, syntax, and morphology rules for itself. At the same time, the program would create a model to predict how certain phrases are translated and where in the sentence they should appear. For example, after billions of iterations the computer would realize that, in German, the verb typically comes at the end of the sentence.
Waverly Labs hasn't yet released the details of its software, but it likely works in the same fundamental way as Google Translate, which uses these rules and the predictive model to give the most statistically likely translation, the one that best mirrors the patterns it already found. But because we use language in multifaceted ways, the translation software also has to identify a linguistic context, called a domain, so when it sees, say, customer reviews of a guitar, it knows how to translate the word "neck." However, in order to be statistically significant, a domain has to be large — a minimum of 2 million words. Therefore, most training material comes from organizations like the United Nations, which have large caches of documents that have already been translated by humans and contain exactly the same content. But even then, algorithms have their limitations.
While you probably couldn’t learn Romanian by reading 50,000 European Parliament reports translated from English, you could (fairly easily) decipher the linguistic junkyard that is YouTube comments and Facebook posts — the grammatical atrocities, orthographic abuses, and carte blanche punctuation styles. But a computer cannot process even basic exceptions from its programming rules.
As a result, SMT has greatly improved translation fluency, but it struggles with slang and variations in dialect, like that of the waitress. This problem, however, is for the most part technical; in theory, larger data sets, faster computer processing, and more advanced algorithms will eventually solve it (think about the evolution of autocorrect for texts). The real issue with machine translation is not with its technology, but rather with the nature of language itself.
* * *
Josh tells the waitress that her accent is charming and orders another cookie. His new friend doesn't seem bothered by the interruption, but Josh's hands are a little sweaty. He wipes them on a napkin. Not knowing how to segue back to the book discussion, he stutters for a second and then asks the first question that comes to mind: "How are you?"
Pleasantries are so mundane that their complexity is often ignored. Linguistically, they fall into a category called phatic language, meaning expressions that accomplish a social task rather than convey information. For example, when asked “what’s up?” most people wouldn’t tilt their heads upward, in order to describe what they see. With a large enough corpus, machine translators could recognize the what’s up? pattern and give the appropriate correlate, but phatic expressions, like all frequently-used language, are especially volatile. Were Josh in Chaucer’s days, he might have said“ey, maister, welcome be ye”; in Shakespeare’s England, “God ye good den” (Hello was an exclamation of surprise).
With the internet acting as a linguistic hall of mirrors, these transmutations are occurring increasingly quickly, and not just with phatic language. In 1986, the Birmingham Corpus, then the largest of its kind for American and British English, contained 20 million words. Today, the corpus for Oxford Dictionaries contains almost 2.5 billions words and also draws from Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa. Moreover, lexicographers now have access to blogs, emails, social media, television scripts, and message boards — and those are just the official gatekeepers. For user-run sites like Urban Dictionary, readers can create and edit their own entries, legitimizing through public exposure such ephemera as Netflix and Chill.
Linguistic innovation occurs quickly, spontaneously. The meaning of Netflix and Chill, if the term survives past the next season of Fuller House, might shift as wildly as the viewership's mood. In contrast, SMT is inherently glacial, conservative, reactionary. Outdated expressions cling to the corpus, preventing the algorithm from recognizing the directions in which language marches. Even if these programs could adapt to linguistic changes in real time, not only would they need access to all of our written and spoken communication to be accurate, but they would also be incapable of recognizing spontaneous mutations.
* * *
Before long, it’s just the two of them in the cafe. It’s amazing, really, all the similarities. They both have an older brother, wanted to be artists when they were younger, and try to go running every morning (and have been sleeping through the alarm!). The conversation doesn’t feel forced, no awkward pauses or self-conscious laughs. Josh hasn't even had a chance to eat the cookie the waitress brought him. After eyeing the baked good for several minutes, though, his friend asks him to share.
English offers an impressive array of affirmative words that have strong connotational differences. For example, Josh could reply with a decisive “yes,” a response that even the Georgetown-IBM computer could translate. But, for the sake of argument, let’s pretend he's unusually hungry, that this cookie is particularly satisfying, and the kitchen is now closed. So instead of saying “yes,” he says “fine.” While both replies literally grant permission, the latter connotes reluctance, discomfort. “Fine” is how a teenager obeys his parent’s request to take out the trash while also communicating his indignation. Like phatic language, the true power of “fine” is not in its literal meaning, but rather, its suggestion.
Fifteen minutes later, Josh has transcended the pleasantries stage of the conversation. He wants more but holds back because of the most archetypal of first date feelings: uncertainty. Yes, his jokes are killing, but maybe it’s polite laughter? Josh wants a signal, goddammit, but if he's waiting for his machine translator to give the green light, he shouldn't hold his breath.
Should Josh's date speak Japanese, he'll have to tread carefully. In general, Japanese speakers consider silent listening rude and pepper the conversation with “yes,” “indeed,” and “really?” As encouraging as these particles seem, especially translated into their English correlates, they don’t necessarily signal interest. Even worse for Josh, Japanese speakers rarely say “no” directly. For example, if Josh suggested that they Netflix and chill after dinner, he may receive a seemingly promising response: “I would love to say yes, but. . .” Any Japanese speaker would instantly recognize the dismissal inherent in that ellipse, but Josh, fatally optimistic, would not.
However, Japanese is a great language for communicating intimacy because it has four ways to modify its verbs and nouns based on the conversants’ relative social position. If Josh's date threw all propriety to the wind and switched to the most casual declination, that would be the glaring declaration of interest that Josh has been looking for.
If Josh were in Bogotá, however, his date would probably address him using usted, the second-person pronoun more formal than its sibling, tú. Switching to tú would certainly be encouraging, but how could Josh's machine translator communicate that shift using only English's you? Granted, this is a problem that even human translators grapple with. But more damning for Josh's computer, how would it know when to change his translated usteds to tús? Because computers rely on statistical models, and not an actual understanding of the words or of human behavior, they can’t judge intimacy. No statistical model exists — could exist — to determine how many coy smiles, jokes laughed at, and chairs pulled closer it takes to transition between usted and tú. Good luck making the first move, Josh.
* * *
Whack! Josh knocks his mug off the table, and it breaks into three large clay chunks. By now, though, he's infatuated, and a clumsy mistake or overly blunt response or silly second-person pronoun isn’t going to derail him. As the two stand to leave, dating history comes up. Without thinking, Josh says that he had been living with his partner of five years, but they recently broke up. Unfortunately, the nuance of the word partner will be lost in translation.
Gender, like politeness, is a capricious linguistic feature. Arabic, for example, genders it nouns, verbs, and adjectives, and its thirteen personal pronouns. In English, gender features less prominently, so it’s possible to hide the gender of a person entirely, as I’ve done throughout this essay for Josh's new friend. Instead of denoting a man or a woman, partner communicates a different kind of information.
Much of what we learn about people comes not from what they say, but from how they say it, and what (we presume) they really mean. Consciously or not, though, we drop linguistic clues that even the densest listener will pick up on. Even an accent conveys information, if only to trigger a stereotype, as any Deep Southerner or Cockney Londoner knows all too well. Think of Josh trying to describe to his new friend, let alone to his machine translator, the differences between house and estate, ya’ll verses you guys, yes or yaaaaas: like the prescription for his glasses, slight changes may seem insignificant, but they greatly affect how Josh sees his new (maybe) boyfriend.
* * *
Erudite linguistic theory be damned. Josh has conquered the phatics, maneuvered the honorifics, parsed the etymologies. Back in the hotel room, Josh's friend hunches his shoulder, looks around nervously, and wipes his hands on an imaginary napkin. "Uh," he pauses for a moment, "how are you?" The two laugh at the imitation and collapse onto the bed. Drawing a deep, luxurious breath, Josh says, "Let’s never leave this nosey little cook." He tries to correct his silly mistake and say “cozy little nook,” but snorts laughing and falls onto the ground. When he finally recovers, his lover’s staring, blank-faced.
Unbeknownst to Josh, he just quoted a Spoonerism, a phrase whose consonants and vowels are switched. While his slip of the tongue was unintentional, people deliberately use wordplay, which highlights the fundamental difference between humans and machines: computers can process unfathomable amounts of data, but they cannot leave the nosey little cook of their programming. Again, the problem is not in translating individual words, even words that purport to be “untranslatable,” like Portuguese's saudades. True, there may not be a perfect, one-word equivalent in English, but I can explain that it describes an intense nostalgia — a melancholic longing for someone or something that’s irretrievable — and you'll instantly connect with saudades. No, the real problem is that meaning itself is precarious, shifting, ambiguous, deceptive.
Regardless of their limitations, in certain contexts, Google Translate and its peers far surpasses the abilities and economic constraints of human translators. As a software engineer at Google Translate pointed out during a conference, machine translation is perfect “when you need to get the gist of things. When you’re looking at reviews, like a hotel review, you don't need to worry about whether the grammar was perfect.” From a commercial standpoint, Google’s service also allows translations in situations where the sheer volume of data precludes human intervention: when, say, the American Red Cross needs the combined inventories of all the hospitals in Mexico City.
Even if it lags and stutters, Waverly Labs' Pilot is a remarkable invention that could change what it means to be a student, tourist, immigrant, and refugee. It could allow for more substantive engagement with the world. It could even lead to love. But will it deliver on its promise of a "life untethered, free of language barriers?" I would love to say yes, but…
When Josh gets to the airport, he sobs, reassured only by the conviction that he'll return home, stuff his objects into storage, and fly back. He keeps talking to his lover: Skyping every day, but then the schedule for his yoga class changes, and with the crazy time difference, it moves to just the weekends. And then to a phone call as he buys his groceries for the week. And then to an email hastily written before bed on Sunday. Six months later, when he thinks about his tryst, it’s only periodically, more with saudades for the idea of it than for the person.
While my scenario is absurd for many reasons, the real quixotism is that language is nothing more than statistically significant patterns. How we speak is so complex that it should be above the abilities of even the cleverest algorithms. The alternative — where our nervous flirtations are inputted, prodded for patterns, compared to parallel corpora, and regurgitated in a string of consonants and vowels — is far bleaker than the death of a sci-fi fantasy. So the next time your mother tells you to watch your tone, or a friend describes something as Emily-ish, or you ask your frat-boy cousin if he’s going to spring break forever, congratulate yourself. Your linguistic powers are infinitely more creative than a probabilistic model or a corpus composed of grammar, vocabulary, and syntax. Language is not a tool or a formula, but rather a rabbit hole adorned with a cautionary sign: Curiouser & Curiouser.
A few days later, I talked about the article on NPR. Listen to the recording here.