Computer Science

Teaching machines to learn languages

Sanjeev Khudanpur tells this story: When IBM was making mainframe computers in the 1960s, an idea took hold. The idea was that if computers could be taught to process language— human spoken English, French, German, Japanese, not just lines of computer code—great things would follow. So language processing became a substantial discipline in computer engineering.

What makes the story interesting is what makes so many stories interesting —once people set off down that road, they ran into one complication after another. Fifty years later, researchers are still working to fathom the complexity of language processing, both in synapses and silicon. Khudanpur is one of them. He is an associate professor in the Whiting School's Department of Electrical and Computer Engineering. One bright morning in his Hackerman Hall office, he describes his work in language processing and machine learning, but only after a primer on how scientists and engineers have come at the problem.

First, in the 1960s, came a straightforward approach—frequency spectrum matching. Khudanpur explains the concept: Each sound of spoken language, vowel sounds and consonant sounds and diphthongs and all the rest, can be analyzed as a spectrum of frequencies, just as light can be broken down into a spectrum of colors. Create a reference library of frequency spectra in the computer, then instruct the machine to analyze any sounds that it "hears" and match the spectra to those in its library. Match for a hard C, a short A, and a hard T, and there you go—"cat."

Except no two people pronounce words exactly alike. There was too much variability for the precise matching the computer had been instructed to perform. So researchers hit on a new idea: data-driven pattern recognition. Instead of giving the computer one frequency profile for each sound, give it a thousand different voices all pronouncing "cat" and tell it to create its own method of recognizing the word. That worked better. But researchers soon learned that left to themselves, computers drew too many wrong conclusions to accurately process language. Here Khudanpur uses another visual analogy. Suppose you have instructed the computer to find its own way of analyzing pictures in order to identify cars. Instead of looking for a car-shaped body and four wheels, (what most people would do) the computer notes that the body of every car in the pictures presents a smooth surface against textured background. OK, whatever works. Then one day, says Khudanpur, "you show the computer a plastic teddy bear with a smooth surface lying on textured grass, and the computer says, 'Car!'"

So computers still needed human coaching. This is where Khudanpur enters the story. He studies what colleagues in various neuroscience disciplines are learning about how the human brain processes speech. Then he takes their findings and constructs ever more complex and subtle models that form the basis for code that will guide the computer in better analysis of language input.

For example, a computer needs the ability to add words to its lexicon. To do that, it must first recognize that an unfamiliar sound is indeed a word. How do babies do that? Cognitive scientists have learned that babies first distinguish the stresses and cadences of speech, and begin to recognize those patterns as language. When they hear an unfamiliar sound, they compare it to their developing mental model of language, and if it fits the metrical pattern, they catalog it as a new word. Khudanpur is researching ways to program computers with the same metrical acuity. The code he writes is further informed by statistical analysis of language. For example, English has substantially more nouns than verbs, and when new words enter the English vocabulary they most often are nouns. New verbs are far less common. So Khudanpur builds that information into the computer's model of English. When the machine encounters an unfamiliar word, it now stands a better chance of understanding the meaning of a sentence because it can correctly sort the parts of speech.

"Many scientists believe that understanding language is the true mark of human intelligence because it captures all the complexities of comprehension and decision making," Khudanpur says. "If we can crack this problem open, all sorts of other intelligent machines can be built, machines that think, analyze, and respond to their environment in a way that everyone would agree requires 'intelligence.'

"We've come full circle," he adds. "We started off by having the computers mimic humans. Then someone said, 'Eh, we don't have to do that.'" Let the computers figure out how to process language. Then the limitations of that method became apparent. "Now," he says, "we're back to starting to modify the computer models to look more and more like the way humans process language."