Get in touch with technology with tech Stuff from how stuff works dot com. Hey there, and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer at how stuff Works and I love all things tech. And in the last episode, I covered the history and technology behind speech recognition. So today we're going to look at a related concept called natural language processing or natural language understanding.
The two are are related. This technology and speech recognition are both part of what make voice assistants like Sirie, Alexa and Google Assistant work, though there are other technologies that also go into that. Now, this is a huge topic and as a long and fascinating history, so this episode is just going to be the start of it. In the next episode, I will conclude a discussion on natural language processing and go into the history of these actual voice assistants. So, on a high level, what is
natural language processing? Well, simply put, it's programming a machine to interpret language the way we use it we human beings. So in an ideal implementation, which would also require advanced artificial intelligence, you could speak to a machine or type whatever you like into a terminal and it would be able to understand what you meant. What your commands were,
no matter how you worded the phrase. In turn, the machine would be able to generate responses that made linguistic sense to us, and we could in effect hold entire conversations with those machines. This, as it turns out, is a very difficult challenge. Even creating a machine that can respond to basic commands delivered in a natural language is really really hard to do, and we haven't yet cracked the nut on making a machine that can actually hold
a real conversation with us. Yet we can sometimes forget that machines do not natively understand human language. Machines process information in machine code, which is difficult for humans to understand. I almost said impossible for humans to understand, but really it's just impractical. It's incredibly difficult. So, for example, computers that run on binary systems process all information in zeros
and ones. Ultimately, when you get down to it, so if you were to look at a sheet of zeros and ones, it would probably seem completely incomprehensible to you, although to a computer it could seem perfectly logical. Our language is equally incomprehensible to machines. Programming languages make it easier for humans to make machines do what we want them to do. Programming languages create a level of abstraction between human language and machine language. It's kind of a
meeting ground in the middle. Programming languages tend to be highly structured with specific strict sets of rules. Programming within those rules will get you the results you want, assuming your code is good, but if you stray outside those rules, you start to get errors. Human language is much more variable and complicated and ambiguous, and that's something that machines
are not very good at handling. Now, if you've ever played a text based adventure from way back in the day, like Zork, you know that those adventure games have a very limited vocabulary. The game can accept certain commands, but only because the programmer built in the option in the game.
They incorporated that in the game's design. So you might be able to type something like go north or just north, and the game understands you want your character to move to a new location that's to the north of your current location. But maybe you type something else, maybe you type jog north or saunter north, and the programmer didn't
think of that. They didn't come up with all the different ways you have describe the way you want to move north, so you might get a result that says something like I didn't understand that, or you can't do that here. Computers only have the illusion of understanding us. They don't actually know what we mean when we say something, at least not natively. Now, that meant that for most of our history with computers, humans have had to learn how to work with machines, not the other way around.
We have had to learn commands and syntax that machines accept, and if we try to word those commands in a different way, we tend to get an error. Natural language processing attempts to flip the tables on this relationship and teach machines how to work with humans so that we don't have to go through any sort of learning curve. We don't need to formulate our our commands in a specific way to be understood. The technology works on our terms,
or as close to those as we can manage. That means that programmers have to build systems that can parse language for meaning, and it also means having to build tools and machines that can handle stuff that you typically encounter in higher level language courses. So here's a quick rundown on some of the stuff a natural language processing
approach has to take into account. First, you have grammar. Now, grammar can refer to the study of language, but generally speaking, when we say grammar, or at least when I'm using the term in the context of natural language processing, I mean a set of rules for the organization of components of a language into meaningful statements or sentences. This is a broad concept. It is a big, big idea. It actually encompasses a couple of other also big ideas that
are important in natural language processing. One of those is the concept of morphology. Morphology has to do with word forms. Words consist of more themes, and a word can actually have multiple moreph themes. So, for example, let's take a word like sky divers. Sky divers technically has four more themes, and they are sky dive er and s sky divers. The more themes only make sense if we put them in that particular order. For the word skydivers, dive skiers
does not mean the same thing. Actually, it doesn't mean anything at all. So a good system will have to understand morphology and know how words can and cannot be formed. So again, with skydivers and knows all right, well, I know the word sky, I know what that means. I know what the word dive means. Er means that this is not an action. This is actually an entity that engages in that action. Right. A sky diver is someone who's skydives, and the s SO says it's plural, so
that there's more than one skydiver. That's what morphology is all about. This is this sort of internal logic of word formation. Syntax is another big concept within grammar. Syntax, however, does not refer to word formation. It refers to sentence structure. How do we arrange words to make meaningful sentences. For example, the sentence you must have patience, my young Padawan. That follows good syntax, but patients you must have my young Padawan is a bit hanky because Yoda is all over
the place with his syntax. In addition to grammar, you also have to take into account semantics. Now, that is the study of the meaning within language. This is a tricky one because there's a lot to unwrap here. For example, words and phrases can actually stand for different meanings. They can denote different ideas. We might use many different phrases
or words to describe the same concept. Right, So we might use a usen or more different ways to say the same thing, or we might use two similar words or phrases to describe very different concepts. We might even use the same phrase to describe wildly different things or with very different meanings. Semantics gets down to what we
actually mean when we say something. If you've ever had a discussion with someone and that person says, you know what I meant, that's essentially a statement that indicates semantically the meaning was clear, even if the phrasing did not indicate it on the face of things. Then there is pragmatics that's all about context. Contextual information is incredibly important in communication, and it relates a little bit to semantics.
Semantics is about structure, and pragmatics is about context. So if I say the weather sure is nice today, on the face of it, that sounds like I'm in favor of the way the weather is. Right, it sounds like, oh, I like how the weather is. But if I say that same phrase while I'm standing in a downpour and I'm clearly not happy, I'm obviously being sarcastic. I mean
the opposite of what I actually said. The context of the situation changes the meaning of what I am saying, even though the actual phrasing would seem to indicate the opposite of what my meaning was. As we develop more technology that can communicate with us, we have to take pragmatics into consideration, or else machines are going to be misinterpreting what we actually mean when we say stuff. So machines are going to have to learn how to deal
with stuff like sarcasm. Yeah. Right. Then we have phonology, that is the sound of a language. I talked a little bit about this in the Speech Recognition podcast about how different languages have different phonemes. So I'm not going to dwell on that again. You can listen to the Speech Recognition podcast to learn more about it. But it is an important element in languages, especially when you get into uh natural language processing that is taking verbal input
and not just textual input. Then you have lexicons that's the total vocabulary for a system. Ideally, alexicon has not just the words, but some sort of metadata attached that indicate the meaning of words or the relationship of words with one another. Though you can fudge this a little bit depending upon the implementation of the system. I'll talk a lot more about that throughout these podcasts. Now, these can be tricky concepts for human beings, let alone for machines.
Machines are very good at following strict sets of instructions, but language can sometimes defy logic. Think of rules that apply to your native language, then just think of the exceptions that exist to those rules. Every language has exceptions for rules that are established, and depending upon the rule and the exception, there may seem to be no rhyme
or reason for the deviation and from the rule. Moreover, if we want machines that are capable of understanding us and responding to our language in a meaningful way, those machines need to be able to handle the idiosyncrasies of individual speakers. To some extent. There may be regional turns of phrase or vocabulary that don't extend to the general
population of speakers of the respected language. So you might encounter a person who speaks in local idioms quite a bit, and if those are not frequently used in the broader general population of that language, then you're gonna have a lot of communication errors between that person and a machine
that is trying to process that language. Ideally, machines would be able to understand whatever we say and interpret the meaning correctly, although we haven't even gotten to a world where human beings can do that reliably, So I don't know why I'm holding machines up to such a high standard. We definitely would want them to reach a certain love of confidence and and capability, however that machines just are
not quite there yet. I'm going to talk a lot more about the history of natural language processing in just a moment, but first let's take a quick break to thank our sponsor. The history of natural language processing is pretty darn complicated because it involves multiple lines of research
and lots of different disciplines. So we have all sorts of things that play into this, like hidden Markov models I talked about those in the Speech Recognition podcast, neural networks, referencing language using mathematical vectors, and a lot more contributing to the evolution of natural language processing, and a lot of disciplines like not just computer science, but linguistics and psychology. So there's not like a single line I can follow where it's a lad to be led to see. So
we're gonna be jumping around a little bit. However, one of the sources I want to call out that I used while I was researching this episode was a paper written by Karen Spark Jones called Natural Language Processing a Historical Review. It's pretty dense, it's pretty technical, but it's also available to read online if you want a more thorough treatment of the history of the technology up to
two thousand. I'm gonna be skimming over quite a bit of it because, as I say, it gets really deep and really technical, and it uses a lot of shorthand to reference things, which meant that I had to do a lot of jumping down research rabbit holes to learn more. But it was a very useful starting point for this research. And also it was published in two thousand one. Obviously a lot has happened since then. We're almost two decades
out from that. But I'm gonna start at the beginning and then work my way up to what's going on today. So early work in natural language processing it actually surprised me. I was surprised at how old it was. It actually dates all the way back to the nineteen forties. Physicist and computer scientist Andrew Donald Booth proposed using computers to translate passages from one language into another, which is the
type of natural language processing. You have to be able to recognize the words of one language and then map them to a similar meaning in a different language. Now, Booth's approach involved creating a word for word model. If the model couldn't find a match between two words, it would automatically discard the last letter on the input word and try again. It would do this until it found a match, or if it didn't find a match, you've
got an error. But it would find a match, it would search its memory to see if the ending of the input word could give information about what the ending
does to the meaning of the word. So, for example, if you were using this to tr inslate from English into Russian and you use the word writer, maybe writer does not show up in the Russian lexicon, but right does w R I T E. So the translating program tries to translate writer from English into Russian, cannot find a Russian equivalent to writer, drops the r looks for the Russian word for right, and it finds it. Then says,
all right, well, in English, what does writer remain? What does that are due to the word right and it looks at its memory and finds out that the letter R makes a a noun out of the verb, but it creates an entity that does the action, which is to right. Then it looks in the Russian lexicon and says, all right, well, is there a word in that lexicon that matches this meaning. It's kind of a slow, laborious way of doing things, but was also very very early.
I mean it was the following year, in nine, Warren Weaver produced a memorandum about machine translation, and Weaver admitted in the memorandum that such an application would likely be much more challenging than what he understood it to be, but that he was quote willing to expose my ignorance, hoping that will be slightly shielded by my intentions in
the quote. And I think that's rather charming. In that memo, Weaver cites a letter he wrote to Professor Norbert Wiener of M I T. And that included the following paragraph. So here's a full paragraph. Actually it's two paragraphs from the memorandum recognizing fully, even though necessarily vaguely, the semantic
difficulties because of multiple meanings, etcetera. I have wondered if it were unthinkable to design a computer which would translate, even if it would only translate only scientific material, where the semantic difficulties are very notably less, and even if it did produce an inelegant but intelligible result, it would seem to me worthwhile also knowing nothing official about, but having guests and inferred considerable about powerful new mechanized methods
and cryptography methods which I believe succeed even when one does not know what language has been coded. One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say, this is really written in English, but it has been coded in some strange symbols I will now proceed to decode. So he got this idea because of activities that were going on in World War Two,
where teams were trying to decode messages. And they might decode the message, they might figure out what letters correspond to the code, but it may even be in a totally different language than when they speak. So while they are able to decode the message into a native language, they are not able to speak that language. He says, well, what if we just take that same step, and now we treat the other language as a code in of itself and try to translate that into English or or
decrypt it into English. Weaver are acknowledged that the word into word approach that Booth and his contemporaries were relying upon had limited utility. He wrote, quote, it is in fact amply clear that a translation procedure that does little more than handle a one to one correspondence of words cannot hope to be useful for problems of literary translation in which style is important, and in which the problems
of idiom, multiple meanings, etcetera. Are frequent. End quote. So there he's saying, you can't just take a foreign word, translate it into whatever the closest equivalent in English is, and hope to get the same meaning, especially in literary works, because they are all these different turns of phrase and cultural meanings that will get lost. In that translation. You would have something that might technically be considered more or
less correct, but would not be actually correct. You wouldn't be getting across the meaning of the author in that translation. You would just have words in a syntactical order that would make sense from a syntax perspective. In other words, you would have sentences that held up grammatically, but they
wouldn't necessarily have the meaning of the original writing. Weaver's proposal was to perhaps expand the word into word model and create a system that would analyze not just the target word, but the words adjacent to the target in order to determine the context of the word the meaning of the word. As we'll see when we get a little bit further down in the timeline, this is one of the methods that folks working in in natural language
processing incorporated into their approach. So this was incredibly forward thinking of Weaver. On January seven, ninety four, researchers from IBM and Georgetown University demonstrated a system that was able to translate around sixty sentences from Russian into English automatically. Now, the process wasn't exactly painless. It required an operator to take a sentence written in Russian but transcribed for the English alphabet. It wasn't in the cyrillic alphabet. The person
would then encode that sentence on punch cards. They would feed the punch cards into a seven oh one computer. I mentioned the seven oh one that was an IBM system, but I mentioned that in the previous episode and speech recognition. Then they would wait for the translation program's response, which would take a few seconds. The program would attempt to translate the words from Russian to English. The demonstration was impressive,
but it was limited in scope. The program had alexicon of only two fifty words or so, and it required extensive programming to cope with syntax because word order in Russian is different then word order in English, and you can think of the programming as including metadata. The researchers would tag Russian words with little signs that related to specific rules. So, for example, one of the terms the
system could translate was a Russian two word phrase. It was g dial major, which is I'm butchering the Russian pronunciation, but in English it means major general. But the word order is reversed in Russian. If you did a strict word to word translation, you would get general major with the translation, because that's the order that the Russian phrase
would put it in. So the programmers would tag each word with a rule to kind of give the idea of of what what you would what you should follow when you're making these translations, and by you I mean the computer system. So the word for general got the assignment of rule twenty one and the rule for major got the sign on. So when the system encountered a word, it would look up any related rules to that word.
So if it comes across a word that has the associated rule one, it would say, all right, this rule tells me I have to go back over the message and look to see if there was a rule twenty one word in that same phrase, And if it finds a rule twenty one word, it would then know I need to reverse the order of these two words. This this uh word order that appears in Russian needs to be flipped for English. Now that's a pretty laborious process
and it doesn't work great for larger lexicons. The larger the vocabulary, the more complex the sentences can become, the more exceptions and rules you're going to encounter. It would be really hard to implement this on a big scale, but it was an impressive display of machine translation. The system was essentially a vocabulary list and a long series of if then rules. If the word is this, then look for this. If that is there, then switch the
word order. Essentially according to articles, it could translate sentences designed for the system in about six seconds. But again it was designed for the system, very limited vocabulary, so limited implementation there. And it's good to point out that a lot of work and machine translation around this time focused on English and Russian, which is no big surprise. Keep in mind the time scale we're talking about the nineteen fifties. Here, the USA and the then USS are
we're not on great terms. Both countries were using pretty much every means at their disposal to analyze one another, to spy on one another, to maneuver to make certain the other nation didn't get a superior position. And we saw a lot of technological development during this period, including the space race that was all wrapped up in this Cold War issue as well, and perhaps as no big surprise, the US government was pretty keen to fund research and
development in machine translation up to a point. That is, in nineteen sixty six, Joseph Wisenbaum published a computer program called Eliza. I've talked about Eliza in previous episodes of Tech Stuff. This was a primitive chat bought text based chat bot. It mimicked a Rogerian psychotherapist. That's a discipline that was pioneered by the psychologist Carl Rogers. It's sometimes also called persons centered therapy. Eliza was strictly this text
based terminal operation. You would see a line of text pop up. It would ask you how what how you're doing? You can type stuff in and then it would respond to you, so you would get the responses that appeared to be semi intelligent. Typically it would be a question to ask for more information, or sometimes it would be a phrase to change the subject. So you might say something along the lines of I'm so angry right now, and Eliza might respond with what has made you angry?
So Eliza has flipped this around in order to sustain the conversation. Then you could type in something else. Maybe you type in everything is going wrong today, and Eliza might respond with can you give me an example? And then so on. Eliza would give the appearance of understanding the subject, but in reality it was simply taking the input, analyzing the parts of speech, then sending back a very similar message or a related message in an effort to
keep the conversation going. Like it might just be a placeholder. The program did not understand language or context beyond being able to parse the basic parts of a sentence and then rearrange them or go with several stock responses when it didn't have a way of figuring out what it should do. NIX also saw something else that would end up creating a bit of a big setback for natural language processor researchers. But I'll explain more about that when
we come back after a quick break to thank our sponsors. Okay, So nineteen sixty six, what happened that set back research in this field. Well, that's when a report was published that had a dramatic impact on funding for R and D and machine translation. It was called the ALPAC Report. ALPAC a l p a C stood for Automatic Language
Processing Advisory Committee. This was a group consisting of various experts and fields ranging from computer science to linguistics to psychology, and the U. S. Government had established the committee back in nineteen sixty four, and they had a very simple assignment, or at least simple on the surface, which was evaluate the progress that was being made an automatic machine translation
across the board, look at what everyone's working on. Give us an idea of where we are and where we're headed. The nineteen sixty six report essentially concluded that the field was still in its infancy, and that before any real advancements could happen, a lot more basic research in the field of computational linguistics would be required. So essentially, the report was saying, we're trying to move at a full gallop, but we still aren't really sure how to get on
the horse. I'm paraphrasing, of course. One result of this was that the US government began to scale back grants for research in the field of machine translation. This was, unfortunately exactly the opposite thing that needed to happen. The US government wanted more immediate results and decided, well, if you're not going to get results right away, we're gonna take that money away and put it to use somewhere else. And that made funding scarce, and it likely prolonged the
amount of time it took to advance the discipline. Although I should stress work was still being performed in the United States as well as elsewhere. It's not like this brought everything to a standstill. It just slowed down quite a bit. By teen sixty seven, NLP research was straining against technological limitations. They were starting to feel the the very limit of what computers were able to do. Even advanced systems could take upwards of seven minutes to analyze
a long sentence. Programming was still largely in a similar language, so it wasn't easy to do. And you would still have to interact with machines using punch cards, so that was also laborious, and heaven help you if you dropped all your punch cards and you forgot to number them, because then you've ruined your program. Work was progressing on the linguistic side, but the technological side was kind of
lagging behind at this point. One of the big decisions researchers had to make around this time was what were they going to focus on first while building out computational linguistics. Because it's such a huge problem you couldn't really tackle it wholesale. You needed to kind of focus on specifics.
So should research focus on syntax that all about sentence form and structure, as I mentioned earlier, or should it focus on semantics, which is more about the underlying meaning of what was said and less about the structure of how it was said. And ultimately, most researchers, not all of them, but most of them decided to focus on syntax. For one thing, it seemed like a more analytical thing
to concentrate on. Right like, you could define rules more easily for syntax than you could for semantics, and semantic ambiguity could be fudged a bit. You can rely heavily on output words that had a broad meaning. So using a word with a broad meaning might not produce a specific, precise result, but at least could be quote not wrong
end quote. So if a word might have several translations ranging from hut to villa to bungalow to mansion, the output word might be building because the translating program might not know which variation of that translation it should go with, but knows that all of those different examples fall into a larger category called building. So that's not precise, but it gets the job done. You you would understand what the the actual noun was. In general, you would know
it was a building. You might not know that it was a home, and you might not know what kind of home it was, but you would at least know that it was a structure. So much of the work in the late nineteen sixties focused on solving syntax problems for computers, with the researchers saying will worry about semantics later.
Some notable groups went against the flow and decided to tackle semantics and semantically driven processing, partly because they recognized it as being a really tough problem and some engineers just love solving really hard problems. That's kind of what thrills them, and so they chose to go that route. They began building out semantic categories and worked on semantic pattern matching using semantic networks as a means of knowledge representation.
Karen Spark Jones, who wrote that that history I mentioned earlier, suggests that it was in the late nineteen sixties that the research moved out of its initial phase and into a second phase, and that second phase was largely marked by the incorporation of artificial intelligence, including incorporating world knowledge
in processing natural language. In nineteen sixty eight, Terry Winograd, who today is a Professor Emeritus of Computer Science at Stanford University, was working in M I. T. S AI Lab as part of his postgraduate studies, and he began to work on a virtual world he would call s h R D l U sued blue. Um, that's what I'm going to call it is sued blue. It consisted of virtual objects on a virtual table, so it's all imaginary, right. He then programmed a grammar and lexicon specifically for this
very very limited imaginary world. So in anything that did not involve the things that were in this imaginary world, namely the table and these virtual objects, that didn't need to be dealt with it all because it was immaterial, It didn't exist in this universe. So he only had to focus on the elements he had created, and that limited the scope of his work and made it more manageable. His design even included the concept of persistence and memory.
So imagine a table with a collection of five objects on it. So you've got imaginary table, You've got five imaginary objects on it. Two of the five imaginary objects are spheres. One of them is a green sphere, and one of them is a red sphere. You then type in a command into a terminal that is that's giving you information about this virtual world, and you say, I want to move the red sphere over to the far end of the table. And then you send another command,
only this time you don't specify red sphere. You just say move the sphere back. Whino grad system could actually remember ber that you had previously moved the red sphere, and it would apply your command to the red sphere again under the assumption that's what you meant. When you didn't specify, you must have meant the same sphere that you had just moved. This is a concept that we're seeing rolled out into voice assistance today, like Google Assistant.
It's the ability to reference something you've already accessed without having to specify what you're talking about. So if I asked a voice assistant what the weather will be like today, and then I follow that up after I get the information, I say what about tomorrow, the system that has this kind of capability could infer that what I meant was what will the weather be like tomorrow, even though I
didn't say it specifically. Like that. That's pretty advanced for nineteen sixty eight, even though it was for this very restricted virtual world with a limited number of variables. However, win no Grad discovered that the secret to his success was largely in this restriction. As you expec ended the virtual world to incorporate more elements, it made the problem exponentially harder. His work, by the way, was an early example of what we call anapho resolution, and an anaphour
is what I was talking about second ago. It's a word or phrase that refers to an earlier word or phrase within a discourse. So if I said move the red sphere to the left, then I said, now move it back the It obviously refers to the red sphere. You would understand that, but a machine wouldn't necessarily understand it. You would have to say move the red sphere to the left, move the red sphere back. And even with back, that has an element of memory to it, because the
system has to remember where the red sphere used to be. Why. No Grad's approach was one of the early attempts to incorporate anapho resolution into NLP models. Other models concentrated on translating word by word or sentence by sentence. They were incapable of maintaining relationships between between words beyond that. That shift marked a change in attitude among NLP researchers of
the time. A growing number of researchers felt that world knowledge and artificial intelligence was necessary if we wanted machines to be able to analyze and act upon longer forms of discourse. The early approaches to NLP were best suited to short, self contained passages in ninety one, AREPA launched the Speech Understanding Research Program. I also mentioned that in the Speech recognition episode it was very important for the
development of speech recognition. The goal of that program was to advance not only speech recognition but also n LP research so that a computer could not just detect and transcribe speech, but also respond to it in some meaningful way, for example being able to UH index all that information so that it is searchable. The program lasted five years. However, at the conclusion, the agency was not satisfied with the results, which technically delivered upon what was asked, but a pretty
limited implementation, so are BUT decided to cut funding. They stopped the project. This was another big blow to research in the United States, which had viewed the project as a positive development ever since the ALPAC report had pulled the RUG out from under the funding earlier. Now, I've got a lot more to say about the development of natural language processing and where we are now, as well as the history of the various voice assistants that we're
familiar with today. But it's time to conclude this episode. In our next episode, we'll pick up where I left off today and we'll continue down and talk about all of our beloved friends like Syrie and Alexa. Now, if you have suggestions or future episodes of tech Stuff, right me. Let me know what you want to hear. There might be a specific technology or a company, a person in tech. Maybe there's someone you want me to interview or have on as a special guest host. You can send me
an email. The address for the show is tech Stuff at how stuff works dot com, or you can drop me a line on Facebook or Twitter. The handle of both of those is tech Stuff H s W. Don't forget. You can follow us on Instagram. I want to see you guys over there, and I'll talk to you again really soon for more on this and thousands of other topics, because it how stuff works dot com
