Happy Anniver-Siri

Speaker 1

00:04

Welcome to Tech Stuff, a production from I Heart Radio. Hey there, and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with I Heart Radio, and I love all things tech, And uh, you know what the today's episode I was gonna I was gonna make it a one partner, but it turns out there's just way too much stuff, not just about the topic at hand, but the various components that make up this topic that require me to do more than one. So

00:40

this is gonna likely be a two parter. But today I thought we could look back at the development and evolution of a famous AI personality. This virtual assistant celebrated an anniversary recently, and I must apologize for being a couple of days late with this, but this particular servant debuted on October fourth, two thousand eleven, technically for the second time, but the history of the actual technology dates

01:11

back much further. And of course, I'm talking about Sirie, Apple's virtual assistant that can interpret voice commands and return results based on them. This is not just some dull history lesson, however, Sirie really has an incredible backstory, ranging from a science fiction vision of the future to a secret project intended to augment the decision making capabilities of the United States military. Yeah, Siri had a pretty tough background.

01:46

The story of Sirie is complicated, and not just because of the internal history of developing the technology, but also because the tool relies on a lot of converging technological trend There are elements of voice recognition, UH, speech to text, natural language interpretation, and other technologies that fall under the very broad umbrella of artificial intelligence. So get settled, it's

02:14

time to talk about Siri. Also, if you're listening to this near Apple devices, I apologize because there's a good chance those devices might start talking back at me. But I refuse to do an episode where I just refer to the subject as you know who. You could argue that the origins of Siri can be found in a promotional video that Apple produced back in nineteen seven to show off a concept of an artificially intelligent smart assistant.

02:49

Now that alone is interesting, but what really is amazing is that the arbitrary date they chose as the setting for this video was two thousand evan, probably September. We know that because there is a part within the video where a character asks for information that had been published five years previously, and the published information had a publication

03:14

date of two thousand six. Now this means that the actual debut of Syrie as an Apple product was just one month after the fictional events in that video from nine. That's just a coincidence, but it's a cool one. The Knowledge Navigator video shows a man walking into a study, really nice one, and unfolding a tablet style computer device. Then he walks off away to stare at stuff as a virtual assistant reads off his messages and meetings on

03:49

his calendar. The virtual assistant appears as a video and a little window on the screen of the tablet, and it's you know, like shot from the shoulders up, kind of a the bust of a young man, and the video takes up that one little corner of the tablet device. So in this visualization, the virtual assistant isn't just a disembodied voice. It also has a face. Also, everyone in this video is extremely white, which I guess is kind of a given for the time period and the people involved,

04:24

but it just comes across as so white. I mean, we're doing this with the benefit of the glasses of twenty I just wanted to throw that out there anyway. The video goes on to have the real life man who is a professor in this video, ask his virtual assistant to pull up lecture notes uh and unread articles that relate back to the lecture he's He's asking for a lecture notes of a lecture he gave a year ago.

04:52

He's giving essentially the same lecture now, but he wants to update it with the latest information, and he even asks the virtual assistant to summarize those unread articles that had been published in the year since his last lecture. The virtual assistant is thus aggregating information, analyzing that information for context, and then delivering summaries, which is that's a

05:15

pretty sophisticated set of artificially intelligent tasks. He also, the professor uses the device and virtual assistant to call and collaborate with a peer in real time. Now, this was not the only video that Apple would produce to showcase this kind of general idea, however, arguably it is the

05:37

most famous of those videos. Now, as I said, Knowledge Navigator came out of Apple, and Steve Jobs would later play a pivotal role in how the company would introduce Sirie, but This was not a Steve Jobs project because Jobs had been ousted from the company Apple, or he had quit in disgust, depending upon which version of the story you're listening to. Anyway, he had left a couple of

06:06

years before this video was produced. The Knowledge Navigator was something that Apple CEO John Scully had described in a book titled Odyssey. Now, of course, in science fiction stories we have no shortage of instances where a human is interacting with a computer or otherwise artificially intelligent device like a robot, but the Knowledge Navigator seemed to lay down the foundations toward future products like Siri and the iPad, not to mention the potential uses of the Internet, which

06:39

inn was definitely a thing. It existed, but most of the mainstream public remained unaware of it because the Worldwide Web wouldn't even come along for another few years. However, while you can look at this video and say, ah, this must be where Apple got that idea, they probably got to work right away on Siri, well you'd be wrong because the early work, in fact, the vast bulk of the work on Syrie to bring it to life, didn't start at Apple at all. It didn't involve the company.

07:14

So our story now turns to a very different organization, the Defense Advanced Research Projects Agency, better known as DARPA. Now this is part of the United States Department of Defense. Back in nineteen fifty eight, the then President of the United States, Dwight D. Eisenhower, authorized the foundation of this agency, though at the time it was called the Advanced Research

07:42

Project Agency or ARPA. Defense would be added later. This agency would play a critical role in the evolution of technologies in the United States, and the mission of DARPA and ARPA before it is quote to make pivotal investments and breakthrough technology is for national security end quote, and that wording is really precise. It's easy to imagine DARPA as being housed in some enormous underground bunker filled with scientists who are building out crazy devices like robo scorpions

08:16

or a blender that can also teleport or something. But in reality, DARPA is more about funding research than conducting research. Now, don't get me wrong, the agency relies heavily on experts to evaluate proposals and consider to whom the agency should send money. But the purpose of DARPA is to enable others to do important work. DARPA has played a huge

08:41

role in countless technological breakthroughs. This way. Much of the technologies that would go on to power the Internet started with ARPA net, a kind of precursor network to the Internet and one that was funded by ARPA. Thus the name the DARPA Grand Challenge just helped get self driving cars into gear. You know, pun intended. They also created difficult scenarios for humanoid robots to go through. That was

09:09

a few years ago and was really cool. The competitions DARPA hosts have specific goals and metrics, and that guides the designers and engineers who are working on them as they build out technologies. It's good to define your goal. It really gives you focus when you're trying to develop

09:28

the technology to meet that goal. Winning a challenge is a big deal, though the cash prize may not even cover the amount of money participants have spent through the development of those technologies, and there are entire businesses, or at least divisions within businesses that can be borne out of these challenges. The Grand Challenges are just one way

09:50

DARPA encourages technological development. Often, the agency will create a specific goal such as the design of a robotic exoskeleton that can help you know, US soldiers carry heavy loads while they are on foot for longer distances, and then they'll send out an RFP, which is a request for proposal. The agency considers the proposals that it receives from this RFP and then decides which, if any, they will accept and then fund. Then after a given amount of time.

10:22

You know, it's dependent upon the specific project, we find out if anything comes out of it. Sometimes nothing does, as some technological problems may prove more challenging than others and may require more time to evolve the various technologies to make it possible. So it might push the field, but you might not have a finished product at the end of it. Other times you do get a finished

10:45

product anyway. In two thousand three, a decade and a half after the Knowledge Navigator videos came out of Apple, DARPA identified a new opportunity, and this was one that was borne out of necessity. The challenge was that we have access to way more information today than we did in the past. So decades ago, military commanders had to

11:08

make decisions based on limited information. They'd rely a great deal on their own expertise and experience in order to make up for the fact that they only had part of the picture. And while a great commander has a better chance of making the right call than an inexperienced commander would, the limited amount of information could still contribute to disaster. You might be the greatest commander of all time, but if you're lacking a key part of information, you

11:37

might make a decision that is terrible. So flash forward to two thousand three, and now the story had kind of flip flopped. Now military commanders would receive more information than they could reasonably handle. The challenge now wasn't to use intuition to make up for blind spots, but rather, how do you synthesize all this information so that you

11:59

can make the right decision. Too much information was proving to be kind of as big a problem as too little information, at least in some cases, and so DARPA wished to fund the development of a smart system that could help commanders make sense of all the data coming in from day to day. Now, DARPA projects tend to be labyrinthian, with lots of bits and pieces and a lot of different companies and research labs and more organizations might tackle all or part of one of these projects.

12:34

The cognitive computing section of DARPA had a program called Perceptive Assistance that Learn or PAL, which seems nice. It was this part of the program that would fund the development of a virtual cognitive assistant. The amount of funding was twenty two million dollars. What a great PAL. The organization that landed this deal was s r I International, itself an incredibly influential organization. It's a nonprofit scientific research institution.

13:11

Originally it was called the Stanford Research Institute because it was established by the trustees of Stanford University back in nineteen forty six, though the organization would separate from the university formally in the nineteen seventies and become a standalone, nonprofit scientific research lab. The organization has played a role in advancing materials science, developing liquid crystal displays or l

13:38

c d s, creating telesurgery implementations, and more. And now it was going to tackle DARPA's request for a cognitive computer assistant. S r I International created a project called the Cognitive Assistant that Learns and Organizes or KALO or

13:58

CALO if you prefer. And this appears to be another case where they landed upon that acronym first and then worked backward, as klo seems to come from the Latin word colognists, which means soldiers servant, and I probably mispronounced that because even though I was a medievalist, it's almost

14:19

criminal I never took Latin. The concept, however, hearkens back to some of what we would see in that Knowledge Navigator video from a system that would be able to receive and interpret information, presumably from multiple sources, and provide a meaningful presentation or even interpretation of that data to humans, which is a pretty tall order, and let's break down a bit of what an assistant would need to do

14:47

in order to accomplish this. We'll leave help the voice activation parts for now, as that would not be absolutely critical to make this work. You know, you might have a system that gives daily briefings on its own, or you might have one that you activate through text commands or some other user interface. It wouldn't necessarily have to be voice activated. But on the back end, what has

15:08

to happen for this to work well? Presumably such a system would need to pull in data from a number of disparate sources, so the assistant wouldn't just be reciting facts and figures that we're coming from a centralized data server. Instead, it might be assimilating data from numerous sources into a cohesive presentation. On top of that, the data might be in different formats, meaning the system would need to be able to analyze the information inside different types of files.

15:38

This isn't an easy thing to do. There's a reason we have a lot of specialized programs for working with specific types of files. When I put together these podcasts, I use a word processor for my notes, and I use an audio editing piece of software to record and edit the podcasts. Now I need both of those programs because neither of them can do the job that the other one does. I don't have like a all purpose program that does everything. Accessing different file formats, even in

16:11

the same general family of applications is tricky. Beyond that, the way information can be presented within each file could be very different. It's very possible for us to open up multiple spreadsheets and even using the same basic spreadsheet program let's just say Excel, It's possible for us to open up half a dozen Excel spreadsheets that are all presenting the same information but doing so in different ways,

16:38

and that might not be obvious at casual glance. You might look at one and the other and not immediately realize, oh, these are both saying the same thing. Just think about how information could be presented as a table or a graph or a chart. The AI assistant would ideally be able to access information no matter what format it was in. Nomatter are what a version of that format it was in, be able to interpret it and then be able to

17:05

deliver a meaningful analysis to the user. Now, as data sets grow, this becomes increasingly difficult, which I should point out is the whole reason DARPA wanted to fund research into this in the first place. Military commanders were faced with a growing mountain of information that was increasingly difficult to parse. The analysis might also need to incorporate natural

17:28

language recognition features. And I've talked about natural language a lot in previous episodes, but if we boil it down, it's the language that we humans use to communicate with one another. It's our natural way of expressing our thoughts. But the way we humans process and communicate information is different from how machines do it. We can be subtle. We can use stuff like metaphors and allegories and just

17:55

different phrasing. Computers are, you know, a lot more literal. Hey, if you break it down to the most basic unit of machine information, you know, the bit. You see how literal computers are. A bit is either a zero or a one, or if you prefer, it's either off and on or no and yes. But using lots of bits, we can describe information in a way that provides more

18:21

subtlety than just nowhere. Yes. But my point is that computers don't naturally process information the way we do, and so an entire branch of artificial intelligence called natural language processing evolved to create ways for computers to interpret what we mean when we express things within natural language. Making this more complicated is that, of course, there's no one way to say any given thing. We've got lots of ways to express the same general thought. And added to that,

18:53

we have lots of different languages. There are around seven thousand different langue whig is spoken in the world today, though you could probably get away with a couple of dozen and cover the vast majority of the world's population that way. But these languages have their own vocabularies, their own syntaxes, their own expressions. So not only do we have multiple ways of saying things within one language, we

19:19

have all these different languages to worry about. If you were to send ten people into a room with an AI assistant, and those ten people have a task they're supposed to perform with the help of this AI assistant, odds are no two people are going to go about

19:36

it exactly the same way. And yet a working virtual assistant needs to be able to interpret and respond to every case and do so reliably on the back end, and AI system needs to be able to interpret data coming from different sources that may have very different ways of expressing similar ideas. This is an enormous task. Now, when we come back, I'll talk more about what s R I was doing and how the military project would evolve ultimately into Apple's Personal Assistant. But first let's take

20:08

a quick break. Now I've only scratched the surface of what makes the creation of an AI assistant capable of accessing information from numerous sources and making that information useful really required. Let's talk a bit about the parameters of

20:32

this project itself. So if you remember I said that the deal was initially for twenty two million dollars, and that would end up funding the creation of a five hundred person project, and the project spanned five years initially to investigate the possibility of building out such an AI system. Over time, more money would end up going into the research system, and it totaled around a hundred fifty million

20:58

dollars by the end of the produc inject. The lab where it all went down would receive the charming nickname nerd City. A large part of the project focused on creating a program that could learn a user's behaviors. So not only could this personal assistant respond to what you were asking, it would gradually learn the way you behaved and it would adapt to you to work more effectively. Now this comes into the arena of pattern recognition. We

21:31

humans are pretty darn good at recognizing patterns. In fact, we're so good that sometimes we will quote unquote recognize a pattern even when there isn't a pattern there. In some cases, this can come across as charming, such as when we see a face in a cloud, right, that's not really a pattern there. We're recognizing a pattern where none really exists. It's all based on our perspective in our imaginations. Now, in other cases, it's not so charming.

22:02

It can actually lead to faulty reasoning. So I'm going to give you a very basic example that I hear all the time, particularly now that we're in October and there's some full moon weirdness going on. So there's a fairly widespread belief that there's a connection between full moons and an increase in the number of medical emergencies that happened. Generally speaking, that people act irresponsibly during a full moon, and that often results in injury, which means greater activity

22:33

at hospitals. Now, this belief is most likely due to confirmation bias. That is, we already have a belief in place, and the belief is that full moons lead to more accidents because of people acting irresponsibly. That is what we believe. It doesn't have evidence yet, and then when things do get busy at a hospital and there happens to be a full moon, we register that as evidence for our belief. Aha,

23:03

says the mistaken person. The full moon explains it. However, on nights when it is busy but there is no full moon, there's no hit, no one, no one takes notice of how odd you know, it's crazy busy, but there's no full moon tonight. We don't do that. Likewise, if it happens to not be busy but there's a full moon, you're also not likely to notice. You're not likely to say, like hunt, it's not very busy tonight,

23:30

but there's a full moon out. So it's only when you have the full moon and the busy hospital where the evidence appears to support your belief and confirm your bias. But in truth, when you take a step back and you do an objective study and you look at the times when a hospital is busy, and you look at when there was a full moon, and you look to

23:52

see if there's any correlation, it falls apart. Now I got a little off track there, But the point I wanted to make is that we humans are biologically attuned to recognizing patterns. It's very likely that pattern recognition is one of the traits that really helped us survive thousands of years ago, which is why it's so intrinsic in

24:14

the human experience. But building programs, computer systems that are capable of identifying patterns and separating out what is signal versus what is noise is its own really big challenge. S r I was hoping to create a program that could look for patterns and user behavior in order to respond with greater precision and accuracy to user requests and ultimately to anticipate future requests. Now we see the sort of pattern recognition and response in lots of technology today.

24:48

There are several smart thermostats on the market right now, for example, that can track when you tend to raise or lower the temperature in your home, and after a while, the thermostat learns that, hey, maybe you like it nice and chilly at night, but you want it to be warm and toasty in the morning, and so the thermostat begins to adjust itself in preparation for that based on your previous behaviors. Now that is a very simple example.

25:15

Extrapolate that out and you begin to imagine a technology that is anticipating what you need or want, perhaps before you're even aware of it yourself, which can get kind of creepy but also sort of magical. But in truth, it's because this system is detecting patterns that we aren't even able to recognize ourselves. The danger there, of course, is that the systems can sometimes mistakenly identify a pattern

25:43

when in fact there's not really a pattern there. Very similar to the case I was explaining about with the full moon and the busy hospital. Even computer systems can make those sort of mistakes, and depending upon the implementation, that can be a real problem. But that's a that's an issue for a different podcast. Now. When it comes to humans, pattern recognition is so ingrained in most of us that it can actually be kind of hard to explain.

26:10

You notice, when something happens, and if that same thing happens later with the same general results as the first time, it reinforces your first perception of that thing, and if it happens over and over, their brain essentially comes to understand that when I see X happen, I can expect why to follow, and from that you might eventually realize that there are other correlating factors that may or may

26:36

not be present. When this goes on. With computers, the goal is to create systems that can analyze input, whether that input is an image file or typed text or spoken words or whatever, and it first has to interpret that input, has to identify it and figure out the defining features and attributes of that input, then compare that against known patterns to see if the input matches or doesn't match those patterns. And in a way, you can think of this as a computer system receiving input and

27:08

asking the question have I seen this before? And if so, what is the correct response? If the input matches no pattern, the system then has to have the correct response for that. So a very simple example might just be a failed state, in which case the virtual assistant might reply with something like I'm sorry, I don't know how to do that yet, or something along those lines. Now, remember earlier I mentioned that we humans have a lot of different ways to

27:37

say the same general thing. For example, with my smart speaker, I might ask it to turn the lights on full, meaning I want them to be all the way up. I might say make the lights. I might just say make it brighter. And the system has to take this input, analyze it, and make a statistical determination to guess at

27:59

what is that I actually want to have happen. I say guess because in each case we're really looking at a system that has multiple options when it comes to a response, and each option gets a probability assigned to it based on how closely that option matches with the input, So I might say make it brighter, and the underlying system recognizes that there's a n chance I mean, increase the brightness of the lights of the room, my men,

28:29

and the system has determined that that's the most probable answer. Right, it's probably correct, so it goes with that, but still kind of a guess. Now, there are a lot of different ways to go about doing this, but the one you hear about a lot would be artificial neural networks. I've talked a lot about these in recent episodes, so we'll just give kind of the quick overview. So you've got a computer system has artificial neurons. These are called nodes, and the job of a node is to accept incoming

29:03

input from two or more sources. The node is then to perform an operation on those inputs, and then it generates an output, which it then passes on to other nodes further in the system. You can think of the nodes as existing in a series of levels, with the top level being where input comes in and the bottom

29:23

level being where the ultimate output comes out. So the nodes are level down except incoming inputs then perform other operations on them and pass it further down the chain and so on until ultimately you get an output or response. Now that's a gross oversimplification of what's going on, but generally you get the idea of the process. Now, let's complicate things a little bit to get these sort of

29:48

neural networks to generate the results you want. One thing you can do is mess with how each node values or ways each of the inputs coming into that node. So I'm going to use some names human names for nodes here just to make things easier to understand. Let's say we've got a node named Billy. Billy is on the second layer of nodes, so it's one layer down from where direct input comes into the system. So there are nodes above Billy that are sending information to Billy.

30:24

We'll say that the two nodes that give Billy information are named Sue and Jim Bob. Sue and Jim Bob send Billy information, and it's Billy's job to determine what further information to send down the pipeline. Like I need to do an operation based on this bits of these bits of information that are coming to me, and then

30:44

I have to come up with a result. Only Billy has been told that Sue's information tends to be a little more important than Jimbob's information is, and so if there's a question as to what to do, it's better to lean more on sue use information than on Jimbob's information. We would call this waiting as n W E I G H T I n G. Computer scientists wait the inputs going into nodes in order to train a system

31:15

to generate the results to the scientists want. One way to do this is through a process called back propagation. Back propagation is when you know what result you want the system to arrive at. So let's use the classic example of identifying pictures that have cats in them. As a human, you can quickly determine if a photo has a cat in it or not. You'll spot it right away.

31:40

So you feed a picture through this system and you wait for the system to tell you if yes, there's a kitty cat in the picture or no. The images cat free. And let's say that the picture you fed to the system in fact does have a cat in it. You can see it, but when you feed it through the system, the system fail is to find the cat and says nope, there's no cat here. Well, you know

32:05

that the system got it wrong. So what you might do as a computer scientist is you look at that final level of nodes right at the output level to see which factors led those nodes to come to the conclusion that there was no cat in the photo. You then look at the inputs that are coming into those nodes and you see how they are weighted, and you change the weights of those inputs in order to force that last level of nodes to say, oh, no, there

32:34

definitely is a cat here. And so on. You move up from the output level and you go up level by level, tweaking the waitings of incoming data so that the system is tweaked to more accurately determined if a photo has a cat in it. Now, this takes a lot of work, and it also means using huge data sets.

32:55

You know, you're feeding hundreds of thousands or millions of images, so of them with cats, some of them without, and training the system over and over again to train it before you start feeding it brand new images to see if it still works. And this can be a laborious process to train a machine learning system, but the result is that you end up with a system that hopefully is pretty accurate a doing whatever it was you were training it to do, you know, like recognized cats. But

33:22

that's just one approach to machine learning. There are others. Some like the version I just described, fall into a broad category called supervised learning. Others are in unsupervised learning. In fact, Kalo was largely built through unsupervised learning, meaning the machine had to train itself as it performed tasks using inputs that hadn't been curated specifically for training purposes. It's just an enormous amount of information coming in that

33:54

the system has to process. So, in other words, for Kalo, the system wasn't dealing with like a stack of a million photos, seventy of which had cats and which didn't. Kayla was working with real world information and attempting to suss out what to do with it in real time. Now, to go into how unsupervised machine learning works would require a full episode on its own, but it is a fascinating and complicated subject, so I probably will tackle it at some point. I'm just gonna spare you guys for

34:25

right now. The real point I'm making is that s RI I International spent years building out systems that could do a wide range of tasks based on inputs. Pattern recognition was actually just one relatively small piece of that.

34:40

Creating an ability to pull data from different sources in a meaningful way is its own incredibly challenging problem, as I alluded to earlier, particularly as the number of sources you're pulling from and the variety of formats the data is in begins to increase, it becomes easier for the system to make mistakes as you throw more variety at it, and it requires a lot of refinement. Frankly, it's actually a task that's so big I have trouble grasping it.

35:09

The Kalo project became the largest AI program in history up to that point. It was an incredible achievement. It brought together different disciplines of artificial intelligence into a cohesive project with a solid goal. By the two thousand's, artificial intelligence was a sprawling collection of computer science disciplines, each with incredible depth to them. So you might find an expert in one field of AI who would have little to no experience with another branch under the same general

35:41

discipline of artificial intelligence. There was a prevailing feeling that the various branches of AI had each become so complex they would never work together. The Kalo project proved that wrong. When we come back, i'll explain how part of this military project would break away to become the virtual assistant, ultimately finding its way onto iOS devices. But first let's

36:05

take another quick break. Adam Chair, whose name I'm likely mispronouncing, and I apologize, but he was an engineer at s r I working on Kalo, and he worked with a team that had the daunting task of assimilating the work that was being done by twenties seven different engineering teams into a cohesive virtual assistant. So, as I mentioned just before the break, the disciplines of AI had each gotten very deep, very broad, and required a lot of specialization.

36:45

So you have these different engineering teams working within various disciplines, and it was chairs team that needed to bring all these together and make it into a working, coherent hole. The results were really phenomenal. Now I'll give you a

37:00

hypothetical use for Kalo. Let's say that you've got a project team and there are ten people on your team, including you, and let's say there's a meeting that's on the books for tomorrow morning at a particular conference room, and it's supposed to be a status update meeting for the project. It turns out that two out of the ten people on your team are no longer able to make the meeting due to last minute high priority conflicts,

37:30

so they've had to cancel out of the meeting. KALO would be able to detect the change in status of those two people and say, all right, these two are no longer going to the meeting. Then KALO could determine how important those two people were to the overall team, essentially saying what are their roles? What what role are they performing within the context of this team, and is

37:53

it a critical role for this meeting. It can also look at the importance of the meeting itself, like, oh, well, this is a status update, so it's really just to keep the team, you know, informed of what's going on. It's not a mission critical type of meeting. It could

38:08

take all that into account. Then KALO can make a determination on its own whether or not it should keep the meeting in place and go ahead just without those two people and maybe just send updates to those two people, or to cancel the meeting entirely notifying all the participants about it. Then look at the different calendars of those participants, book a new meeting, including securing a space for that

38:33

meeting and sending out new invites. It would even be able to look at the purpose of the meeting and flag information that's relevant to that meeting, essentially creating a sort of meeting dossier on demand. So it's really, you know, incredible sophisticated stuff. Now, that was the fully fledged Kalo, but an offshoot of this project, or maybe it's it's better to say it was a smaller sister project that existed at the same time it launched in two thousand three.

39:03

Along with Kalo. This other one was called Vanguard, at least within s r I, and it was taking a more scaled down approach of building out an assistant and looking at how it could be useful on mobile devices. Now, again, this was in two thousand three, before smartphones would really become a mainstream product because Apple wouldn't even introduce the iPhone until two thousand seven. But s r I was working on implementations of a more limited virtual assistant and

39:32

then showing it off to companies like Motorola. One person at Motorola who was really impressed with this work was a guy named Dog Kittlaus. Kittlaus attempted to convince his superiors that Motorola that Vanguard was a really important piece of work, but he didn't find any real interest over

39:53

at Motorola, so he did something fairly brazen. In two thousand seven, he quit his job at Motorole and he joined SRI International with the intent of exploring ways to spin off a new business that would develop an implementation of the Kalo Vanguard virtual assistant, but for the consumer market. The result would be a new company called Sirie s I r I, which is kind of the way you would say s r I if you were trying to pronounce it as if it were an acronym as opposed

40:27

to an initialism. Adam Chair, after some convincing from Kittlaus, joined the venture as the vice president of Engineering. Kit Loss would be the CEO. Tom Gruber, who had studied computer science at Stanford and then pioneered work in various fields of artificial intelligence, would become the chief technology officer for the company. Interestingly, the Serie team didn't initially call

40:53

their own virtual assistant project SIRIE. Instead, the new spinoff company, SIRI would call their virtual Assistant how H a l after the AI system in the book and film two thousand one. They did take an extra step to reassure

41:12

people that this time HOW would behave itself. So, if you're not familiar with the story of two thousand one, the artificially Intelligent computer system HOW begins to malfunction and begins to interpret its mission in such a way that it compels it to start killing off the crew inside a spacecraft, kind of a worst case scenario with AI. While SIRIE began to get off the ground, it was licensing technologies from s r I to power the virtual assistant, and it also began to hire the talent needed to

41:44

bring this idea to life. At the same time, Apple was pushing the smartphone industry into the limelight with the introduction of the first iPhone. This was all happening at two thousand seven. It was clear that the push for a virtual assistant was coming at just the right time, as Apple's implementation of smartphone technology was a grand slam

42:06

home run. To use a sports analogy, it soon became obvious that the future of computing was going to be, at least in large part mobile That in turn opened up opportunities to create new ways to interact with mobile devices in order to do the stuff we needed to do now. It's obvious to say this, but mobile devices have a very different user interface from your typical computer.

42:32

Interacting with a handheld computer by tapping on a screen or talking to it creates different opportunities for crafting experiences than someone sitting down to a computer with a keyboard and mouse. There's a potential need for a voice activated personal assistant that could help you carry out your tasks, particularly ones that might need multiple steps. Sirie the Company came along just as the need for Sirie the App was beginning to take shape, so it was the right

43:03

place at the right time. In two thousand seven, Apple had not yet opened up the opportunity for independent app developers to submit apps for the iPhone. That wouldn't actually happen until July tenth, two thou eight, essentially a year

43:18

after the iPhone had debuted. The Serie team was still hard at work building out the virtual assistant app they had in mind in two thousand and eight, while they were licensing technology from s r I International, you know, from the Vanguard and the the Kalo projects, they still had to build out the systems that would actually power

43:38

Syria on the back end. Generally speaking, their approach was to create an app where a person could ask Syria question and the app would record that request as a little audio file, send that audio file to a server and a data center, and the first step then would be to transcribe the audio file into text, so we're talking about speech to text here. Then the system would need to parse the request. What is actually being asked here?

44:07

What is the command or request saying. Now, in some systems, a computer will break down a sentence into its various components, you know, a subject, verb, and object, and then try to figure out what is actually being set. Adam Chair took a different approach with his team. They taught their

44:26

system the meaning of real world objects. So, rather than trying to parse out what a sentence meant by first figuring out what's the subject, what's the verb, and what's the object that the subject is acting upon, Siri started off by looking at real world concepts within the request. Siri would then map the request against a list of possible responses and then employ that statistical probability model that

44:55

I mentioned earlier. What are the odds that someone was asking for dire actions to an Italian restaurant versus asking Siri to provide a recipe for an Italian dish, for example. So if I activate my virtual assistant and say I want linguini, that's a pretty broad thing to say, right. The app has to guess at whether I mean I want to go someplace that serves linguini or I want

45:21

to make it myself. Now, my personal app would have learned by my behaviors that I am very lazy and would realize that I am actually asking for someone to bring me linguini. So there's no doubt Siri would return results of Italian restaurants that deliver as a result from my request. And keep in mind, Sirie was intended to learn from user behaviors and a tune itself to those behaviors over time. Beyond that, Siri would pull information from

45:50

multiple sources to provide results. So if I asked about a restaurant, Siri would provide all sorts of data about the restaurant, from user reviews, to directions to the restaurant, to menu items to what price range I might expect

46:05

at that place. Syria could also tap into other stuff like the phone's location, and thus give relevant answers based on my location, so I wouldn't have to worry about getting irrelevant search results if I happened to be far from home, right Siri wouldn't suggest that I go and get food from a place that's right down the street from my house in Atlanta while I happen to be in New York City, for example. The team also gave Sirie a bit of an attitude. Siri could be sassy

46:35

and had a bit of a potty mouth. In fact, Siri would occasionally drop an F bomb here or there now. According to Kittlaus, the goal was eventually to offer extensions to Siri so that end users could kind of pick the apps personality. Maybe you want a no nonsense virtual assistant that just provides the information you need and that's it. Maybe you wanted more of a good fee sidekick, or maybe you wanted a virtual assistant who could give you

47:05

some serious attitude on occasion. The goal down the line was to create options for people to kind of shape their experience, but that would end up on the cutting room floor due to a very big reason. The serie app made its debut in the iPhone app store. In January, three weeks after it debuted, Kit Loss received a phone call from an unlisted number, a call that he almost didn't even answer, but when he did answer, the person on the other end of the call happened to be

47:37

Steve Jobs, the CEO of Apple. Jobs was over the moon about Sirie and wanted to meet with kit Lost to discover some pretty enormous options, the biggest one being that Apple itself would acquire Sirie. Now. At the time Sirie, the company was working on developing a version of the app for Android phones, having reached a deal with varies in to create a version of Sirie that could be the default app on all Verizon Android phones moving forward.

48:07

The Apple deal would ultimately derail that agreement, as Jobs was insistent that Sirie be an Apple exclusive. In fact, when Apple would introduce Sirie on October fourth, two thousand eleven, it seemed like it was being presented as a purely Apple product, that it didn't have a life outside of Apple at all. It came across as it just being Apple all along. And of course, the day after Apple would introduce SyRI to the public, Steve Jobs himself passed away.

48:45

October five, two thousand eleven. But that part of the story will have to wait for part two because, as

48:52

I said, this is going longer than I anticipated. So in our next episode we'll pick up probably actually a little earlier than where I'm leaving off here, actually, because there's still some other details we should talk about as far as how Siri works and the actual arrangement of Apple's acquisition, and then we'll talk about how the app has evolved and changed under Apple's ownership, and will also explore, you know, a little bit about series distant cousins like

49:22

Alexa and Google Assistant and others, because all of these work in similar ways, though they have their own specific processes to handle requests, and so if you do an Apples to Apples comparison, it does break down ultimately once you start getting down to how things are working in detail on the back end. So I won't go into full mode on those because it would require multiple episodes on that. But we will talk more about Siri and what has happened in the years since its acquisition in

49:57

our next episode. If you guys have suggestions for future topics I should tackle on tech stuff, let me know the best way to do that is to reach out on Twitter. The handle we use is text stuff H s W and I'll talk to you again really soon. Text Stuff is an I Heart Radio production. For more podcasts from my heart Radio, visit the i heart Radio app, Apple Podcasts, or wherever you listen to your favorite shows.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript