Get in text with technology with tech Stuff from stuff dot com. Hey everyone, and welcome to Tech Stuff. I am your host, Jonathan Strickland. Today I am incredibly fortunate I have not one, but two amazing guests to talk about our topic of discussion today, which is really going to be about Amazon's Alexa service and what it can do, what what it's like to develop for it, and why
should we be excited about these voice recognition services in general. Uh. I guarantee you by the end of this episode you'll be as excited about them as I already am. So let me introduce my guests. First, on the phone, I've got Dave Zbitski from Amazon High Dave, Hey, how you doing great? Thank you so much for joining me, really appreciated. And I here in the studio in the flesh. John Scheme from big nerd Ranch. Now, Josh, you've been working on developing sort of the the the how to guide
of developing for Amazon's Alexa. Is that correct? Yeah, that's right. I've been working with David and others in Amazon to build some developer education tools to write apps for the Alexa Skills Kit platform with Now, this is great. I'm glad that I've got two experts on the subject here because I do lots of research and I love to chat about technology, and I'm very passionate about the subject.
But it's always a great thrill for me to have experts on the subject matter here as well, so that they can even fill in those gaps that are within my understanding, because I'm coming from a consumer standpoint primarily, I am not someone who has had a deep education in the field of coding and developing for this sort of stuff. I have a liberal arts degree, but I'm so glad to have you guys here to talk about. Let's start at the very top and work our way down.
So Alexa is uh a kind of a personal assistant that can do lots of different stuff and depends heavily upon voice recognition, speech recognition, natural language processing. And I don't think a lot of people have a true understanding or appreciation of how big a deal that is that the way we humans communicate and the way that computers quote unquote think is very different. So, Dave, can you talk a little bit about the challenges of developing something
that can actually work with natural language? Certainly, I um and and thank you for the intro. I uh, I love talking about technology too, and I am far from an expert. This has been a learning journey for me. There are people who have been working on voice and natural language and AI for thirty plus years, and I feel like we're in a see change now with the power of the cloud and just how affordable you know everybody, Um, you may have a tablet or you may have a
smart home. Just that technology has become so affordable for everybody that we can finally do something like this. And Amazon's vision is basically the Star Trek computer. If everybody remembers, it doesn't matter if it's you know, the Next Generation, the original series, Deep Space nine, any or the movies, right, there was always a voice that a human being could
just call out to in the air. I remember watching Next Generation and uh wharf would go and play opera, right, klingon opera or Pricard would would ask for music just and they would walk in the room. And I was just thinking about that the other day walking into my office, I was like, gosh, I'm asking a computer to turn
on the lights and play a piece of music. This is we are living in science fiction, right, And so that's the basic idea behind mind Alexa is it is a service from Amazon that we make available to anyone were free, and we've put it into actual pieces of hardware that we make and we sell to Amazon customers as well. And some of your listeners may know that as the Amazon Echo, the Dot, the tap, or the
fire TV. And the basic premise is that Alexa can understand us as a human being, and then she can talk to all the technology in our lives so that we don't have to learn that user interface, you know, the whole twelve o'clock blinking light. How do I go ahead and change that. I should be able to just ask the device itself to set the time. I should not have to worry about that as a human being.
And that's really what Alexa is all about. She can talk to human beings, she can understand human beings, and then she can go ahead and tell machines what we're actually asking for. The is really important to have that
sort of translator between us and the machine world. One of the things that I found extremely frustrating early in the era of home automation is that it wasn't all or nothing kind of approach, depending upon what you wanted to do you had to essentially go with one provider for everything, one manufacturer for all of your stuff, because it wouldn't talk to each other. You have different protocols, you have different approaches to the way that they would
integrate with each other. And if you had everything from one company, awesome, everything talks to each other, it's fantastic. But if you're like a regular human being who can't necessarily outfit an entire home all at once with the same sort of technologies, you wanted to be able to
talk to each other. So I think one of the big in my mind, one of the big bonuses of an approach like Alexa is the idea that you have this go between that can do that work for you where it can start to compensate for the fact that these technologies don't natively talk to each other necessarily. Um. Right. A great thing about standards is there's so many of them, right, yeah, yeah.
I love the fact that you know, the term standard means the opposite of what you could expect it to be and um and and that is one of the most popular uses we have seen for the the echo device. I know that was my journey too, is I started out you know. I used it for music and general queries, and I said, you know, I heard this thing can do stuff with smart home. I have no idea what smart home is. I have no idea what any of these terms like IoT right, internet things. It doesn't make
sense to me. I want to light I can turn on. So I went to Amazon and I searched for smart light bulb I think is what I found, and it was I think it was fifteen bucks. I figured i'd order it and see if it would work. And there was you know, kind of a general instruction on Amazon site. But what you can actually do with Alexa as you get the device and you say, Alexa discover devices and then she figures out what you put into your home. You don't have to go and figure that out. And
we've found enough custard demand that we've actually created. If you go to Amazon dot com slash smart Home, people can go there and they'll see all of the smart home devices that Alexa can just talk to and will automatically work. So it makes it a much easier process versus trying to figure out all the individual pieces to buy, who talks to who and everything that you mentioned. You can just go ahead and put a light screw in
a light bulb and be able to talk to it. No, that's incredible from a consumer standpoint, right, the idea that you've made this sort of a seamless experience so that you don't have that frustrating moment where even with something as simple as bluetooth pairing, for some people that that is that is a barrier, right that they have to wait, do I have to on each device when the light is bling? Game? It means that what's happening, you know, just something as simple as that can be really frustrating
for some people. So to take that step away, uh, is really really an ingenious and helpful thing to do. Now, Now, Josh, you've worked very hard to help with the back end of this so that people who are developing for Alexa can take advantage of this and and give people chance to make Alexa do some pretty incredible things. So, first of all, I gotta I gotta lay down some vocabulary.
Right before we started recording, you talked about how a thing on your phone you refer to it as a skill, and you think all of us, because I'm so deep in this Amazon world, skills and acts. So so tell people what what exactly is an Alexa skill? Well, it's interesting skill the term you think initially a skill is
something that you acquire or learn over time. And I believe that Amazon used that terminology as a nod towards really the machine learning aspect of what they offer in the cloud, right, um, which is and David was speaking to this a bit ago. Um, you know, the machine learning component of what the Amazon service offers, and that is the incredibly complex problem, right of taking spoken words and resolving them to a format that uh software can
actually work with and treat in a predictable way. Um. You have to imagine all the timing, the inflection, the variation, just regional differences, uh, that someone's going to ask for something. And then also you have to think about all the various ways that someone could ask for the exact same things. It's an astronomical number of ways that that can happen. And it's the Alexa Skills Kit platform that actually resolves that problem, um in a lot of ways, UM for you.
So I believe that's why they chose that terminology. It makes sense, right, And actually Amazon's corporate a corpus by the way, to throw another technology steward into the mix. UM is really a collection of data, and Amazon their service offers a large collection of that data at which is ever increasing by the way. UM. That simplifies the problem of resolving that speech and converting it into UM.
What what the platform calls an intent, which is really an indication of something that someone would like to do at a given time. Yeah, it's interesting because if you think about it from a classical computing standpoint, if you wanted your computer to do something specific, uh, let's go back to the DOS days. We're not gonna go all the way back. We'll go back to DOST days because
that's that's my childhood. Then you would type in a command and and the computer knew exactly what you wanted to do as long as the actual program is installed on your computer. Because you're following a very specific protocol that does not vary. That's right, always going to be the same. It's a one to one thing, and it's
a textual interface and very consistent. Yes, but when you get to two different people, just just two people, and you just want them to ask for the same thing, but you're not guiding them in how to ask, that's same for that same thing. That's where you start getting into this, uh, this this variability and even if they're both saying the exact same phrase, if they're from my neck of the woods, there might be a bit of
a droll that's right there. If they're up over in Maine, it's going to be a different sound if they're if they're no non native English speaker, they're gonna be in an inflection in their voice from whatever language was their primary language. So these are all non trivial problems actually
in the in the programming world. Yes, so there's that problem, and then there's also an Amazon refers to this as the interaction model, which is the number of variances in how someone could ask for something, and the platform takes
a fuzzy matching approach to solving that problem. Right, So instead of providing an exhaustive list of all of the various ways that someone could ask for information for an airport, for example, um, instead they use a artificial intelligence approach to that problem and generalize a set of training data that you actually provide as a developer, uh to to simplify that problem down. Okay, so yeah, because the first thing I was guessing was like I wonder if it's
going to be probabilistic. It is one of those things where it assigns a probability that I'm pretty sure this is what they're asking for, so let's go for that. Uh exactly. Yeah. We've talked about that with some of the other artificial intelligence platforms out there, things like IBM S Watson being a very simple example, right, simple in the sense that it's easy to understand. It's actually a
pretty complicated machine as it turns out. But the fact that you would say, all right, well, when it was playing jeopardy, it would never buzz in unless that certainty was greater than like an eight threshold. And once you explain to people that's what we mean by probabilistic, where a computer is determining, well, how how quote unquote sure am I that this is intent of the person giving
the command then act upon it. One of the things I wanted to also mention about alexas skill, so we talked about how it's it's voice commands, right that you're giving it. These have their own kind of anatomy, Right, You've got the skill, where how how you activate the skill itself? What you call upon in order to make the skill happen, and UM, I wonder if you can maybe go into just a little bit like it might not be something that the end user would necessarily think
about the developers would think about. Yeah, no exactly that. You know, it's got its own grab bag of terminology, but it's really not that complicated once you get past those initial things. Um. You know. So as a developer coming to the platform, the first thing that I asked was, well, what heck is the name of my app or skill? I mean, um and updating that in my mind? UM, and that is called the invocation name on the platform. So what that invocation does is it invocation name does?
Is it maps a user's word? Basically? I think it as a name space like I came to you know, Alexa skills kit development as a Java developer and being able to give package names or name spaces two classes as a Java developer is very helpful. Um. And really
that's what it does, uh, an invocation named. So for example, UM, in our class at Bigner Ranch that we've built for for Amazon, we we've given We've built a skill to give you information about airports and UM, you you say, Alexa, ask airport info for flight delays at a t L. For example, that airport info word or words is an invocation name, so that initially brings up the skill. It launches it so to speak UM. And then you've got
sample utterances UM. I mentioned that you train UM. I kind of think of it as a brain in the cloud, UM that lives in the cloud to do our bidding for us for resolving the spoken words. UH. These sample utterances are what effectively resolve what someone's asking for to an indication or an intent that we'd like to ask for airport information. And Amazon has made some really smart
decisions about making that a black box. Effectively, as a developer, all we do is provide that training data and on the other side we get an indication of what came out. We're not required to you know, set up a machine learning server or you know, artificial intelligence or deal with any of those algorithms that who knows how many countless engineers and hours Amazon is invested into building that infrastructure UM.
But we've got that as a tool to resolve the information down to something that our skill service can work with. So with an Alexis skill, you've got a skill interface and a skill service the Skill interface is where this brain that I keep mentioning lives, and the skill service is it's really anything that can speak HTTPS. UM. Now with our class, we're using no JS, which is, you know, it's JavaScript on the server side. Everybody's it in a dab of JavaScript here there anyway, So it's an easy
language for people to pick up and get into. If you've done you know, um uh, any web development, you've likely got some JavaScript exposure. UM. So it's really an extension upon that using some you know, you've got some additional things like file io and being able to write
to a database using node. Uh. But with that JavaScript layer, we get events from the Skill interface and we're able to process those events and then send a message back using Jason right Java JavaScript object notation uh to the to the device, and she's able to speak speak our response. It's pretty cool. This is really well, I'm sorry, Dave, please go ahead. I was just gonna say to um
to to add on to that. You can think of it in terms of you know, we've we talked before about Alexa translating human language into a way that she can talk to technology. It is the same when third party developers go ahead and enhance what she can understand
through skills. So, if you are an Amazon customer and you have an Echo and you talk to your echo, and perhaps you're using what Josh describeder, you're using a starting phrase, uh and a invocation name to launch something like let's say Fitbit or uber or your your Domino's easy order. Your voice goes to Amazon. Your voice is
never shared with a third party developer. You should think of Alexa almost as that friend that can can not only translate to other pieces of technology, but she can talk to all of these third party developers who have their own technology. Uh. And so in essence, for for Fitbit, when you talk to her, she understands that you're asking
how you know, how did you do today? And then she goes and she talks to Fitbit, and Fitbit says, oh, you know, we have these servers and we have all these datas and and we know we uh, you know, we we know you're a customer. We're gonna return that. So go tell that customer this. And then Alexa goes and she tells you the information that Fitbit had for you. So, in in essence, it is translating between those things and uh,
you mentioned before the probability. Uh, it's very interesting. As human beings when we have a conversation, we were constantly making choices like that too. I bring all of my experience to this podcast today, right, So when we're having a conversation, it's based on all the years that I've had a conversation and the understanding of what those words actually mean. And there's a lot of things that go on set. So for example, we all know what time of the day it is where we are, we know
where we live, we know what country we're in. I know you know, I know how to stand up, I know what eyes and nose look like. I know what a computer ist I knew what when you reference Doss, A lot of machine learning is that a computer doesn't necessarily have that context, right, And so when you're a third party developer, you know the the interaction model that jos described basically, think of it like, Uh, you and I are going to have a discussion around a new
topic today. Maybe it's something to do with fitness, and so we all, you know, we all go look at a wiki or maybe we look at Reddit, and we define all the terms and then you and I can can have a conversation and actually understand all of that. And so that's what you're basically setting up when you want to add an additional skill to Alexa is Hey, here's all the terms, and when people ask for it like this in language, here's what I want you to
tell me that they asked. And Dave you touched upon something early on in that too that I think is interesting and important to point out the idea that uh with with you communicating to Alexa, your information is going to Amazon. It's not being propagated across the internet willy nilly,
being spread everywhere. That also ends up being an important part of some of the restrictions around the types of skill ales that Amazon will accept for Alexa, one being that they there aren't going to be uh child once once for children specifically, because there is a very real concern about privacy and information, particularly for children who may
not fully understand or recognize the importance of that. Um. So are there other restrictions on the types of skills Amazon is going to except I would imagine that anything that was against the law right out the door not happening. Yeah, you know, um in the scenario that you described, because Alexa today has no way to distinguish between different voices. So if we had an echo in the room and I was talking and you were talking, that is both
a human being, it's it's language. She cannot differentiate. So she cannot differentiate whether it is a child asking for something or an adult. And because of that, uh, and you know there are COPA regulations and and everything else when it is to mobile and web existing today, we want to make sure that we're honoring that and that we are protecting folks. Um your um uh second question around um, if there's anything else that we're not necessarily allowing,
I think, um, there are. The guidelines are very similar. So we do have an Android app store today, an Amazon App Store which has very similar policy. So it is something that Amazon has been in that that space for years and we've kind of learned as we've gone through. We just launched an update to the Alexis Skills section where we have categories, and those categories are very similar to what people would expect in a mobile app store
for you know, games, technology, things like that. So uh, you know it's very it's got to be PG thirteen and G you know, no our rated content anything like that. No UM personally identifiable information. You would not. You know, we're very careful about if you keep asking people all about them, why are you asking about them? And you better have a privacy policy for that that somebody has allowed you to ask for that information. We do not
share any information about you. We don't share device information UM, although you know developers of course keep asking for those things you know they want to know is this coming from an echo or mulbile app? Because Alexa will run in anything. So there are UH apps for the iPhone, apps for Android where you can talk to Alexa and get access to all your skills. You can get access to all of your smart home functionality as well, using the same familiar Alexa experience. See and I am glad
you were able to address that. I think that that actually is a very important thing for any platform to do well. You do. Actually it seems counterintuitive, but you do need to have those guidelines and restrictions there. And if you ever want to see what can happen if you don't put them there, you can look at some any dramatic lessons we've learned through things like Microsoft Ta, which was certainly not intended to turn into a big problem. It was that the intention, the intent was how long
did that last? For like a day and hours? It was twenty four hours and then they pulled the plug. Yeah, I did a full episode of Microsoft Taste. I'm not going to go back and do that, obviously, I'm not going to ask either of you to comment on that anymore than but just to just to say, like, if you have a system and you don't have those restrictions in and we are all in many ways like children, and sometimes as children, you want to test boundaries, and
if you find there are no boundaries, problems happen. So I'm in favor of boundaries personally. That's actually one thing, uh that Amazon has done a really good job of is curating the experience as well. You know, they literally, um, as you submit your skill, audit that skill pretty thoroughly and pretty rigorously to make sure that it conforms to
good Voice User Experience guidelines. Well, and when you're talking about a device that that you know people think of as listening to you, obviously you have a great responsibility in order to provide an experience that isn't going to be negative in any way. Uh, knowing that you know you have to take a lot of time and effort to make certain that you avoid any problems that could come later down the line. I mean, that's got to
be pretty pretty top concern from Amazon. Yeah, that was, you know, very very important for us at Amazon when we created the device. And so if you're not using an Echo, you're using another device, you know, maybe it's a mobile app, or you know, maybe it's in the car or in a clock radio. All of those devices are pushed to talk. So those devices are never listening
unless you hit a button. If you have an Echo device, you have the ability to hit a mute button, and when you hit a mute button, you actually see a red ring that goes around the outside letting you know, and we do cut power to the microphone as well when you hit that. Otherwise, Alexa is never listening unless she hears her name. So when you say Alexa, then we begin to record your voice. Your voice again is only sent to Amazon. We do not send that to
third parties. And then you can open the Alexa app itself. You can see every same little thing that you've ever said to Alexa. After you've said Alexa and and then talk. And then you also have the ability to delete any one of those, or if you'd like, you can contact us and remove your entire history as well. So we
do put all the control into the customer's hands. And obviously that was this great foresight on on the part of Amazon, because you could easily imagine that if you did not build that into your design from the ground up, that you would you would very quickly realize the need for that. And if that that, that's not a good feeling.
I think in any of this, this scenario, because it is very new for people, it's about building up trust and it's about you know, I use terminology of crawling, then walking, and then running, and I think as a technologist, I always want to run, but it's important in a space like this to start out crawling, even if that means you're limiting what you can actually do with the device. And one of the you know, the things that I feel is a sign that that's been a success is
that people come and they ask for more. You know, now, hey, I want Alexa to do more. It's okay, you know, give me permission. I wanted to control this. I wanted to control that, and gee, it would be great if we call it like, so people are now they're fine with the fact that they can talk to Alexa. Now I want Alexa to do even more. I want Alexa to wake up and start talking to me even if I haven't talk to her, which is something you know
that she doesn't do today. She will never interrupt you or start talking out of nowhere unless you first engage. And I think that's a sign of customer trucks and people getting excited about where the technology is headed. That's pretty cool. I also want to say something else that I think is really cool. Um, you guys might disagree, and many listeners may disagree. But so I went back and I was looking at There are blogs for specifically for Alexa developers, which I recommend my listeners go out
check out those blogs. They are not written in a way that is UH that's so dense or so technical that they are inaccessible. They are very accessible and I read over quite a few of them before we had this conversation, and one of the reasons why I wanted to bring this up. While they are incredibly helpful and technical, one of the examples that was used in UH in one of the blog posts spoke to a deep core within me where it was actually being used to explain
what is a launch phrase? Was an invocation name and it was about um using Dungeon dice D twenty for that was me. Yeah, I'm as a hardcore D and D fan from way back. I hung out with Gary Guy Gags creator Dudgeons and Dragons. Yeah, edition campaign right now. Actually, if I hold this up talking about the video, you see all my D and D manuals right there. They can my originals. I can report that Dave in fact does have a stack approximately looks like about two ft
tall of D and D manuals behind him. Um, first edition, all the way through a second. Oh Dave, you and me man this blogged this room and on the phone. Uh yeah, so that was one of those things. But I like that the examples you guys give are interesting, they are easy to understand, and you also it's outside of that initial reaction I think a lot of people
would have when they hear the word Amazon. Of course, their first thought is going to go towards online shopping and they're thinking, oh, well, this is going to be an app that just makes it easier for me to buy things. But then you get into something like this and you're like, well, no, this is here's something where imagine you've got a table and you know, you don't have a metric ton of dice weighing the table down.
You actually have an app and you can just call on any time you need, and then that becomes incorporated into your game. It's almost like Alexa is playing the game with you, and people start to realize, oh, there's other stuff this can do that that aren't that isn't related to buying things. I personally think one of the cool decisions Amazon made with this is really treating it like an interface and as a developer, implementing that interface, you know, uh, for your purposes is really what the
platform is is great at I mean at work. For example, we had a hackathon recently and um, I wrote a service in a lixer to be able to control and this runs on a Raspberry Pie to be able to control the servo locks on of the doors in our building. And I recently was able to write a skill that interacts with that Raspberry Pie via the web to open the door, um when we see a visitor come by.
So the platform is totally open as a developer, and and uh, there's Something else I wanted to touch on as well, that as mentioned in the blogs and we've kind of touched talked about a little bit, is that the the coding side of this, it is a server side, not a device side, not a client side kind of service. Josh, can you talk a little bit about about why that is just for people who might wonder like, well, why
would all this be? Why would all the the hard work, the crunching of numbers, if you will, Why is that happening in the cloud and not on on a dedicated device. Yeah, that's a great question. I mean, I think there are
a couple of really solid reasons right away. You can list for one, um, you know, the fact that the Amazon Echo you'll likely have numerous devices, and being able to enable that skill and have it in the cloud um is an instantaneous thing across all of those devices, right So, I think that's one really solid reason that
Amazon went without architecture. The other reason is moving everything to the cloud, right keeping it on that infrastructure, it allows for Amazon to iterate incredibly quickly on that skill interface side of things. UM, you know, not having that constraint to the device and worrying about shipping new hardware to make changes over time. Amazon can simply change out that in interface and interaction model um uh, you know pretty much seamlessly. Um and it works um incredibly well.
And that's a that's a big part of it. It also enables um, you know, third party developers to host their own services wherever they want, and whatever technology they want doesn't have to be an Amazon Cloud using any language that they want. And we knew, you know, uh, we thought of some pretty interesting things for Alexa, but we knew we really need to an essence crowdsource her
ability to get smarter. Uh. And there's now over fifteen hundred skills out there today by a third party developers who are creating things with you know, like job described about controlling doors and putting things in Raspberry pies and robots and drones and health health and life sciences and doctors offices and hospitals and games and virtual reality, you know, things that we never would have been able to right ourselves, that people have just taken uh and they've run with it.
And I continue to be impressed every day by what people are doing. Yeah, exactly, Davin. I mean, you know, you can embed you know, Alexa in if you're a hardware designer, you can build your own version of Echo with extended capabilities. In fact, you could have a display or you know, if you have visions about what that could be like, you can embed what's called Alexive Voice services within that hardware and actually create your own version
of of the Echo, which is incredibly powerful. Interesting. So let's talk a bit now, Josh. You you work very closely over at Bigner branch and developing the kind of UH curriculum that someone would who wants to develop for Alexa. They'd find it very helpful they to to actually learn how to code something and crafts something. And you've mentioned before also that that one of the test skills people will develop is one about getting information about airports and
potential delays. Let's let's kind of, in a layman sense, kind of walk through what is the process in general of of developing for Alexa, and then at the end of it, I think we can do a quick little demonstration. Alexa, by the way, I didn't introduce her. I feel like such a cad I didn't introduce her, but she is also in the studio with us. So Alexa, how are you doing today? Great. And if you're listening, make sure you're muted. Although you were already listening your your echo
has probably gone on several times. My my parents are going to send me a very mad email. My parents, by the way, they have they're the ones who asked me to do this podcast because my parents own own and Echo, and they talked to it all the time,
and they demonstrated to me all the time. And uh, the only reason I haven't picked one up yet is I've got to wait for my wife to go out of town just long enough so I can get it and incorporated into everything, so that when she comes back and sees that I bought it, it's so awesome and incorporated and integrated that she would never want to get rid of it. Yeah, I run the white I do the wife test with technology as well, and is slowly warmed up to She was resisted at first, but she's
just accepted it into our home. Step one is is get my wife comfortable with Alexa. Step two is to get her into a D and D campaign. All right, So so we're I come to you, Josh, I just say, I say I am interested in starting to develop for Alexa. What's kind of the process of of learning to develop and and the process of actually developing in a skill. Yeah. And so you mentioned earlier that you had read our
blog and you followed that arc. It's um as you get into the process, it's really a couple of things that you get your brain wrapped around. Initially, you know, most developers are going to be coming from a graphical user interface background, right, I mean that's the lay of the land these days. Of the applications being written, though that's changing, are all graphical, right, because it's either meant
for a smartphone screen or it's on a laptop or desktop. Yeah. Once, once, once mac os and and and Microsoft Windows really took hold, we started to see other types of UIs disappear, and then we just began to assume that was the only way you could do things for a while. Yeah, exactly right. Uh. You know, however, the paradigm as we see is changing. So the first thing to understand is the skill interface portion of what I consider to be a two part thing.
It's a skill interface and then a skill service. The skill interface, like I said, is responsible for resolving the user's words to the first step of building a skill, which is defining the interaction model, and so I consider consider that to be setting an invocation name, which we talked about earlier. It's the name of your skill, um and how a user is going to be communicating with it. So this is so that you can you can create a distinction between the skill that you're developing in every
other skill that's out there on available on with Alexa. Yeah, that's exactly right, and it's going to be unique to the skill that you're building. Um. Amazon will in fact, uh not allow you to have you know, of course, the same name of another skill that that introduces an interesting new like domain squatting type problem, that doesn't wonder,
I wonder what will become of that at any rate? Um. So in our class, for example, one skill that we teach you to write, which gives you a pretty good cross action of all the different capabilities of the platform is the airport info skill. So my first step would be defining the invocation name of airport info in Amazon's skill interface. And this is literally a web portal that you visit and configure in your web browser. So as
you can figure that. The next step that you get to is defining the sample utterances that as a developer, I need to hook into UM as a result of the resolution of what the interaction model said someone had just spoken into the device, right. So this is the kind of magic that we're talking about earlier. The black box that Amazon offers you UM that leverages the artificial intelligence and machine learning, you know, the cutting edge technology really there, UM, and we train that model up with
sample utterances. So those sample utterances for airport information UM. It's actually UH, it's got a couple of different aspects. One is, UM, we want to be able to resolve the various ways that someone will ask for information about an airport. And then secondly, we've got to have a way to pass a variable into the application or the skill. So I want to be able to, for example, determine
if someone had said SFO or a t l UM. Right, they're asking about it a specific airport code, and I need to be able to throw that into a variable. So the interaction model has a mechanism for doing that called a slot, and a slot is it's kind of UM. It's Amazon's terminology for basically a variable assignment that we want to be able to do. So in that in that set of sample utterances, we give strings that represent
how user could ask for airport information. So I might ask for, uh, you know, Alexa, ask airport info for flight delays at a t L, um for flight information at a t L, for delay status at a t L. You see the very creations of ways that someone could ask for that info. UM. Now, like I said, it's generalized, so in other words, it's a fuzzy resolution between those phrases and UM. An intent. So an intent is an indication of what someone would like your skill to do.
And we want our skill to be able to give us information about an airport. And so the skill interface, once we've set up that sample utterances list and we've said okay, the words that fall at this portion of what someone says are actually going to be dropped into this new thing called a slot. And I want to be able to have that as a variable on the second portion of building a skill, which is the skill service.
So once the skill interface has done that work for me, and since the information of what it found of what someone had said to the skill service, that's where we're in node land, or like David said, it's really any programming language that can speak HTTPS and live on a server One of the steps that actually happened at that point too, is after you've set up everything that johsh talked about, in essence, this is what I want to talk about. Here's some examples of me using it in
a sentence. That's where the computer science comes in. So that's where it runs in the cloud, and we actually do a bunch of uh, you know, AI and machine learning and everything that sits on top of Amazon Web Services, and we generate a lexicon and a language model, so it's it's all done ahead of time, and it's almost a you know, me being a sci fi geek, it's almost like the matrix when Neo says, teach me kung fu, and then he's like, I now know kung fu. That's
the part that we do through the portal. So now Alexa knows kong fu, uh, and she can talk to you about kong fu. But you actually have to build something to be able to respond to that. And that's what Josh is talking about now when we move on to the node piece, right, Yeah, so this is the magic black box in the cloud that has you know, the cutting edge artificial intelligence technology that Amazon offers us um on the interaction model side of things, and as
a developer, I provide a training data. Uh. When David said it's sort of prepares things ahead of time. It literally bakes that training data down into something that occurs, you know, in near real time. That interaction is is very very fast and there's there's little to no latency um and the reason is because they use that sample data to prepare a model ahead of time. Uh, it's really cool. It's it's super cool. It's you know, we're we're living a sci fi novel right now. I feel
like I'm in cryptomic con or. So well, it's it's got to be exciting too to be a developer that gets to take advantage of that level of of technology and not have to build it yourself, Like to to develop on top of something that has already got this amazing capability and on the shoulders of giants. I mean, we're leveraging all of that platform and infrastry sure and
research that's that's occurred there. Uh. And right now there is a language model out in the cloud that knows all about rolling dungeons and dragon dice, which is It's a reassuring thought, isn't it. I would say that that was critical and critical hit it too a good initiative in doing that, I'm terrible. I've lost my saving throw. It gets being cool, that's not she will actually she'll make a little scathing comment about you rolling a one. That's that's pa why I showed it so like if
you could aller it. But the big light of my kids rolling some dice if they had one, and she'll say, hey, do you have a warple sword? If you hit twenty, oh nice because off the head there you go. Uh so, all right, So you you get to this point where you've you've defined what it is you you know, incode, You've defined what it is that the action that needs
to happen, um and you've you've built it. So we've provided the training data and the interaction model, right, and we've said when you uh hear words that are along the lines of our request for airport information, I want you to send an airport info intent as part of a Jason payload to my skill service. And I also want you to grab I want you to wrap up what they said at this portion of that utterance into
a variable that I can use on my skill service. UM. So the skill service in our class is living on UM. It's actually another Amazon service called Lambda UM, which is an HTPS server that spins up and shuts down. It's kind of like Hiroku. If anyone's ever done rubyond rails development here, Hiroku is a real go to for handling the DevOps side of things UM and it's kind of an on demand server platform that we're using in the
class UH and so on that. On that AWS Lambda instance, we've written node code that handles parsing the HTTPS UM Jason payload that comes across from the Skill interface and it says, Okay, I've gotten Jason information here, and I received an airport info intent UM, and I've got a variable of a t L here, And with that information I can then do what I will with it. I mean I can I can go make a web request or right to a database I could call off to
another service UM. And that's actually what we do with Airport info is we hit the Federal Aviation Administration's servers and request the status for a t L. Then we build a string which is what Alexa is going to respond with, and send it back to the Skill interface and that gets forwarded onto the device. So that's kind of the round trip of how an interaction with the
skill would work and what we would do as the developer. There, so got the skill interface, skill service, skill service response to the interface and hands it back to the device. Now for practical experience for the person who's actually using Alexa. Obviously, all of that information, while awesome, is not something that you have to worry about. If you did have to worry about it, then you would not really have a
consumer product in your hands. You'd be a developer. But in order for you to understand what is what's the end result with that, I thought i'd be cool, Josh, would you mind asking Alexa kind to give an example of that app in action? Yeah? Sure, So it's skill I should say I have. Don't worry about it. That's a common blunder initially coming into the thing. I think I said app once today as well, So we don't have any app. That's fair, that's fair, that's a good
skill idea. Yeah, so alright, so let me first, let's un mute her, all right? So okay, So Alexa ask airport info for flight delays at a t L. There is currently no delay at Hartsfield Jets in Atlanta International ladies and gentlemen, This is an amazing day. Not only did we get to hear voice wrecking shouldn't service work in real time on this show? But there are no flight delays at the airport. Actually yeah, it's it's it's not calling it's not really calling the airport, it's not
calling the f A uh. But this is this is really interesting again showing just a very simple application. Obviously, you could ask for lots of different things. Sorry, skill, but you could ask for lots of different things, including like if you were curious about what's the weather going to be? Like what what was the time of day? Can you play my favorite playlist on such and such?
All of these sort of things that are all really just you know, if you come up with an idea that could be uh essentially if you could if you could do it across a computer web browser, and if you're able to translate that into an experience that works, especially speech um in a speech role, then it's it's totally possible. And only that, but I saw because I
was curious about this. There are some things that we do with computers where if you were asking a question, you might need a little more information or a little bit more of Uh, it might be a little difficult to explain something just simply in spoken word. Uh. I find that all the time doing an audio podcast, that it can be kind of challenging to explain a certain
concept without the use of visual aids. I saw that with Alexa you could actually pair that with something like a display on an app where you can you can have a little more information about whatever the request is. Yeah, that's right. Yes, So Amazon also provides a additional part of that response from the server called a card um.
So if we sent that information onto the skill interface, what would happen is we could preserve or send additional info to the Alexa companion app, which is in your browser, it's on your phone, it's on the Android and Iosh client the interfaces with it, and we'd have a history um and additional info as well um that we could display with that card mechanism. Right, that's a great way
of getting around what could be a real challenge. I mean, that's something that because we've designed so much of our interaction to be primarily a visual experience, there's certain tasks that you do that don't translate as easily into something that's audio based. Yeah, it's a really interesting aspect of voice user interface design as well. I mean it's such a um ephemeral format. It's it's here and then it's gone, and it's an interesting problem coming from you know, the
graphical user interface background that everybody's got UM. I think being able to hook a little bit into the g u I side of things and persist data UM and allow people to refer to that back um. You know, it does give your skill legs beyond just that one interaction. You know, for situational experiences, an on demand info voice is great, but being able to carry it beyond that is also one of the things that the platform offers. That's fantastic. Now I have a question for both of you.
I'm going to kind of start wrapping this up because I feel like we've got a nice, a nice foundation for a discussion here and I think that it will really uh, it's really enlightening to understand, like what are the challenges not just the challenges, but what are the potential benefits of this kind of technology. My question to both of you, and it's a pretty simple one, is what is your personal favorite skill that you've had a chance to play with on Alexa and Dave, I'm gonna
have you go first. Yeah. So, um, besides, you know, using the dice with my kids the funniest one, and this is this speaks I think a little bit to my maturity level. Is I have this skill enabled. It's called for a fart, and you can ask for a fart and you will get one through Alexa. So I actually I have dots so upstairs in my house. Yeah, all right, there's a dot in each one of my kid's rooms. And then I have a full echo in
my bedroom. And there's actually a place in that upper hallway that I can say Alexa asked for a fart, and there is a symphony musical returned across the entire upstairs and I immediately here, Dad, not again. A nice, a nice chorus followed by groans. I can appreciate that as someone who loves puns and and grown and grown worthy. Hum. Um, yeah, that's fantastic. All right, So Josh, you your answer, that's
an interesting one. Well, I have to say on the pragmatic side of things, Um, I have used lift skill to great effect. So you just say, Alexa asked lift for a ride, and guess what a car shows up at your door right to pick you up. Amazing, and it's such a good fit for the platform as well. I mean, you're walking out the door, you say, I don't want to get my phone out. Let's let's have a car show up right now. Uh, and it happens.
That's very convenient. And there is a if you you know, we talked a little bit about D and D. If you ever played um that goes all the way back into the old bulletin board systems days of Space Empire or anything like that, there is a skill called star Lanes by Joe Jo Quinta that basically allows you to do that and you can play It's a multi player online Space Empire game through Alexa, and that is one
of my other favorites. I would highly recommend checking that out if you've ever played any of those games in the past. That could be quite addicting. That's really cool, I mean, and it really does speak to the potential for things that we can't even necessarily anticipate right now that could end up being either uh, it could end up being something where you know, people talk about for
a little while like, oh, that's really clever. It's really neat use of the technology, or it could truly be transformative to the point where we didn't think about this and now we can't think of what what life would be without it. I mean, that's the that's the cool promise of this kind of tech is that it stands to be really disruptive, uh, for a type of technology that's been fairly set in its ways for the last several decades. And um, I love seeing this. I love
the discussions about machine learning and artificial intelligence. I love the discussions about natural language and the challenges that we face when we try to create interfaces that can accept natural language as an input. Guys, I have to thank you so much for joining me on tech Stuff, Dave, Thank you, Josh, thank you. I really hope that you enjoyed your time here on on David. I know you're not actually in the studio, but I'll pick up my laptop and I'll just show you around in a minute.
Thank you. Yeah. I really enjoyed being here today and having this conversation. Thank you so much for having me on the show. Absolutely yeah, thanks for having us, Jonathan, I was really stimulating conversation. Uh. Yeah. And Guys, like I said, if you want to learn more about developing for Alexa, or you just want to kind of give a better idea of what's going on when you are using a device or a service with Alexa incorporated into
it and you're wondering, well, what's actually happening? Those like I said, those blog posts are really accessible. I've read a lot of developer blogs over the last ten years, and they are written in a way that is much easier to understand, even if you're coming in from a purely just just an area of curiosity, much easier to understand that some of the other ones I've have encountered. That makes me feel good. And I not that you
know it's I am. I am a budding uh podcaster, but just last week I did actually launch a podcast along the same lines of just talking about Alexa, you know,
similar to the blog post learning Um. The first episode aired with Charlie Kendall, who runs in our smart home, so it's me asking him you know what that team set out to accomplish it If you're if you're interested, if you're if your listeners want to check it out, it's Bitley Alexa dev Chat or just look for Alexa dev Chat on iTunes or Stitcher or tune in or any of the other catchers. Yeah yeah, I should also mention the video series for learning Alexa skill Kit Development
just rolled out. At least the first two videos have rolled out, and um I believe Amazon is planning on rolling all of them out in the next few days. Um, so those are going to be available online. If you search for you know, Big nerd Ranch Amazon training videos, you should you should be able to find it that way. Fantastic guys. Thank you again and listeners out there. If you want to get in touch with me, you have a suggestion for a future episode, you gotta follow up question.
You want me to either run down Dave or Josh and ask them, or you know, just want to say hi. You can email me addresses tech stuff at how stuff works dot com or drop me a line on Twitter or Facebook. The handle for both of those is text stuff h s W and I'll talk to you again really soon. For more on this and thousands of other topics, is it how stuff works dot com.
