Effective Conversational AI: Chatbots that work

Speaker 1

00:00

Welcome to the Deep Dive, the show where we try to cut through the noise and get you truly well informed.

Speaker 2

00:06

Fast glad to be here.

Speaker 1

00:07

So today we're plunging into a topic we've all encountered, and let's be honest, sometimes really really disliked the chatbot. Oh yeah, you know the ones they just don't understand a single word you say, or they send you in these endless circles, or you know, make you desperately mash that speak to a human button.

Speaker 2

00:25

It's such a universal pain point, isn't it. And it really spotlights a critical challenge. Yeah, how do we build AI that actually understands us? Yeah, and you know helps that's.

Speaker 1

00:35

The core question exactly. So for this Deep Dive, we're unpacking the secrets behind creating genuinely well delightful AI interactions.

Speaker 2

00:44

Hopefully we're pulling insights from some really interesting new research, Effective conversational AI Chatbots that work by Ennikarrosa, Andrew Freed, and Corey Jacobs. It just came out in twenty twenty five.

Speaker 1

00:54

Yeah, and our mission today is basically to reveal why some bots succeed where others just spectacularly fit, and also how the newest advancements in AI are truly changing the game for the better.

Speaker 2

01:05

Think of this as your shortcut maybe to understanding how to build or even just identify a truly effective conversational AI.

Speaker 1

01:14

Okay, let's get into it then, So let's.

Speaker 2

01:16

Maybe start with a clear definition for you. Conversational AI. It's essentially a set of technologies designed to mimic human interaction or sometimes even replace it using natural language.

Speaker 1

01:28

Right.

Speaker 2

01:28

It goes by lots of names chatbots, virtual agents, AI assistants, sometimes even digital.

Speaker 1

01:33

Employees, digital employees, huh okay.

Speaker 2

01:36

And you mostly see it use for automating customer service, powering voice assistants like Alexa or Siri, and even sometimes pre screening interactions before they actually go to a human.

Speaker 1

01:47

So it's way more than just that little chat window that pops up on a website. It's kind of everywhere, it really is, and the book breaks these down into I think three main functional categories.

Speaker 2

01:56

Is that right precisely? Yeah? First, you have your question answering bot. People often call them faqbots. They're designed to give direct responses to pretty simple factual questions like when are you open or where you located? No follow up needed. Really, they just spit out the information.

Speaker 1

02:14

So these are the quick hit ones. Get in, get the answer, get out. Is there like a common mistake people make when they're designing just these simple bots.

Speaker 2

02:22

Well, I think the main pitfall is underestimating the sheer variety of ways users might ask the same simple question. You know that mismatch leads straight to misunderstanding.

Speaker 1

02:32

Ah, right, makes sense.

Speaker 2

02:34

Then you have the process oriented or transactional solutions. Now, these are designed to guide users through a series of steps to actually achieve a specific goal.

Speaker 1

02:44

Like booking an appointment or checking inn account balance maybe.

Speaker 2

02:47

Exactly checking in account balance, booking something. They might collect info for someone else to handle later, or sometimes they can even execute the transaction right then and there.

Speaker 1

02:55

Okay, And the last category, what's that?

Speaker 2

02:57

That's the routing agent. It's holds you basically is to figure out where to send you next, so like a dispatcher kinda yeah, either to another more specialized bot, or you know, when it's necessary, hand you off to a human agent.

Speaker 1

03:10

Okay.

Speaker 2

03:11

And what's fascinating here, and the book points is out, is that many real world AI solutions are actually a clever mix of all three. Oh interesting, Like how We'll think of a retail banking chatbot. It might answer FAQs about bank hours, right, but it could also guide you through opening a new account that's transactional, and then route you to a human specialist for something complex like fraud reporting.

Speaker 1

03:35

Right right. It blends the functions that really paints a clear picture of how versatile these things can be.

Speaker 2

03:40

Yeah, when they work well.

Speaker 1

03:43

So given this sort of intricate blend of categories, how does this sophisticated dance actually happen? Behind the scenes? The book describes this fascinating three step process.

Speaker 2

03:53

It is quite elegant when you break it down. But here's the real insight. I think the brilliance of a truly effective bot it lies in the seamless execution of these three fundamental steps. If any one of them falters, the whole experience just kind of collapses.

Speaker 1

04:07

Okay, so what's step one?

Speaker 2

04:09

Step one, the bond has to figure out what the user actually wants. This is done using natural language understanding or NLU.

Speaker 1

04:17

NLU got it.

Speaker 2

04:18

Often this uses a machine learning text classifier. Think of it like an AI that learns to categorize text, maybe like sorting your emails into urgent or promotion. It uses that to figure out the user's intent.

Speaker 1

04:33

Okay, so intent when I type something? The first challenge is the bot figuring out what I'm actually trying to do, like distinguishing between me wanting to reset my password versus say, find a store.

Speaker 2

04:43

Exactly that you nailed it, that's the intent y Step two. Once it thinks it knows the intent, the bot needs to gather any extra information it needs to actually fulfill that want. Okay, So a dialogue engine will ask clarifying questions, and it might use something called orchestration layers to interact with other systems through APIs.

Speaker 1

05:01

APIs right like ways for computer systems to talk to each other.

Speaker 2

05:04

Precisely, it's the bot's way of securely talking to other databases or services to pull the specific details it needs, like your account info or whatever.

Speaker 1

05:12

Okay, intent figured out, info gathered? What's step three?

Speaker 2

05:15

Step three? Give the user what they want, simple as that ideally, whether that's fulfilling their request directly providing the information, or connecting them to a human agent.

Speaker 1

05:26

And throughout this whole thing.

Speaker 2

05:27

The critical takeaway, and the book really emphasizes this is it must be quick, easy, and crucially follow ethical guidelines.

Speaker 1

05:36

Ethical guidelines like what specifically like.

Speaker 2

05:38

Handling sensitive information securely, and a big one never ever pretending the AI is actually a human. Transparency is key.

Speaker 1

05:46

Okay, it sounds so logical laid out like that, Yet, as we said, for so many of us, the actual experience with conversational AI causes so much pain. Yeah, the book points out those classic frustrations, Right, the bot didn't understand the thing I said, or you get that robot voice initiating some totally confusing dialogue, or you just immediately hit the button to talk to a person.

Speaker 2

06:05

We've all been there.

Speaker 1

06:06

It really begs the question, what exactly causes this weak understanding? Why are they so often bad?

Speaker 2

06:12

Well, weak understanding shows up in several frustrating ways. Right the chatbot gives you the wrong answers, or it uses that fallback intent way too much, you know, the sorry I'm not sure what you're asking message yes, Or you see frequent escalations to human agents, declining user engagement over time, people just giving up and leaving, increasing abandonment rates.

Speaker 1

06:36

So if users are constantly being asked to rephrase or the bot just gives totally irrelevant responses, that's a dead giveaway.

Speaker 2

06:43

Absolutely clear sign the understanding just isn't.

Speaker 1

06:45

There, So what's behind it? Is it just like bad luck or is there something fundamentally limited?

Speaker 2

06:51

No, not usually bad luck. The book identifies a few really common culprits, and the insight here is that these are often design failures or sometimes maintenance fail things that could have been prevented. Okay, like what well. One is manufactured training data, so examples that don't truly reflect how real users actually speak or type.

Speaker 1

07:09

Right, if you train it on perfect grammar, but people use slang or type fragments.

Speaker 2

07:14

Exactly, the loot's going to fail. Another big one is insufficient scope or gaps in topic coverage. Basically, the bot just doesn't know enough about the things users are asking about, like.

Speaker 1

07:25

That Meti World Pharma bought example, during the vaccine rollout.

Speaker 2

07:28

Perfect example yeaheah. Initially it could handle general COVID nineteen questions fine, but when people started asking about you know, vaccine eligibility or booking appointments, the bot was totally stumped.

Speaker 1

07:39

Because it hadn't been updated. The world changed faster than the.

Speaker 2

07:42

Bot exactly, which highlights another cause new information that the bot hasn't been taught. And the fourth one, which can often be the trickiest to sort out, is a lack of vetting or proper gatekeeping round changes.

Speaker 1

07:54

What do you mean by that? Like too many cooks in the kitchen?

Speaker 2

07:56

Kind of untested changes or updates made by team who aren't familiar with the whole system can accidentally introduce duplication or create conflicts between different intents or mess up the balance of the training data.

Speaker 1

08:08

Wow.

Speaker 2

08:09

Yeah. The book mentions a client where they saw their classifiers accuracy just plummet from around eighty percent down to like fifty five percent over time.

Speaker 1

08:17

Fifty five percent that's barely better than guessing for some things, right, And.

Speaker 2

08:21

It was because of all these unvetted changes piling up. The insight here is that building an effective chatbot isn't a one and done thing. It needs really diligent processes to stop that kind of entropy from creeping in.

Speaker 1

08:35

That's a huge drop. So how do we actually measure this understanding for traditional AI to stop that kind of decline from happening.

Speaker 2

08:42

Well, for traditional classification based AI, we rely on a few core metrics accuracy, precision, and recall.

Speaker 1

08:49

Okay, break those down for us accuracy seems straightforward.

Speaker 2

08:52

Accuracy is yeah, basically the overall percentage of correct predictions the bot makes simple enough. Recall is the bot's ability to identify the correct intent. Think of it as catching all the relevant questions for a specific topic.

Speaker 1

09:04

So if recall is low, it means.

Speaker 2

09:06

The bot is missing a lot of relevant questions. Like the example in the book, if a hashtag login issue intent had a really low recall maybe zero point four to four, it means the bot missed more than half the questions that were actually about login problems.

Speaker 1

09:18

Ouch, okay, and precision.

Speaker 2

09:20

Precision, on the other hand, is the bot's ability to avoid giving a wrong intent. So if precision is low, your bot might be confidently misunderstanding.

Speaker 1

09:31

Users, which might be even worse.

Speaker 2

09:33

It can be yeah, more frustrated than the bot just saying I don't know. So the real insight here is how critical it is to balance both precision and recall. Sometimes improving one can actually hurt the other, so you need to watch both.

Speaker 1

09:47

That makes sense. It's a balancing act. So we're talking rigorous measurement, But how do we actually test this in a way that reflects the real world. You mentioned kfold cross validation or blind testing.

Speaker 2

09:58

Yeah, those are standard method and AI generated data can be useful for blind testing, especially when you're just starting out. But what's really fascinating in the book highlights this is that the most reliable, least biased testing data it comes from representative.

Speaker 1

10:12

Production logs, meaning the actual conversations people have had with the bot exactly.

Speaker 2

10:17

These logs show what users actually ask and precisely how they phrase it. It gives you the truest measure of how the bot performs in the wild.

Speaker 1

10:25

But that sounds like it requires a lot of work to go through and label correctly.

Speaker 2

10:29

It often does. It frequently requires careful, sometimes even manual annotation by humans to identify what the golden or correct intent should have been for each user message. But the insight is clear. Real user data is gold standard for testing.

Speaker 1

10:46

It sounds like incredibly diligent work making sure the bot's brain is truly learning the right lessons from real interactions.

Speaker 2

10:52

It is.

Speaker 1

10:53

Yeah, just as we're learning how to really fine tune these traditional systems, there's been this monumental shift in AI that's just complete lately, changing the rules of the game for chatbots, oh, absolutely, which brings us to the real game changer. Generative AI. How is this revolutionizing the very nature of conversational interaction?

Speaker 2

11:11

Right? Generative AI. It's kind of a blanket term really for AI that's powered by these foundation models. So specifically, we're usually talking about large language models or lms.

Speaker 1

11:19

LMS. We hear that term everywhere.

Speaker 2

11:22

Now, Yeah, think of them as these incredibly vast machine learning models. They've been trained on well basically all the Internet's text, or huge chunks of it anyway.

Speaker 1

11:32

Okay, and how do they work? Fundamentally?

Speaker 2

11:34

Their core function essentially is to predict the next word in a sequence, and because they're trained on so much text, they get incredibly good at it, good enough to generate everything from coherent paragraphs to you know, entire pages of texts that sound remarkably human.

Speaker 1

11:50

Wow. Okay, So how do these incredibly powerful lms help solve those common chat butt pain points we just talked about, the ones that make us want to, you know, throw our phones.

Speaker 2

12:01

They offer potential solutions across the board. Really for that weak understanding problem. Lllms can help train much stronger traditional intents or and this is a big one, they can even entirely replace traditional intent recognition using something called retrieval, augmented generation or air gray greg.

Speaker 1

12:19

Okay, we'll definitely need to dive.

Speaker 2

12:20

Into that, Yeah we will. But the point is lllms are just far more adaptive to nuance in all the varied ways people phrase things.

Speaker 1

12:27

And what about the complexity issue bots getting too confusing?

Speaker 2

12:30

They can help there too. Lms can assist in writing simpler, clearer dialogue for the bot, or they can even be used to test dialogue flows for unexpected complexity before you deploy them.

Speaker 1

12:42

Okay, and the immediate opt outs people just giving up right away.

Speaker 2

12:45

Generative AI can help write much more engaging, maybe even more empathetic pros for the bot's messages. Setting a better tone right from the start can make a huge difference in making the user feel heard and willing to continue.

Speaker 1

12:59

So they're not just for the end user experience, but they're also tools for the people actually building the bots. I saw a table in the source about key applications exactly.

Speaker 2

13:08

Lllms have both consumer facing applications like generating answers using R which we mentioned, or maybe summarizing long conversation transcripts or human agents who take over a call. It's useful, yeah, hugely and then they have powerful build assistant tasks. They can help copy it or even write dialogue flows from scratch, or they can augment training data for the human builders, which is just a massive time saver.

Speaker 1

13:30

But with all that power, especially if they're trained on all the Internet's text, there must be some pretty significant danger, some pitfalls we need to watch out for. Oh.

Speaker 2

13:39

Absolutely, that's a critical point. The Internet, as we all know is it's full of bias, hateful speech, misinformation, you name it, and lms can unfortunately learn from all of that. So guardrails are absolutely crucial, non negotiable.

Speaker 1

13:56

Really, what kind of guardrails are we talking about?

Speaker 2

13:58

Things like content filters, but also process guardrails, like a beforehand review process that means the LLM might assist a human maybe drafting your response, but the human always has the final say and is ultimately responsible for the output.

Speaker 1

14:12

Human in the loop exactly.

Speaker 2

14:14

Yeah, and perhaps most importantly, grounding the LM's output in your own company's verified accurate documents through Araghi. That stops it from just pulling answers from.

Speaker 1

14:24

The wild web, like that unforgettable Canadian Airline chatbot example that went viral.

Speaker 2

14:29

Precisely that case is legendary. Now, their chatbot offered a bereavement discount that didn't actually exist based on some information it hallucinated or pulled in correctly.

Speaker 1

14:40

And the airline tried to argue the bot was separate.

Speaker 2

14:42

They tried to argue as a separate legal entity, if you can believe it. The court strongly disagreed and made them honor the discount. Wow, it really underscores a critical insight. Companies are responsible for what their bots say, which highlights the absolute necessity of these guardrails, especially like RAG, to ensure accuracy and frankly avoid legal nightmares.

Speaker 1

15:06

H that's a very expensive lesson and it really drives home the need for proper implementation. So let's talk more about this argo retrieval augmented generation. You said it's a big part of solving the weak understanding problem, especially for those less common, more specific queries.

Speaker 2

15:21

It absolutely is traditional intent based systems. They really struggle with what's called the long tail problem.

Speaker 1

15:26

Long tail.

Speaker 2

15:27

Yeah, think about it. Imagine trying to write a specific rule or train an intent for every single possible question someone could ask about your products or services. It's an impossible task, right, there's a long tail of very specific, infrequent questions.

Speaker 1

15:42

Right, you can cover the common stuff, but not everything exactly.

Speaker 2

15:46

So when users ask questions that deviate from those pre defined intents you did train, or questions that are simply too uncommon to have specific training data for, the traditional bot just breaks down. It throws up its hands because it has no rule to follow.

Speaker 1

16:01

Okay, so Argie is the answer to that long tail problem. How does it handle it differently, say, compared to just adding a search function to the chatbot.

Speaker 2

16:09

That's a great comparison. Let's think about traditional search within a chat bot first. It would work kind of like the pharmabout example before Eric. It finds relevant documents or passages, maybe on ibuprofen and blood pressure. Okay, the benefits are clear. You get a breadth of information, it's relatively easy to maintain, just add or edit your documents, and search technology is well established. But the downsides, the drawbacks are significant for

16:32

the user experience. It often just returns links or maybe short snippets of text. It forces the user to click through, read, and piece together the answer themselves, which is really frustrating.

Speaker 1

16:43

You ask the bot for an answer, not homework.

Speaker 2

16:45

Exactly, and it's particularly bad for voice interactions. You can't exactly click a link when you're talking to a voice assistant.

Speaker 1

16:51

Good point. So how does argon improve on that?

Speaker 2

16:54

This is where retrieval augmented generation really shines. It offers a truly powerful leap forward. Argin combines that search based retrieval step with the power of generitive models. The LMS.

Speaker 1

17:07

Okay, so it searches and generate precisely.

Speaker 2

17:09

The insight here is that it first retrieves the most relevant passages from your own verified knowledge base, your documents, your website content, whatever you feed it, and then the LM takes those retrieved passages and synthesizes them into a cohesive, contextually aware answer in natural language.

Speaker 1

17:26

Ah So, instead of just giving me links about ibuprofen and blood pressure, the pharma bought with our gig would actually read those relevant bits and then write me a clear, single summary answer.

Speaker 2

17:35

Exactly, and crucially, that answer is grounded in that verified source information you provided.

Speaker 1

17:41

Grounded That seems like the key word it really is.

Speaker 2

17:45

It means the answer is based on your accurate, up to day data, not just the llm's general knowledge is great from the internet years ago. This dramatically expands the bot's versatility. It can answer way more questions far more accurately, and it's significantly reduces those bot doesn't understand and too much complexity pain points.

Speaker 1

18:04

Okay, that sounds amazing. How is it actually implemented behind the scenes? It sounds potentially quite complex.

Speaker 2

18:10

It involves a few pretty fascinating steps. First, your large documents think manuals, website pages, knowledge based articles are broken down or chunked into smaller manageable pieces, maybe paragraphs or logical section chunking, got it. Then an AI model called an embedding model converts these chunks into numerical representations. We call these embeddings.

Speaker 1

18:31

Numerical representations like coordinates on a map.

Speaker 2

18:34

Kind of Yeah, it's like creating a unique numeric fingerprint for the meaning of each piece of text. Texts with similar meanings end up with similar fingerprints or closer coordinates in this high dimensional space. Whoa these embeddings. These fingerprints are then stored in a special kind of database called a vector database. Think of it as a super fast, intelligent library index that doesn't just look for keywords, but for semantic similarity for similar.

Speaker 1

19:00

Okay, so you've indexed all your chunked documents by their meaning. What happens when I ask a question?

Speaker 2

19:06

Right at runtime? When you ask something, your question is also turned into an embedding using the same model. The system then searches the vector database to find the chunks whose embeddings are closest, meaning most semantically similar to your questions embedding, So it.

Speaker 1

19:20

Finds the most relevant paragraphs based.

Speaker 2

19:22

On meaning exactly, and then those retrieved passages are fed to the LLM along with your original question with instructions like answer the user's question based only on this provided information. The LLM then synthesizes the final grounded answer.

Speaker 1

19:37

Wow, that's a lot of intricate steps chunking, embeddings, vector databases, retrieval, synthesis. So for you, the listener, who might be thinking, Okay, if the lms are so powerful, why not just ask the LLM directly, why bother with all this a rag stuff?

Speaker 2

19:52

That's a really crucial question, And the reason is simple control and reliability. Llms used on their own.

Speaker 1

19:59

Can hallucinate, hallucinate, make things up.

Speaker 2

20:01

Yeah, literally makeup facts, or provide outdated information because their training data isn't perfectly current. Remember they're trained on a massive general data set, but they don't inherently know the specific, up to the minute details of your company's policies or product features, and you have much less direct control over the answers they generate.

Speaker 1

20:20

Okay, So RAG fixes that.

Speaker 2

20:22

Eric directly addresses this. By grounding the LM's answers in your specific, verified, up to day documents, you ensure accuracy and reliability. You maintain control over the knowledge source. It's about combining that amazing generative power of the LLM with controlled, accurate, trustworthy knowledge.

Speaker 1

20:39

That makes perfect sense. It's like giving the LLM guardrails made of your own information.

Speaker 2

20:43

That's a great way to put it.

Speaker 1

20:44

Okay, So you've built this amazing bot. Maybe it uses our gay, maybe it has finally tuned intents, it's got guardrails. You're done right, set it and forget it.

Speaker 2

20:53

Oh, if only, wouldn't that be not? No? Conversational AI is definitely not static. It can't be. Why not, which leads us straight to the critical insight of continuous improvement. Think about it. User needs change constantly, business rules and policies evolve, new technologies like generative AI itself emerge, and better AI models become available. All the time.

Speaker 1

21:16

So if you don't keep up, the bot just gets worse over time.

Speaker 2

21:19

Absolutely, performance inevitably degrades if you don't actively maintain and improve it. The book calls it fighting entropy in a constantly changing environment. You have to keep investing just to stay level, let alone get better.

Speaker 1

21:31

Okay, So what does this essential continuous improvement cycle look like in practice? Is there a process?

Speaker 2

21:37

Yes, it's an iterative process, a constant loop of refinement. Really, you first measure the baseline performance of your system. Get your starting.

Speaker 1

21:44

Point, okay, establish the baseline.

Speaker 2

21:45

Then you identify a problem, what's not working well? And ideally you connect that problem directly to a business metric. Not just the bot is confused, but maybe too many calls about exer transferring to an agent or customer satisfaction scores are declining. For why reason?

Speaker 1

22:02

Make it concrete and business relevant?

Speaker 2

22:04

Got it exactly? Next, you devise a solution. What's your plan to fix it? Then you develop and deliver those changes. And crucially, the book advice is making small, iterative changes rather than huge, big bang updates. Right, small changes because they have a smaller blast zone. If something goes wrong, it's easier to roll back a small change that causes problems than a massive overhaul makes sense safer. And the last step, finally, you monitor and evaluate did the changes

22:32

actually deliver the improvements you expected? Did that metric you were tracking actually move in the right direction? Then you start the loop again measure, identify, solve, deliver, monitor.

Speaker 1

22:42

And it's all about connecting this technical work back to actual business value, isn't it You mentioned metrics. It can't just be tech jargon for its own.

Speaker 2

22:49

Sake, absolutely not. You have to speak the stakeholder's language. That means focusing on things like cost reduction.

Speaker 1

22:56

How does a better bought reduce costs several.

Speaker 2

22:59

Ways through containment that's completing calls or interactions without any human involvement, by reducing average handle time or AHT for the human agents who do get involved because maybe the bot gathered info better, or by reducing human touches which means fewer calls getting routed to the wrong place and needing another transfer.

Speaker 1

23:20

Okay, cost reduction? What else?

Speaker 2

23:21

The other big one is customer satisfaction or sees that you measure that with things like net Promoter score or MPs surveys, or by looking at metrics like timed resolution, How quickly did the customer get their issue solved or even reduce customer churn? Are fewer customers.

Speaker 1

23:37

Leaving you that example in the book about the medical ensurer though, that was fascinating. They improved the accuracy of their claim to night reason intent right.

Speaker 2

23:45

And it worked. It increased containment, more people got the answer from the bot, but their NPS scores actually dropped because the unhappy callers who previously would have just escalated to complain to a human were now self serving with the bot and then taking the post call survey and expressing their displeasure.

Speaker 1

24:03

Wow, so fixing one metric hurt another. What a nuanced problem.

Speaker 2

24:06

It's a perfect illustration that business goals can sometimes actually contradict each other. The real insight here is that you need to deeply understand those nuances and potential trade offs. Solving one problem might just uncover or even create another.

Speaker 1

24:21

One, which brings us beautifully to the user journey itself. Streamlining it reduced some complexity and trying to stop those dreaded opt outs. The book talks quite a bit about the pain of complexity.

Speaker 2

24:32

Yeah, complex conversations are just intimidating and confusing for users. They lead directly to friction, people having to retry things and ultimately abandonment giving up.

Speaker 1

24:41

Like that insurance company's voice system for checking claim Stata.

Speaker 2

24:44

Oh, that was a rough one. It had only a forty percent success rate forty percent, way so low because it required users to verbally input five separate pieces of numeric information before it would even start searching for the claim. Think about that provider I member ID, claim date, claim number.

Speaker 1

25:02

That sounds exhausting just listening to it. Who has all that ready?

Speaker 2

25:05

Exactly? It was grueling? And the insight here is that every single extra step, every additional piece of information you require from the user, creates another potential point where they just drop off.

Speaker 1

25:16

So how did they fix it? What was the impact?

Speaker 2

25:19

They simplified it by using the caller ID to potentially identify the member automatically and then only asking for I think four fields instead of five. They grastically improved the success rate. That relatively small simplification made a huge difference to the user experience and therefore the business outcome.

Speaker 1

25:36

So how can builders spot these overly complex dialogue flows in their own bots? Are there specific warning signs to look for? Anti patterns the book called them.

Speaker 2

25:46

Yeah, the book lists several good ones. Asking for information users are unlikely to have handy right then, having really rigid input requirements like it only accepts dates in one specific format, no flexibility.

Speaker 1

25:58

Yeah, that's annoying.

Speaker 2

25:59

Asking ambiguous questions where the user isn't sure what's being asked, Treating all users exactly the same, regardless of their context or history, Presenting too many options at once, choice overload, asking for information in a weird, disjointed order that doesn't feel natural, and a big one for voice delivering channel unsuited information, like trying to read out a really long complex URL over the phone. Clear signs of unnecessary complexity.

Speaker 1

26:25

Okay, so once you spot these complexity traps, how do you actively simplify the journey for the user?

Speaker 2

26:31

Well, A key strategy and a really powerful insight here is using contextual information personalize the experience. How So, based on things you might already know about the user, their location, maybe their time zone, the type of device they're using, their stated preferences, maybe even their past behavior or previous interactions.

Speaker 1

26:49

With you, like that banking chatbot example, MAX giving generic advice.

Speaker 2

26:53

Exactly MAX giving generic credit card advice to Emma, who's already a customer. That's just frustrating and unhelpful. But if Max new Emma's transaction history may be her current credit cards, it could give much more tailored relevant recommendations that saves her time, saves her effort, and makes the interaction feel actually valuable.

Speaker 1

27:14

That's a great application of using what you know what else helped simplify things.

Speaker 2

27:18

Slot filling is another really powerful technique.

Speaker 1

27:21

Slot filling like filling in blanks.

Speaker 2

27:23

Kind of, it allows the bot to capture multiple pieces of information from a single user utterance, letting it skip unnecessary follow up questions.

Speaker 1

27:31

Oh, I see like if I say.

Speaker 2

27:33

Like if you say I'd like to make a reservation for two people this Saturday at eight pm, A good bot can pull out the party size to the day Saturday and the time APM all from that one sentence. It doesn't need to ask how many people than what day? Than what time?

Speaker 1

27:48

Right, much smoother. The insight is really about efficiency. Then move the conversation forward.

Speaker 2

27:53

Faster, exactly get to the point without tedious back and forth.

Speaker 1

27:57

And allowing for flexibility too. I imagine users don't always say exactly what the bot expects them to say.

Speaker 2

28:02

Oh never, So you have design for multiple correct responses, especially with things like yes, no, or choice confusion.

Speaker 1

28:09

Like the example where the bot asks text message or phone.

Speaker 2

28:12

Call right and the user just says yes, or rigid bot might just say sorry, I didn't understand, but a flexible bot could perhaps infer that yes means text message if that's the first option or the more common preference, or at least ask a better clarifying question. The bot needs to try hard to meet the user where they are, not force them into a rigid script.

Speaker 1

28:34

Okay, so we've talked complexity, but why else do users just bail on chatbots, give up and leave? Besides getting confused?

Speaker 2

28:41

Well, there are several reasons, and the insight here often comes down to trust and managing expectations. Sometimes it's just prior poor experiences with other bots. They come in already skeptical.

Speaker 1

28:52

Fair enough scar tissue.

Speaker 2

28:53

Yeah. Other big reasons include the bot clearly not understanding them or asking them to rephrase too many times, feeling like they're stuck in a loop, or just not making progress towards their actual goal, or sometimes they simply don't like the answer the bot gives them. Or they had different expectations for what the bot could even do.

Speaker 1

29:12

Okay, so knowing why they leave, what are the best strategies to actually reduce those frustrating opt outs? How do we keep them engaged?

Speaker 2

29:20

The book suggests several key things. First off, great.

Speaker 1

29:24

Greetings the very beginning of the chat.

Speaker 2

29:26

Yeah, SE's the tone. A good greeting should affirm the bot's purpose, maybe briefly preview the journey ahead, set realistic expectations for completion time, and sometimes even incentivize the user.

Speaker 1

29:37

Like that utility company example where the bot started by saying something like this process should only take a few minutes exactly.

Speaker 2

29:44

That simple sentence manages expectations and gives the user a reason to stick with it. It's a small thing, but powerful.

Speaker 1

29:51

Okay, great greetings? What else?

Speaker 2

29:53

Second, try hard to understand, which goes back to what we discussed. Continuous investment in good relevant training data. Using conversational search or RV for those broader domains so you can handle more queries makes sense.

Speaker 1

30:07

Be better at understanding.

Speaker 2

30:08

Third, try hard to be understood, Use really clear, simple wording, allow for those flexible responses we talked about, and employ graceful error handling Instead of just saying retry, maybe try to disambiguate, offer a couple of options based on what you thought they meant.

Speaker 1

30:25

Okay, handle error is better.

Speaker 2

30:27

And fourth, fourth, implement smart opt out retention flows. If a user does ask for a human, don't just immediately transfer. Try to discover their true underlying goal. First, assess if the bot actually could help with that goal. Maybe try to convince the use of the bot can handle it effectively, or if not, at least route them intelligently to the next best action, maybe another specialized virtual agent or then finally, the right human team.

Speaker 1

30:53

So try to salvage the interaction if possible, or at least make the handoff smooth.

Speaker 2

30:56

Exactly, don't just dump them.

Speaker 1

30:58

And generative AI can actually help with crafting those messages themselves, right, making the bot sound less robotic and more helpful.

Speaker 2

31:07

Yes, absolutely, that's another powerful application. Llms can be great at rewriting potentially rude or unhelpful system error messages, transforming something blunt like you didn't provide a thirteen digit number into something much kinder and more instructive.

Speaker 1

31:21

Yes, off in the edges exactly.

Speaker 2

31:22

Yeah, and they can also help craft those greeting messages to be more helpful, more welcoming, and less like talking to a machine, making that initial interaction much less likely to cause an immediate opt out.

Speaker 1

31:33

Okay, this has been fantastic. Finally, let's touch on the human AI partnership. It seems like it's not just about building bots for humans, but increasingly about how AI, especially llms, can augment the human builders themselves. How do they become collaborators.

Speaker 2

31:48

That's a really exciting area. Lms are becoming incredible partners in the actual development process. They can help solve the cold start problem, you know, when you're building a brand new bot and have literally no real user data to start training with.

Speaker 1

32:01

Right where do you begin?

Speaker 2

32:02

Lms can generate realistic sounding initial training data, or they can help expand your existing data by filling gaps for maybe rare but really important intents like fraud reporting, where you might not have enough real world examples to train effectively. The insight here is that llms don't replace human creativity or oversight, but they can seriously supercharge the.

Speaker 1

32:24

Process so they can essentially generate data for both training and testing. That sounds like it could save a huge amount of manual effort.

Speaker 2

32:32

Oh. Absolutely, they can generate lists of synonyms for specific terms, different nouns for credentials maybe, or various verbs for lost or misremembered. But they can also generate entire utterances with lots of varied grammatical structures, statements, questions, even fragments or commands, just like real users.

Speaker 1

32:49

So instead of just forgot password, it might generate I can't remember my account information, or my account is locked, or help logging in.

Speaker 2

32:56

Precisely, this provides much more robust training data and also more realistic testing data. It helps reduce the inherent bias that might come from just a few humans righting examples, and it can significantly improve the classifiers accuracy beyond what manual efforts alone could likely achieve. It's about getting a truly comprehensive and diverse data set.

Speaker 1

33:17

And what about AI assisted process flows. That sounds almost like the AI is designing the conversation itself in a way.

Speaker 2

33:23

Yeah, it's about rapid prototyping and also hardening the system. Llams can actually suggest or even design entire process flows from scratch, Like you could ask it to design a flow for handling a medical insurance claim status check, and it might propose the steps, the questions to ask and

33:39

even justify its design choices. Wow. They can also help execute dialogue flows at runtime and some newer architectures, acting as the chatbot's brain, dynamically figuring out what questions to ask next, to collect the information needed for say in an API call. Okay, and testing and critically, for testing, they can simulate user inputs. They can generate all sorts of weird, una expected or edge case inputs that human

34:02

testers might never even think of. This really helps to harden the chatbot against unexpected interactions, making it far more robust when it meets real users.

Speaker 1

34:11

That's a massive timesaver and quality booster for developers. Okay, one last piece, those inevitable handoffs to human agents. What about summarization. It must be incredibly frustrating for human agent to have to read through a long, rambling chat log when they take over it really is.

Speaker 2

34:29

It wastes time and forces the customer to wait or repeat themselves. And the insight here is all about efficiency and improving the customer experience during that transfer. Agents don't need the entire transcript. They need a brief, targeted summary of what happened and what the.

Speaker 1

34:42

User needs, so llms can create that summary.

Speaker 2

34:45

Yes, yeah, llms are excellent at summarization. They can take a full chat transcript and condense it into concise pros or even better, sometimes they can extract key structure details like the user's ID, maybe a claim number, the specific issue, even a sentiment analysis score, and present that clearly to the agent.

Speaker 1

35:03

That sounds incredibly useful.

Speaker 2

35:04

It ensures a much smoother handoff, It saves the human agent valuable time letting them get straight to the issue, and it dramatically improves customer satisfaction because the customer doesn't have to frustratingly repeat everything they just told the bot.

Speaker 1

35:17

What a deep dive this has been. We've really explored how conversational AI works, haven't we, From those basic intent based bots all the way to the truly transformative power of generative AI and things like our rage.

Speaker 2

35:30

Yeah, and we've seen how thoughtful design, that constant cycle of continuous improvement, and this emerging strategic partnership between human builders and the lms themselves, how all that can overcome those common frustrations. We started with.

Speaker 1

35:44

Right streamlining complex processes and ultimately creating virtual assistance that feel genuinely understanding and actually helpful.

Speaker 2

35:51

And the real aha moment here for me at least, is that the ultimate goal isn't just about building a smarter machine, although they're getting incredibly intelligent, incredibly fast.

Speaker 1

36:02

It's more than that, I think.

Speaker 2

36:04

So it's really about designing an intuitive, maybe even empathetic, and definitely valuable experience for you, the user. It's about leveraging all this amazing technology to meet people where they are, truly understand their unique needs in that moment, and deliver real value.

Speaker 1

36:21

Whether that value comes from a quick answer, or helping with a complex transaction, or even just making that handoff to a human completely seamless.

Speaker 2

36:29

Exactly making the interaction feel effective and respectful of the user's time.

Speaker 1

36:34

So here's a final thought to leave you with. As conversational AI continues to evolve at this well breakneck speed, what new ethical considerations, what new design challenges will become paramount especially as that line between human and AI interaction gets increasingly blurred. How will our own expectations of what

36:51

understanding even means continue to shift. That's a big question, definitely something for all of us to maul over as these systems become more and more deeply ingrained in our daily lives.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript