¶ Introduction and Recap of Last Year's AI Polls
Hello and welcome to the Behavioral Design Podcast. This season we're diving into the intersection of behavioral science and AI. We want to make sense of the state of AI, from understanding how humans interact with intelligent systems to using AI to do behavioral design itself. I'm Aline Holsworth, a health tech advisor specializing in AI and product design. Over the past 15 years, I've been crafting human centered products with behavioral science
at the core. At Apple, I LED Behavioral Science for Health AI, designing and launching AI powered features to help users reach their health goals. And I'm Samuel Sultzer, your second Co host. I'm a behavioral strategist specializing in hybrid formation and designing products that drive long term baby change. I work with leading tech organizations integrating AI to scale behavioral design for good.
And I'm also the founder of Baby Bites, a dedicated community on behavioral science and AI. Quick word on Nuance Behavior where we help organizations build impactful digital products using behavioral design. We only take on a few clients at a time to ensure the highest level of quality for our tailored evidence based solutions. If you'd like to become one of our special projects, e-mail us at hello@nuancebehavior.com or we could call directly on our
website, nuancebehavior.com. Sam, Hey, Elaine, Do you remember way, way back in the end of last year, in late 2024, which in AI time is basically a decade ago? Yeah. Do you remember we organized a series of of LinkedIn polls? Do you know what I'm? Talking about, oh, I don't remember a lot of things from last year, but I do remember those polls. Yes, I do remember them. And we were really interested in using AI for different parts of the behavioural design process.
So if you think of the the process just very generally as like starting with discovery, so trying to aggregate information and understand the landscape of a key behaviour, identify that key behaviour, you know, basically like doing a lit review and understanding what's out there and then moving on to
diagnosis. So this often is, you know, some sort of behaviour mapping or a UX audit looking at the the user journey and so on. And then maybe building on those two phases to do some design, some behavioral design with the, you could be creating a product or you know, refining an existing product, something like that. And then finally ending in a testing phase.
And of course, we're sort of glossing over the fact that you can do testing in any of these phases, but let's just say we end with testing that sort of final product. So we were curious, given the advent of AI and how how we're seeing all kinds of different applications out there in the world, what do our friends and followers think it would be really well suited to replacement or the help from AI
tools. And so, you know, all the caveats apply with this is a, you know, a very unscientific, crowdsourced, convenient sample of like our, you know, not very many LinkedIn friends. Don't say that. I think we have pretty good reach on LinkedIn. OK, sure, but for science's sake, small samples this is these are not the many thousands that you would get on M jerk or prolific or whatever. So yes, we did these polls. It was more of AI think, impetus for conversation than anything
else. We really wanted to see like what are our friends intuitions in these areas and what did we find? Do you remember? No, but I'm not actually super intrigued to hear because it feels like in hindsight, that was like a really interesting time to ask this because at least for me, I would say working closely with AII, think around last fall around October, November, that's where a lot of AI models became kind of taking the next level leap into something that we're dealing
with today. And I, I feel like so much has happened since then. So I, I don't know, I'm super keen to hear actually what, what did people say? Well, and and honestly, many of the deep researchers that we have today, we're not even out when we ran this poll. So the land, the landscape looked very different then. However, I don't think that the, you know, this is just my, my hypothesis, but I don't think the opinions would be different if we asked people the same question now.
So the vast majority of our respondents said that the discovery phase, you know, like doing a inventory of the landscape and seeing what existing literature says, basically doing a late review that this is the best phase suited for outsourcing to AI tools. Does that sort of feel right to you?
Yeah, it is interesting. I, I think in some ways I agree with you that like probably people would say something similar today, but it's also interesting, like does that reflect how AI, what AI is capable of or is it also reflecting kind of like what people are afraid of it being capable of? Yeah, yeah, we're we're only talking about people's subjective opinions with all of their biases baked in, not the reality of the situation.
Well, I feel scarier that AI could be able to do the second and third steps, especially rather than the first step in the journey. Like it feels if AI could do a really good job around kind of better running interventions, designing them or understanding and experimenting like those is probably a little bit feels a little bit more closer to home as well. It might be reflected, you know, that AI might not be always as good and evolved in those areas
depending on the context. But I also do think it's it's a little more scary to to imagine for a lot of behavioral scientists. I think you're right that it is maybe threatening to one's identity as a behavioral scientist to think the later phases could be automated by AI. However, if you think about what are regularly touted strengths
¶ AI's Strengths in Literature Review
of at least generative AI, that's really consistent with the late review phase. So like scanning massive amounts of, you know, whatever it is data, in this case, it's research papers. So you just taking like thousands and thousands of papers and assessing them very quickly, summarizing research findings like how how great is ChatGPT at summarizing anything? Like excellent often does a really, really great job. And then extracting key
insights, right? Organizing based on themes, kind of grouping similar concepts together, doing this sort of thematic clustering and what you could call topic modeling, right? Kind of this like organizing and structuring and finding similarities between things. That's really what generative AI is really, really good at. And so you might think, OK, like, yeah, that that that should lead to a really, really excellent literature review.
Yeah, no, that is fair. I guess what is interesting as well though is that we are saying that, you know, I think it was just this week that there was this kind of like first paper in. I try to remember if it was in biology or if it was another science where basically it was AI who researched the literature review of it. Kind of like understanding the context, came up with a novel idea to test, like the hypothesis tested it wrote the paper and then said oh really for review.
So you know. I don't like that at all. No, it feels wrong, right? It feels wrong. And I think I, I guess we talked about it before on the podcast that like these days, I think it's as important to understand, you know, what they I can do as, as when it's most useful and, and important to to use it versus when to have human in the loop and and so on. And, and when, you know, it's stronger or weaker and and so on. But it's still interesting that that happens.
Yeah. I mean, just think of the poor editors at these journals who are volunteering their time, and now they're met with this deluge of new article submissions, these poor, poor academics. I know, I know. I've never felt more sympathy for academia. No, it is AI has both made a lot of people in academia their life easier in many ways in terms of what is helping people do, but also, you know, AI slop is, is a real thing.
And, and I think we see also a lot of research papers are used kind of put out in numbers that we've never seen before that yeah, it's probably not so kind of thoughtfully put together because they're used incentivize to like generate paper ideas or paper published. And so, yeah. But it sounded like in your example that there was truly a new insight that was generated through this synthesis and hypothesis generate. So like that seems quite good.
I think it that's kind of a step beyond what I was describing as the literature review capability of your sort of typical synthesis of what's out there. And it's sort of like predicting the the next step forward, identifying gaps, and then trying to answer those questions. You know, Google recently launched this, you know, Co scientist as a kind of initiative to support scientists who are doing good work and make their life easier with AI and
seeing AI as a collaborator. And I probably would say that for many of those areas is where AI serves a lot of value as a collaborator maybe rather than the one you want to give full freedom to. I do think, you know, it's easy to say that it's impossible for AI to be original. You know, it's based on historical data exponency. But I do think that also brings a little bit of a maybe not a accurate view of what AI is today.
I think it is moved beyond that. And I think it does kind of challenge some of those notions in that it is able to, obviously if given the autonomy and access to certain things, it can run experiments, it can do certain things. And you know, one of the things that a lot of labs are working on now is basically creating a environment that is set up
artificially. So basically like taking some form of real environment in the world and then setting that environment in a kind of an artificial setting and is mainly done now for like robotic training. So imagine you have to train a robot to walk and move and do some things with kind of the the idea that they have to understand how the gravity of the world works and how to catch
a ball and all of these things. But now I think NVIDIA has been quite proud to announce this, where they basically have been able to set up environments where these robots can be trained as kind of digital twins of themselves. Digital twins? Yeah, exactly. Yeah, you you have a virtual environment that's mapped to the real world. But yeah, you can't actually send your robots over throughout the entire world. Right. And I guess that that would be interesting to like what do you think?
Like wouldn't it be possible if you have better and better environments like that with better and better replicas of digital twins, if you give access to those environments to AI, like it will be more able to run and test experiments and do various things and potentially come up with some findings. You could say that, well, it's still resulting as they're like, well, what do we know about the real world?
Because it's been, you know, set up in a virtual lab setting, but it's still kind of at some point becomes uncomfortably similar to a physical lab setting where we still ask those questions. So I don't know what do you think? For me though, I wonder how much of it is coming to conclusions based on the new evidence that's been generated rather than really creating kind of a general mental model of the world and having this more generalized understanding.
I'm not going to speak for robots, but I am thinking of maybe synthetic users and doing more behavioral design kind of research. And the reason I ask this or the reason that I suspect that there's a difference here is that I've at least in the examples that I've seen from, you know, AI generated lit reviews and so on, is that there's this real like dearth and critical thinking. There's the ability to really kind of extract from what's
there. But moving beyond that, I'm pretty cynical about at least current ability to move beyond that in terms of this like novel insights and going to the like actually the next step that doesn't directly come out of what's already there. And this actually, as I was looking back at our LinkedIn polls, this came through this idea of like, you know, what's uniquely human. There were several categories that zero people chose as being well suited to be replaced by AI.
And those things were like stakeholder alignment, connecting business to behaviour, product strategy, basically all of these like larger systems thinking and critical thinking skills that we think really are uniquely at least at least our current conceptualization of what's uniquely human. That is kind of where we seem to be drawing the line. Yeah, that is interesting. I think another word for that in some ways is, is maybe a taste.
I think I hear that a lot in AI circles around kind of like, well, if AI can generate a lot of things, they can do a lot of things, that is great. But it still seems to be kind of often times if you have high expertise in something, you have also developed a high taste of what certain things, how they work and what works and what doesn't work in in making those kind of decisions is quite hard. And yeah, I do think that is maybe like similar to what you're saying, like it's kind of
a lack of taste. Exactly, yeah. We know immediately when we see something that is just absolutely wrong, whereas the AI may not. OK, so let's talk about how AI lit reviews actually perform. Yeah. We have how people think they may do, but what do we know about how they really do?
¶ Emerging AI Tools for Research
Yeah. So that's a really interesting thing. So basically I can give a little back story on this. So I noticed that there's been a lot of tools for a long time around, kind of like helping you research. And I've been using a lot of them myself in terms of, I think we talked about it many times around some of the early tools like illicit and, and some others, the research rabbit and so on that kind of like helped pull together scholar. Yeah, scholar.
So various tools that were kind of like made a little easier to search for research papers and compile them in ways that would take much longer before. But then end of last year, I think a big thing that came out was that Gemini, the Google shot based kind of model launched this deep research feature where basically they allowed for you to ask question and then it would scour the web and the somebody was like published,
researched. And it will basically provide you a report of like, here's what the science says. Here's the basically the answer to your question in the report form. And that was in some ways, I think for many, including myself, quite exciting. The idea of like, OK, I can put in a prompt. It would also like make sure that I'm clear. So would like follow up with some questions like, is this
what you mean? Did you mean that in what context and so on. So the record found the prompt and then it will go to work and you would see it. And the thing that gave a lot of people kind of this excitement feeling is that you see the sources kind of like tick up. So it started with like five sources and a 10/20. And in some cases like 5075, it's like it's, you see, it's kind of going through and scouring the web and you're kind of like anticipating and waiting for what will the report say,
Like what will it do? So I think that was quite exciting. But what I feel like is that we're living in a strange time where what used to take three years now take three months and what took three months take 3 weeks. And kind of the example of that is, you know, earlier this year, that was like the only tool for
that. There was no other tool that would like do that kind of call it agent based, sending out, searching the web and so on. Suddenly then Open AI launched a tool and they creatively named it the same thing. Deep research, yeah. Thanks. And and then, you know, a lot of others follow. There were Chinese DeepSeek, there was Kimi from China as well. And then some of the other like established tools did some versions of this as well.
And then we had Storm from, you know, an open source tool from Stanford. And so a lot of these tools kind of like began to promise similar things. And I saw a lot of hype online. Like we saw like we've talked about it. We saw a lot of hype and a lot of people basically declaring that OK, you don't need a PhD research anymore, you know, these tools can do the job and and so on. In.
Seconds in seconds. And so basically, I think what we were interested in is with nuance, we wanted to better communicate to our clients and the people we work with. But also I think we tried to give nuance like that's the whole goal, what we're trying to do.
And so we also thought about as an opportunity to be like, hey, let's look at these tools, let's compare them and see how well do they actually do. Like what can we actually say right now about how good are these tools at doing literature reviews? So that's a little bit background on on this project. Oh, so, so, so exciting. I mean, I, I, this feels like it's so needed because all I see is the hype. I don't see a systematic analysis that anyone has done looking through these different
tools. I just see like, Oh well, this person likes this one and this one. It's like I just want to get away from the hype and know what's real. Yeah, and honestly, I was kind of keen for that as well, because you're always in this current set of things. You're always feeling like there's some tool that you haven't tried. And there's always like, oh, what about this tool?
What about that tool? And you see some some screenshot and you see some LinkedIn post and each tool is promised to be like, you know, the, the best thing ever and so on.
¶ Evaluating AI Tools for Literature Reviews
So basically what we ended up doing is that we put that as a challenge, like could we kind of benchmark and test these tools? It was very much a group effort. You're involved. And basically I wanted new ones in some ways were involved. But special shout out to Josh Panjwani, who I think him and I were the ones kind of leading this. And what we started doing is basically looking at what are all tools that are promising this, basically trying to make sense of. How many are they?
Like, how many are out there? Are they US based, China based? You know, what's the landscape? And so we did that and then we basically narrowed down to Indiana 8 tools that we all had access to that we could use. Some free, some quite expensive and some you probably heard a lot, maybe some you haven't heard about. And we then basically went through and wanted to test five different criteria for basically how well can it do at creating an AI generated literature
review. So obvious things I wanted to see like what's the cost of using it? How much time does it take to to run this tool? Then importantly, how good is it? Is it, is it good both in terms of writing like doesn't provide a good written report or an output, but also citation quality? Like does it actually provide reliable good sided sources that is not hallucinated, but real and good from reputable journals and from also relevant sources?
And then lastly, given that all of these were generated based on the prompts, we want to see how good are these models at following the prompts. So within the prompts, we basically specified certain things like the specific research question we wanted to have answered, but also the year in question that we wanted to get research papers from, the citation style, APA style citations that we wanted to have
and and so on and so forth. So we wanted to see how good is it at following our instructions basically. And so, without giving too much away, how did they do? What did we find? Yeah. So we're currently compiling like the full report around this and it should be out very, very soon, probably very close to this podcast. But I can say quite a lot of interesting things already that we have seen. So where to start? I think 1 interesting thing is that a big worry for people is hallucinations.
Basically that if you ask AI something it's going to give you an answer whether it's true or not. And interesting findings. There was no account of any hallucinations in any report. None of the tools hallucinated in any way. What is your definition of a hallucination? So it's making up basically a citation that is not real, so it's giving you a. Citation OK specifically for citations. Exactly. Yeah. So it's a good, good thing to clarify.
So it might make the wrong conclusion some point, it might claim certain things, but it. Which I did see that. OK. Yeah. But it's not going to claim it based on a made-up citation. So again, that is not looking at their accuracy in terms of how well they're making inclusions and using citations. But it's like, at least we know citations that are used are real citations in all of the tools we we tested.
And so that should at least give some level of like, hey, we don't have to worry about at least these tools just making up citations if we use them, which is kind of a big finding in some ways. Right? Yeah. For all the tools that you found this, all of them, Yeah, that's great. Yeah, then when we look at how much time it took let a span of some tools were able to generate a literature review in less than a minute on average and some it took around 35 plus minutes.
So that was a span that they took, which is pretty big span, but also like 35 minutes is still relatively short. That's. Nothing. I mean this would take you so long otherwise, at least a day. Yeah, exactly. And I guess going into more
¶ Comparing Chinese and American AI Tools
details, I I'm kind of interested, what do you think did best shyness tools or American tools? Oh, come on. Best in what regard? In general, like in All in all categories like we, we can see a pretty overarching tendency around quality well. All right, despite my disappointment with the United States, I still have a America favoritism I would say. And yeah, I think probably the US based tools performs better. And yes, they did so. All right.
In recitation quality, in terms of writing quality, in terms of prompt accuracy and sensitivity, tools like Open AI, Deep Research, Geminis and the likes performed better than the likes of Kimi or DeepSeek on those things. What I would say that both Kimi and DeepSeek, which was the two Chinese models that we we looked at, they were extremely fast. So they were the ones that were basically quickest at generating things and both of them are free.
And so if you wanted to use, you know, either deep research from Open AI or from Gemini, it would cost you at least $20.00 per month or something, which is still run to the sheep, right. But Kimi and Dipstick are both free, but with a caveat that you might be giving away data to the Chinese government. So there's there's always a cost, right? Nothing is completely free. Yeah.
In all cases. Yeah. I mean, she's like, I guess to hear from you, what would you want to know because you don't know the results. So what is you're interested in terms of finding out?
¶ Evaluating Literature Review Outputs
So I'll, I'll just for context, say that I rated many of these generated lit reviews and was overall pretty disappointed. There was one that I read that I would actually use a single 11
output. Everything else I would like, you know, even if I rated it, I don't know, six or seven out of 10, I wouldn't use it. I would not it, it was of no value to me. Even if I, you know, could would say like, oh, it's organized OK, Or you know, the citations are, you know, relevant and has appropriate structure, etcetera. So what I want to know what's the one? What's the best? Tell me the one I should use? That's my big question. I know.
Well, interestingly to your point, what we saw is basically that across these three quality categories, so citation quality, writing quality and prompt sensitivity or accuracy, none of these performed best across the board. So there were some that were better in one or two categories, but then really bad in 1/3, for example. And so there's not one tool that is better than all of them
across the board. So as an example, we can say that like when it comes to citation quality and prompt accuracy, elicits and maybe Site AI were the ones that performed best there, but they're also some of the ones that performed the worst when it came to writing.
Yeah. So that is currently where we're at right now, where we're getting some tools that are better and better at specifically helping us make sense of the literature, but they might not be the ones that we want to rely on for also compiling the findings and writing and making sense of them. So I think in some ways this reinforces my kind of belief that I had before this, which basically these tools can be
¶ Critical Analysis and Human Oversight
extremely valuable today. And you will almost be fooled to completely disregard the tools that are out there, but you would also be fooled to completely rely on them for everything. And keeping human in the loop is basically still a really important factor. Yeah, I think my main take away from this is that take any of the outputs, it's like 110th of a literature review. It's not, you know, there are major gaps, complete misinterpretations of the data sometimes, which could be really
misleading. So there was one case in one of the better literature reviews that I reviewed, unfortunately, where correlation was completely confused with causation. And it described the impact of financial literacy on financial
well-being. And came to the conclusion that based on this association between, you know, people having higher financial literacy and better outcomes in financial well-being, that like this is an important area to focus interventions on. Well, like actually, we know from so, so much research that financial literacy interventions actually have so negligible, like a fraction of a person percentage of an impact on one's actual financial decisions and
their well-being. So, you know, without having the expertise to interpret the results that you're getting, I think it's really kind of dangerous in leading people towards the wrong conclusions. I agree and I would say we put a
lot of thought into that. The prompt we would give the models, and we purposely gave them probably a way beyond average quality of prompt in yourself details this instruction clarity around what we wanted and we would expect that in the wild most people use these tools with much worse prompts. Yeah, you're really setting it up for success. Yeah, and and even then, as you say, this happens, and I like to liken this to them, look back at like, what would humans do in this context?
What could we expect of humans? And like kind of what we're seeing now is what we expect from like maybe intern researchers where they don't really understand the thing they're being asked to do. And so they kind of make sense of it. They go out and look for findings to put together something.
But they don't have enough kind of taste or expertise or context to know actually what matters and, and what to neglect and just put everything together and put it into report and give it to be like here, here, here's what he asked for. And you're like, well, it's not really what I asked for. Like this is kind of a mismatch of a bunch of different things where kind of what I asked for, but not really.
So that is kind of what I would say as a tendency for a lot of these, and especially the ones, you know, like Open AI, Gemini, Both of them are also importantly relying on what they can find online, not what they can find in research papers, but what they can find in like research abstracts online. And so you know that whatever they tell you, they haven't read the methodology, they haven't read the sign of the study, they haven't read any of that. They've only read the abstract
of the study. That's a huge problem in itself.
And then if you take that a step further, they've not only not accessed that information, they've certainly not critically examined it. Like, if you look at, like, every single behavioral science lab or any kind of science lab itself, like a very common exercise is some sort of journal club where you will read an article and then assess, sit and say, like, OK, like, did you know, what is the validity with the internal and external validity of this paper?
And you analyze its methods. You say, what are the shortcomings? What's the sample size? You know, on and on and on. And every single paper that I've ever analyzed has something wrong with it. There's no perfect research study out there. And all of these literature review tools are completely glossing over that entirely. There's no critical analysis at all. It's just we're taking at face value whatever the abstract says that you found what you said you found. And so often that's not the
case. Yeah. And I think that's something that people can relate to in that lot of their models are trained to be kind of trust what we say to them. So like if we say that the Earth is flat, a lot of models traditionally would say, oh, yeah, sorry, I was wrong, it is flat. Now they are better. Like a lot of the models are a little bit more willing to push back, especially the reasoning models, they're a little bit more able to say, hey, actually, I don't know if are you joking with me?
Like they would actually kind of understand that it's probably not true if you say something completely out there. But again, a lot of these things are possible. If you don't have a critical eye, you can use gloss over whatever potential aspects that could be wrong and so on.
So so I agree with you. Like that's a huge issue still with any of this is that they used take everything at face value and basically compile everything as it was the same quality and and so on. And even if we'd instructed to say like, Hey, weight the ones with higher sample size more than the ones who don't, that's still not something that usually this ones are really good at doing.
And that's also like, that's one proxy, but it's, it's like what makes for good research papers is really massive to determine, right? Right. Yeah, yeah. What's the, what's the experimental design like? Is it an RCT? What? What were the actual conditions that were included? How? What comparisons were made? Yeah, yeah, there's there. It's hard to find rules of thumb for the quality of a paper. Yeah. Can I say something fun actually
around this? Yeah. So one of the two kind of prompts that we gave 11 prompt was around, as you said, like financial well-being and it's basically around UK population and younger adults. So that was the the thing. And it's really funny how sometimes it took some of those things, like it took either younger adults or UK and it just came up with completely like studies that had to do with something with well-being. Like, like how likely are, you know, people that are 18 to go
to the gym? And then based on based on this finding, we can say that like. No, yeah. Yeah. So it wasn't always great at some of those things. Yeah, and I noticed a lot of COVID era research was included across many of the reviews and and I just like that was such a unique time like looking it just feels a little non generalizable. Yeah, yeah. No, I agree. So maybe obviously we'll release all of this very soon and many
more things to talk about. But I think 1 interesting thing here that is somewhat sad I guess is that we don't know
¶ The Worst Performing Model
which model was the best, but we can definitely say which is the worst. Oh good. Tell me. It is the storm model by basically it's a Stanford lab and bummer, it's a bummer. I was looking forward to trying that one out. Yeah, yeah, I know once we tested it a little bit deeper and really evaluated how it performed across the various settings and and prompts, basically it wasn't able to perform very well across any of these things. And it was, yeah, by far the worst of all of the ones.
So the only one that's kind of like open source and free and American in this case, like it's both American open source and and free. That was the worst one and by margin so that like I would almost now having went through this, I would recommend people not to use coast arm. That's probably the only one I would like definitely say don't use it. It's just going to be mainly going through blog posts rather than research papers and oh. Wow, that's shocking. Yeah, that is shocking.
Yeah, bummer. It is a bummer. And they kind of also asked with many of these. The tricky thing that I think I'm most worried about still is that the people who are quote, UN quote buying literature reviews are often the ones that are not like really good at doing it themselves. Like they don't really know what a good one looks like. And so they could be like a lot of people that would say like, oh, actually, I used to hire a researcher to put together my literature reviews.
Now I'm going to just do them myself. I heard this tool Storm was out there and I'm just going to do it with storm and then storm generous report. It has a lot of citations, but you have to like click on the citations to see, oh, it's a blog posts, It's a this, it's a that. So if you squint it looks really legit because it has cited like 35. Sources. Wow, that's a shame.
¶ Introducing Nuance's AI Lab
So I guess this in some ways marks a first initiative that we're going to publish around something fun that we're doing with Nuance, which is basically we've been working with AI for a long time now. You know, even before Nuance, both you and I were involved in a lot of AI related initiatives and so on. But we're now going to specifically because we feel like as an example of this, there's a need for a little bit of thought leadership and help people navigating the state of AI.
We're going to actually have AAI Lab dedicated to specifically exploring these kind of questions and hopefully sharing some important findings around, you know, what works, what doesn't work, and hopefully as much of what tools to use and some of those things, but also like really helping make sense of when to use some of these things because it's becoming more of a not so much if AI can be useful, but more how it can be useful and when you should use it.
And so a lot of our initiatives will be looking at stuff like this, looking at kind of how we can automate aspects of our work, but also looking at AI adoption. How can we do that? Well, because that's not a problem at all, right? Everyone is doing the adoption. Perfectly, yeah. Not so that is also something we're going to be sharing more on and we're doing already some work on that with clients and yeah, and then many other
¶ Behavioral Redesign Series: Peloton Example
interesting things. And actually I wanted to shout out something that you recently posted. It's maybe not exactly related to our AI lab, but it is a really fun AI example and and also relates to something that is also something new for us, which is our behavioral redesign series. And so kind of reimagining products if they had a little more of a payroll spice for good or bad, I guess. Yeah, some are spicier than
others. Yeah. And and you shared one that I really loved this week where I think it was a little bit of a personal story as well behind that. Could you share the Peloton example that you redesigned? Oh, yeah. I mean, so this is something I think about a lot is the sort of flexibility and forgiveness of the various apps out there that are trying to help you reach some goal that you have. And, you know, you're committed to reaching that goal.
But also you're a human and life gets in the way and you have hard days, you have easier days. And on those hard days. I have found that unfortunately basically 0 apps have any tolerance or any acknowledgement that like you had a hard day, you didn't get any sleep last night. Let's maybe readjust your plan. Yeah.
And I think like my favorite example of that is, you know, Fitbit has been around for a long time now and it's been like kind of getting people to get their daily step counts in and famously gone. A lot of people crazy where they like, they realize before going to bed they haven't reached their step count and so they start running in. David Sedaris in particular. Exactly. Yeah, But that comes to mind as an example of like, we've been faced with this for a long time now.
And if it's like still, that is still kind of a burden given to a lot of users where it's just like, you know, you have a goal do it doesn't matter if context shifts, you're expected to do the same thing every day. Yeah, it's always a nudge to get you to achieve your goal, not to be kind to yourself and maybe make you more likely to reach your long term goal.
And so the, the idea with this Peloton design was, I mean, taking in your sleep data and really like presenting that to the user, acknowledging that you had a really hard night last night. I can see that, you know, you didn't get very much sleep. You were, you know, up three times at night feeding the baby. And, you know, maybe maybe Peloton doesn't know that you were feeding the baby, But I, I can add that personal context here. And then saying, like, let's adjust your plan.
I see that you were going to do a really intense hit workout this morning. Maybe we should swap a yoga instead. And that feels like something that you could actually accomplish, which maybe helps you stick to your goal in general of regularly exercising, but isn't so demotivate like there's just zero chance that you're going to do a hip workout in that state. So, you know, some sort of compromise and actively suggesting that as a kindness to the user.
Yeah. And I love that because I think we've often talked about the city of well, it's great to have an if time plan around kind of like, you know, if I get tired and I will do this thing instead or if I get sick, I will do that thing instead. But I think it's really hard for people, Like it's hard for me, it's hard for anyone to really have those backup if then plans always available or always thought out in advance. And so why shouldn't our
technology help us? Yeah. And I feel like what oftentimes I see, what we see probably a lot of people do is they're going to kind of a fuck it mode instead where they say this heat workout, I don't have energy for that fuck it. I'm just going to then do the complete opposite, you know? And so, yeah, I think giving people this kind of like, you know, if then plan provided for them and also like really making it easy to do something instead
of nothing. Yeah, I think it's key that this prompt was to substitute your HIT workout with a yoga. It wasn't like, you know, forget about your goal altogether. What, you know, while there is a time and place for that as well. And you can sort of see that in Marissa Sharif's research with
emergency reserves. Another way to sort of continue towards towards your long term goal is to just do something so you like feel like you've progressed and you're sort of psychologically consistent with your goal. Even if you didn't do the hard thing, you at least did something that's consistent. Yeah, I, I, I think this is such a obvious thing that needs to be in the world.
I mean, to know, you know, how this has been for you on a personal level, because I know, you know, you've recently had a baby and you I know you have a Peloton. So how have you dealt with this without Peloton supporting you? Like how has this been? Well, yeah. Well, so maybe I should say the reason that I created this design is because I think there's a very real need for it. I have gone the fuck it route and just not done anything but
this is my cry for help. Like I want to get back into it but it's really really hard. Yeah, yeah. I don't have a peloton, by the way. Yeah. That's fine. So I guess hopefully we'll see some changes around this and certainly a worthwhile nudge to the industry to reimagine how this user experience could look like. And yeah, I think this is one of many redesigns, I think. How many have we published so far? Oh my gosh, at least maybe 10:15 so far, and we've got a whole
lot on the way. Yeah, yeah. So check out I think in the links and then also we we can see on the Nuance website we list all of them, but also, you know, on LinkedIn we share them as well. So I guess I wanted to share a bit what was going on, some interesting findings, some interesting stuff. Yeah, lots of fun, exciting initiatives launching with the AI Lab and Behavioral Redesign series. So yeah, just stay tuned. And we also have a bunch of
¶ Podcast Highlights and Future Guests
really exciting guests coming up on the podcast soon. That is true and I'm really looking forward to it. And I think, honestly, what I felt is that this podcast has been really a therapy session every time we've recorded an episode, especially also because we've had so many, I think, incredible guests. And, you know, I've tried to make sense personally, you know, what is the state of AI? What is the state of behavioral science?
And it's been such a gift to, like, having recently talked to, like, Sandra Matz, Susan Murphy, like all these people who are really both pioneers but also leaders in a lot of aspects that we're kind of thinking about. And yeah, as you say, some more similar conversations coming up soon. But thanks again for everyone listening and supporting the podcast. Really appreciate all of the
yeah love we received as well. So that's a wrap up for today, but see you very soon for more explorations on all things label science and AI. And that's a wrap. You've been listening to the Behavioral design podcast brought to you by Habit Weekly and Nuanced Behavior. Sam and Alene tell me. This season is packed with incredible insights about behavioral design and AI, so be sure to subscribe and share the podcast with your friends. Though you might want to keep it away from your enemies.
In case you haven't noticed, I'm an AI voice. Yep, pretty crazy. Quite the improvement since last season's AI outro, don't you think? If you'd like to collaborate with us at Nuance Behavior, where we use behavioral design to craft digital products with Nuance, e-mail us at hello@nuancebehavior.com or book a call directly on our website, nuancebehavior.com.
A special thanks to the amazing Dave Pizarro for our show music and to Mei Chen Yap and April English for their help in producing and publishing this episode. Thanks again for tuning in. We'll be back soon with another exciting conversation where behavioral design and AI Intersect happens to. Megatroid. The.
