AI in Software Engineering: Productivity, Pitfalls, and Practical Advice with Stacy Cashmore

00:00

Hi everyone. My name is Patrick Akil and joining me today is Stacey Cashmore, AI Product Lead over at Omniplan and returning guests from about 3 1/2 years ago. We discussed what she's been building in AI features and how AI has made her more productive as software engineer. So enjoy. I feel like with AI, people are getting to the end result as fast as possible.

00:23

And I just had a conversation on, OK, this path that we're on with AI, some certain decisions have been made, certain people were involved, certain companies, certain hype. Is it now good for engineering or not what we have? And I was of the mindset of, OK, you can use it because the gap with regards to knowledge and figuring things out has gotten a lot smaller. So I feel like it, it can be

00:46

beneficial. But then the contrary opinion was, well, if I'm trying to generate something, I just get frustrated with the output and I just go trial and error basically, and I don't learn anything. Yeah. Oh. I I've been there with some stuff. I first really started using it because I needed to write something in Python because I couldn't get the C# libraries to work as I wanted them to. Fine AI supposed to be all about Python from everybody that you speak to.

01:13

So that's the. Best one, the Python route and I was using Claude and mixed with Claude and copilot. It never gave me an answer that was good enough, but it did always point me in a close enough direction that I could figure it out and I think that's the thing that scares me. Just saying you can use it as a junior developer. Junior developers learn. I wouldn't give it to a junior developer and expect them to be a senior.

01:45

I think it would actually go the other way because if you don't have the knowledge that you built up, then you're just going to take what it puts out and it's going to ruin something. So you really need to keep on top of it. I, I experienced this at the weekend. I was writing something, I couldn't find something online. I needed a URL for something, and it's like auth 0 is making it really hard to find. So I'm just going to ask Claude. Yeah. And Claude gave me the URL and

02:16

it looked correct. OK, Nothing worked. Oh, really? I was like, what the Hell's going on here? Yeah. So I spent probably an hour adding debugging all over my application trying to find out what went wrong. It gave me an under score instead of a dash part of the URL. I was like, of course that's a dash. But worse than that, it's like, fine, let's tell it that it's made a mistake. So in that same conversation, it's like you gave me the wrong URL. It should have been this.

02:50

I gave rid the correct URL. It's like, I don't understand. You've given me two URLs that are exactly the same, and I was like, no, they're definitely different. It's one of them has an under score, one of them has a dash. The URLs are the same. The one that I gave you is absolutely right. It needs to be a dash. It's like, yes, but you gave me an under score. And at that point it was like, I'm so sorry for the confusion. That's like, Yep. So that's like, that was on.

03:17

It was on me because I was just being a little bit lazy. And I could have found the problem the solution with about 5 minutes worth of extra searching to get the right URL, but I was feeling lazy and I wanted to move on to something else and it cost me about an hour.

03:35

It's humbling. It's very humbling when that happens, and I think that people need to admit it when that happens because you hear all the hype and you hear that this is such a phenomenal thing and it's going to make us all so much better. But if you're not critical enough, then it's going to hit you really hard. Yeah, exactly. I mean, I I see it being and I haven't been in the seat, but I would love to try being very

04:06

powerful for early projects. Basically everything Greenfield, everything we need to get something up and running from scratch. Incredibly powerful because you need many, many lines to get something small done usually, and that's what it's good at, it's generating lines. It's great for that. I think as long as you're doing that and you're using it as a proof of concept and not a production system, then I think that is perfect.

04:32

You can use it to make a very quick proof of concept, something to just prove your point. Claude makes great interfaces. So if you have an idea for an interface, well what if we did this? It can make you an interface in like 2 minutes. Like a using React. It can write and display React. So it will write C#, but it won't display a Razor component, but it will display a React component.

05:00

So you can actually iterate and then it gives you the code and then you can improve that code and take the bits that you need to make it work. That I find really powerful. But the code that it gives you, if you use that off of the bat, it's not going to be maintainable code because I read it then I generally like refactor at least half of it, make it into something that is going to last a longer time. You love yourself. Yeah.

05:36

OK, manually. I I do that manually because I rested in the past to do it and it didn't. Even when you're trying to get the perfect prompt, say I spent like 10 minutes trying to get the perfect prompt to could refactor this in 5 minutes. It's just easier to use Rider or Resarper and use their extract methods, renaming, all that kind of stuff. You can take that code and you can improve it in half the time that you can get the system to improve it. So yeah, it it, it's nice, it

06:09

works. It does help your productivity, but only if you use it right and only if you recognise that you are going down the wrong Rd. early enough. Gotcha. I feel like I've had conversations with people and usually the thought that pops up in my head is kind of you're playing a game and you have this checkpoint system where you make a safe state where you're happy with whatever is generated and you either manually change a few things, but you, you continuously make little

06:40

checkpoints for yourself. Yeah, in essence, those are commits. Yeah, but I have seen people and also from a start up phase go very far with regards to vibe coding, self interjecting, a little bit of refactoring, a little bit of peer review and then making a full-fledged

06:54

production system out of this. I have not seen any example to be honest, where there's strong engineering culture in an existing organization, strong conventions where vibe coding then is also very effective because somehow it doesn't adhere to the conventions that you have within your particular organization. It adheres to the general developer on the Internet which does not understand your code base. Yeah, it it.

07:17

It's also not necessarily great to be the general developer on the Internet. It's the code is taken from so many GitHub repositories that may or may not be maintained. They may or may not be made by people who understand what they're building. They might be experiments or tests or learning repos. And I think that can be the issue. It's the code that I've seen it

07:51

generate. Personally, yes, I could probably make something vibe coding with it, but only if I didn't care about fixing bugs or expanding it in the future. Gotcha. Yeah, let's zoom into some of the AI projects that you've been working on because I I want to get into productivity and I want to get into kind of results from a project based. So let's start there. What have you been building with regards to AI as well?

08:20

The first thing that we built was a conversation summariser because basically I managed to get the role of product lead AI. I'm a huge AI skeptic, which I thought was hilarious. Speaking to the guy that's been helping us, he just said no. Being an AI skeptic is awesome because you're not just going to go down the this is cool path. You're going to think about what you can use it for. And that that's what we were

08:55

doing. So we had a meeting like we don't just want to do a really bad chat bot, which is obviously what everybody, oh, you've got to put a chat bot on your side and we need customers to just be able to chat with a chat bot and get a mortgage. I like, have you ever tried that? It's chat bots are not designed for that type of interaction because everything here is known. So I didn't want to go down that

09:23

street with the first project. And we had three ideas that I gave a presentation on last year at the VIP Congress for the mortgage market. There was a chat summariser, there was the chat bot because. It has to be there. To have a chat bot. Of course. It's I, I really hate to given that part of the presentation, but it needed to be there. And I think we had another one for e-mail generation. Gotcha.

09:57

When you've had advice from an advisor, we're expecting people to want the second one, but apparently the conversations that the team had with people who attended said the chat summarizer sounds awesome because we spent so long making notes that this would be such a boon for us. It's like awesome, allowing the fact that if you're online, most software these days comes with a transcription.

10:27

We needed to see what value add we can put on top of that because there's no point in doing something if they say, well, like get it over here for free or in an existing thing. So what are you bringing to the table? So what we wanted was a conversation that you can start inside of a customer's dossier so that everything is recorded in context with that customer. So you don't have to switch between applications if you ever want to get the transcript, you can all do it inside of 1 system.

11:00

So no context switching. And also at the end of the conversation, make a financial summary of what was in the conversation. So not just a, a baseline, tell me what was spoken about, but really from the point of view of the financial world, what do we need out of this conversation? We managed to build that in 3 1/2 months, which I was really pretty. I have a tiny team. It's it's me and I don't work many hours and I'm stretched

11:32

thin. I have a back end developer working with me who is just superb, compliments me perfectly because he really wants to put the cross the TS and dot the IS which I want to do. But I also know where my head is most of the time. So we complicate. We complement each other really well there and when I need it, I have a front tender to help us and three people 3 1/2 months getting our first product live I thought was really quite cool. It's live now. It's live now.

12:03

Awesome. So we managed to get that one live and now we're working on a number of things. So taking what we learnt from the implementation of that, we have some longer term things that we need to build that we know we can't roll out quick releases because it's not going to have value until there's a system in place. But we don't want to stand still because you can't stand still in

12:29

this market. So we're trying once a month to release extra functionality so that we can carry on doing the cadence and it stays fun to program because you're always doing something new as well as the long one where you're not going to get that dopamine until later on in the year. So we just released the second piece of functionality and that is generating emails based on that conversation.

12:56

So you can open the conversation, you can generate an e-mail and you can style it to the customer that you're talking to. So some customers want something that is really I forgot the English to hankluk accessible and they they just need an accessible short description because that is where their knowledge of the financial world is, which is a fine thing. It's you don't need to have that knowledge. Why do you? You can also make it more formal.

13:31

You know, some people prefer more formal communication. Some people prefer more friendly communication. Because of my history and the things that I've built, I have a reasonable understanding of what I'm doing. So I want some of the technical details in there because I know what I need to be looking out for.

13:50

So being able to do this just in a couple of seconds for your conversation tailored to your the needs of your client, and then being able to edit it for anything that you think, OK, that's this can be better worded. Yeah. So you can save a lot of time in sending these emails that way. Awesome. So that went live this week. That's really nice. That's very recent, actually very fresh, yes. Yeah. So it's we really started in

14:17

November trying to do this. We talked about it before then and November was when the really big push started and we've kind of learnt that doing an amount of experimentation before you start building saves you so much time. Interesting experimentation. To what degree? Like to make sure whatever you do is achievable, or the format in which you're trying to achieve it, or. A little bit of both. So the way that did, we did the experimentation in the first month, we ignored all of our

14:52

deploy things. It's like, no, first of all we want to know how we can get data out of the two or three models that we have to interact with in order to get our summary, because it both summarises die rises, so you know who is speaking at what time. And then it does a summary at the end. I was like, OK, so how are we going to make sure that we can do this in a viable way? So we started off with just a console application. You just gave it a link to a web file, a video file, whatever we

15:29

had available. And we could run through each of the steps and check that we were getting the data out that we needed for each of those steps for the LLM, for the summary, because that's the one where you really need to prompt the other two, it's solved problems. We can just interact with existing infrastructure. So we use Azure Speech Services for that, because why should we struggle when Microsoft's

15:53

already done it for us? For the summarisation, that's where the LLM and your system prompt really come into play. So we must have spent a couple of weeks experimenting, trying to find a system prompt that was good enough to get the information that we needed, but also produced good enough results without hallucinations. Because it is financial data. The last thing that you want is hallucinating something in the conversation that can throw off anything in the future.

16:29

Yeah, of course. We also learnt the hard way for that. Our original idea was when we have this summary, it has all of the financial subjects belonging to that conversation. So your job, your salary, your mortgage, what loans you have, what savings you have. We have all of this information. We can take it out of that conversation and we can pre fill our data model with it and that would be awesome. We gave up on that after about half a day. Oh, that's quick.

17:00

Because we saw the dead end that it was walking into, the amount of context that you need in order to correctly translate these things is huge. So we set up a very quick demo we gave that was with Claude, by the way, that was actually in the Claude browser. We had a fake conversation, recorded it, transcribed it, gave the transcription to Claude and Claude made a fake. I think we just started with a job, so a job financial subject form and it got most things

17:40

wrong because. From the conversation. From the conversation, yeah. When you're talking about a job, you don't always talk in the same context. So do you earn 3000 a month? Does that include a 13th month? Does that include vacation money? Is that 4 weeks or is it per month? And these things were just so hard to get correct. And the worst, the worst one that we tried with it during that morning was actually it. We had a conversation with I think 5000 was a salary new 15000A year.

18:23

I was like, OK, how are we going to get it to fix this right? Yeah. This isn't supplying the value that we want it to. It's going to take us too long to fight this. There are better ways of getting information into a dossier of a person. We have source data in the financial world now. We have companies like Octo where you can log in with your Digi ID and you can give documents and information to

18:51

your financial advisor. So trying to guess it from a conversation, we realise that that was going to take more work and more checks by the advisor then it was going to save. So we struck that one off. It's we didn't run down that alley too far because we could see that it it wasn't going to supply value. And if it's not supplying value, what's the point in stressing over it?

19:18

Even something as simple as this, like extracting information from a conversation and then the topic of it being financial loan based, I was like, yeah, that should definitely be doable, right? But that's a huge assumption.

19:30

And if you take that and if you run with it and in the end you come to the same result that you came in half a day, then that's a problem because you already have so much sunken cost in all we've been trying to do this, we've designed the user interface like this is all in basically that you would then

19:44

get into this cycle. My assumption it's the cycle of trying to fix it. Continuously you get into the sunken cost fallacy, and I am sure before people put it in the comments, I am sure that it is possible. It's definitely doable. But the question is what is the cost going to be versus the value you can extract from it? Because this is only valuable if people are prepared to pay for it. How much are they going to pay you a month to use this? Is that going to cover your

20:14

development cost? Is it going to cover your LLM cost? That's the other big one here, that people. Do you want to think about that? Three. No, we figured it out for our conversation summary. It costs about €1.50 per hour. OK, using Microsoft's resources. Yeah, sometimes more, sometimes less. And that's a spoken conversation. That's a. Spoken conversation. That's a lot, to be honest. That it, it sounds like 150. Well, people are going to pay

20:48

150 for their transcription. But if you're having three calls a day, five days a week, then you're up to 15 per week, you're up to 60 per month. You need to cover your development costs, your overheads. Are you going to be able to get somebody to pay an extra €100 a month for this? So that's where you have to start figuring out, OK, at what point can we say this bit doesn't provide value and we can approach it a different way and save money for us and the client.

21:25

And that's a lot of what we've been doing as we've been working on this. Gotcha. Now with the features that you've put into production, you explained to one that went live very recently and the conversation analysis that you've done, how do you measure then success? Is it by how many times people go in and manually change something? Is it by costs with regards to the output and the outcomes? We are measuring it on a couple of ways at the moment.

21:49

It's because we're in total experimentation. We don't have anything solid in place that we're going to have by the end of this year. So our original idea is how many people are going to be using it. It's currently in beta. We're giving people free access whilst it's in beta so that we can see what they think to it. And we have asked for feedback from our customers on top of that. Yeah, there's the cost. How much is this going to cost in the real world, in a real life situation?

22:20

Are there places that we're going to see that we do need to improve this with regards to making it affordable? And we're working on that, and we're improving it all of the time so that we can, yeah, get the best value proposition out into the world. But it's all things that you really have to take into account when you're starting this. We started, I started pessimistically.

22:47

And yet still at the end of that first brainstorming session, we had such ideas which very quickly proved to be non starters based on what we could see coming out of it. And I think that's something that a lot of people miss. Yeah, Yeah, it might be in the the market that you are as well.

23:06

Like the financial domain things are very, very much they end up being perception based like as a customer if the information if, if the information is factually incorrect, then we just had a dialogue and I gave you all the correct information. Like that can be killer with regards to brand and. Recognition, something that we spent a lot of time on. It's we we would rather that we say that there is information that we can't give, then we give incorrect information.

23:33

That's the worst. Incorrect. Yeah, it's better to say Nope, sorry, than yeah, give something wrong. This can have it can have serious consequences if the advisor takes it again at face value. You're using AI, don't take anything at face value. So the advisors still need to be in control here. Gotcha. But if you continually give them bad stuff, then they're not going to trust it. They're not going to use it.

24:03

Yeah, that's it then. How did you measure then what the accuracy of your system prompt was? Because you mentioned OK this for us, getting the minute details of a system prompt correctly was actually a decent chunk of work. It was a decent chunk of work. We measured it. We, we have an amount of conversations recorded by our financial advisors because our product owners and consultants are all qualified financial

24:35

advisors. So they knew what the conversation should be. So we had a number of different conversations that we could feed through the system. My team can look at those and pretty much figure out if it's right or not. But we also gave it back to the financial advisors. So is this what you would expect out of this conversation? And we tuned it based on that. Got you. Yeah.

25:01

Are you? Because I'm working now also in the financial domain, we're trying to make people more productive by taking something manual they're doing and then generating it through AI, kind of pre filling it, keeping a human in the loop for validation. And one of the benefits that we have is that this process is already 2 years old, which means we have two years worth of historical data. So we know the PDFs, we know the outcomes and now we put an out of them in between.

25:26

So we have a very easy way of measuring if what the output of our prompts are equal to what the historical information was that was actually captured. Like that's how we are doing kind of the validation and analysis. And I really like that from a surface level, sounds very good. We have ground truth. We can work with that. The problem there was that we have the human factor of error is also in there and they're not manually corrected.

25:52

So if we're going to fine tune towards 100% with regards to our prompts, then the same human error factor will be in there. So we actually had to go into, OK, what are the documents that are valid qualitatively with analysts being like, OK, this is valid and this was actually incorrect and we should probably make an amendment for this. Yeah, yeah. But did you have something similar with regards to historical information or was this very much qualitatively

26:17

done? This was, we didn't have access to historical information for this. Yeah. It's we don't have access to our clients data obviously because you shouldn't be bad. So we we don't have access to that data to run against. So we are running against our own tests and that's the point now putting it into beta for the financial advisors so that they can try this and they can get back to us and we will ask them proactively for feedback on what they the issues that they're

26:52

having. We've already seen one issue which we managed to solve that was really odd, but that's that's why you have things in beta and the people using it can then also accept that. OK, it's 99%. It's certainly not 100%. You need to be careful, but it is still saving them time. So we're happy. How was it with regards to their adoption? Because you're mentioning, OK, they're happy now.

27:20

Were they happy from the start? Were they kind of fighting against this change or what was that adoption curve like? The adoption curve, yeah, it was very low. OK. Because yeah, it it is AI coming in to do stuff. And do you trust it? So we have a lot of communication that we are building up regarding this functionality. Gotcha. Yeah. So we're we're going to start to do a video series on it as well. Oh, that's nice. Because telling somebody about

27:52

something is one thing. Showing them it in action is something entirely different. Yeah, exactly. So that's, yeah, that's what we're currently working on. It's a little bit of content creation just to just to make the job more interesting and a little bit different for a while, so. Yeah, it's I like that a lot actually. Like it's something I didn't

28:12

necessarily realize, right. You're building something to make people more productive, but the only way they are going to be more productive is if they actually embrace whatever you've built regardless on if functionally it does the right thing. Like especially with something so delicate as generating the e-mail conversation, like the content of things, the intent might be exactly the same but

28:32

the phrasing can be different. The nuance and that's all like personal either preference or perspective. That's so personal. That's the reason why for the e-mail generation, we wanted to give them the option to use different terms of voice. Yeah, I like. That I imagine at some point we're going to be able to give them the option to upload their style of writing because that would be perfect for it. But what we're trying to focus on is not the things that make their job fun.

29:01

And I have the same as a developer. I don't want AI to do the bits that make my job fun. I want AI to do the bits I don't like doing. So I will use it to scaffold. I will use it to, I used it at one point to transfer a load of things from JavaScript to C#. And it's great because you can give it a nice prompt, it would do that work and it saves you 2 weeks worth of boring typing. And that's not interesting to me. Fighting the new functionality that is interesting to me.

29:38

And I wanted to do the same thing for our customers. I want to make sure that we are not taking away the bits that make their job enjoyable. I want to take a bathe the bits of their job that detracts from what they're supposed to be doing. That is incredibly smart.

29:53

Like if there's a good use case where there's a lot of time spent on something that people really don't enjoy, and they want to actually spend their time on the tiny bit that their time is now going towards that they really enjoy, then that's the perfect use case. Yeah. And we're trying to do the next step of what we're doing at the moment is taking that also to their clients.

30:14

So it's when you buy a house, one of the most annoying things I found was uploading all of my documents because it's not enjoyable. You've got to find everything. You've got to upload everything. Make sure you're uploading the right file for what you're trying to do. And it, it is cognitively a high load, but it's also very boring,

30:40

which is not a good combination. So what we want to do is allow people to just see that they have to upload 6-7 documents, however many it is. And all they have to do is just grab the right documents and just upload them. They don't need to say what these documents are, they just need to upload them. And then we can recognise what documents they are. Then we can validate them.

31:03

Because once you know what type of a document it is, you don't need AI for the validation them because you can just get the values. And that is a prescribed process. Now you don't need any generative process in there. In fact you don't want any generative process in there that is an explicit, if it is a different number to this, then you know it's not right. But the idea that we have there is to reduce the lead time in a

31:33

mortgage. So I know that for our documents, that was an extra week added to the process because the advisor has to send you the request, you have to upload everything that then has to get validated. If there's any more documents needed, then you have to go through the same process again. And what we want to do is allow customers to just upload their documents, have them instantly

31:58

validated. So they're going to be told straight away this document you've uploaded is invalid, Explain why it's invalid and then let them upload the correct 1 so that instant feedback saves one or two days. And if we can make it easier for the client to upload it, do the pre validation so that the advisor, they have to be in control and they have to approve it because again, financial data, you don't want unknown things being approved.

32:31

But it goes from them having to go through the document, extract the data themselves to a this is a document, this is the information in it. Do you agree? So they can go from an hour's process to 5 minutes. And again, as far as I'm aware, that's not the bit that makes their job fun, checking documents and copying over numbers. They are more into the how can I get the right mortgage for this person? How can I help this person achieve their financial dreams?

33:03

And yeah, so get rid of the boring stuff and let them focus more on that. Exactly, Something you mentioned kind of stuck with me with the regards to, OK, we have a document, we need to retrieve the values of this. A lot of people go rag, at least that's what I've seen. Right. Take all the contents of something, give it into an LLM model and then generate the output. But you mentioned specifically you don't want to generate, which, yeah, in essence I agree

33:28

with, right? It's black and white. Either it's in there or it's not. So you can also retrieve it. Yeah, I would say if the. This is one of the things that I'm going to say. I'm not sure I'm allowed to say what we do. OK, that's OK. Yeah. But I was wondering though why people are going the generative route. Like I understand it, if documents are different and if they're all images then it becomes more complex, but if indeed the format is the same,

33:53

that's a solved problem. Then the issue is formatting is not always the same gotcha that that I can say it's. There are very few documents that have fixed formatting in the financial world, so ID is an awesome one. More than that, there are models on the market that you can put ID into and it specifically looks for ID information, so that is perfect. But for something like a salary slip, I have not worked for a single company that has a salary slip in the same format. That is a big issue.

34:34

It always has the same information, but it doesn't have it in the same format. So that that is what makes that bit quite tough and difficult. And the same for the one that we had. That was really interesting because it really showed the the edge cases that make or break something employers intent which you need to buy a house. Your employer has to say, yes, we're going to carry on

35:01

employing this person. When asked about that, financial advisors came back to us. Our internal ones said, well, there is a form that is used nearly always. So that one, we know exactly what it's going to look like. I've never used that form. Well, yeah, it's, it's 90 to 95% used. It's like that. It's like you've just said that form is not valid for all of them. So unless you're going to say we only support this form, it's not valid to say that this is what

35:34

it has to look like. So you you have to catch all of those edge cases, even when it looks like it's something that's really simple. I mean, that's in the end, that's programming. Like that's that's engineering. It's like these are hypotheticals, but I have to cover them. Yeah. Like that's the thing here. Yeah. I want to get into more of a day-to-day.

35:55

What's in your tool belt nowadays, because you mentioned Claude, you mentioned copilot, what do you use to make yourself more productive or what is being used generally within your team? Within the team, we have a choice out of copilot or Clod so people can use the one that makes them what they're most comfortable with and what helps them the most, or they can use both. You don't have to be limited to one or another. copilot is better at some things, Clod is

36:21

better at other things. We had a training morning or training day where we had somebody come in, help us how to prompt, and then we had an exercise in the afternoon of pick something that you need to do and try and just do it with AI to try and get people to really see the value that it can bring. This is also a really tough process because people have their ways of working. As much as humans are all about change, humans hate change. Hate. Yeah, yeah, I fully agree.

37:04

And so it was a little bit of just trying to show people what was possible. And we have a nice uptake, I think not as good as it could be, but we have a nice uptake and people are starting to learn how to use it and starting to learn that you can use it for more things than you originally think. Yeah, so except I will use it for when I got started in Python, I did the absolute vibe coding of I need to do this. Yeah, give me and it gave me a

37:35

Python script. It's like, OK, so that's Python And I ran it and it failed and it's like, OK, failed with this error. And I got it to the point where it worked. And then I looked at the structure and tried to learn a little bit of Python whilst I was doing it. And from that point on, it was kind of like I would ask it about AP is as much as anything else. It got it right 50% of the time and it saved me time because it got it right 50% of the time.

38:01

And the other time it got me close enough that I could track down the issue myself. So that's one way. Obviously the coding weighs an obvious one, but we also use it for brainstorming. Something that I love about using generative AI is helping you think through a problem. So rather than going and asking it explicitly for something, I tell it to ask me. So I have this problem, I'm trying to figure out how to solve it.

38:36

Interview me and ask me questions and tell me or help me figure out what I need to do. At the end of the day, the AI doesn't tell me anything, it just helps me think a problem through. So it's a rubber duck. That one I have found super useful. I helped it. I had to go to the doctors for one of my medical issues and I needed to get all of my symptoms down on paper because of the way my brain works. That was really hard because I always forget something.

39:14

So I got it to interview me and it never put any symptoms in itself, but in asking me questions I could tell it scenarios, I could tell it things that were happening. And at the end of the day I had a three page Word document that I could take to my doctor with all of my symptoms organized by criteria. Sodium, this is neuro, this is gastro, this is musculoskeletal. And the doctor is like, I apologise to the doctor saying I hope you don't mind me bringing this. So this is awesome.

39:54

Can I have it in electronic form so I could send in the Word document? And it was so much easier for him to then gather my symptoms. So using it for that type of thing was a wonderful experience. I also used it to buy my new car. Oh, really? Yes. I was like I I couldn't figure out which one to get so I told it the two options and then told it to interview me about what I

40:20

want from a car. Fun thing is I brought the opposite car to the one that it told me to buy, but because I went through the process of doing it, rather than it just going round and round in my head, I could actually lay it down piece by piece and it helped me realise which one I wanted. Interesting. So there's that use of AI that I think people haven't quite got around yet that can really improve your not only improve your productivity, but also allow you to think things

40:54

through more deeply. That's what I'm thinking as well, yeah. Like it's a, it's a way for you to bounce ideas off of something and it's, it's very strange because it could just be you on the other side. In essence, because there is nothing, it just generates kind of statistically whatever sentence structure is the outcome here. And it loves generating. So you also have to limit it. But if you use it for self reflection, I, I don't see how that could be anything other

41:19

than being beneficial. And especially in something in engineering, you just need to kind of get out of your own head to a certain degree. So to then talk through a problem with something as kind of a mirror is, has always been helpful. That's why we have the concept of Robert Ducky in the 1st place. So then using this for that I think is genius. Yeah, it is a really nice use case.

41:41

I think it's a slight tangent, but one of the things that you said there reminded me of something that is so important to remember about AI, the fact that it's predicting statistically. Yeah, something that a lot of people really don't understand about AI. Even if the model is called a reasoning model, it is still a prediction engine. It will do cycles of prediction which allow it to give a more accurate answer, but it is still a prediction engine. This thing is not thinking, it

42:14

is not reasoning. It is predicting text, taking that as extra input and using that with your prompt in order to try and drill down and get to the correct answer. If you ever use Claude, turn on reflection and you can see the air quote thought processes that Claude goes through to get to your answer. And you can also see how it switches as it produces results. And that triggers it to think, OK, this is what they're saying. And it allows you to have much more fluent conversations with it.

42:52

But it's fun watching that. OK, you predict one thing and then it goes back to like, Nope, that wasn't the right thing to do. Yeah, exactly. Yeah. I mean, in essence, like I, I understand that in the end it's just predictions, stuff like the OK, I'm going to toggle and I can see reasoning that doesn't help because in the end it's still statistics.

43:11

But the argument is always, OK, we're going to do statistics so well to a point where it's going to equal reasoning, but in the end, it's still a prediction model. Yeah, yeah. I I think that the thing that humans still have as a huge advantage is our ability to think laterally. The fact that you can just go from one topic to something completely different in a way that works in the human mind that can really help you move forward with a problem is still something that I don't know if

43:51

they will get to the point. Yeah, I think we're a lot further along than I thought we would get by this point. We're going fast, very fast. When I was at university in the 90s, I did a semester on AI. As part of my course. We had an AI professor. He was involved in so many projects and showed us the things that it was being used

44:17

for in awesome use cases. So whether or not that was facial recognition for security systems or I believe it was Germany, you might need to Fact Check this, but I think it was Germany for their breast cancer screening in the 90s. They didn't like using mammograms because you had to use X-rays and X-rays are bad, so they used ultrasounds. Ultrasounds are much harder to read than X-rays.

44:50

So they trained a model to read the ultrasounds and then that gave extra input to the doctors and false positives were filtered out by doctors. But they found cancers in a way earlier stage than they would have done. And that was back in the 90s. So our professor was absolutely not anti AI. He he was full on with this. But he said something that has stuck with me since the is it AGI? Yeah, is probably never going to

45:33

happen. And the example that he gave to us of why you cannot make a thinking engine is because we have qualia in our thoughts. I never heard the word before that time and I've never heard it since. But it is things which are unexplainable. So if I look at the bottle cap that's in the middle of the table, it's blue. You also know that it's blue, but you don't know that I'm seeing the same thing as you. I I've had this conversation also because I'm partially

46:05

colour blind. Like it's, it's very strange for me to be like, what colours do you see? Yeah, yeah, I can appreciate that one and think, what does a strawberry taste like? Yeah, describe the taste of a strawberry using something that somebody else can have an absolute concrete grounding on without having to reference other things which are only experienced inside of your head. And to make a fully autonomous AI, you have to solve this

46:35

problem. And he didn't see it being done in our lifetimes of our children's lifetimes. He didn't say it's impossible. He just didn't see it happening then. And if I look at the engines now, just being prediction based, no matter how good they get, you're still not going to get to that essence behind it. Yeah. It's it's not just having the knowledge of the world, it's also, yeah, the experiences as in whatever do you see and is it equal to what else someone else

47:07

sees or with taste as well. And then touch. Yeah, it's like all senses. I I never thought about it like. That it's a very strange thing that's I'm a phantic, which means I don't see images in my head. OK, if I the the whole Apple test that went around the Internet a couple of years ago, close your eyes, I see black. I don't see anything. I thought that was normal.

47:29

So, you know, imagine you're on a beach and I would have a rolling commentary in my head, you know, it's like you're on a beach, blue sky, maybe some clouds. I didn't realise that people would actually see this and I don't, not at all. I can describe it, I can't see it, and I think that itself shows the fundamental lack of understanding that we have of how a brain and a mind works, that getting to that point of an actual autonomous AI is still a pipe dream for us. It's a huge leap.

48:07

I I can see it getting better, I can see it getting more accurate, but I don't see it making that jump. To finally then circle back because we touched on what you've been building, what you've put out in production, and also what has made you more productive.

48:25

Do you think with the introduction of AI and kind of the tool suite that is available to software engineers now, is it going to make software engineers better at what they're doing or is it more harmful with regards to kind of their thought process intellectually and also their output? I see people being more productive, but I also have definitely been humbled by how lazy I've become for certain problems. It's like, yeah, I just want it and hear the error and just give you the outcome.

48:51

Yeah, I think it it all depends on the developer as well. I saw a talk given month, month and a half ago. If you give junior developers AI, they don't improve, they get worse. If you give media developers AI, they can stay pretty much as they are.

49:22

If you give seniors AI then you can start to get the benefits because they can see the good and the bad of the output and work towards that and use it for the things that are going to save them time rather than chasing their own tail trying to get a prompt to produce something that they could have coded in less time. I had the same issue. Sometimes I kind of feel like, OK, you could have just done that. I was like, yeah, but I did this in like 5 minutes and I didn't want to type it.

49:55

Yeah, exactly. Yeah, I got the result I wanted and I didn't do a thing. Yeah, so it's. I see it for that, but I think it can also help for the senior developers on checking things as well. I've done that both for code and for non code. You can have your code, you can have your ideas and you can ask the AI then for improvements and then you can decide which improvements you want, which improvements you don't want. You can also ask it to describe your code and I found that far

50:29

better. Rather than just saying, what improvements do you see with this code? Just giving it code and saying what does this do? And if it comes back and says what you want it to do, then you're probably on a reasonable path. And if it comes back with something else, then you probably know that that's not going to be readable in six months. I think it helps with maintainability. That's interesting. I, I also use this for my for my talks. I used it in writing the book as well.

50:54

The book is on me. I did not use AI to write the book, but every chapter that I finished I gave to Claude just with a simple prompt to tell me what this is and if it told me what I wanted to convey in that chapter. Then I would do a follow up question of what is the target audience and if it came back with this is a beginner's book and taking people through XYZ. These bits are really good for that. Then again, I know OK, that

51:26

chapter's done. Sometimes it would come back, like that's not really what I meant, or that's too advanced and I would have to rewrite that bit to make it better in the future. That having that as a first editor was just phenomenal because it just meant that I could pass it on to my publisher and to their editors and the technical editors with a lot more confidence. Yeah, that I was doing the right thing.

51:58

Yeah, exactly. I mean, already when you're writing something to an audience for you're preparing a presentation, you always do a self validation, right? But that's your perspective. Absolutely. And then I love how this is in a safe space. You can kind of give it to something that is AI and then be like, OK, yeah, I could do another pass myself. Yeah, I could do another pass myself. And I did a blog post once and I rewrote it from scratch for the

52:21

feedback that I got. Yeah, it's, I thought it was kind of OK, but it threw up so many, not technical errors, but contextual errors. So it was like, readers are going to struggle getting through this. Oh, wow. I was like, OK, maybe you've got a point because yeah, I've read it, but I just wrote it, so of course I know what it means. Of course. So having that feedback allow me to write something much better.

52:49

Yeah. I wonder if, because then probably you haven't done this, if you could then AB test in a in a perfect world and say, OK, this is the 1st result. And then this is what whatever the changes were after AI validation, let's say. Because from your perspective, probably the feedback that it gives, it seems plausible, right? But that's also the job. It's to give you a plausible feedback and be like here's

53:11

something. I think that's why I try not to tell it. I try not to ask it to validate what I'm saying. I try and keep it as vague as possible as I like. Not yet. I like that. Is this a good chapter? I was like, no, it's like what is this chapter? Who is it for? The only time that I would try and get it to do self validation is when it's find some things. It's like, I kind of agree with it, but I don't think it's that important. And at that point I will do the

53:47

95%. Unpublished is better than never published for 100%. Yeah. And then it's OK if that's what you're after. Sometimes it's like, this is fine. It's like, yeah, that's nitpicking. It's fine. And other times I say these bits are fine. This bit's really important, Yeah. And then it's like, OK, so I know to look at that bit again. Gotcha. It's I found that so useful also because I know I'm a perfectionist. So actually asking it, OK, I'm a

54:18

perfectionist. Is this acceptable without actually saying that question was was also a huge boon because I know that I spend way too long writing my chapters at the best of times and they're said like I could improve that. I know I can improve that. And it's like, no, it's if it says 95 and all the examples work then I know I'm there. Yeah, I, I love those insights because my mind usually goes to, OK, I'm trying to confirm something or I'm trying to deny

54:48

something. I ask it in in that way. But then it's such a, a yes, Patrick's so on and so forth. It always says yes. Or if I'm saying that, well, what about this then of course, you're right. And it gives me X wines that the, the vagueness or at least the open endedness of your questions in kind of validating. I think that's super important. Yeah. Yeah. Awesome. I've really enjoyed this conversation. Stacey, thank you so much for coming on.

55:11

It's been 3 1/2 years, I think, something like that, Something like that, yeah. Thanks for having me again, this has been fun. Awesome. Then I'm going to round it off here. If you're still here, let us know in the comments section what you thought of this episode. It's the best ways to support the show and we'll see you in the next one.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript