Can You Really Trust AI-Generated Code? - JSJ 699 | JavaScript Jabber podcast

Speaker 1

00:05

Hey, folks, welcome back to another episode of JavaScript Jabber. This week, I'm your host, Charles max Wood, and I'm here with Itamar Friedman.

Speaker 2

00:14

Now it tomorrow.

Speaker 1

00:14

Do you want to let people know who you are and what you do? I see your shirt it says quoto, So you want to talk about what they do? And yeah, and then we dive in and talk about whether or not to trust your age AI generated code or code review.

Speaker 3

00:30

Really happy to talk about that topic and being here, Charles like, really a pleasure. Developing community are really awesome, especially as they're more.

Speaker 4

00:38

Specific you get into the details. COODO stands you mentioned.

Speaker 3

00:42

The name CODO stands for Quality off Development, and the most most focused of our platform is around AI code review. In general, we deal with quality, different AI quality workflows, et cetera. Basically helping enterprise professional a depth teams to standardize their quality via the review process or.

Speaker 4

01:01

Shift left, cold review and testing.

Speaker 3

01:03

It's okay, yeah, serving for example thousand around the world. But I think, like really really exciting, for example, just to talk is cold review important and all right now and in the future.

Speaker 4

01:15

Forget about CODO. That's I think like a cool topic.

Speaker 1

01:18

Yeah, absolutely, Well you know, you kind of hit on something that I think a lot of people are talking about with the AI stuff, you know, whether it's a code review or AI generated code. Maybe it's AI generated code that's code reviewed by AI.

Speaker 2

01:32

I mean, I don't know.

Speaker 1

01:33

It seems like there's a lot of concern as well as far as Okay, well, if I've got you know, an AI n LLM generating code, am I even going to have a job or you know, maybe they just downsize most of my team and so I have to be the most elite of the awesome elite at my company. And so yeah, there is a lot of concern there. And then the other piece of it is is, okay, I've got this powerful tool, Am I using it right? And so let's talk about the code reviews first, since

02:04

that's where you're kind of living these days. And I don't think we've really gotten into the AI code reviews, to be perfectly honest. I saw that GitHub does some like you can turn this on for some of your repos and I haven't even tried it because it's just like I don't know, I mean, at work, you have to have a human code review your stuff anyway.

Speaker 2

02:25

And then on my personal stuff.

Speaker 1

02:26

It's like, like, I guess I did turn it on on one project and I was not that impressed. So so tell me where this fits in and maybe where I'm not using it to its full potential.

Speaker 3

02:39

Yeah, these are good pointers. So you touch an interesting point. They're related and different. So it's suckles almost theim like first about losing your job, Like you get to gut that point really quickly. I think like, first of all, the next few years. I'm not talking about fifteen years. That's it's hard to predict, especially the future. So let's focus on five years and we can we can go to further and out. I do have like my opinion,

03:01

strong opinion about about the five next year. Forget about it.

Speaker 4

03:05

Sorry.

Speaker 3

03:05

Dario froment Tropic and Sam Altman cleaning. I don't know half a year ago, one year ago to the twenty twenty five. You don't need more developers. Ninety percent of code generated by developers. Yea sweet bench if you know this benchmark software engineering benchmark going to be ninety nine by nine of the year.

Speaker 4

03:20

We're far from all of that.

Speaker 3

03:22

And now I claim that while they decrease their predictions for twenty twenty six lower than twenty twenty five. I think even there they're wrong. So but that's not doesn't mean that the future is not going to be strong right now. It all already is, and it's going to be more and more AI empowered. And yes, at some point there is going to be a flection point where we're actually going to see end to end automation of

03:44

software certain aspect of software development. But we can talk about it via a core review specifically about you know, Copata, et cetera. We are actually good friend, Like we have multiple clients that we are partnering together. They focus more on code generation and agentic workflow out of or pushing like get up issue into a PR much quicker. We're more focused on how do you note that that you can trust that line code? And I'll elaborate about that now.

04:12

Sometimes there are some you know, feature like think about the cloud, like cloud have cloud observability tools, but still you have the data dog off the world. And that's a difference between like a few features that the cloud have around observability to a full fledged platform that gives

04:27

you the confidence. So I won't go deeper on that competition because it's actually quite a symbiotic like, the more people use copilot, the more you need CODO in order to trust the code, So it's actually good.

Speaker 4

04:39

But code review specifically, I think like some might.

Speaker 3

04:43

Think about a synonym for where you deal with software quality, but actually difference like it is part of it. But let's think about quote review the purpose. It's meaningful purposes first like owning the code and learning two different very close relate buckets, like as a team. First the developer mostly single handed, like developing a feature or a sub feature.

05:08

And then there's that moment where the team take responsibility if that person is going to be in vacation, who's going to do root code analysis if something happens right, So it's a moment where you you like, learn to each other code, learn the software, and own it together, even if an AI generated it completely one hundred percent from one prompt to a PR which is rare. But let's assume we're going to maybe talk about aspector and development,

05:32

et cetera. But let's say you did you respect. Yeah, let's say you did write respect completely like perfect man, and you've got a feature. Even then a human needs to take over at least for the next few years and take responsibility. So that's one one thing about Court of View, and the second thing is that like until now, and it's going to be in the next few years, even if you do the best spec and the best like work, you're still going to have AI that needs sorry,

05:57

code that needs to be reviewed. And there's a different between a tool that is meant to help you review and startulize quality to the one that is trying to help you generate code. Same off, how there's a difference between code of reserveability and thought. So to till the r hey, don't worry, in my opinion about your job. I would even recommend people as different than all other I.

Speaker 4

06:20

Think nine and a half the industry.

Speaker 3

06:22

Do go learn like you know computer science, but do like work with AI as one of your main tools throughout the STLC and code generation, Code of view will cot analysis, generating your respect everything and you will just alleviate together with with the profession, and that I think my point about that.

Speaker 1

06:44

Yeah, I agree with a lot of what you said there, and I'm going to back up through some of it, and then I've got a couple of questions as we go through. But one is you know, because yeah, you kind of wove in the am I going to lose my job?

Speaker 2

06:56

Or where where do I go with my job? Along with hey, where.

Speaker 1

06:58

Does the code you stuff fit in? And and that's where I have more questions. But I just want to reiterate a couple of things. One is is that we're not getting away from people having to be involved in the process. Is what I heard you say. And so to a certain degree, yeah, you may have tools that take on certain part aspects of the job or you know, do some of the things that are involved in you know,

07:23

understanding and figuring out what's going on. But at the end of the day, yeah, you need a human that's you know, that's going to take responsibility for this stuff and and shepherd it through. And if you're looking to enhance your career, you need you need to understand and be able to use these tools because they do make people more productive. And so the adoption of this stuff

07:45

is inevitable basically. And so those are the triitions and the role right and the role of soffer interrupting of software developer right now in five year is completely different. I love the word that you shepherd, like like basically you're going to deal a lot with writing spects, writing your rules, writing your best practices, following down like navigating your army of agents, specializing in how to deal with specific problems that might.

Speaker 3

08:12

Come around with that. How everything I said is going to be evolved, et cetera. So it's going to be completely different, like higher level, more architecture, guardrails stuff. And it's a process that happened throughout the last twenty thirty years.

Speaker 4

08:26

Right.

Speaker 3

08:26

We used to punch cards and write notes and assembly and etcetera. So it evolved. So it's going to be different. But there's a lot of things for a human to take responsibility and ownership.

Speaker 1

08:38

Yeah, and wherever we end up, right, because I think it's optimistic of anybody to say, well, this is really where I think we're going to be even in two or three years, because things are moving so quickly, right, And you pointed out, you know, Sam Altman and some of these other folks, they thought they knew where this was going, and they just you know, it's impossible to really predict it. But if you're on top of what's going on today, then it's a whole lot easier to

09:05

adapt to whatever comes later. And then as far as the like the AI run code reviews and things like that. So you talked a little bit about somebody's going to have to understand and maintain and take ownership of the code, and a lot of that knowledge has passed through a code review, right, So are you are you advocating that you have a human reviewer and an AI reviewer or so? So, yeah, how does this fit into the life cycle of my features and things like that.

Speaker 3

09:40

Yeah, we'll talk about a bit about nostalogy just before, Like you remember those days, like I think me and you are like sorry for iman selling old model enough to remember today's where we actually use books.

Speaker 4

09:53

Do you remember before.

Speaker 3

09:54

We went to Google or whatever was other websites, we were actually using books to learn about things, et cetera.

Speaker 4

10:00

And I still use books.

Speaker 1

10:03

That's kind of my maybe I'm going to sound old, but and mostly yeah, if it's technology, I do spend a lot of time on the internet kind of picking up the new things, right because the books get out of date kind of fast, depending on how quickly technology moves. But yeah, I prefer picking up the books and just getting kind of the classic ideas that don't change it.

Speaker 2

10:26

Everything runs on.

Speaker 4

10:27

So so we are nostalgic about it.

Speaker 3

10:29

But if I have to force you, like choose one, I think I know what you're going to choose. And think about how much we delegate to Google or other technologies on the Internet and interanet to that we're not going to validate right now to the deepest like part of it. So I think it's right now already like a very human digital we call it I or not like technology based like development already as it is today.

10:58

And I think like the same thing is going to happen with court review, where actually if you have a one hundred and we are seeing customers that have more than that, like a set of guardrails, rules, standards, and I can give you an example like having a human reviewer doing that like all at once in the time the limited time is actually problematic and if you can free the time to focus on what you know where could be the most useful for that courde review, it's

11:25

actually going to have better learning, especially if the tool AI tools that are using cares about learning. So even if it caught something, it's just like, oh I fix it. It's rather hey, this is the reasoning, and that's what we learned from it, and by the way, this is

11:38

how we changed our second brain like that. And so let me give you an example, like there could be that someone in the company really cares about you know, eat your flags, and they didn't manage to tell everyone no. And the five hundred developers seven thousand developer organizations, that's what you care about. You can configure that in the system and I system. And now if you developed a new feature, you get an alert, Hey you didn't put

12:03

a feature flag. Oh show you shouldn't you know? It's this is also learning, like it's like, hey, oh, the company cares about feature flag. I have a one hundred example like this, but just one, you know, very I think famous, funny or scary or not funny. Min I'm a developer. This is like I'm a developers. I wake up in the morning, I have a slot forty minutes to review a PR, a pull request. If you give me five lines of code in that PR, I'll give you fifty comments. If you give me fifty lines, I'll

12:35

give you five comments. If it's five hundred lines, looks good to me? Right, Like, in forty minutes, how can I five on lines of Code's that's a reality. And by the way AI could even help us with classifying like which review which pr?

Speaker 4

12:49

You can actually do it for forty minutes.

Speaker 3

12:51

That's how Also doctor works like they get an AI tool that helps them prioritize, not necessarily tell them if that person is whatever diagnostic, but just serve facing. Right, So the opportunities are awesome. Actually, the problem is real already existing around learning and catching issues. Tell me if you didn't have a major issue, which developer did not have a major issue?

Speaker 4

13:12

And production?

Speaker 3

13:12

I can tell a few horror stories myself, and so the opportunity is not reducing the learning or increasing bugs, is actually trying to to you know, get this better. But we do need to write you xui and the right mindset of the team. That's the building the dev tool.

Speaker 4

13:28

So obviously like.

Speaker 3

13:29

Sorry, shameless plug bug code, but it's not only us, okay, like or not going to be the only one who cares about it.

Speaker 2

13:35

Yeah, I think I think the how do I put it?

Speaker 4

13:38

So?

Speaker 1

13:39

Programmatic checking of your code has been around for a long time, right, So you have the static analysis tools, You've got the linters and stuff like that. The electronic with those is that typically at least the ones that I've seen they break your code down with the abstract syntax tree and then they look for patterns.

Speaker 2

13:58

Is basically what they do.

Speaker 1

13:59

And the AI systems they kind of do that. But yeah, you can be a lot more I guess, a lot less prescriptive as far as what the patterns are, right, and so you can train it on how to look at code and what to look for, but you can be a lot more broad and it'll figure out how to do some of the stuff that you have to explicitly tell the other the static analysis tools and the

14:24

linters how to do right. And so you can say, like your feature flag example, right, you know it can you know, I mean, depending on the training of the system and you know how good the data set is and things like that, and how good your prompt is.

Speaker 4

14:37

But you can teach it.

Speaker 1

14:39

Essentially, this is how you identify a new feature and right, and then you have to make sure that there's a feature flag around it. And it can figure out the other steps, like it can infer what to do and so on a lot of those things where you're saying

14:53

you set up a rule. Effectively, what you're saying is is my code has to conform to these ideas, and then it can with its latent space, you know, the stuff that it's been trained on, anything else that you add to the mix, with its context and things like that, it can then go in and intelligently figure a lot of that stuff out where trying to figure out how to explicitly program it to say, here's how you find something like this in the abstract syntax tree, or here's

15:21

how you break down this idea. Especially on you mentioned like the five hundred line pr right, it's like across all of these files and all of these changes. That's tricky. But the LLM can consume it and figure it out much much easier than you can figure out how to explicitly program it for every case you're going to run into. And it's not going to be perfect, but it can do a lot better job, and it can do it a lot more quickly and a lot more thoroughly.

Speaker 2

15:49

Than I can.

Speaker 1

15:49

Yeah, just kind of browsing through it and going, you know, hoping my pattern matching brain goes, yeah, that's a problem, and that's also a problem. And hey, there's so much context in here that I'm just trying to understand that. You know, I'm going to pick out all these little things. But then when I go in as an experienced programmer to do the code review, you know, I can see what it caught and I can say, yeah, ninety nine

16:14

percent of this is great. You don't have to worry about these couple of things that you know, it may not be quite right on that or it's close, but you should do this instead of that. And then the other thing that I can see with it is that there may be things that are just kind of aesthetic or organizational or other things that we do that we haven't codified into the rules where it's like, no, we

16:39

do things this way, not that way. And then you can also go back and you can retrain your you know, you can rewrite your prompt to include those on the next one. But yeah, I can see where the LM's looking at the code could do a much broader and more nuanced analysis than you can get out of some of these other tools and be more thorough than a programmer doing it.

Speaker 3

17:02

I'd love to relate to the you know, traditional quote unquote old world satical analysis versus the semantic AI. But just just before that, about philosophy around like, how would you how would you actually exploit l lams like AI to work with rules or standard standards that are written down. So it's for those who are watching the video I'm actually having I have here like a black shirt that

17:32

it's really where you really I wear purple. And I'll explain why, because I think the world's roughly sticking divide into too and I'm relating to how you use rules in the standard and mark down, et cetera. I'm going to explain it's connected, I swear. So there's the blue team and the Red team. Like principle at CODO, we see yourself like more in the red with mixed. That's one of the reasons we chose the purple. So I'll explain the blue team, which we had like the Winter

18:01

of the Cursor of the World, et cetera. Right QUOD code compile everyone Basically they take those rules you can write them in different and they mostly put it as part of their context, which in many cases is actually listening to.

Speaker 4

18:16

But there's two problems. One I said many cases.

Speaker 3

18:19

The second is that maybe it took it as part of the context. But as you usually it doesn't work. You don't finish a feature like at least not a meaningful feature from promptu code, even if it's an agent that runs for ten minutes usually or one hour, and how sometimes you usually like get to prompt it over like.

Speaker 4

18:38

Like navigate it.

Speaker 3

18:39

Then at your tenth you know, prompt it actually very very probably like missed something from your spec or or your rules or your even your own prompt. And the second one that's a blue way of thinking, would they prioritized like their KPI is like u x ui. We admire on them on that and speed from prompt to code, et cetera. The red team is how we take every fact that is there as an intent they're functional or non functional, and verify it. That's a totally different process.

19:18

So you take the list of your one hundred rules, whether it's ten or or one thousand by the way, and you and you check them with and write lam to turn to really increase. I can't say one hundred percent, but really ninety nine whatever and free is a chance that they're actually actually checked. I just want to you know that it's important to differentiate between different philosophies and it actually leads to a different uxui, et cetera. And

19:43

that relates to the first topic of ASTs. I think that they're actually very powerful tools like sonar for example. I would definitely I have like high expectation of that company.

Speaker 1

19:53

And but yeah, they do a great job within the set that they're you know, they're capable of doing.

Speaker 3

19:57

They do two and I think like eventually, if you want to exploit AI, really explore, move fast with confidence, you want that mix of you know LLLM AI empowered code review, code quality with that static and it would be the best if you can actually mix them together and you would see integrations like we have integrations with with those tools because they might catch in many cases the same thing and you don't want that annoying like

20:25

double double reviewing double thing. So but but the basic of the technology, you want this technology to to to work together because otherwise you don't have a confidence to really like, okay, there's one to clock code. In my experience like you often I wouldn't say daily but weekly, Like with one prompt, I can can get one thousand

20:47

line of code in five minutes being changed. Go review that right and right and and now and now, like you want the maximum help of AI also or sorry, any technology to help you navigate to getting confidence about that code quality, correctness, maintainability, et cetera. So I have like a high expectation of these twols working working together, being that hunter part the red versus the blue. So you can get a purple as a dev organization.

Speaker 1

21:19

So I mean going into this, right, and I kind of I mean, we may change the title when we publish, but I kind of tongue incheek put the title is can I trust AI generated code? And so now you're talking about this other end of things right where it's like you've got the copilots or cloud codes or Google Gemini or chat GPT.

Speaker 2

21:39

Or whatever, right or whatever?

Speaker 1

21:42

Right, Yeah, and so you know they're using these models to generate code, right and then yeah Kiro we did an episode with Eric Hanchett where you know, in Kiro it helps you generate the entire spec and then basically it kind of iterates through step by step and does it. It was funny because we did that episode. I'm sitting there going, well, I've been using Cursor and vs Code with Copilot and neither of them do it, and literally

22:09

a week later is showed up in Cursor. And so so I've used that feature a handful of times and it's really nice. The other thing that I'm going to point out because you're like, yeah, you know you have claud code generate you a thousand lines of code or whatever. I was having a conversation yesterday with Obi Fernandez. He's a Ruby developer, but I've seen the same thing with from JavaScript friends, where you know, they use claud code and they have it generate a bunch of the code.

Speaker 2

22:37

Right, and then they kind of review it on their own.

Speaker 1

22:39

And the thing that's he basically said because he used to run a consultancy, one of the bigger consultancies out there in the Ruby space, and he was saying, I built this app in a month. That it would have taken us six months and probably five hundred thousand dollars of developer salary to build this, and I've done it

23:01

by myself in a month with AI help. And so I'm looking at it and go, well, so to a certain level, I guess the answer to can I trust AI generated code is yeah, I mean to it because it works, right, But yeah, so if you have the AI generating the code, I mean, where do you run into problems there? And then you know, especially with what you're seeing, where you have this system that then reviews

23:26

the code. So if you have the LLM generate the code and the LLM review the code, right, is that kind of a circular dependency that does or doesn't work? I mean, this is where I'm starting to get into. Okay, you know, I've got these tools that are supposed to empower me, but do they actually work nicely together to give me that red team blue team work out?

Speaker 4

23:49

Yeah?

Speaker 1

23:49

And I guess the other concern is is you know, am I actually moving faster or am I moving faster with a gun into my foot.

Speaker 2

23:56

The whole time?

Speaker 4

23:58

Yeah, we heard a lot last point.

Speaker 3

24:00

I think, like short, what I'm mostly want to relate about the main question off the topic of today, like should you trust AI generated code? I do have an answer, but I just say, like I do think the future is already happening now, like pol people of a client connecting CODO with cursor, CODO with COPAAD or or whatever, and that's more.

Speaker 4

24:21

Yeah. I just want to like it, Okay, Yeah, I just don't want to like to spend too much time with that.

Speaker 3

24:25

I guess it's a lot of commercial today's, so it's possible it's going through that direction.

Speaker 4

24:30

More and more.

Speaker 3

24:31

I do want to answer the question, but with a metaphor before if I ask, do you you do you trust human code? Wait, way, don't answer, let me give you give you a second, but just think about a for a second. Do you trust human the like code? Think like, let's let's well think about a little bit. Sorry, no, because because you're you know, it's a it's a good point.

Speaker 1

24:56

The flip side, though, is that because I don't know that it's necessar do you trust human generated code? A lot of it depends on the human right, and you know, it's like, Okay, how much experience do they have? Have they done this before?

Speaker 4

25:08

You know?

Speaker 1

25:09

Are they taking security and scaling and all the other things into account?

Speaker 2

25:14

Right?

Speaker 1

25:14

But the other thing is is that I don't think that that's the baseline that we have to look at AI from because it's not whether or not I trust human code and whether or not I trust AI code, because at this point, the human generated code is the baseline.

Speaker 2

25:31

We've been doing that for years and years and years and years.

Speaker 1

25:33

Is the AI code better or the other way you look at it from a business standpoint, is is it close enough? Given what I had to do to generate it, which is usually a prompt and not nearly as much time. Right, so maybe it's not as good as human code in a number of ways, but it's good enough and it's.

Speaker 2

25:55

A lot cheaper, and so it may be worth it anyway.

Speaker 4

25:59

I'll use you on it.

Speaker 3

26:00

But still still bear of me with the metaphor, like if I ask, and I'll borrow something that you said to you to make my point. If you ask, like do you trust human the developed generated code, written code or coded code, then then I think that the minute

26:17

answer is like, yes, that's what we do. But then you think about it a bit more like you just did, and you're thinking, if I trust that person, even if it's a senior developer, to write the code and noteped plus plus on the airplane and then push it to production. I'm not saying I never did that, but but but

26:34

but push a production. I think your answer is no, And then you're what you are saying, like a senior developer that is going through the processes that is required, uh checking also for security reviewing for the standards and all that and going through the review process and checking the CI results and et cetera. You do trust if you want generated code. That's my answer, Like you should you trust AI generated code? Spit it out, you write a you wrote it, prompt you via code, trust it. No,

27:03

I suggest you don't also do that. For the most senior developer, you have a process, the process could be quick, and I think that's what.

Speaker 4

27:10

We're going to see.

Speaker 3

27:11

Also for the AI generated code, you're going to see AI being used to automate more and more and more parts, and then AI generated code is going to look like more more a generated code like AI generated software development, AI software development, where AI is going to do a

27:32

proper process. Humans, by the way, in certain percent, which in the beginning would be very big, very large, sorry, are going to be involved throughout the process, and over time the process is more like you know, like verifying that the pipeline works, you know, like like we do with the MANUF in a manufacturing like a lab. Right like it started with human labor and slowly like more and more and more automatic humans are still involved there, et cetera. It took like fifty years to do that.

28:04

It will take also like fifteen twenty years. We've talked about.

Speaker 4

28:07

Predicting for totally being machines.

Speaker 3

28:11

Meanwhile, like I just invent in a number like fifteen years, like twenty forty one just to have to guess, like the four and one looks like AI, so I guess

28:18

it's a stayer for me. But until then, like we're just going to see like more and more portioned being automated and maybe some portion like automated end to end, and at that point you will trust it because it's not just like spitting statistically code and even if it's trained really really well, but it's also going to validate itself, verify itself going through the process, et cetera.

Speaker 4

28:38

And that's a future. That's a future we believe that.

Speaker 3

28:40

Code of right, that's where we focus on code quality, codelorification and etc.

Speaker 2

28:44

Right.

Speaker 1

28:45

I think the point is well taken that because a lot of people they conflate where things are with where we're going to end up. And what you're saying is is that, yeah, we're going to get better and better tools to do more more things, and they're going to manage more and more pieces of the process, right, And and yeah, I think that that is absolutely true. It's funny because for several years, you know, I've been using rock or chat, GPT, or Claude. I've used all of them for different things.

Speaker 2

29:21

Right.

Speaker 1

29:21

It's like, hey, I'm trying to explore this thing, right, and so'll it'll give me all kinds of feedback on you know, health or whatever, right, and so I kind of use it as a coach or at least, you know, and then sometimes I'll go fact check it or verify this or that, or you know, refine whatever it gave me.

29:40

But you know, a lot of times it shortcuts a whole bunch of research that I would have to do, and so then I can just justify the pieces where I'm like, it doesn't seem quite right, but it's it's gotten more and more correct the longer, you know, the longer we go, because the models get better, the data that's you know, in that latent space gets better.

Speaker 2

29:58

And so that's I definitely see with software.

Speaker 1

30:02

Yeah, But as far as conflating where we are with where you know, a few years ago, I think it was just last year actually, I was having a conversation with my father in law. Now, granted he's a general contractor, and by general contractor, I mean like he fixes crap in people's houses general contractor, and you know, and so he just heard about like the goof ups in the news where it was like, well, I heard somebody ask chat GPT this thing and it told him this bogus thing.

30:29

And I'm like, yeah, dad, but we've moved like four models ahead since then and it doesn't do that anymore. And he's like, yeah, well, you just can't trust it for anything, and you know, and again I'm looking at him and going, well, actually, I use it all the time for this other stuff because you can trust it, right,

30:47

But yeah, I go and fact check stuff. It's like, you know, I think you have a bias here, and so I'm going to go fact check these pieces because I think I think the data that you were trained on isn't one hundred percent in line with my worldview

31:00

or the way I think things are. But at the same time, yeah, it's gotten way, way, way more accurate, especially when it gets into a lot of the you know, like easier for like meal planning and you know, hey, I got to modify this workout or this or that, and it you know, it's terrific because it's got all that data in it. And I think that's where we go with the software. Is My point is so even where we see it kind of fall short. Is it a much better and much more accurate, much more thorough

31:30

than I can be as a code review reviewer. And it looks like the answer to that question is undoubtedly.

Speaker 4

31:36

Yes.

Speaker 1

31:37

It may give me some false positives and some false negatives, but it's going to be more thorough and much faster, and I can pick through that and it's still going to save me a whole bunch of time and effort, and it's only going to get better. Right, So where we end up in a few years it may be completely different, but it's almost certainly.

Speaker 2

31:54

Going to have better data to run on and make the process better.

Speaker 3

31:57

I think, like I truly believe there is meaningful improvement in the LLLM. Like some people over time claim that it's like diminishing the velocity of right because it consumes what it's been putting out. I've heard that for example, for that reason, et cetera, I think the are meaningful improvement. We have internal benchmarks around quality of code, et cetera. It's it's it's going up. And having said that, I

32:22

think there's other reasons where like extracted value is bigger. First, we use it in more areas the l ll ms and more areas. But specifically I want to relate to what you said. I think it's also we're learning how to use it.

Speaker 4

32:35

So you know, like as a developer, you learn how to Google, or you you learned how.

Speaker 3

32:39

To use stack overflow, but now I guess, but you learn how to have a good Google. That's still very relevant and I think like we are learning how to you know, prompt or use it could be one of the differences in between you and the front or family, et cetera. And that's really actually to for example, what we talked about w as Hero and Spectra and development that like we we're learning that the better we the

33:02

more information should be concise and accurate. But the more information we provide as part of the prompt if we're talking about self development, like maybe spec in most cases, the.

Speaker 4

33:12

Better job it will do.

Speaker 3

33:13

By the way, I am like once in a while, like giving a disclaimer or trying to be careful because because there's a lot of research down for example and Tropic are really good in it is that if you think that you can push like as much as context and instruction as you want and expected to really work well, they're actually seeing it diminishing returns even like worse if you give it like a spec of like a full book, even if the context is bigger. But putting that aside, it is a really good idea to.

Speaker 4

33:40

Learn how to use these tools.

Speaker 3

33:41

And I think we're actually consciously and consciously like like like doing doing that and and for software development specifically, and I think even JavaScript, where you know the language is maybe not that descriptive, et cetera. Like having a proper spec is a good idea. And although I have to say I'm not a big believer that spectrum of development is going to be the last thing that survived. It's going to be the biggest thing that actually make

34:06

the difference. Is an important concept, but it's not going to be the what's solved everything.

Speaker 2

34:13

No, I think.

Speaker 1

34:14

And I'm just going to piggyback on what you're saying because I think you're correct in one way how we use the tool, right, and so you know, spec druven development is one way that we've you know, this is a new way to use.

Speaker 2

34:26

The LM and you know, maybe have a wider.

Speaker 1

34:30

Context on what it's doing and give it a step by step cohesive plan. But yeah, I don't think that's where we end up. I mean, we're going to invent other ways of using these tools, and this may be a stepping stone to something else. The other thing, though,

34:43

is that it's not just for me. Hey, we're getting better at using these tools, but also as it takes things off of our plate, we're able to refine in other areas, and I think those get better because the next versions of the models pick up some of those changes to the way we do things outside of how we use the LLM and make it better that way too. And so at some point does it kind of you know, are we getting smaller increments of value?

Speaker 2

35:12

Maybe?

Speaker 1

35:13

But again I I just see the ingenuity of people as we go continue to just be really cool and awesome, and so for the time being, we just see these astronomical leaps every time we get a major version update on these llms.

Speaker 3

35:30

Yeah. By the way, I think once upon a time. My background is the machine learning since two thousand and six, oh, okay, annual networks in twenty ten, so I allow myself talk about a history. I think once upon a time until roughly speaking GPTs three three point five. Every time you train them all like really like nine nine percent of

35:57

the cases. You try it to be better than the others in a specific niche, right, even if it's a big niche, still GPT three point five, I think we had like a year or two or more that we

36:08

were under assumption level matters. You're not the market that wow, Like, look at this GPT three point five winning every benchmark, even human benchmark if you remember those graphs, amazing graph on opening eye exacts as like different professions like from lawyer stuff to history and the waxes like the percentile on their official tests, and like GPT three point five like cross every model on all of these like fifteen different professions, and then GBT four the same, et cetera.

36:41

At this point, it's not the case anymore. I think since roughly speaking, so that's three point five if you're familiar from on Tropic, et cetera, then and I think that model was suddenly better. Some claim much better than GPTs on coding, but probably not at all.

Speaker 4

36:58

Quite a few cases.

Speaker 3

37:00

Now, like there was a moment where seeing people thought only open air on Tropic or whatever, Google are going to generate like foundation models, but I'm seeing like dedicated foundation models and in health medicine, customer success, et cetera.

Speaker 4

37:14

And and like we do do see like GPT.

Speaker 3

37:17

Five for example, and maybe there's new new versions the the are coming that are that are better on specific aspect of software development, like specific in software develop right, So so like there is evidence that that is it's incremental, and I think that's actually somewhat meaning that we're not maturing. And I'm not sure if that's what people thought. Aren't going to say, like we're maturing. Okay, let's get to

37:43

let's get to those EDU cases. Let's get Okay, we probably need a specific LLM and a specific agent for specific quality measure that we want to track or help with, et cetera. So I think I think like it doesn't like, it doesn't mean that the whole solution doesn't keep evolving upwards right in the same or even bigger speed, you know, like the same.

Speaker 4

38:05

As uh uh.

Speaker 3

38:07

And book we talked about book like carts will that similarity is near. We're seeing like this is a notion that we're seeing like exponential growth and technology. But if you're zooming in, then you're seeing like skurve and and the thing is that each escort the time. The difference between that each escort just slower and smaller and smaller, right,

38:28

and if you zoom out, it looks like exponential. So like, yeah, maybe that specific GPT sorry, specific LLM attention architecture and specific training, et cetera, is slightly like the low hanging

38:41

fruits are over, but we are we will. We are seeing in the igentic world and and other like technologies more breakthroughs and I can mention more and and overall like the we're going to see AI like keep keep going upwards and so so I wouldn't like, like say, you know, like incremental LLM the solution we're getting, it's going to get better and better and we should adopt it.

Speaker 1

39:07

Yeah. All right, Well we're getting toward the end of our time. I hate to cut this short because I could sit here and talk about this forever. But yeah, so we're gonna just roll into our picks. I've got like five minutes before my work, Yeah, my work anyway, so I'm going to jump in. I have a work meeting, so I'm going to jump in and move to picks.

Speaker 2

39:29

Now.

Speaker 1

39:29

Picks are just shout outs about whatever it is that we've been up to and enjoying lately. So the first pick that I have is, so on Friday, I'm going to be teaching board games. I do this periodically. Hang on, So I yeah, I'm teaching board games at a board game conference, and I've picked most of the games we're teaching. The one game I haven't picked is well, there are two of them. One of them I'm learning tonight and the other one earned last week. And this one's called

40:01

far Away. It has a board game weight on board game Geek of one point nine to one, which means that it's fairly approachable for the average board game player. And so what it is is you how do I explain? It's it's mostly cards. So you're playing cards in front of you, and you have like eight slots, and so you play your first slot and then your second slot, and then your third slot. But when you score it,

40:29

you score it back the other way. And so the last card you put down is the first one you played to score, and it's it's available for scoring on all the other cards that you played, and then you flip over the next to last card that you played and you score it against the two cards that you have down.

Speaker 2

40:49

If you play cards. So let's say you play.

Speaker 1

40:51

The the twelve, and then you play the fifteen after the twelve, then you also get some other cards. I can't remember what they're called, but those ones count through the whole scoring process and anyway, so you just kind of build up this deck and then you score it back up the other way it is. It was really fun. I think it took us like a half hour. There were four of us playing. It says that you can

41:14

play it with ten ages ten plus. You probably can if you're going to play competitively, as far as like, hey, you know, I'm stacking all these cards up so that all the resources on the earlier cards play nicely on the later cards. A ten year old might struggle with, you know, planning ahead that far figuring out what to do, but they definitely play the game right, and enough of it is common sense enough to where they can probably at least wrangle their way through a lot of it.

41:45

So so yeah, so I'm going to pick that it's called far Away on the board game.

Speaker 2

41:50

Pick, and then let's see other picks.

Speaker 1

41:54

So I think I might have I think I might have picked this last time, but I'm going to just pick it again there's a movie that came out. It's called Truth and Treason by Angel Studios. My wife and I are members of Angel Guild, so we pay every

42:08

month to be part of that. We get to vote on the movies that they make, and it's also part of our subscription to the Angel app that Angel where you know, you can watch videos and you can say, I don't want any of this kind of profanity or any of this kind of content, right, so it'll cut all the sex scenes out of your movies and stuff like that. But so we wind up getting tickets as part of our Angel Guild membership to all of these movies when they come out to theater. So this one

42:35

is a World War II film. It's the story of three young men become disaffected with the Nazi regime during World War Two after their Jewish friend gets disappeared by the SS and so they start distributing leaflets by putting them in mailboxes and stuff and on cars and things around Hamburg and they get caught, and the movies about them and you know what happened to them, and so anyway, it was it was really really good. One of the

43:06

things I like about these kinds of movies. I mean, it's it's a sad story, you know, in the end, you know the way, the way that it all goes for them. But it's like, look, you know, how how willing are you to stand up for what's right and what's true?

Speaker 2

43:19

And I think in.

Speaker 1

43:20

Today's world, in certain parts of the world, yeah, you may be risking your life, you know where I live. Yeah, I guess they do kill people for that because it killed Charlie Kirk. But I don't feel like somebody's going to kill me for standing up. But I've had people come after, you know, my reputation and things for things that I've said. But again it's down to how, you know, are you willing to stand up for truth?

Speaker 2

43:45

Are you willing to.

Speaker 1

43:48

You know, do the right thing even if it costs you. So anyway, it's called truth and treason. I don't know if it's still in theaters or not. I think it still is probably until like Thanksgiving, So yeah, definitely worth seeing. Terrific, terrific film. And then my wife, My wife and I are still playing Jaws of the Lion, which is one of the gloom Haven board game setups, so and it's basically without a Dungeon Master, self directed D and D kind of game, and so you know, uh, anyway, the

44:21

difference being that you're it's not as free form. You actually have cards to give you your abilities and so you play the cards. Yeah, anyway, very fun. So I'm gonna pick that as well tomorrow.

Speaker 4

44:32

What are yours? Okay?

Speaker 3

44:34

Like, I just saw the movie Good Fortune, and uh, the first time I'm seeing a movie in a in a theater in New York. I just relocated here, and I felt that there was a somewhat funny mix of uh, you know, today's a tech tech bro you know thing, just this cushion related to together with like all the apps that we're using daily and how how like people influence? How are these apps that are designed by supposedly tech

45:14

bros Our influence day to day? Kenor is like playing there as like an angel I don't give like ten out of ten, but coming from the tech industry, et cetera, I thought like it's an interesting movie also to see a bit a little bit how like other people like see our industry, et cetera.

Speaker 4

45:33

So I definitely like.

Speaker 2

45:38

Very cool.

Speaker 1

45:39

All right, Well, one last thing and then I have to jump off for a work media, I'm already late for if people want to check in see what you're working on.

Speaker 2

45:48

Check out codo. Where do people go for any of that stuff?

Speaker 4

45:51

Yeah, totally so first of all qodeo, dot ai.

Speaker 3

45:54

From there, we have like everything we are at social obviously as well personally, I a mar underscore. Mar is my handler at x Twitter, and we have multiple open sources. We're actually going to contribute some of them to one of the open source foundations.

Speaker 4

46:11

Still learning which one is the best one.

Speaker 3

46:14

So for example, we have like a pull request code review agent, so you can find it.

Speaker 4

46:19

It's called pr agent.

Speaker 3

46:21

It's very different than our main product, by the way, very very different, but it is part of like our collaboration with the community. So this is a bunch of ways to reach out and we love like hearing the community code reviews subjective quality is subjective, while we do need to standardize that. People think about it differently, so hearing everyone, please reach out and anything and we'll be in touch.

Speaker 2

46:44

All right, cool, Well, thanks for coming. This was fun.

Speaker 4

46:47

Yeah, same here. I really loved it.

Speaker 2

46:49

All right, folks, we'll wrap it here till next time.

Speaker 4

46:52

Max Out

Transcript source: Provided by creator in RSS feed: download file

Can You Really Trust AI-Generated Code? - JSJ 699

Episode description

Transcript