#165 - Learning to Program in the Era of Generative AI - Leo Porter & Daniel Zingaro | Tech Lead Journal podcast

00:00

As software engineers, only a fraction of your time is spent coding. A lot of your time is spent thinking, how should I lay out the interfaces? How do I work with the other software within the company? How do I make sure I've got really clear requirements for my code? Like, all of these things are the really big problems that still humans have to wrestle with, and I'm not seeing LLMS taking that away from us anytime

00:25

soon, at least for now. Hey everyone, my name is Henry Surya Virawan and you're listening to the Tech Lead Journal Podcast, the show where I'll be bringing you the greatest technical leaders, practitioners and thought leaders in the industry to discuss about their journey, ideas and practices that we all can learn and apply to build a highly performing technical team and to make an impact in your personal work. So let's dive into our journal.

01:04

Hello Doctor Dan, Doctor Leo, Good to see you in Technically Journal Podcast. Welcome to the show. Now, thank you for having us. We're excited to be here. So I always love to start my conversation by asking my guests to actually share a little bit more about your career. If you can maybe mention your highlights or turning points that we all can learn from, that will be great. Thanks, Henry. I'd be happy to. I think the beginning might not be too surprising.

01:27

So I started in university in computer science and I did my undergrad degree in computer science. And then I started a grad program. So I started a master's program and I was doing something called formal methods. So like normally when people want to get confidence in their programs, they run them with a bunch of test cases. Like you know, as many test cases as they can come up with. But formal methods is different. Formal methods, you try to prove mathematically that the program

01:56

is correct. And I was pretty interested in that, although I could kind of tell as I was working on it that it was just extremely difficult for me. And I don't mean to say, oh, if you're, you know, if you're not getting something right away, you should just give up. Like that's not what I mean. But I I mean, there was some extremely impressive people in this field and I was happy to be part of it. But I also just sort of realistically and realized that I wasn't going to be able to

02:20

make a huge impact in that area. But I was still having a great time with it and I guess that's all that mattered to me. But then my supervisor one day happens not to be able to teach. I think he wasn't feeling well. So he called me sort of last minute and said, can you cover my class For me?

02:36

It was a compiler's class and I was worried because I hadn't taught a class before, but I, you know, I gave it a shot and that was the turning point for me, going from this kind of abstract research to education research. So I taught that lecture and then it was over for me. I was just how can I start teaching more and studying education. So it was a complete shift at

02:59

that point. I think to this day my supervisor would probably say that like, his biggest mistake was not getting himself to class that day because we were planning on working together further in in that area and it

03:11

didn't happen. And then maybe in 2010 or 2011 or so, another big career change happened, which is I met Leo at a conference, we were at both at an education conference and just sort of met up there and immediately just had a lot in common, you know, not just about our research but just hobbies and sports and video games and I think just like worldview. And so we connected immediately and just we've worked on dozens of papers since then and our our book most recently.

03:43

And it's a career highlight. It's a real honor to be working with him on in so many capacities. So I I have a bit of a non traditional career path in that I did my undergrad in computer science. I actually switched into computer science as a major and then I did 4 years as an officer in the United States Navy. I was a navigator reguided, missile destroyer. And so a lot of my lessons about leadership and building teams and ethics actually come from my time there.

04:09

After I finished my time in the Navy, I went back for a PhD in computer architecture, did lots of processor design, very similar to kind of how Dan's describing formal methods. I enjoyed the work, but it was really when I started teaching and being for the classroom that I got the most excited. Right towards the end of my PhD. It started shifting into to computer science education research.

04:28

And it started really with a colleague, Simon, kind of introducing me to Dan. And as Dan kind of pointed out, it was just incredibly fortuitous me because I didn't know the methods of computing education at the time. Dan was in an education PhD and it was really through the two of us defining our research direction together that we did tons of really productive work. And I'm super appreciative to

04:50

Dan for all we've done. I mean, Dan's been in terms of the kind of big stuff we've done in our careers. We've investigated how effective this pedagogy called peer instruction is in computer science classes and done most the main research on that topic, at least up to like 2017 or so. And then other people kind of took over the research.

05:08

From there, we built this assessment of how people learn basic data structures that's actually validated and is used by the community, called the Basic Data Structures Inventory. The two of us use machine learning to predict which students are likely to succeed and fail, and we were being pretty accurate about predicting student success very early in

05:25

the quarter. And just in terms of our book, that happened because I happened to hear about some of these generative AI tools like ChatGPT and GitHub Copilot. And I tried them out, and I immediately was worried about our introductory programming courses. And I thought, everybody's going to be panicking. It did end up happening like that. Everybody was panicking, trying to figure out what to do. And it's like I had gone on a call with Leo and I I said, Leo,

05:54

there are these tools. We have to look at these. Maybe we should write a book. Somebody has to. And it didn't take Leo very long, maybe 15 minutes or so of playing with the tools himself before he agreed. He was like, yeah, I mean, somebody's got to do this for the community, just to give some direction, some perspective on

06:13

what was happening. And then Dan, as Dan said, he said, hey, I I started playing with these LLM things and they are terrifying for trying to teach programming because they're solving essentially all the tasks we used to give them. And so we're going to need to change how we teach interactive programming. And as Dan said, it took me about probably 15 minutes playing with this thing before I went, Oh my gosh, we're in serious trouble.

06:36

And we then sat down and started figuring out how are we going to build a class that would adjust to the fact these amazing tools are available. And then as we kept going with that, we said, well, wait a second, we need a book to help structure the class. And that's when Dan kind of suckered me into writing a book. And then we, we had a lot of fun writing in after that. So it was great. Thank you for sharing your story. I think one thing that I really picked up is like, how did you

07:00

start writing this book? Right, Learn AI assisted Python programming. I think you just thought about it in 15 minutes. That's really cool. So today we're gonna talk a little bit more about what you've done in terms of doing the research and also from your experience playing around with this LLMAI assistant. And also maybe the impact to people learning about programming. Hey, thank you for being part of

07:23

the tech regional community. This show wouldn't be the same without your ears and you are the reason this show exists. If you're loving TLJ and want to see it keep on growing. Consider becoming a patron at techledjournal dot dev patron or buying me a coffee at techledjournal dot dev coffee. Every little bit helps field the research, editing, and sleepless nights that go into making this show the best it can be. Thanks for being the best listeners any podcast could ask

07:53

for. And now let's get back to our episode. So let's start probably in the beginning to just level set our understanding what is actually AI Assistant. You know some people heard about Copilot, but maybe if you can maybe describe what is AI Assistant? So I guess I would define an AI assistant as a piece of software that helps you get work done

08:14

more efficiently. But the key to it is the way you communicate with it. So typically when people think about using computers, they think you need to be very rigid in how you communicate with a computer. And that's what we would say when we teach programming courses, right? Every symbol matters, every space matters. Every key press that you make is important. And that's the language of computing, right? It's very precise. What an AI system allows you to do is communicate in English or

08:47

any natural language. And there are way more people in the world to know natural languages compared to programming languages. And so the hope and the goal is that people be able to use their own language and have the computer translate that, because computers still can't run or do anything with these languages, like English for example. The only one I know. So I keep talking about it, but still it has to be translated into something a computer can

09:13

work with. That's what we're trying to automate with these AI assistants. It's like making it so that people can communicate in their language and have it automatically translated over to lower level stuff that the computer understands. And so I would expand and just say that what's great about these AI assistants is it's almost a step in the natural progression of making programming and interacting with

09:41

computers easier for humans. And so we've seen this evolution from having to write assembly code or actually before that you have to push buttons on a machine to make it do things. And then writing assembly code was a huge improvement of the stored program computer. And then we moved to the point that we could start writing in higher level languages that were more English readable than assembly, compile down to assembly and actually then the assembly ran.

10:08

And then over time, we've developed more and more advanced languages that become easier to express our goals in the language available to us. Now, what's unclear to us is whether or not these LMS are going to be just the next language in which we interact. Right now you can interact with them and get working code fairly often, but it's not always correct, and so it's not quite the same as a compiler which is deterministically going to be correct.

10:36

But it does seem like the next step in a really nice evolution of making it easier and more accessible to write software, right? So I think that's a very interesting thing. I've not played around with all these copilot tools a lot, I mean in my day-to-day role because of the nature of my job. But one thing that I think very interesting when I heard developers using it right, it's like it seems to improve their productivity.

10:59

So just like what you said, right, it could be the next evolution of how we write software. And I think in your lecturing role as well, I think it will be different now that you teach programming to new students, maybe a little bit, you know, how do you find the difference now that there is this LLM AI assistant and students now learning programming, does that become much easier or does it actually make it harder? Is there any kind of a stock

11:24

difference that you can tell? Oh geez, this is kind of a long answer, so I'll give it a go. So we've been experimenting with how do you teach new learners how to program in the presence of LLMS? And I'll be upfront that we don't have all the answers yet by any means. I think that's going to be probably a decade before we actually really know from the research what the best way is to

11:48

teach students. Now, these new tools are available, but I can say a few things, which is that students very quickly recognize how powerful the tools are. I actually downloaded in my class, the very first class, and they all were. I got a gasp from the crowd. They couldn't believe that I was just basically writing the code for them. And so the question then is what are we teaching the students? What changes here? And I think there's still a lot to teach students.

12:13

It's just the scoping has shifted and what the skills are shift. We also decide what our goal is. Are we training the next software engineers or are we training someone who works in business or data science or accounting to be able to write software that does something useful for them. And I think those are actually slightly different audiences in terms of what we want to teach.

12:33

What we've done is we've kind of realized that the skills that you need to interact with an LLM are actually fundamental programming and software development skills. So the general workflow in working with an LLM is you give it some, you have some desired function that you want to write, You know you want to accomplish some small tasks, and you then describe that task. And then the LLM is going to generate code for you. Now the code it's going to

13:00

generate may not be right. It may not even be close to kind of addressing what you want. If you read through it, you can quickly recognize this isn't what I want. And then you can pull up basically a window of saying, are there other solutions that are good for me? And so already students needed to know how to read code, understand what it's doing, potentially be able to trace code, and be able to pick from multiple code examples which

13:23

one's going to work. And then the next step for them, once they've kind of picked one they think works, it's the right test because you can't trust the LM. And this is actually a point that I think is a really encouraging piece for new students is testing has been a point where new students struggle. They tend to write code and by definition assume it's right, which is basically the opposite of what we all do as software

13:46

engineers, right? You write the code and you gather tons of evidence for it being correct before you have any faith in it. And so the students, I think because it's coming from a machine that they know that makes mistakes, they're actually more willing to test. And so that's a research question. We haven't. I don't have the data to support that yet, but I have suspicions that they're more willing to test when it's coming from a tool that they know can make

14:05

mistakes. So then you test the code. Writing good tests is super important. And saying that we haven't taught as well as we should have candidly in the past. And then once they've tested it, now they know that that piece of code is working. However the one catches, sometimes the code doesn't work and sometimes you can't get the LLM to give you the exact right answer. And so there's still this last

14:23

step of being able to debug. And so you have to teach explicitly how do you modify code that's slightly buggy to do what you want. So we're still teaching debugging skills, and that also is a fundamental skill, right? So, so far your listeners are probably saying, well, what's

14:37

different here? But there is a shift onto reading and modifying and testing code away from looking at a blank screen and writing code from scratch, which is what we used to do the most of. Yeah, and Leo, maybe I I could just say this may not be obvious to people who have been programming for a while, but the syntax of a programming language is something that takes some students weeks and some students just don't get past it. It's extremely stressful for

15:08

students, and also artificial, right? Like the only reason we need all this syntax. And you know, maybe for this nice little bit of discussion, people can think about some horrendous syntax, like the way that like C function pointers are defined or something like. The only reason it's like that is because the compiler has to be able to unambiguously understand what your goal is. And if you're getting stuck on syntax, some of us do program

15:36

for the sake of programming. I I get that a lot of people probably listening to this podcast just love programming, present company included. But there are many people who just want to do something with the code, right? For example, many of us may not care how our appliances work. Like if my microwave, it makes my popcorn for me, I'm happy. And that's how some people are with code, right? They don't need to know what exactly each piece of syntax is

16:03

doing. What's super exciting for me and Leo about LLMS is I think for the first time we see a future where people who don't know how to program could be afforded some of the same benefits as people who do. Thanks for the explanation right of how it changes the dynamics. Now for people to learn programming languages.

16:24

I think the skill set that you mentioned, you know, like still people needs to be able to read the code, needs to be able to test it, need to be able to debug it and maybe also need to be able to express the task that they want to solve, right? Because LLM cannot just solve everything in one go. You will probably need to play around, do a little bit back and forth before you come to a perfect solution so to speak.

16:46

But one thing I think I want to highlight about this tool is that there's a risk of not getting it right the first time, right? I mean, even if you ask the same questions, it might spit out different answers, right? Maybe a little bit of the underlying why, why the tool doesn't seem to be deterministic. That's the first thing. And maybe a little bit about LLM how it works for people to understand.

17:07

So I really appreciate you pointing out the you can't just give a large task to an LLM in one go. And that's actually a super important skill that we're now teaching in classes that candidly we did not in the past. So what we used to do in the past, and this is common across pretty much all of computing education and I bet many of your listeners had this in their classes is students are given

17:28

essentially just a function. The function's basically perfectly described because you need to be able to auto grade it and so every possible case is covered in that description, right. And then they have to just fill in the code for that function. Now, LLMS do that incredibly well. And so the shift now is, if I give a fairly vague task, like a large project to work through, how do they break apart that large project into smaller tasks that the LLM can then help them solve?

17:55

Now this for your audience and for all of us as software engineers, this is problem decomposition. This is probably one of the most important skills we learn as software engineers, and it used to be that we didn't teach that to new students learning how to program until much later in their careers, and now it's actually front and centre. Incredibly important to learn in your first programming class because that's how you have to interact with these LLMS.

18:19

And so in my class this last fall, I had students doing things that they would have never been able to do in a previous CS1 class. I gave projects like find a data set on Cagle, ask a question of the data, and then write the software to answer that question. That's way beyond the scope of what we'd ever asked in ACS One, especially given the scope of what many of my students did, where they often did really nice visualizations. They pulled in really interesting data sets from the

18:44

domains that they cared about. Some of them had interactive programs where they were you could interact with it and ask, I want to see the relationship between there's a stroke data set and you could say I want to see the relationship between age and stroke, and it would actually plot age against stroke in really clever ways. These are things again, you've never seen ACS one. It's really open-ended and the students are doing all the problem decomposition on their

19:08

own. And so we're really excited about teaching this skill. And if I'm real reflective, I'm a bit disappointed that we as a community stopped prioritizing that so early in the careers. It really should be a first word priority. You mentioned about problem decomposition, right? I think that's really important skills for any programmers with or without AI assistant, right?

19:27

It's like to be able to break down problems or even requirements into small tasks, into modules, design classes and things like that and decompose that such that we can make a good software right? That is maintainable rather than just one big function that does everything in one go. So I think problem decomposition definitely is very important and I want to come back to the question earlier about LLM because it's non deterministic so far.

19:52

Maybe if you can explain why is that non deterministic, How does it work underlying so that people actually understand that? I mean, it's not going to replace programmers in one day, right? Yeah, Thanks, Henry. It's a really important point and it's also counterintuitive for a lot of computing people because, you know, we all joke about shutting your computer off, turning it back on if the thing you're trying to do doesn't work.

20:16

But I think many of us go under the assumption in our day-to-day computing lives that computers are deterministic. If you do something, you're going to get the same response. I mean, they're always examples of this and race conditions and stuff, but overall, that's how we feel about computing. If you write a program and you run it, you're going to expect that if you run it a second time, the same thing is going to

20:38

happen. And if it doesn't, you probably start thinking about, oh I I maybe I have a memory allocation bug or some sort of like transient behavior problem in my program. But Henry, as you mentioned, LLMS are inherently nondeterministic. So you ask for some code and you get the code, and then you ask a second time and you'll get different code and you ask a third time and you'll get different code. And this is first of all kind of challenging as an instructor.

21:07

We have reported recently that it makes it very difficult to plan sometimes because typically, you know, our lectures are sort of scripted in some ways where we need to demonstrate specific things in our lecture. And it's very hard to do that when you don't know what the LLM is going to respond in real time in class. So there is that. But on the other hand there's also a benefit, believe it or not, of being non deterministic. And that is because these things can make mistakes.

21:35

Imagine how frustrating it would be if you asked it for some code and they gave you some code and it was wrong and then you were like, well OK, now what do I do? Do I try it again. And in this case you don't want it to be deterministic, right? Because then you're just going to get the same wrong code every time. So the fact that it is non deterministic means that you have a chance even if the most

21:54

probable response is wrong. Maybe you can ask again, or look at maybe the top five or top ten and maybe you could pick out the correct code from that list. And this is a skill, right? This is a skill that our students or learners did not need before, but they do now, right?

22:13

So because the first response may not be correct, students have to know how to go through the list of potential solutions and figure out which ones are perhaps not correct immediately, but which ones are worth further testing, further consideration. And so for that reason, Leo and I are very careful to continue to teach the programming language.

22:37

So it's true, our book and Leo's students are working with LMS throughout the course, but at the same time, we are still teaching the students Python, because right now Python is the language that we teach in the introductory computing courses, and it's still a important part of the loop. And so we're not yet at the point where you can give English or whatever natural language instructions to the LLM and get

23:08

back your language. You're still getting back Python code, and so learners still need to understand and work with Python, not at the syntax level. Like we said, we're spending less time on the low level syntax details, but they still need to understand Python, And one of the reasons is so that when this non determinism is happening, they can look at and evaluate a bunch of different solutions for which ones may be

23:35

correct. So I think the interactions that you mentioned, right, asking back and forth, you know if you got the first solution probably not quite right, you ask again and you ask again back and forth until you find the right solution. I think this comes to the term prompting, right? So I think many people would have heard now prompt engineering is kind of like a new job even. So tell us about this prompting, right. I think it's a skill set.

23:57

It's a new skill set that everyone needs to learn in order to get the best out of AI system. Maybe a little bit about prompting, Like what do you feel about this new skill set? So prompt engineering is just the task of writing a prompt in such a way that the LLM will give you back a good response. And what is tricky there when we're teaching people who don't know how to program to start, is the LMS do very well if you describe problems in a technical

24:25

language. So if you say I want this function to find the largest value in this list, right? Like that. I'm using terms that we know as computer scientists, right? I want to make sure I describe it as a list if it's in Python or if an array if it's in Java or things like that, right? And I'm saying I want to find the maximum value. So I'm specifically saying

24:44

exactly the behavior I want. I may even describe it even better to say something like I write a function that returns the largest value in the parameter list. Now I really specify the behavior and the LM does a much better job of that. The problem is, I'm using keywords that you have to teach learners, and so there is still this task of teaching them how we would speak about these functions. And I think, and by no means an

25:10

expert on how LLMS work. However, they are reading from large code bases and they're learning from those large code bases. And so in a sense, what we're trying to get them to do is just give that function header that a human would have written to describe their function. And if you can generate one that's very close to behavior you want, the model is going to find something very similar in its train set and then generate code that's going to be paired

25:33

with that. The other thing you do with prompts is there's a whole bunch of ways in which you can basically tell the LM to behave in a particular way, and Dan has a lot more experience with that, so I'll let him take it from there. Yeah, we had a good time near the end of our book, sort of going into these other prompt interaction patterns, people have started to catalogue the different ways of interacting with the LLM.

25:57

It kind of reminds me of the object oriented patterns, you know Henry, like Observer pattern and Model View, Controller pattern and Visitor pattern and all these patterns that people have identified and they documented over the past several decades. And people are starting to do that now with LLMS. And I probably could have kept going and going about it in our textbook, but I managed to control myself and I only, I think I only talked about a couple, but they're very interesting.

26:24

And so, for example, one of them is what if you don't know what information the LM needs to perform your task? And this links back to what Leo just said. You have to be very precise sometimes in your natural language, hopefully not as precise as you do with programming. But like Leo said, you would still need to know a lot of the terminology that you might not know. And so one thing you can do is you can use this flipped interaction pattern where you

26:52

ask the LM to ask you questions. I think the example we have in the book is you want a function to validate a password and you might not know how to ask for such a function. And So what you can do is you can ask the LLM to ask you for all the information and needs and once it's done asking you, it will write the function. So one of its first questions to you might be OK what are the parameters? And you might be like, well I don't know what parameters are.

27:20

And then it will tell you what the parameters are. And hopefully then you can be able to answer that question. I mean, there is a risk here of you going down a rabbit hole that you don't understand. And so we need to balance these prompts patterns against teaching the fundamentals of programming. But there's another pattern that I find kind of interesting, which is called the persona pattern. And a lot of educators are using this to good effects right now where you can ask the AI to act

27:47

like a specific kind of person. And So what educators right now are doing is they're using the persona pattern and they're saying, OK, LLMURA CS1 instructor. And that conveys a lot of information like don't use advanced programming concepts that have not been taught yet. Or the persona pattern could be like you are a student in an introductory computer science course, things like that. So that you try to scope the types of responses that you get.

28:16

You try to change the types of responses from the default ones, because there are a lot of situations where the default responses might include code that students have not seen before, might use types of code that you don't want them to see yet. It's like not in the scope of the course, and it's kind of amazing to me how much leverage you can get by just telling the LLM how to behave in the

28:39

upcoming interactions. So it's definitely an ongoing area of research and I'm definitely of listening carefully to what's going on there. Right. Very interesting to hear some of the patterns right. I've read your book as well, so I find it also very very interesting. So for people who might have applied AI assistant like with ChatGPT or bot, right, They would have seen this pattern as well. Some people also share if you wanna do something, here are the catalogs of prompts that you can

29:04

use to solve the problem, right. I think same thing applies for programming. So I mean flip interaction pattern, persona pattern, those are definitely interesting. So it's not just one way we ask question and the AI assistant will just give the solution. You can also be creative and sometimes use it differently, right? So I think Thanks for mentioning the patterns. I would love to see more patterns in the future. So I think we'll leave to those creative people to come up with

29:28

the patterns, right. So maybe if you can share from your experience so far using the tool, you know, cracking the code right? Are there any techniques that probably is a little bit less utilized for now, but for people to try it so that they can actually see the true power of LLM in their day-to-day workflow?

29:45

It's a good question Henry. So I I think a lot of people are used to, if they've played with these tools at all, they're kind of used to asking chat GDT or GitHub copilot for code and that's a great use case. Something else Leo and I have learned is you can also use these tools to ask for libraries or modules that you might be able to use to make your task

30:07

easier. So the chapter we have in our book is called Automating Tedious Tasks and it's amazing to me how many libraries in Python are available to help you. If you didn't know about these libraries like I'll just pick a random example. We have an example in the chapter about automating the tedious tasks where you've got two huge directories of images and the back story in the book is maybe they came from

30:39

different phones. So like your partner that has a bunch of pictures on their phone and you've a bunch of pictures on your phone and they're duplicates because you've been sending them back and forth. I think everybody listening kind of knows what kind of mess you can get yourself into. And so the idea is we want to remove the duplicate pictures. And this I think sounds like a super daunting task until you realize that if you ask copilot or ChatGPT, hey, here's a task I

31:05

want to perform. Is there a Python module or library that I can use? It will come back and tell you about the libraries that are available that might help you. For example, something that tells you if two pictures are the same picture, like all the pixels are the same. And then you can ask copilot for clarifications. You can say is this module built in? Is it something I have to install?

31:27

Are there other alternatives? And actually I want to throw it over to Leo for a second, because in your class at the end of 2023, I think you managed in one lecture to do an example of adding up all the word counts in like a ton of documents. And this is something I'm assuming you would not have done in a normal CS1. I guess is, you know, like, is that accurate? Absolutely.

31:52

I I I think it's really hard for students to work with new libraries, and so the LLM having that conversation is really clean. It gives you nice examples. And so I meant for this to be a whole lecture. And what happened was that I ran along from the previous lecture and we just had that quick conversation with the students. But I'm asking Copilot, what's a good library for me to find out how many words are in a Word document? It gave me a great answer. It gave me some star code to

32:15

work with. And within I think it was about 15 minutes I actually spent with my students and we we'd solved the problem. And that's just way beyond the scope of what we normally teach. NC is when I completely agree with you again. Yeah. And again, it gets at that higher level of abstraction, right? Like a lot of listeners have probably had to dig into API docs reading about how functions

32:38

are called. Oh, this thing takes 5 parameters, the first one is a pointer to a pointer to a stream, and the second one is blah blah blah blah blah and all you want to do is just use this thing and the sample code is out there and these LMS could access it. And I just found the example really cool. Leo, that something we would not have even attempted in a previous introductory computing course.

32:59

We wouldn't have attempted it because we would have wasted too much time learning or showing students how to use the library, how to call these functions correctly, and now we can just do it. And I think it speaks to the resilience of the students right when they learn this way, when they're working with a whole bunch of different libraries, because they're working with a whole bunch of different samples from the LLMI think they become more resilient than we can give

33:21

them to small little code snippets that perform specific tasks and specific domains. Jumping from domain domain I think helps them a lot in building kind of a robust understanding. Right. So I think another important thing about AI Assistant, right, I think Leo mentioned it a little bit. It is trained from a large code base. So if you're using Copilot, I think maybe most likely it is trained from GitHub repositories, right. I think there's a race about copyright.

33:47

There's a race about just bluntly copying, you know from those repositories. Any kind of race that you have seen so far from the introduction of these tools. Henry, that's a fantastic point about the ethics of using these tools, and I think there's a few directions we could take this discussion. And the first is the ownership of that code base that they used to learn from.

34:09

I don't think we as a society have figured out first how we should view this ethically and 2nd, how should we view this legally? We are obviously building tools that can help empower people. And so in some light we would say this is a good ethical thing. But we do have to ask how are the tools built and who aren't benefiting from their code potentially being taken or things like this? So that's kind of the first concern. The second concern would be a

34:34

copyright. Are the LMS commonly parroting code which might be under someone's ownership? Is it hard to assess, particularly for the kind of small pieces of code that LMS tend to be able to generate Well? But occasionally, I mean, I've seen it in my interaction with a copilot.

34:51

Occasionally it will generate an author name like in in its recommendations to me, and then clearly like, I don't know if that's the author or if it's just doing next word prediction and it happened to say author and then predicted some words after that. But it does give you some doubts about where this came from and whether or not we have ownership over it. And so we said fairly early on in our book, this hasn't been resolved legally yet.

35:14

Feel free to use these tools kind of for your own use, but if you were to go try to build a company off the software that you're writing, you should be a little careful until these laws get resolved. And the third piece I'd say in terms of the ethics is models. And we've seen this across artificial intelligence. Models reflect biases within society. And so if you ask for a list of names, it will probably give you a list of Caucasian male names

35:41

just on first try. And you have to ask a question, why would it do that? Like why is that its default? And it's obviously learning from a code base that probably has those more representative, but that's not a good sign for students who are coming in not from those groups.

35:55

What I think is important to do, since I don't think, again, we've worked out these issues as a society, is to bring in the readers for a book and bring in the students in our class into this conversation and say these are the ethical concerns of these models and have that a direct conversation about it and be frank about what we know and what we don't know.

36:14

I think the fear is if we aren't, if we kind of pretend these models don't exist and we try not to let the students use the models and they go on to use them on their own, they're going to run into these issues. And so it's better for us to teach them up upfront than to just leave them the blind on it. It also just sort of seems just to keep going off what Leo just

36:32

said. It also seems a little to me as a teacher, it seems a little dishonest to not show students these tools as soon as we do. But then what Leo said comes into the picture, right then. We cannot pretend that these issues don't exist. But there are people who try, you know, to pretend these tools don't exist and ban them so the students can't use them in their courses. And I totally get why it's a very upsetting thing that has

36:59

happened. I don't mean upsetting in terms of, you know, like making me sad. I mean in terms of, like, upsetting the status quo of how courses are taught. And it's very tempting to just try to pretend these tools away. But the tools are out there and our students are going to be using them. And I think more importantly, they're going to be using them when they get their next Co-op position or their next industry

37:24

job. Or at the very least, they're going to be asked about these at future companies and asked about their opinions of these tools. And I just have to super agree with Leo on this. We need to be teaching these ethical concerns. We may not have solutions to them, but I don't think a solution is to try to scare students away from using these tools or somehow trying to prevent them from using these tools, because it's never going

37:50

to happen. And I think it's more useful if we teach the tools along with the concerns that we have. Like, I think it goes without saying, but we have a lot of work to do, right? Like, the issues that Leo just mentioned are not small. There's a reason that they're at the beginning of our book and not at the end. Right? Like, these are not like a oh, by the way, you know, these things are going to reproduce like cultural norms. Like, still, this is a big deal,

38:13

right? We can't just say it at this. Oh, this is like, look at Appendix A for the problems. This is not an Appendix A stuff. This is like chapter one stuff. So Leo, you know in your course you talk about these early on, but I don't think that that means we can't use these tools. I think actually it makes it more likely that our students will use these tools appropriately.

38:34

I think one of the worst things we can do is introduce students to these tools and then not help them understand what the costs are. Because I even think once students understand what's going on, they'll be on the lookout for this and they won't just accept whatever the LLM tells them as the correct answer, right. So we're trying to balance the fact that they're out there and students are going to be using these tools with also training students to understand the deficits.

39:03

And who knows, our students might be the ones who end up in positions where they can make these kinds of improvements. Like you know, students are potentially a couple of years away from graduating and being able to inform how these tools are deployed and how these tools are used. So I definitely think that this is a very important part and a new component of an introductory computer science course.

39:27

So thank you for highlighting this potential risk of using AII think it's not just for coding or programming, right? I think it's a bigger conversation, responsible use of AI, copyright and for example ownership as well bias. I think all these is like it's a new thing, right? So people are trying to grasp some countries also try to come up with the guidelines, right? But I think you are right, maybe banning it all together might not be the wise idea.

39:50

We have to adapt with this tool. And I think we all as a user of this AI assistant, right? At the end of the day, when you use the code and apply it to your system, right, it is also your responsibility to actually make sure that the thing that you apply is correct, right? Because it might potentially affect other people's lives as well. So I think one related question about using this tool right in our day-to-day life is definitely people are afraid of

40:16

being replaced. Many people think that, oh, we don't need so many developers anymore. You can probably cut down the number of people that we have in the companies, right? The potential is there for people to think that we may not need so many, you know, developers anymore. What's your take about this? I know it's probably hard to know the actual impact, but what's your take about some people being afraid of OK, AI is going to take over the world and, you know, replace so many

40:43

people. OK, I think if any of your listeners spend a little time with Copilot, their fears will be quickly taken away. So, I mean, these tools are fantastic. They do great things, but they make mistakes. You realize very quickly there are still essential skills that are required to use them properly. And so I don't think we as programmers are going to go away and that that's kind of the first take away. The 2nd is of our jobs as

41:07

software engineers. And you know this far better than I do. But as software engineers, only a fraction of your time is spent coding. A lot of your time is spent thinking, how should I lay out the interfaces, how do I work with the other software within the company? How do I make sure I've got really clear requirements for my code?

41:27

Like, all of these things are the really big problems that still humans have to wrestle with, and I'm not seeing LLMS taking that away from us anytime soon, at least for now. And Leo, I guess just to add, I think if you look back at computing evolution, I wonder if people have had the same discussion when Visual Basic came out, you know, where you could drag and drop components onto a form. I wonder if back in 1995 people were saying, Oh, well, that's it. We have like rapid application

41:57

development. I heard that term. It was called RAD, I think, and I I wasn't around really. I was like a kid having a good time. But I think probably people back then were saying the same sorts of things, right? Like, oh, look at this, We can develop these applications by dragging and dropping.

42:12

And I think these advances, I don't know if they lead to more or less jobs, but I think it's likely that it's going to be a steady state and perhaps will be more productive with what we're able to do. Just to reiterate what Leo said, I don't think that they make jobs in programming go away. I should also add, and Leo, I wonder what your opinion is on this. Most of what we've been talking about in reading has been for introductory programming.

42:43

I don't know if we know what the impact will be on industry level projects. We know people are using these tools in industry and we know they're more productive, but I don't know if we know whether there are more or fewer jobs or if there will be in the future. I just I have a feeling that the result will be that the existing programmers are just going to be more efficient. I agree.

43:10

I suspect there's going to be AI think there'll be a shift to like with all the other major advances in technology. When Python came out, we didn't say, oh OK, well, we need fewer people to write code. It was a oh geez, we can write larger software or more quickly do data analysis or now deal with the influx of big data. Like we've just adjusted and done bigger and better things as the technology got better. And so naively, I think that's the case.

43:36

But I do think there's gonna be a bunch of research on this topic. I mean the next 10 years, probably in the software engineering community. Yeah, maybe one few things that I pick from the industry point of view, right? I mean there are maybe people saying that it improves their productivity, maybe like 3040%. Maybe the gap from junior and senior might be lesser now because the juniors might be able to take on more advanced

43:59

and complicated problems. But I agree with Leo that writing code is not the only job for software developers, right? So they still need to understand requirements. And we know in the industry a lot of times requirements are vague or not well specified, right? So I think it's the software developer's job to actually translate that into a good design, proper design. And also don't forget about evolving the code right, writing it in such a maintainable way, writing it in such a way it can

44:24

scale. I think Those things I still haven't heard that the LLMS can do for us. For example, you tell them build me a few microservices that can interact with these kind of APIs. I think that will be too much task for LLMS to solve, but maybe one day it would Happy to see that future.

44:41

But for now, I think it might take as well that we have to be able to live with it, leverage with it to improve the productivity so that we can move on to solve bigger and bigger problems just like Leo said, right? And I'd add that it's not great at writing efficient code. So if you're you say no, this is this is an inefficient algorithm, Could you use dynamic programming to solve this? At least in my experience it hasn't done very well.

45:04

And then I did try. I teach a really specialized class on writing high performance software that's architecture aware, so like knowing about caches and like super high efficient code, extracting cache locality, things like that. And it did terrible. Like I asked it to write like a blocked matrix, matrix multiply and it could not do that in any way. So I think there's still for the advanced code. There's a lot of room for us as software engineers to be

45:29

developing those ourselves. Yeah. I guess what I find kind of interesting about the discourse right now is because it's so new that people want to be able to make these claims like LMS or crap, right? Or LLMS are amazing. And, you know, it's very early, so people are going to make these kinds of claims right now. But I guess I'm more interested in what happens when the dust

45:51

settles. And I think all polarizing opinions right now, I don't think any of them are going to end up being what actually happens, right? Like, is every software engineer going to be fired? No. Are we going to have a different number of software engineers? Probably, right. Like it's to some extent. I don't know if it's going to be more or less, but I think there are many, many opinions right now. Leo, what's that statement you have about overestimating the effects of technology?

46:18

Oh, there's this famous quote. Let me see. Yeah, track it down. But there's a famous quote which is like we tend to overestimate the effects of technology in the short term and underestimate it in the long term. Yeah. And so perhaps that's what's happening and that's this kind of stuff that grabs headlines too, right? So I mean, we're still in the throes of this thing where it's very difficult right now to separate pipe from what's actually happening.

46:40

I guess I look forward to maybe getting to the point where we have more research backing, because until that it's fun and everything, but it's just people talking about the things that we don't really know the answer until the research gets done. And that it's Amara's Law is the name of it. So Amara's Law is the we tend to underestimate the effects in the long run, but overestimate in the short term.

47:05

Yep. So I think, yeah, one thing clear for sure, right, if you rely too much on LLM, I think still we are not there yet, right? So I think in your book you also mentioned it is not an expert, it is actually trained from existing code bases, right. So for example, if you want to solve a new problem, maybe quantum computing, let's say, it may not be able to even give you a proper solution, right. So let's not forget about that.

47:26

I think we still need to use our judgment as well as a human to actually apply what LLM is suggesting to us into our software. So maybe one last point I would like to ask. Since you are also part of the university teaching students, right? You mentioned about equity opportunity, probably is last time for people to learn about programming computer science,

47:47

right? There are only limited number of people now with this introduction of AI system, potentially more people will be able to get into computer science and learn about programming. Maybe about the syntax problem will soon becomes lesser of a challenge. So what is your take on this? Creating a more equitable kind of a society for people to learn computer science? Yeah. Thanks, Henry. So this is something that Leo and I have been thinking a lot about and we're excited by the

48:14

possibilities here. But we don't want to say anything too early because again, we don't know what's going to end up happening. But just to summarize for everybody, the deal is that people who already have prior programming experience they it's unsurprising, but they tend to do better in introductory CS courses.

48:35

So if they had more opportunities in high school, for example for you know, their parents had access to some maybe computing or courses or they directed the students into this field, then they tend to perform better. And I wouldn't necessarily have a problem with this except that these opportunities are not evenly distributed and so they're made more accessible to dominant groups.

49:00

And so then this gap in prior experience leads to a gap across different types of students, which is obviously not OK. It's what we're hoping, and the research is ongoing or hoping is that because there's a reduced emphasis on syntax using LMS, we're hoping that prior experience, while it will certainly still exist, the gaps will still exist. Perhaps the gaps in prior experience will not lead to the gaps in outcomes that we've been seeing in introductory CS

49:34

courses. So again, there are a lot of caveats here. One of them, for example, is that maybe the students with privilege are going to be using LMS earlier than other students, and then they'll have prior experience using LLMS too, and that may conveyed advantage just like a prior programming experience does right now. I guess our hope stems from the fact that learning syntax is so difficult and it's such a barrier for so many students.

50:05

And then maybe these LLM skills, maybe the gap can be made smaller more quickly. I want to ask Leo to jump in here too, because this is a question that's definitely worthy of multiple discussion points. Oh, absolutely. I I think you've summarized the issue really well. I think there's a couple of other reasons for optimism and I am being very cautiously

50:27

optimistic. As Dan points out, we have to do the research, but I mentioned earlier the kind of status quo of how we assess students in computer science classes and it's solving these really small functions that aren't particularly exciting to be quite frank.

50:42

And there's been a whole bunch of work within the community that has shown that students from demographic groups that are currently under representing computing tend to want to see that their work is going to help society, It's going to be for the societal good and they want to see that the computing can

51:02

serve that good. I think when we move to LLMS you end up, unless you want to do these outdated assignments that they LM solve for you, you have to move to these kind of open-ended large projects, which is what we were using in our class. And then they can pick the domain that matters to them and then it can be something that's

51:20

meaningful to them personally. And I think if you can do that, I think we're going to bring in a broader audience of people who are interested in competing because they see how it matters for them as people. So that'd be the first reason for optimism. And then the second one is 1, where I'm also kind of cautiously optimistic.

51:35

And as there's been a whole bunch of research and already started by members of our community, that's really interesting in terms of how can we turn these AI assistants into tutors, essentially intelligent tutoring systems. How could we help through prompt engineering, through really careful crafting of the introductory prompts.

51:55

How can we make these make it so when the students struggling, they don't have to wait till the next office hours of instructor, they can just have a quick conversation and they're going to get mostly correct answers. Which is how with LLMS you got to get correct answers, you're going to get encouraging answers ones, they'll encourage them to keep trying. How can we get them the help they need, when they need it is if there's a gap in terms of how much support different groups

52:19

need. Making sure everyone has lots of support will help everyone. That will help disadvantage groups more. Yeah, and Leo, It's not impossible that this happens. Like in case people are skeptical out there. Leo and I, of course are disinterested as well. Slash skeptical because we're scientists, but there is precedent for something good to happen here. And Leo specifically, I'm thinking about the way that we teach our classes.

52:43

So for example, using student discussion in classes through something called Peer instruction seems to be able to reduce this gap. Yeah, it seems that this techniques like active learning disproportionately help students who are underprivileged, and so it helps everyone kind of boat raises all waters, but the folks who are struggling are raised more when you see a larger impact for those struggling groups. Yeah. And and that's because the new supports are there, right?

53:10

Like other students, in the case of peer instruction, it's I think perhaps partially a community aspect. So now they have more students who can help them kind of catch up. And so this is the hope right now. So people are already referring to LL Ms. as like one-on-one tutors. And I'm not willing to go there quite yet. But I think that's the dream, right? Like Leo said, the dream is that they can reduce the time delay between having a question and getting an answer.

53:37

Because if we can reduce that to 0, like just imagine that any question a student has could be answered immediately. That bodes well for students to catch up, right? As a lot of the times I think the limiting factor is just resources, right? Like I only have office hours once a week for example. So if a student gets stuck before, maybe they have to wait for me to get them unstuck and maybe they can get unstuck sooner with LLMS and then catch up. So again, this is just kind of

54:05

the hope right now. Maybe in a few years we can revisit this and say yes, we were right or no we were not. But for now, it's definitely something we're interested in, I. Think that brings up a really good point of kind of comparison groups, which is where I've kind of shifted my thinking about how are we comparing. And so I'll give you kind of three examples here.

54:23

One is, you'll hear folks say we can't change what we're teaching in our introductory courses right now because students are learning the fundamentals and they they'll start kind of hammering on how great the currency is. One class is, but the evidence is indirectly students finishing interview programming class. The majority of them can't find the average of positive numbers in a list. That's like a super easy task for computer scientists, like

54:48

for software engineers. And the majority of students can't do that at the end of ACS one. So we we need to make sure that we're very clear about what we're comparing against. What's happening now isn't

54:56

successful for everyone. The second reason this is what you made me think of for the tutors, was we've done a whole bunch of research out of my lab, finding that both students and tutors have significant incentives to essentially just give away the answer and just fix the problem for the student right there. Basically act as human debuggers without actually teaching the process.

55:17

And so I think when we imagine that human tutor interacting with a student, we imagine the great teacher like Dan, like sitting down and going back to like step one and diagnosing the problem and giving them the right instruction, the right time to address their misconceptions. When the reality is, it's mostly students giving this kind of tutoring help and they're maybe not giving the best instruction.

55:39

And so we have to be honest with ourselves about what are these AI assistants being compared against, and then we can actually do a fair comparison, right? I think it's like what you said, right? We can be cautiously optimistic about this kind of equitable future, right. So I think really looking forward for more chances, more opportunities for people. It's been a great conversation so far, right? So I think we will have a lot of more topics if we don't cut it

56:04

short for now, right? I have one last question before we wrap up, which normally I ask for all my guests. I call this 3 technical leadership wisdom. You can think of it just like advice as well for people to learn from you. Maybe if you can share your version of three technical leadership wisdom. Henry I I love that question and if you don't mind, I've got a slightly long answer for my first one, and it's I had a really close colleague who is just a fantastic cyclist.

56:27

His name is Allen Snavely here at UC San Diego, and he was part of a race and he was a fantastic cyclist and they were in the second pack. Even if all of cycling knows that you're in packs and the first pack is up ahead of them, they can't catch them. But at one point along the race, like the front pack seems to go the wrong direction and Allen pretty quickly realizes, wait, that's actually not the direction to the finish, What are they doing?

56:52

And so he steers the second pack towards the finish and it's the one race that he ever got to win because the main back went off in the wrong direction. And whenever he tells that story, I I always get a kick out of it. But it reminds me from a leadership perspective that it's important to be good. It's important to be fast, it's important to be able to be productive, but it's just as important or even more important to know where we're going.

57:14

And so I spend a lot of time with my group and with my lab and making sure we have a vision for where we're going. It's that we are going the right direction. Thanks Leo. That's a powerful one, especially for researchers like us to remember. I have another one I think that relates to research too, which is actually even more important now I think for people who are not researchers as well because of the LLM discourse right now.

57:36

And that is always test assumptions or always be aware of assumptions that people are making. And I I bring this up specifically now because I think we're at the beginning of this in a flood of research and commentary that's going to come out about LLMS. I mean, obviously this applies to everything, right? Always, you know, take the time to understand where the writer is coming from or where your own assumptions are coming from.

58:03

But especially now I just want to caution that people are going to be making sweeping statements about LLMS and Leo. And I read a lot of research around LLMS, and often, you know, if you're a busy researcher, busy professor, you can get some summary of the paper by reading the abstract. Sort of great practice. But if you're very busy, you can get a sense of what the paper's doing. I don't think this necessarily works for LLM papers.

58:32

There's so many assumptions that are baked in to the experiments that people are doing right now, We can't even agree on the right skills that we want students to have when they're working with LLMS anymore. And so I think we're seeing a lot of papers that like there are the headlines, like LLMS suck or LLMS are amazing or whatever.

58:53

But I think we need to dig beneath the headlines to see exactly what's going on, especially in a new area like LLMS where there are so many assumptions that have not even been written down yet that people might be making. That's a brilliant point, Dan. And we see with the new papers coming in, they're coming in very quickly. And because we were trying in such a race to get the research done, it's really important for us to go to the methods and

59:19

actually read the paper fully. And I know, I know you're fantastic at that. And so I I really hope all the other practitioners, all the people teaching programming, do the same thing. Spend their time making sure they understand the studies that have been done. Yeah. And it's not even that anybody who's involved is being deceptive. I think everybody's being super honest about what's happening. But the assumptions, I think, are so new that we're not even

59:43

necessarily writing them down. Like if we're not being careful enough, we may be making assumptions about LLMS, like, So for example, I could just think in my head, OK, students still must know syntax. And maybe that's true, Maybe it's not true, but it might be so obvious to me one way or the other that I just might not even take it into account in my research. And this is one of the most dangerous things for researchers, right, Leo?

01:00:08

It's like an assumption that is apparently super obvious that you don't even question it, or even worse, you don't even write it down. And I think we're. As a community, we're at risk of doing this right now because everything is moving so quickly. Exactly. We've been studying how to teach programming for the last 40 years, and so we've so many assumptions built in about that. I think just even the assumption of what is the angle, like, is syntax an end goal of a intro programming class?

01:00:36

We don't know. Like, I think there's going to be a whole bunch of discussion about that. Yeah. Or like, does it make sense to compare what students learn with LLMS against what they learn without LLMS? Like, what do you compare? Right. What's important? Like, I don't think we know the answers to these questions. So I guess I'm asking more questions than I'm answering, which I don't think was what Henry wanted for this section.

01:01:00

Yeah. Yeah. And Leo, you have one more I think you wanted to share. Oh yeah, absolutely. So the last piece, and this is going to be me, honestly kind of just taking from the great wisdom that Henry's already shared previously with some of his guests. And that's I believe everything is done with people. Like if Dan and I work fantastically together, I love working with my lab. And so I believe very fervently in the notion of empowered

01:01:22

teams. And the message from Marty Kagan really resonates with me. I first heard from Monty Hammond Tree at Microsoft and it's just a really powerful message of you want to make sure your teams are empowered to be able to do the work they want to do and solve important problems.

01:01:36

And I think Dan and I have both seen this as PhD advisors in empowering our PhD students to find their own path is probably one of the best things we get to do as faculty of watching them, not really knowing what they want to study initially and us really being close to them on every project they do to six years later, five or six years later when they are now essentially running their own research program and we're just

01:02:00

giving them occasional advice. And so I I think for the tech leaders out there who've been listening to Marty Kagan's message of empowered teams, I think it applies more than just software engineering teams. I think you empower all the people who work with you and you, you end up in a better place. Yeah, it's like tutoring, really. It's like one-on-one work is the really the most powerful work you can do. I'll take my classes of 300 or 400 or whatever.

01:02:24

I'll you know, I'll do my best. But you can't match a small team just empowered to do great work. So I totally agree. So yeah, I think for people who want to learn more about this AI assistant from your book, is there any resources or a place where they can find you online? Yes.

01:02:39

If people are interested in getting our book and trying to learn how to write software with the AI assistant, they can just look for our book on Amazon. It's freely available in all countries and we candidly are very open to feedback. This is a very, very new space. As readers work through it, we would appreciate the emails or the comments on LinkedIn that would let us know how they're appreciating the book and what we could do better for a second edition.

01:03:07

Thanks for organizing this for us, Henry. Thank you for listening to this episode and for staying right until the end. If you highly enjoyed it, I would appreciate if you share it with your friends and colleagues who you think would also benefit from listening to this episode. And if you're new to the podcast, make sure to subscribe and leave me your valuable review and feedback. It helps me a lot in order to grow this podcast better.

01:03:33

You can also find the full show notes of this conversation on the episode page at techlitjournal dot dev website, including the full transcript, interesting quotes, and links to the resources mentioned from the conversation. And lastly, make sure to subscribe to the show's mailing list on techlitjournal dot dev to get notified for any future episodes. Stay tuned for the next Techly Journal episode, and until then, goodbye.

Transcript source: Provided by creator in RSS feed: download file

#165 - Learning to Program in the Era of Generative AI - Leo Porter & Daniel Zingaro

Episode description

Transcript