#222 - Closing the Knowledge Gap in Your Legacy Code with AI - Omer Rosenbaum | Tech Lead Journal podcast

⁠¶ Trailer & Intro

00:00

The greatest risk with the eye is that if you don't understand what it does, then you really sabotage your own understanding and learning journey. I don't think it makes sense for someone in 2025 to work even the same as they worked in 2024. The danger will be even worse, right, if some of these agents hallucinate and they all hallucinate with each other. At some point it's going to be hard for a human to understand what's going on, and that's where I think they'll have to

00:25

stop and think. Will AI actually improve our critical thinking or actually reduce our critical thinking? They do think that people will get burned a lot because they will believe what AI sent them and then they'll learn in the hard way that they shouldn't have. They should resist the urge to send out anything you don't completely understand. Before the people were scared or reluctant to write because they would say OK, no one ever read it. Now AI read is going to read it.

00:48

AI is not lazy. AI is going to read it right. AI is a super powerful tool and especially in explaining things to humans and also aligning humans to ask questions in actual language. It's amazing. But we shouldn't expect AI to understand the code. No, AI can look in the code and understand that the way didn't do 5 things that are not implemented because you tried them sometime and you know that it's not going to work I. Think there's some danger here about hallucination.

01:13

How much context should you give to AI and? When you provide it with complex code bases, AI makes mistakes. It has some very fundamental limitations and that's where you need other techniques like. Hello everyone, welcome back to another new episode of the Techni Journal podcast. Today I have with me the CTO and Co founder of swim, Swim dot IO right, Omar Rosenbaum. He's here with me today in the

01:52

show. So I think we plan to have a lot, a lot of discussions about, you know, learning research, AI, of course, and also other technical leadership wisdom that Omar will share today with us. So, Omar, looking forward for our conversation today. Me too, Henry. Thank you for having me, it's great to be here.

⁠¶ Career Turning Points

02:11

Yeah. Omair, I'd like to invite you maybe share something about yourself first. Any career turning points that you think we can learn from you? Of course. So I started my career and I had an opportunity to be a part of a special technological unit and actually started with a very special training that kind of changed my life. It was 13 weeks throughout which I learned more than I had in all of my years before that this

02:40

professionally speaking. And it was really mind blowing, like how much you can learn in such a short amount of time and also the different methodologies. And it really got me into getting interested in teaching or in learning, which also affected a lot of the things I did later. Then I went to the university where I took all kinds of courses in variety of topics including chemistry, math, linguistics, psychology and

03:12

others. And while I was a student that we also taught a lot, wrote a few books and started a few training programs, one in Singapore by the way. So yeah. So I got to teach in different places and again think about teaching methodology as well as research methodology. And in 2019 I Co founded SWIM and ever since then I've been the CTO and the Co founder here. Well thank you for sharing your interesting journey. I didn't know that you kind of like started a job first before

03:46

you go into university. What do you think is an advantage of starting the job first before you get into uni? Yeah. So first of all, you don't have to start a job per SE. It's just that you basically have a few years of hands, hands on experience, right, which is more similar to a job. And then when you go to the university, either you choose to focus a you study computer science, right? But you come with understanding what this looks like an application in the real world.

04:13

And you want, you want to learn about the science or about specific aspects of it like algorithms and so on. And you can also choose to learn other things and broaden your horizons while you already have a lot of knowledge and a firm basis based on the hands on experience you got. So I think it gives the whole learning experience a very different atmosphere. And you can approach it not as like, this is my way to get a job because I already have my skills, right? I'm here to learn.

04:47

I'm here to deepen my knowledge to understand things better. I think that's really interesting. OMSO what I think as well, right? If you started kind of like a job first before you go into uni, you kind of like have a good understanding of what kind of roles that you like or don't like. And probably the hands on experience will be also quite relatable, right, when you study. Because sometimes I imagine if we go to uni, we can't even relate to some of the subjects, right?

05:11

So we don't even know it's going to be useful or not. And I think a lot of challenges in parts of the world as juniors, right, is actually to understand whether what I'm learning now is going to be usable or relatable with my job

⁠¶ What Juniors Should Do in the Age of AI

05:24

or not. And especially these days, you know, the crazies about AI, right? And there's a lot of fear for juniors about, you know, getting a job because some people think AI could replace the need for any kind of juniors. So maybe let's start our discussion by discussing about this challenge, right? So what do you think juniors should do in the age of AI these days? About getting a good job or

05:45

about learning new skills? So first of all, I think juniors need to acknowledge that it's a challenging time for juniors. First of all, because it's unclear. I mean, we can postulate about the impact of AI on junior developers. Different people have different opinions. We don't know them for a fact, right? So clearly it's more challenging. And I think, I mean, as a junior, you can't impact the fact that now AI is changing the

06:14

world, right? What you can do is acknowledge that and try to find the opportunities. I think on one hand it's really challenging to justify the need for a junior developer for some tasks. If you think of a junior developer as someone who will do the small tasks right while gradually learning and becoming better, then now you could let some coding assistants do that for you, right? On the other hand, I don't think juniors are going to be extinct from the world, right?

06:48

Everyone starts as a junior and we eventually evolved to being more senior and more experienced. Everyone go through that. And I think for junior developers, the important thing is to understand what's important to learn and what is not. For example, I think learning with AI and how to use AI effectively is important and it's one of the advantages they

07:15

can get, right? Like if you see, I don't know our, whenever we see our grandparents operate the computer, we feel the difference, right? Even when you see like a 10 year old, he's computer native, right? They started with a computer, they know what it's like. So junior developers can actually start with AI and feel even more comfortable working with AI tools. Perhaps on the other hand, they should be super careful about not understanding things.

07:42

I think what differentiates experienced, high qualified and effective engineers or researchers from the rest is the fact they understand deeply how things work. And the greatest risk with the eye is that it seems to do, you know, you're asking it to do something, you get some code back, it kind of does what you wanted it to do. And if you don't understand what it does or you don't understand what the alternative is, then you really sabotage your own understanding and learning journey.

08:16

And as a junior, that's the most important thing, right? You need to learn and improve everyday. And also you might introduce critical bugs without knowing that. So I think the urge to just ship something, let's say I got a task, OK, I sent it to one of the AI coding assistants, or I use the completion tool or whatever. I got a result. I can just issue the PR. I think of juniors. You should resist the urge to send out anything you don't

08:45

completely understand. On the other hand, there is an amazing opportunity here before I I on something they could get stuck and just they need a senior developer to help them. That's it. Which might be a waste of time of the senior developer. Maybe I don't feel so comfortable as a junior developer to approach a senior to help me and admit that I'm blocked, right? And AI can actually unblock you from lots of things.

09:10

It might be a generic thing about, you know, as it's junior developers who might not know how to run some tests, which is something that generic that would take you a long time to learn. But if you ask Claude or GPT or whatever, you'd get an answer really quickly, right?

09:27

And also if you select a specific part of the code that is hard for you to understand and you ask AI to walk you through it step by step, it might be an easier, a faster way to learn than the non AI way of having to understand yourself. Or again, if you're stuck asking for help from someone. So it's like having someone with you. I think on in a sense the dream of every junior developer to have a senior developer who doesn't get tired of your questions. Right?

09:57

You just need to realize, not really a senior developer. They don't really know the code base. They don't really understand all the pros and cons, right? They're a limited senior developer with some strength and some weaknesses, and your goal is to make sure you understand what's going on. That's the main thing I think of the junior developer you have to do. I think you brought up very good points, right, specifically like

10:24

your term. You know I'm going to use it like AI native generation, right. So because when they go into the workforce these days, right, they can use AI. It is simply available, right? And I think looking back at my time back then, right, we probably the best that we had our books, right? Maybe Google Stack Overflow was probably also just starting, right? So like now you have AI natively

10:45

that you can use straight away. So I think this gives a lot of opportunity for sure for anyone to upscale and learn about something. And especially if the workplace also allows AI usage, you know, like AI coding assistance, you can also get up to speak to the code base, asking about specific parts of the code that you don't understand without bugging the so-called the senior developers, right?

⁠¶ Junior Developer's Responsbility When Using AI

11:05

And I think you brought up a good points about, you know, understanding your code base, even though it's generated by AI before you actually submit or even push it to an environment of production, right? So tell us maybe about this workflow, right? So for example, a junior, you know, specifically has a task and it has to run specific, I don't know, like business logic or something like that, but AI gives them like a code that they don't understand.

11:28

What specifically should a junior developer do? As a junior developer, I mean, right, let's start from what you shouldn't do, right? So I gave a specific coding assistant a small task and I wanted to see what it does. And I wanted it to help me save me time. And I told you, OK, write a test, implement the task and keep iterating until it passes the test, right? And then it says, OK, great, everything passes, you can work with it now. Then I look at the code, right?

11:59

And I see that it has an if for the specific case of the test and it does something to pass the test specifically, right? So it has a string of like the mock data that I used. So if that returned this OK, the test test great. Right now it's an extreme example, but it happened to me right now, right? I mean, very recently and a junior developer might say, OK, I have it, the test pass and I

12:22

will ship it, right? This would create a really bad impression of course, but also think of a less clear example where if you actually understand the code, you understand that it actually solved a very specific case and not the broader scope. Or maybe there are some other considerations, security considerations. Maybe there is something in there which puts a vulnerability, maybe performance considerations like it works, but it wouldn't work on scale.

12:54

And in case this code should operating scale, right? And performance is important, you must know that. So I think of the junior developer, you can take the task, explain it. By the way, the fact that AI makes you explaining natural language what the task is, is great. Because whenever I got to teach people how to program, the first thing I told them is explain to yourself. I wouldn't say natural language, right? Just explain to yourself what you're supposed to do and what the steps are.

13:22

Now if you write this to AI, it's actually a good practice. And then when you get back a response, you need to go over everything and understand everything and make sure it makes sense in the projects of the task. At which point, if you don't understand something, you can use AI. Ask it. Why did you do this? Is there another way? What are the pros and cons? Right, Keep checking that, but never ship anything you don't completely understand.

13:49

I think it would be very challenging for people now. It would require a lot of discipline not to ship something you don't completely understand. Because they're just like, why would I even care, right? People say like, AI will write code, then AI will fix the code. At least for now. It's very far from the truth. And again, for a junior, you need to understand so you can grow. You won't grow by delegating all of your tasks to AI. Yeah, I think that's a very good reminder, I would say, right?

14:19

Because sometimes I think even seniors using AI, right, if they think, oh, this code looks OK just by looking at a glance, right? Sometimes a particular bug could just appear and before you realize it makes a, you know, like a production issue or something like that, right? So I think no matter whether you're senior, junior, right, always look at the generated code by AI, right?

14:39

And make sure that you understand, again, like the key point that you emphasize is like understand exactly what the code is doing and maybe also the design, right? What certain aspects suggested by the AI, I think that's a very good thing. And I think this brings me to

⁠¶ AI and Critical Thinking

14:52

the next question that I'd like to ask, right? But simply just by understanding and, and maybe asking back, right, or maybe being curious about why certain things are suggested that way, it's actually like a critical thinking kind of a capability, right? And there's a recent research saying that, you know, using AIA lot will actually impact your critical thinking ability. Maybe it will reduce it or even make you less critical. So maybe in your point, what's your view about this?

15:17

Will AI actually improve our critical thinking or actually reduce our critical thinking? I think that it's hard to tell, right? I'm not a prophet. I do think there is a risk there, but I do think that people will get burned a lot because they will believe what AI sent them, like the responses they get, and then they'll learn in the hard way that they shouldn't have. So it might make them more willing to check everything they

15:44

get as a response. And so I'll be optimistic and say it would help our critical thinking if some junior developers are listening. I would advise you to do that elaborately. Make sure you want to improve on your critical thinking rather than get burned and then iterate from there. Yeah. I think like any kind of technological advancement, right, it makes your life easier, right? And by making life easier, sometimes we get lazy, so to speak, right? So don't forget like you have to

16:17

be critical, right? That's the first thing, understanding fundamentals. And I think the other aspect

⁠¶ Understanding & Preserving Domain Knowledge

16:21

that is really important, especially for junior, is actually understanding the domain knowledge, right? Maybe it's the business aspects of it, it may be, I don't know, other aspects of the code base that not necessarily just technological. So maybe in your view, what's your take about juniors, you know, upscaling themselves in

16:36

domain knowledge? Because yeah, sometimes you can ask AI maybe if it's like generic domain knowledge, but there are specific things that are in organization that probably is not easy for AI to suggest something. Of course. So I think we're raising 2 areas where juniors can grow here. 1 is the technological area where it will also be with you when you switch jobs or switch business domains, right? And it's the fundamental of being an engineer, I think. So it's a must.

17:06

But you're also raising another very valid point to AI excels as creating demos from scratch, right? Because it's something generic. It doesn't XLS much. It's getting better. We get other tools that enrich it with context, but AI never knows your company really. It doesn't understand the broader business logic. It wasn't in that meeting where the product manager met with the client, understood the needs, and then on the meeting whether the product manager told you what they need, right.

17:34

So I think one of the responsibilities of every engineer, not just junior engineers, would be to communicate to AI and to the code you're writing yourself, the specific business logic, business rules, constraints, all of that context that is unique to your organization. And I think writing it, by the way, is crucial and something we've been neglecting as humanity for various reason.

18:02

But I think with AI it can actually be easier because you explain to AI why you did something and then you can hopefully preserve that knowledge alongside the code.

⁠¶ The Importance of Written Knowledge for AI Usage

18:11

Yeah, so I think writing I've been hearing a lot of times, right, in so many different episodes, like writing skills, very crucial, right. Leaders especially, you can't just lead by talking to people. You have to write more, right? And writing here also means like some kind of knowledge base, right? And I know swim dot IO is, you know, kind of like dealing with knowledge problem, right.

18:32

So tell us, how can we actually learn better by, you know, having a knowledge base within the company or the practice itself, right? Having more writing more documentations within the organization. Right. So I think if you have an organization that values written knowledge and specifically documentation, you got lots of benefits before the era of Gen. AI and coding assistance. You would say it's crucial for developers and the communication between them, right?

19:03

I think one of the extremist cases I've experienced myself was a crucial real time system that two separate teams worked on. And there was a queue of messages where one team assumed that one was the top priority and 10 was the lowest priority. And the other team assumed exactly the opposite. And they didn't understand why they get random sequences of messages and it's not according to priority, right? So this is just an extreme clear

19:34

example of miscommunication. But in general, when you write code and it accumulates over time, what you lose is the business logic context and why you did lots of things. There are things you can't deduce from code. No way I can look in the code and understand that the way didn't do 5 things that are not implemented.

19:55

They're not there because you tried them sometime and you know that it's not going to work because of this and that and they don't know that it's a request by a specific client. So when you have such unique knowledge, it's critical to capture it and preserve it. And I think now given that we provide the context to AI coding assistants to help us with the coding tasks, it even more clear that you have clear value from

20:23

writing. Before the people were scared or reluctant to write because they would say, OK, no one ever read it. Now AI read is going to read it. AI is not lazy. AI is going to read it right. Your coding assistant will read your docs and use them. And then one of the biggest challenges is actually keeping that knowledge up to date with the code as it evolves.

20:44

And this is actually one of the things we solved first in SWIM, even before the era of Gen. AI as it is today when it was commonly used, we started by allowing developers to write documentation and make sure automatically that it's kept up to date with the code as the code evolves. With AI, it's even more crucial because AI can look at a piece of documentation, not know that it's outdated, and rely on it when generating code or other docs or tests and so on.

21:13

So this is one of the pillars for having a knowledge base that is comprehensive, describes everything you need to know, especially things that you cannot deduce from the code, and caps up to date as your code evolves and then reachable to both humans you might need it and AI assistance and agents. Wow, I think I left when you mentioned that before we were not so sure anyone would read the documentation, right?

21:40

So sometimes we are lazy to write some documentation because we think nobody is gonna read it. But now I think you make a fairpoint, right? The AI will be the first audience, especially if the AI tool has access to your documentations, right?

⁠¶ Limitations of AI in Understanding Knowledge Base

21:51

So I think you have dealt with this kind of challenge, knowledge sharing and also understanding documentations with your company, right? So tell us how maybe some use cases of how AI actually improves this kind of knowledge sharing and also documentation. Because all we know, I mean, we are laymen people, we think, we just feel to AI. AI will summarize, AI will tell you what to do. I think there's some danger here about hallucination.

22:14

You know the context, right? How much context should you give to AI? Do you actually also share everything within the organization to AI? So tell us a little bit more about this, because these are the nuances that I think some of us might not understand. I agree. And I think the difference lies in the small details sometimes because when you give AI code piece in isolation and you ask AI please explain what's happening here, it will do an amazing job most of the time.

22:41

However, when you talk about knowledge that you might need and you think what kind of knowledge is it? So one type of knowledge is understanding some code in isolation, right? But there are other things. The things that are not in the code cannot be deduced from the code, right? Obviously.

22:58

So again, some business logic that is not clearly translated into the code, requirements that stem from some regulation or client requirements, things you ended up not implementing for some reasons, all of that is not something you can deduce from just reading the code. And I think AI is great at writing things in a way that humans can understand, right? So if you provide it with very clear context and all the contact need, then you say, now create, write it in a way that's

23:30

easy to understand. Perhaps translate it to another language because not all of us are native English speakers and not everyone wants their documentation or explanations to be in English. Translate that. This is amazing for AI, but we also need to understand and acknowledge the limitations of AI. 1 is not assuming it could understand things that are not in the code, but another is that also code is not always easily understood even by AI.

23:57

Just for given a few examples here, if you give AI one function and it's written in a way that is very clear, the flow is linear, or the function name is clear, it's documented, Of course, it's easy to understand, right? You take the same function, you rename it, you remove all the comments, and you change the variable names to be something a bit less clear, which might sound like an exercise in bad coding, but I've seen enough code bases to know that it

24:23

happened in many companies. OK, then all of a sudden AI is starting to make some mistakes, one understanding it. And then you get to complex flows. And when you have flows, you have lots of cases that compilers know to take into account and AI doesn't. For example, you have ambiguity resolution. So let's say you have a call to a function called find and you have the find of, I don't know, no JS find. But you also have three different find functions in your

24:52

code base and you call find. And if AI picks the wrong implementation of find, it can get the whole flow wrong right? So ambiguity resolution is 1 case. And also if the code involves looking at resources that are external like reading from database that it might not have access to. So in Jordan, I think AI is a super super super powerful tool and especially in explaining things to humans and also aligning humans to ask questions in actual language. It's amazing.

25:25

But we shouldn't expect AI to understand the code. It doesn't really understand the code. And when you provide it with complex code bases that are convoluted with mixed conventions, specific domain knowledge that it hasn't been trained on, of course, sometimes even misleading comments that are not up to date with the code variable names that sometimes are cryptic and sometimes are even confusing and misleading. AI makes mistakes.

25:56

And I've tested it thoroughly, OK, here at SWIM, we also create documentation automatically from code bases. And one of the first hypothesis was, OK, let's just use an LLM and let's try that. And we tried a lot, OK? And it's made us understand it's an amazing tool, but it has some very fundamental limitations. And that's where you need other techniques like static code analysis and other things.

26:21

You can do that. When you join with AI, you get clear coherent documents or other forms or written knowledge that explain in a way that is useful for humans to understand.

⁠¶ The Limitations of LLM in Navigating Legacy Codebases (e.g. COBOL)

26:34

Yeah, I think so for some people who have used AI coding assistance a lot, right? Especially working with like a bigger code base, very complex, you know, written by so many developers. I think that's also one thing, right? Because you can see the amount of inconsistency or like not so coherent kind of a code from one as one module to the others, right?

26:54

And variable naming as well as some people like to use a certain terms, the others use other terms and they can be duplicates but mean different things. So I think that there's really a big challenge here if you just rely on LLM, right? Because LLM will just take it words by words. And I think you mentioned a very good point about combining it with, you know, maybe like static code analysis or other kind of, I don't know, like compiler's ability or something like that.

27:17

Because computers are also good at that, right? Not AI, right? Maybe tell us how do you actually combine these results at like for example any specific study or maybe a customer case that you have solved using these

27:30

kind of techniques? I think the most extreme cases we had in that regard and it's also our focus now is actually legacy code bases and more specifically mainframe legacy code bases with COBOL, which is a language that I had never run into before I started working on this problem for those client. And since then we doubled down on COBOL, right. But it's a language where most of the code is not available online or there is almost no real Kobo code available on

28:04

online. The code that you have on GitHub doesn't look like the code that companies run. And it stems from multiple reasons, but I think the most important one is that when GitHub launched, no one was writing Kobo in an organization that wanted to publish their code, right? Let's say you start a new project today and you work even not with an AI assistant.

28:27

You start with lots of libraries, you start with Python, you have your libraries for Python And frameworks, You start with JavaScript, whatever, right? You have lots of frameworks. And those frameworks are built on open source. When people developed big code bakers in Cobo in the 70s, in the 80s, they didn't have libraries. They had to re imagine everything themselves. So every organization looked

28:54

very, very, very different. So LLMS don't have access to real world Kobo code and LLMS don't have access to the specific code of your organization. And in Kobo you have cryptic variable names all the time and the structure is different than other code languages. So it was I think the most extreme we saw. And we clients try to send some code to an LLM and ask, OK, what does he do? You get super generic and confusing and wrong results. But of the times.

29:32

And there what we did was we wrote a COBOL parser that actually takes the code, parses it syntactically, and connects the dots together in a way that makes sense for an LLM. So in the end, for example, we have a variable and we want to understand what it does. We take the variable name. The variable name can be reused in lots of different places in the code base. So we find only the occurrences that are related to this occurrence of the variable.

30:02

And then we send all of that context to an LLM and we ask for a summary about this variable. So when the LLM gets the right context and only the right context, it does a great job at explaining it in natural language. If you just throw the old code base at it, it starts combining different variables who have the same name. It happens a lot in Koble and you know, I'm not blaming it. It's really hard to parse all that code and understand what really belongs together and what doesn't.

30:28

So what we do is first analyze the code base statically. We build our own internal representation of the code, how things relate, what function calls other functions, what variables are used, where hierarchy of say a flow. And then we slowly build the knowledge by sending small bits to an LLM to explain. After we do a lot of work on

30:53

cleaning everything. And this is an example I think of combining static analysis or code that analyzes code in a deterministic way, not something probabilistic that an LLM produces right? And then using an LLM for what it does best, which is taking text or specific parts of code and explaining it in natural, coherent language.

31:18

At the last phase, we also asked the LLM to generate parts of the documents we show to the user, because again, the LM is great at formulating that in coherent English. That is clear. That explains the story after it already has all the context that we build bit by bit. Wow. I think it's a very novel approach, so to speak, right? And especially you dealt with the most extreme code base available, I think COBOL, I think I also didn't have experience with COBOL, right?

31:45

I can only imagine like the difficulty dealing with such legacy code base. And I think you brought up like a very good realization for people because I'm sure a lot of tech leaders or senior executives think, OK, now we have AI, any kind of code base it can understand and explain to us. So we probably don't need to be concerned so much about losing the ability of understanding the code base. And we can even probably hire some maybe less good developers and just use AI to fix all the

32:12

problems we have. I think maybe for simple cases, maybe more up to date libraries and programming languages, you can do that. But if you look back, we have some, so many legacy systems, right, written so many years ago, where with the people also leaving the business, knowledge probably is also changing a lot, right? So I think this is 1 task that probably AI would not be able to do. And maybe combining different kind of approach like static code analysis would make AI

32:36

works much better. So maybe in terms of knowledge

⁠¶ Effective Knowledge Sharing Culture in the Age of AI

32:40

base, right, with the ability of AI these days, any kind of medium size or maybe large organizations, what will be your advice of doing some practices or maybe cultural things that actually can fit into AI and help you know, the knowledge sharing aspect or knowledge base aspect becomes much more effective and maybe even like a multiplier effect within the organization, right? Any kind of practice and cultural things that you can share?

33:05

Yeah. So I think the first thing is to acknowledge that it's really, really important and for all the reasons we said before, right? Especially now AI will read the knowledge base, right? So you want to invest in it and it's going to be a multiplier effect, as you said, Henry, I think specifically you want to create a culture as a leader that values people who capture knowledge. Some organizations used to look at these people as wasting their

33:34

time. We're doing the easy task, but just explain what the reason instead of creating. I don't think it's a viable argument anymore. I didn't think it was a viable argument back then, but let's say it's arguable. I think now it's not a viable argument at all. It's clear that if you document what's happening, it will help AI accomplish anything

33:53

afterwards. The other thing is that you should put effort into finding the tools that will help you with the task of writing this comprehensive knowledge base documents with keeping it up to date as the code evolves, because otherwise you just have misleading and wrong information. And 3rd, to be able to find that information again, both humans and AI should be able to find the information they need when they need it.

34:20

So you should invest in those tools to help you with that task and also create a culture that values the knowledge creation, preservation and sharing. Well, I like the the emphasis you put into like valuing people who actually do the so-called. I would say it's a hard job actually to actually capture knowledge, distillate, summarize it for other people to understand. It's actually not an easy job.

34:44

I would say it's maybe becoming more valuable now because you can fit it into AI as a context and everyone can benefit just by one simple writing, right? And maybe it can be reused multiple times. So I think the other challenge

⁠¶ Keeping Knowledge Base Up-to-Date

34:56

about, you know, knowledge based documentation and all that, keeping it up to date, right? Any documentation that you have within an organization, I'm sure most of them are still not up to date, maybe even wrong when you read it again. So tell us maybe some good practice that we can do to actually make it up to date. Right. So I think there are two kind of ways to approach it.

35:17

And we at Swim spent a lot of time working on this specific problem, so very emotionally attached to it I would say. I would say that nowadays there are basically 2 approaches. 1 is regenerate the documentation every time you generate documentation automatically just regenerate it all the time. If you regenerate now, it will match the codes state right now.

35:40

I think it makes sense. In some cases, for example for API documentation, it could make sense, but what you're going to lose if you regenerate every time is additional context that is not there in the code, and you need a way to preserve that, because that could be the most important piece of knowledge

35:56

that is written there. So another approach is to somehow track the changes made to the code that is referenced in specific documents and then update those parts of the documents based on the changes and maybe ask for a human intervention in case the code change drastically, for example. So this is actually something we provide with SWIM.

36:17

When you create a document with SWIM, we track the changes made to the code you relied on in the document and we either automatically update the document or if the change is drastic, we tell you as a human, please decide what you want to do from now. If you want to reselect this part of the code, if you want to rewrite it, maybe this part is no longer relevant, maybe you need to add some unique

36:40

information. But the goal here is to understand that a lot of the unique knowledge that only developers have in their minds is what you need to work so hard to preserve. Therefore you can just rely on AI generating documents. Yeah.

⁠¶ Keeping the Organization Knowledge Base Accurate

36:56

So I think looking in the past, right, when I have difficulties finding knowledge, right, maybe sometimes the knowledge is there, but we just don't know where to find, right? I think that's the one thing and keeping it up to date, right? Because sometimes I find this piece of documentation, I read it, well, if we assume it's correct, but it's wrong, right? It's also quite dangerous, right? And I think we have also multiple tools within the organization, which is in like silo, right?

37:18

So for example, some information maybe in our tracking, ticketing, tracking system, right? Maybe let's say Nigeria, some Confluence, some in Slack, some in e-mail. How do you actually build these kind of linkages, references and again like probing people to actually, hey, these parts of the knowledge base is not up to date. I think this is like a real world challenge if we can solve it right.

37:41

Actually, AI has a great promise in the sense of accumulating all of this information from across the organization, right from, say, Jira, Slack, documentation tools and others. And I think it's suddenly possible to just ask a question and get a response from various resources. The key here is to understand that some of these resources are more historic references than provide actual up to date information, which is also sometimes valuable, right?

38:13

Like the Jira ticket can tell you what a product manager wanted you to accomplish at some point, right? At least most of the times it won't tell you what's actually happening right now, right? But you have a code documentation software that actually explains what happens in the code and keeps it up to date, then you can relate on it.

38:32

So I think for AI coding assistance or AI tools that help you find information from across your organization, they should always explain what resources they're using to formulate the responses and perhaps mark some of those messages or snippets of knowledge as how likely they are to be up to date. Yeah. So I think it's very challenging right in the 1st place accumulating.

38:59

So I think what provided that we can give AI the tools capability to maybe like, I don't know like crawl our knowledge base and you know, get the contacts and all that.

⁠¶ Fact Checking and Preventing AI Hallucination

39:08

I think we we all know one danger of AILLM is actually the hallucination part, right? You mentioned about providing references and all that, but assuming that we have like large knowledge base, how do you actually ensure that it is not

39:21

hallucinating? Because sometimes the hallucination could happen in a very small part of the summary that it generates, Especially now these days, we have like a deep research tool where it can do on its own in hours, whatever that is and provide you a summary. But the always the challenge is like, how do you Fact Check it, right? How do you know which part is hallucinating? Or maybe they provide statistics. How do you know it's actually

39:42

correct statistics? So do you have any experience in, you know, doing this fact checking and preventing hallucination to actually make your decisions wrong? Yeah. So we worked a lot on it when working on SWIM, when we generate documents to make sure they reflect the accurate state of the code. And for that we do lots of things.

40:03

But I think the most interesting part in terms of what the end user can get from it is that when we generate the documents, you see for everything we write what we relied on to provide that information. We show you that it's grounded in this part of the code or that part of the document. And you can validate it yourself.

40:22

And in addition to, of course, making sure we do everything we can to avoid hallucinations by all kinds of techniques that are available to eliminate or at least decrease hallucinations. It can always happen, right? There is some hallucination. And as the end user, you can never tell if the LM made a mistake or maybe the other providers made a mistake.

40:42

So I think you should trust tools that show you what they relied on. And if you want to incorporate a tool into your organization that gives you answers based on questions, based on your own knowledge base, you should enforce the fact that they give you resources or citations to everything they output. Yeah, I think it's pretty dangerous if you do not have the citations, you know the

41:09

references, right? And even if these days you do have the citation, sometimes I from my experience, right, they give you the citation, but sometimes the summary itself can still hallucinate a little bit. So I think that's very interesting experience as well. Yeah, I agree.

⁠¶ The Potential of MCP

41:24

The the other thing these days, people are crazy talking about, you know, this agentic capability of AI and also maybe the MCP protocol, right? Maybe tell us the next evolution. Do you think you can see using AI? You know, with all these cool things, and especially in the context of documentation and all the sharing, is there anything that you can see up and coming? Well, I think MCP will be a game changer in the sense that you will see lots of information being fed all the time.

41:53

And I think the IT will help create a flying wheel effect where when you put the effort into generating valuable documents or a knowledge base, then all of the AI assistant would be able to reach it, find the relevant information and make use of it. And then we would get some mind

42:14

blowing things, right? You can have an AI assistant that analyzes your Jira tickets and provides a summary and all of a sudden it knows what's happening in the code base and it gets the broader contact from a document that was written partially by AI and partially by a human. And I think that's what we want to get, right. But to get there, we need to make sure we create these explicit knowledge fragments alongside the way and also that

42:40

we rely on them. So we guide say the AI coding assistant to rely on specific resources. Yeah, I think that providing all these bits of information, again, coming back to what you said, right, the value of writing or you know, capturing the knowledge I think will become a key. And then the next part is actually to expose that, right? Maybe in a genetic manner, right, using this MCP protocol, right?

43:03

I'm actually really excited about this MCP capability, especially when doing coding, right? You can communicate with different tools and ask it to do certain things just by natural language. That can be really super powerful, right? And I think I can be certain that once we see more and more agentic capability may be provided by different companies and tools, we can see this multiplier effect.

⁠¶ The Danger of AI Agents Hallucinating with Each Other

43:24

Although the danger, it would be even worse, right? Because if let's say some of these agents hallucinate and they all hallucinate with each other, like we probably lose track of what kind of things they used to deduce the decision, right? Any take on this from you? Now, just this week I saw a friend of mine posted that he used Claude and I think it was cursor and cursor ran RM minus RF and deleted lots of his valuable information by mistake,

43:53

of course. And you know, it's, it's like a funny example, but those things will happen. So I think if we go back to the beginning of our discussion about junior developers taking code that they don't fully understand and committing it to the code base, right, If you have an agent doing that, you have to somehow constraint it and validate the output that it generates. And I think of the next step this week with an announcement

44:20

of A to a like agent to agent. So we're talking about new protocols for agents communicating with other agents. And at some point, it's going to be hard for a human to understand what's going on. And that's where I think we'll have to stop and think, right, like what's actually happening here? Where should we have the human in the loop? And we're not. I think it's going to be really exciting times in that sense.

44:44

Yeah, I think stop and think will be a point in time where we all realize, OK, we probably hallucinate ourselves thinking, yeah, we'll solve a lot of problems. So yeah, probably one day we will have to build guardrails, you know, constraints such that AI won't lead us to like a dangerous, right. So I think these days people

⁠¶ How to Get Better at Research

45:01

talk about AI. I'm sure every team, every organization also want to integrate AI somehow, build capability, you know, build something on top of AI model LLM, whatever that is. And for that, they need to do some kind of research, right? Some companies have their capabilities, but most of the companies, they don't have this

45:17

knowledge and capability, right? And doing research is partly something that, you know, some organization find it challenging, maybe finding the time, finding the resource. And you brought up a good point before our discussion thing here, but you know, as a product company or maybe it's like a business organization, if you want to do research, what's the best way to approach it, right? Maybe you can share a little bit so that people who want to build capability by doing research can

45:39

do it more effectively. Sure. So I think first of all, we need to kind of define what research is. Engineering organizations are usually called R&D, right? So it's research and development and the research piece comes first. But I think in most teams, there is no pure research. And that's fine, right? Usually have the research as in like problem solving and you need to find the best way to do something and you need to learn alongside the way.

46:05

That's fine. That's all fine, but it's part of development, right? And I think where I draw the line is if you know the task is achievable and you know the approach, the right approach to get there, then it's development. It's research. When you have a task, you're not sure if it's possible or you're sure it's possible, but you really don't know how to get there because there are so many different options and it's unclear. That's where it's research.

46:36

So for example, if I have to, it could be even a really hard development task, of course, right? Let's say I would want to implement, I don't know, VS Code from scratch, right? There are lots of things I don't know I would need to learn along the way. I would need to design the architecture. I'll have to work hard on it, right?

46:55

But it's all development. I know it's I know what the output looks like and I know that it would involve a lot of engineering, whereas research and say I have a Cobalt code based and I need to generate useful documents. I'm not even sure what documents are useful at first, right? I need to learn that and then I need to find different ways. Should I go with generative AI all the way? Should I go with static analysis? Combine them where? And that's more for research.

47:20

So I think when you want to work on a product and research is something that is blocking, for example, I don't know if it's possible to achieve this, right? And the the difference between development in the research is time estimation in development. I mean, it's notoriously hard to give real time estimations. But usually when developers say this will take me a week, it won't take a year, right? Usually like there is something between the estimate and how long it takes.

47:51

With research, you can sometimes just don't know, right? Like, I don't know, maybe it will take me two days because there is an easy win and maybe I'll get into a wall which would be much harder to pass. So I think it's something to acknowledge. And when you have a research that is guided by a product, it means you have some problem you want to solve and you need to do a few things.

48:13

One is to lay the entire flow from beginning to end, even though you can't solve all the intermediate stages. For example, in the example, if I take a couple repository and I need to generate useful documents automatically, the first thing I would do is take a couple repository to play with and generate the documents by hand manually for myself and get feedback on them. Are these documents really valuable? Is this where I'm heading to? Right. Once I know that's what I want,

48:45

say OK, what do I need to do? So I need to say parse the couple repository and I need to find say a few components, for example. OK, I don't know how to find the right components. So for now, I'll wrap that in a box and I'll keep going to the next step. Now that I know what components there are, how do I document the component? And I actually write that on a white board with boxes and I keep them closed. I don't want to open the boxes now.

49:14

I want to make sure I can understand what the process would be and then before I open a box, because the most the thing that makes what you would want to do is open an interesting box, peek inside, try to solve it right? But it might be irrelevant. So you need to first make sure you can achieve everything. If everything works, you say, I assume all the boxes work. Will this work? Yes. OK, why don't I know now there is this specific box. I'm not sure it's possible.

49:43

I don't know if an LLM can read a couple program and describe it. OK. And then the key thing, and this is I think the crucial thing when managing research is to deliberately pause and think about the different directions together, because every researcher will do the research themselves, right? Let's say you give someone a task of understanding what a

50:06

program does. OK, they will read the code and try to understand what's happening, but you can help them by stopping and thinking what their best technique is. When I LED a cybersecurity course, we taught reverse engineering and one of the exercises we would give what we would give as part of this course, right? So the students would reverse engineer more and more applications and they, they would get a game. And the question was, what are the rules of the game?

50:35

And after an hour, we would stop them and we would show how to approach it correctly, which is you open the game, you click on help explain, and you have a textual description of the instructions, right? And the lesson learned is you don't always have to reverse engineering by reading through the code, right? And what we wanted to teach them is that before you jump into one way of solving the problem, stop and consider different solutions.

51:03

So what I usually do when I work with people on research tasks is draw it as kind of a tree, like, OK, we're here. How can we solve it? We have option 1-2 and three. OK, we don't have a time estimate because we don't know what we'll find out. If the LLM can just read an entire code base and give us great documentation, OK, we're done here. OK, let's give it a day and see what happens. And I usually call it time to

51:26

leave. Like for how long are we going to work on this before we stop and re discuss what we found out and whether we should keep pursuing this specific direction or change to another direction of the research? And one of the most important things is to make everyone stop and think about the various ways to approach a problem. Sometimes the easy solution is there, but you need to think about it. Sometimes it's just clicking help and you have the solution. You don't have to read through

51:53

the code, right? And they have many, many, many examples for that. Also not from courses, right? From real life where people, you know, in retrospect, they say, oh, right, we should have done this, right. And when you work on a product, you don't have all the time in the world to just do research. You have to make sure you can provide a product to a user in a timely manner. So to summarize, I think 1 crucial thing is to understand

52:17

the time estimates is hard. What you can do is give it time to leave. How long am I willing to spend on it before I stop and re evaluate? The second thing is to make sure we get the end to end process from the input to the end output that the user sees or that the other product takes into account as an input, and so on.

52:38

And the third part is pausing and thinking together about the different ways to approach a research task, because again, what characterizes research tasks is that it's unclear how to make progress. So we need to stop and think together on how to approach this. Things so many good Nuggets I would say, right, because doing research by itself is kind of like unpredictable, right So like you mentioned, right, you don't know the time estimation

53:02

required, the effort required. Sometimes it could be easy, right if let's say some you know, one day you find oh, there's a library that you can use, but most of the times you know like you don't have the skills you have to gather a lot of knowledge, maybe ask expertise and things like that. So I think you have given some good things. I would just call out a few things that I could remember, right? So the first is try to workflows the direction that you're going into, right?

53:22

Because sometimes we can go into rabbit hole easily, especially playing with technologies, right? So techies, we all love playing with technologies. So we keep digging and digging, but maybe we go to the wrong direction. The time to live, I think it's also very crucial, right? You can't spend all the time just doing research that goes nowhere. And I think many people find it difficult to juggle or even

⁠¶ The Importance of Investing in Research

53:43

like, for example, justify the value of doing research. And we all know these days, especially again, like bringing up the point of the age of AI, right? If you don't do research about capability of AI that could help your business, maybe you will lose out in the next, I don't know, the time span is can be really short these days, right?

54:01

So how can maybe business leaders or maybe executives have this in mind, right, to spend some time doing research even though it's difficult, unpredictable, maybe cannot justify the profits and the revenue coming out of the research, Maybe can you give us some examples here, Right, so I think. On a personal level, say if you're a CTOI think one of your responsibilities is to know how technology can empower your business, and it is also

54:27

internally right? So as a very recent example, you're a CTO, you keep yourself informed with what's happening. You know that there are great AI coding assistants that can help people become more effective. You go to your engineering teams and you introduce them to that tool and you help them adjust or adopt new tools. Some people will always be a bit wary of trying new things, right?

54:54

And I think as leaders, one of our responsibilities is to show them, look how easy it is, look how useful it could be. So this is like on a personal level and how to incorporate new tools or methodologies or techniques to the organization. So another great way to drive change is by giving talks. So get your company team grouped together and give a live demo of using the cool new tool, for example. So that's more about incorporating new methodologies,

55:27

techniques, and tools. When we talk about deep research, I think it's not for every organization. I'm not going to say that every organization needs a research person or team, right? But if you do, you need technology takes time and a different mindset. You can't expect a research team to operate the same way as a development team with a clear timeline for every milestone.

55:49

So you can adjust people there, assign them to research tasks, Make sure that the value for the product is clear and easy to get, relatively easy at least, and get those people to understand what research means. Make them be professionals at managing research, at assigning time to leave to different directions and having brainstorms about what the best way to approach a specific issue is. If you don't have this expertise, it's fine. Consult with others who do. There are people who are

56:24

experienced researchers. They work differently from people who are not experienced researchers. It's the same with engineers and anything else, right? But research is a skill. It's a skill that people can improve at. It's a skill you can learn and you can get help from others if you don't have the experience. Well, I think that those are really, really great. Advice, right? I particularly like about the aspect of changing your mindset, right?

56:48

Because people think doing research is straightforward, right? You do research, you get something out of it, and you can use this straight away. Especially people think that, OK, now with AI you can even get more intelligence, right? You can speed up the research or whatever that is, right? But I think especially doing something that you are not capable of in terms of capability within the organization, it's something tricky, right? You can't sometimes justify the

57:09

effort. So hopefully people today learn a lot of things you know about knowledge based research, using AI for documentations and all that. So Amir, as we reach the end of our conversation, I have one

⁠¶ 3 Tech Lead Wisdom

57:19

last question that I'd like to ask you, which I asked to all my guests. I call this the tree technical leadership is them if you can think of it just like an advice to us, what advice do you want to give to us today? OK, so I. Think the first one would be put your time and effort into your people. I mean they make all the difference right? And it means talking with them about how they are and what can help them and making sure you have them grow in their position. Finding them is hard.

57:52

Leading people is sometimes hard, but I think it's also the most rewarding part of the leader's job. The second thing is, in this era, you must be open minded to try new things. I don't think it makes sense for someone in 2025 to work even the same as they worked in 2024. And it sounds almost childish, right? I used to make fun of people saying things like that, but nowadays it doesn't make sense. Things change really fast and in order not to stay behind, you have to be on top of it.

58:24

So you have to be open minded and keep yourself informed. And the third thing, in case it's viable for your organization, put the time into research because it can open up new directions and it can help you in ways that you haven't dreamt of before. Sometimes spending two days on a research task can make you change your decisions completely, so I think allocate the time in case it's relevant for your business case. Yeah, specifically about being

58:59

open. Mind, I myself also quite concerned, you know, with some of my habits and skills that I learned in the past, right? Whether that it can still be relevant, especially the pace of changes these days is so rapid, right? Every day you would probably hear, oh, there's a new way of doing things. There's new tool that can help you do what being open minded and willing to try and willing to be challenged.

59:19

I think it's also another thing that I feel sometimes as a senior, right, we think we know the problem well, we can solve it by heart, but sometimes there are new ways of doing things these days. So thank you so much for sharing those wisdom. So if people want to connect with you or you know, asking you more things, is there a place where they can find you online? Sure.

59:38

So you can reach out to me via. E-mail it's Omer Omer at swim that's SWIM m.io also on LinkedIn, though I don't really use social media as much so if you send me a message and I don't get back, I apologize but probably didn't see it. I do answer to emails and I'll be happy to stay in touch. Thank you again, Omar, for spending the time today. I. Think we all learn a lot about using AI, building knowledge base and doing research as what

01:00:07

you advised just now. So thank you again, my pleasure. Thank you for having me, Henry.

Transcript source: Provided by creator in RSS feed: download file

#222 - Closing the Knowledge Gap in Your Legacy Code with AI - Omer Rosenbaum

Episode description

Transcript