AI in DevOps: Managing Non-deterministic Workflows and Ensuring Model Lineage and Transparency - DevOps 214 | Adventures in DevOps podcast

Speaker 1

00:05

To like already click the button.

Speaker 2

00:09

And that's how we're starting the episode.

Speaker 1

00:13

Oh for those of you just getting started, actually you are just getting started because I just clicked on the record button. But hey, welcome, thanks for joining the podcast. Warren. How are you?

Speaker 2

00:24

I'm I'm great. You know, I feel like, you know, we should go back to this is Adventures and DevOps. You know, for the first time listeners, if this is just the one episode they happen to click on, then they're in for a real treat today because I'm really interested in today's topic.

Speaker 1

00:39

Yeah, and today's topic, we have gorkam Erkan with us and we're gonna be talking about how to how to integrate AI and mL into teams using tooling that's a little bit more aligned with how we do DevOps.

Speaker 2

00:59

That sound of out right?

Speaker 3

01:01

Yeah? That sounds right?

Speaker 1

01:03

All right?

Speaker 3

01:03

Cool?

Speaker 1

01:05

Before we dig into the details of that, can you share with our listeners a little bit about your background?

Speaker 3

01:12

Sure. I am the CEA of josu U. The josu is a company that is trying to make it easier for enterprises to adopt AI and mL in into their applications. Uh. And prior to JOSU I was with red Hat for many, many years. I was as a distinguished engineer with red Hat worked on developer tools, which in red hat it covers anything from I D E S to C C D pipelines to them op so it's it's all everything that you can think of that is open source and that will tool My group kind of got involved with

02:04

those as part of that. I did Java tools early in my career for something called J two E if anyone remembers that. So and then I actually had the great idea of integrating Java tooling into vs code. The current Java tooling is something that I have done in a month or so and that has grown to what it is today. My group got involved with things like

02:47

tech Tile, CD Foundation and so and so forth. So I've been doing devobs for the last probably five years, or tools for DeVos for five years, and tools in general. Before that, before joining red Hat, I was working for Nokia, the phone company, so I was in the middle of the mobile revolution and again mostly doing open source projects at Noukia at the time.

Speaker 2

03:23

When you say open source and Java and red Hat, my mind immediately goes to Jenkins and I don't have a particular soft spot in my heart for that product, but I can imagine you've had to spend a lot of time with it. Is that what they were using.

Speaker 3

03:39

So yeah, we did start with Jenkins. We do support, We did support or red Hats still supports Jenkins to some degree. But at the time when we tried to do a project, code open shift that I owe and one of the things that we tried to do at the time was to run Jenkins in a manner that it is not supposed to be run at the time. I think Jenkins is doing better now, but at the time, Jenkins was created as a server, so it was meant

04:16

to run as a server. But then when you're on the cloud trying to run Jenkins, one thing that you want to do is you want to try to run its servers, like if I need to go and build something, especially if you're doing this as a SAS service, you want to just on demand start a Jenkins instance, do your bills, and then shut it down so that you're not spending too much on the cloud resources. That didn't work well for us. So one of the things that

04:51

happened at the time was the Canadia project. If you're familiar with it, and as part of the Canadia project first code drop before even it even got public, it was open to some of the redheaders, including myself, and one of the things that we noticed was there was a built piece in the k native code which was

05:15

actually doing server spiels essentially. So the whole idea of oh, can we actually make this and turned this into something It turned out to be the techtile project later and then it became the tech Time's cd c CD system that we know today. So it actually we got involved with that because of Jenkins because we couldn't actually make it cloud native at the time, so it felt like, oh, we should actually, you know, do something cloud native instead.

05:56

So that's that's how involvement with Jane. But on the other hand, I would still bet that a big chunk of workloads on kimbernites in the world is actually Jenkins.

Speaker 1

06:16

Survey to edit.

Speaker 3

06:17

I mean, I don't know, I don't have un versa. I like that would be my guess. It's it's it's in double digits. I'm pretty sure enough.

Speaker 2

06:25

I feel like I feel like this is where ignorance is bliss. Like I don't I'm going to pretend that's not happening and just continue living my life in whatever delusion I have currently going on.

Speaker 3

06:37

Unfortunately, it is true. It's there, It's it's in many enterprises. You can't ignore Jenkins. It's hundreds of instances for sure. Yeah.

Speaker 1

06:48

I have a love hate relationship with Jenkins. I hate working with it, but I love how it can just do absolutely anything you needed to do.

Speaker 3

07:00

And the ecosystem, Yeah, like the ecosystem behind Jenkins. You know, I think nowadays get up actions maybe at that level, but the ecosystem behind Jenkins, it's, it's it's pretty big. It's very important. You can find a plug in for anything essentially, yeah, or anything that you care about, right for sure.

Speaker 1

07:29

So let's talk a little bit about aim L since that's our topic for today. And there's like, you know, I whenever I think about this, the most common scenarios that I see coming up are people using chat, GPT or tools like that, And it seems like a very early stage product. So to get that enterprise ready, like, what the what are the challenges that people are facing?

Speaker 3

08:03

Yeah, so I think check GPT, Like what we are going through right now is that that in every enterprise there is one or two let's call it experiments that is going on. And usually when you're starting the experiment, the first thing that you do is like what is the least resistance that you can face, and that is use the check GPT API s right, the open API or Microsoft has Azure APIs as well, So use those and do whatever you need to do and come up

08:40

with your first early experiments. And when you look at those experiments in many of the organizations, what you see is there are a lot of success that is coming in. There's a lot of learning that they are going through,

08:56

but there's also successful stories as well coming out of them. Uh. The one thing about AI is in deterministic, right, It's like the first time that you do something with AI, you're going to fail, and then the second you're going to do another iteration, and because you're going to understand that behavior, and then you're going to do another iteration. I think check GPT and all the other APIs out there, of the open API and similar AI API is out there.

09:30

I think it provides them this opportunity to do the experimentation. But the way I see it is this is this is temporary, right, This is just dipping your toes into the water. When you actually want to start doing something that will change your business and try to get some financial benefits to your company. That means integrating with many main systems. That means like it's like we have seen

10:01

it with cloud, we have seen it with mobile. Right, It's like it's not a oh, here's one application that is mobile, or here's one application that runs on the cloud. You're starting to adopt it to your organization or if you are even older, there was a time where there was we talked about paper this office. Right, So it's like it takes a little bit of time, it's a process, and you actually need to internalize it into your organization. I think the next step for many of these organizations

10:34

is to start to internalize it. And when you're internalizing it, you cannot really depend on a third party service. And some of the industries cannot even share their data give their data to these services where there is concerns for privacy and security and so on and so forth. So they need to start running these essentially base models internally in their infrastructure or in their cloud infrastructure so that

11:12

they can get the benefits. And the one thing that I see is when you go to your data scientists or data data scientists or mL engineer. They are separate from the DevOps engineers in the company. They are not well integrated. They have a very different tool set. It's just because of the way the AI has been developed

11:47

and coming into the world. And one thing that you notice is there is no shortage of open source projects and tools in the AI world, but there is a shortage of projects and open source tools in the AI world that are standards. You can do. Just think of any subject and you can find ten alternates to do it. None of them talks to each each other. Nothing is standard. So this is one of the things that we have noticed.

12:21

We actually were trying to solve this for ourselves as well, and one of the things we decided what was okay, So I have so many ways of storing a model and moving that to production. I have so many ways of storing a data set and moving that to inference or training and testing and so on and so forth, but none of it is standard, like this should be easier than what it is right now. So what we did is we came up with an open source project called ki toops dot mL. And what kittops dot mL

13:02

provides is what we call model kits. A model kit is essentially an OCI artifact and an OCI artifact like a container image. It's not a container image itself, but

13:16

it's an OCI artifact. The advantage of being an OCI artifact is it can be stored in any OCI registry like docer hub ECR as you're wherever your image registry is, and using the information that is stored in the model kit, you can actually move your models as part of your c c D pipelines much much efficient because your c c D pipelines actually knows how to usually knows how

13:53

to talk to OCI registrict. For instance, when today we have a a registry that we are running ourselves in jose dot mL and you will see many signed model kits in there. The tool that we are using to sign model kits is cosign, which is what you would use for signing an imagery, a regular container image. So the idea behind having the OCI artifacts was to introduce a standard way of storing and standard way of sharing your AI artifacts between data scientists and mL engineers and DevOps engineers.

Speaker 2

14:51

You said something really interesting in there, and that's I haven't heard of put it this way before that your first foray into using any sort of AI or mL models in your company will be an experiment that ends in failure. And I, you know, I think there's something really to that, because I can imagine if you like walked around and put like a hammer on every engineer's desk and said, you will now start to develop with

15:20

this hammer instead of what you were using before. I feel like it's a ridiculous analogy, but I think there's a lot of sense there that like that for sure will not end in success there, Like, however you thought you could utilize it internally is probably not going to

15:35

be super effective. And that obviously extends into the area of if you are building AI into your product, not just using it as a as a tool for development, and you're really talking about this next level which isn't just using it but actually creating it to be effective as part of the product offering.

Speaker 3

15:56

Yeah, And the reason for that is it's really in the terministic right with the AI itself. So you may actually think that they're like when we are integrating applications together, the rule based the applications, the outcome is very deterministic. We know that these are the inputs and these will

16:18

be the outputs. In the case of AI, that's not the case, right, So when you go through your first initial cycle and go through your you know, you receive your inputs, and then you start to get your outputs, you start to see these outputs that you haven't expected before. And the usually, like, for instance, it's very likely that you're missing actually some data that you're feeding into the model,

16:44

and you're not getting the outputs that you're expecting. Because let's say that you're giving the customers can data, but then you when you give the eleventh data, the AI is now able to make the code nection or or or make the analysis better and give it a better result. So you will start to notice things like that, and it is not very easy to notice this at the design phase. It's not even possible to notice these things

17:16

at the design phase. So that's why I'm saying that your first experiment is going to fail, and you're gonna do another one with more data or less data or changed data, and then you will get to a state where you're you're you're happy with the outputs.

Speaker 2

17:33

I feel I have a fear here now that, like you know, if I'm incorporating it into one of our products. You know, we have a couple and we have tried going down the route of adding some AI into into one of them. Uh, there is this to only say non deterministic. We mean like literally the same inputs gives

17:50

us different output. And exactly, I wonder if we're training our customers to believe that they should just you know, turn it off and turn it on again in a sense, right, Like you know, you have to keep repeating the same question over and over again to get the answer you want. And that just that feels like a dangerous thing to push that expectation onto the users of our products.

Speaker 3

18:15

Yeah, there is a Yeah, I see what you're you're saying, and and and I think that's that's that's the part of the AI that that still needs to be enhanced a little bit. You know, the the whole subject of guardrails on AI is a little bit it feels early. Oh, I'm told, I'm totally I'm totally with you.

Speaker 2

18:44

No, there was actually a study that was released just recently, and I think we'll kind of appreciate this that it's not being like prompt engineering isn't being used for malicious purposes.

18:54

It's generally being used by power users to actually get value out of the system more than anything, which I think is really surprising that these systems aren't being abused in that way realistically yet, And so adding in guardrails of anything is just preventing those early adopters from being able to utilize your product more effectively, rather than preventing malicious attackers.

Speaker 3

19:16

Right. And we actually like since you said prompt Engini, we actually talked with an enterprise recently and one of their needs was to share their prompts, Like, uh, think of it this way. You have a database of all your all your enterprises data, all your customer data, and

19:40

so on and so forth. In the past, the sequel statements that you have used for finance was very different from the psychoal statesments that you have used for sales, so you'd never really actually care to share the psycholos statements. But now what they're seeing is, as you said, there are these users who are using these prompts and getting really good results out of those prompts that are querying

20:07

there or that are connected to their their data. Right, So what they want to be able to do is to to be able to share these prompts like between finance and and and HR and and and sales and so on and so forth, so that they don't have to reinvent the veil every time. But I think that we're starting to see a different picture now in the in the AI world where with with these kind of things. But when you are opening your data through and an AI model directed to your customers, I think there's still

20:48

work that needs to be done on the guard for this. Yes, yeah, for sure.

Speaker 1

20:55

It reminds me a lot of there's two things here that that seem like there's lessons we can learn here from other engineering or other industries. Like whenever I work with early stage startups, I always say that like the goal of your first product as a startup is to launch, like the let me rephrase that. To become a successful startup, you have to identify the difference between the product that you built and the product that your customers thought you

21:30

built and close that gap. And it feels like this is very similar to that. You know, because AI will answer whatever question you ask it, and then you have to figure out what assumptions it made that you didn't want it to make. And that's where I think really

21:53

sharing those prompt those prompt models can be helpful. For a while there on X there was a trend where peopleeople were you know, sharing what they were doing with different AI models and then also sharing the prompts that they used too, And I found that really interesting to see the different caveats that they would give in the prompt to keep it from wandering off some other path that gave you an answer that looked right, but as wasn't actually the answer that you needed.

Speaker 2

22:22

No, you say that and it's like, actually, really ridiculous. Like I'm going through right now. I'm making a presentation for the European as reinvent that I'm speaking at, and I want some pictures, and I will ask I'm using one of the tools, and I will ask the same

22:40

question multiple times. But when I get a picture that I actually like, I will take the prompt that I used for it, and I will save it as the file name of the picture so that I can get like, I mean, it's a ridiculous, but that's the name for me, that really is the thing that identifies it, so that I can, you know, potentially go back later and be like, what did I use to actually generate that image?

Speaker 3

23:00

So let me complicate that problem a little bit. So you have you're you're using a check GPT for instance, like.

Speaker 2

23:09

It's not the one from open Ai, but yeah, for sure, it's an l you're using.

Speaker 3

23:14

You're using and it's actually a check giv A service. So you don't really know the version number. Really, you don't really have to know the data sets that has been used and retrained and so and so forth. So for some of the organizations that's not going to work. So some organizations are going to say, hey, you know what, I need to know exactly the model, the data set that it was trained on, and I need to know where my fine tuning is coming from, where my RAGS

23:50

is coming from. So I need all the lineage of of of this when this prompt was issued and right. So you need systems that are capable of telling precisely what was the data set, what is the model, what is the fine tuning data set? Right? And what is if you have been using RAG, what is what is the version of the vector database or the lineage of

24:26

the vectors. All of this stored and available somewhere because something may happen at any point saying that, oh, you need to go back and figure out what went wrong with this? Answer what went wrong with this today? Like if you're using an API, you can't really say what what went wrong? And so if you think about it this way, we actually have a system that was used for this purpose. We used s BOMs to be able to say exactly what we have used in production for

25:13

our applications. Right, so can we actually utilize the same system with AI as well? Like, oh, I'm using the application x y Z, it's built out of these dependencies, these libraries and so on and so forth. An AI application is not two different. It's basically saying that, oh, I'm using the same application, the same dependencies. I'm using the inference engine x y Z, and then I'm using the model weights that are coming from here. I'm using a data set that was used in that is coming

25:53

from here. And these are all in an s bomb which existing systems know how to pars and get results from. And we know how to sign s bomps, we know how to store s bomps, you know CI artifacts. So I think that's kind of you know, is the missing link nowadays where we are doing these things in our enterprises, adopting models based models from somewhere, and none of the

26:28

data sets that we are using is enough. So some one of our engineers just goes to Google and searches for a data set and uses that to you know, do some of the training. You don't know what is in that data set. So it's it's a little bit wild west at the moment with the AI adoption. So and when it was wild west for the regular applications, you know, we we ended up with things like, uh, the we ended up with things like what was the

27:06

look for j right? So how many how many days any Joba they will prespend trying to figure out if they have used the wrong version of look for Jack when that came out right, So, uh so things happened. Uh and I think we need to be prepared for it. And it's a little bit wild wild West at the moment with the with the I amount.

Speaker 1

27:41

It's you know, as we talk about this more, it kind of feels like there is it's almost like a communication problem. One of the things you said was knowing the lineage of the data and with when it comes to these AI models, we have a very human feeling interface to it, and so we I think we naturally try to leverage that. But the difference being, if if I ask either of you who was the last president, you probably are going to assume that I'm talking about

28:18

current time. But if I'm talking to a data set and that data set is twenty years old and I ask it who the last president is, I'm going to get the wrong answer. And I feel like it's those hidden assumptions that we have when we communicate that make us have to be overly verbose with these tools.

Speaker 2

28:37

See, I don't think I don't think AI is special here. I think that our own culture and values propagate much too far into it, and as well as our neural diversity. Like I'm on some spectrum somewhere and you say, who is the last president? I you know, what I heard was like the last president there will ever be like as.

Speaker 4

28:59

If you know, you know, you're talking about side of history and like, you know, if I look at it and today's you know, I look at today's like political climate, you know that is a you know I am.

Speaker 2

29:09

I'm evaluating that based off you know, the model that I've trained in myself, and I can also imagine like for other cultures out there, depending on who you're talking to, you know, I feel like in the West or in the United States, it may be much more clear. But you know, depending on who you're talking to and where they've come from, Uh, they're gonna have much different answers

29:28

on that. And I feel like we may be more closely able to understand what some of those implicit inputs into that human model there are based off of what we know about them or the current world. And I think we definitely do throw a lot of that out the window when we are utilizing a prompt sitting on a website somewhere.

Speaker 3

29:52

Yeah, for sure.

Speaker 1

29:55

So what are the what are the big challenges that that we're facing with this from a DevOps perspective, like how do we how do we set the framework into guardrails to get that tool from the AI developers into a production grade capacity.

Speaker 3

30:18

So I guess the first one is, you know, have you packaged this, have you set this sober in a manner? And you know, we have one solution with kit tops for it. The second challenge that we have noticed and started working on is AI is more nondeterministic and it's more complex. So when you think about a workflow that you use on DevOps, it's more linear, and there is a starting point and there's an endpoint, and there's an art fact that you can record and send to production

30:57

and so on and so forth. With AI, it works a lot a little bit different because you sent the production, but you after that, you keep on watching and because drifts, the data changes the model, you know, twenty years passes, the president changes and so on and so forth. So you start to notice these drifts and you need to start to constantly observe the models and then react to their changes, and not just the models itself as well.

31:32

Before you know, the training needs to be like even if you're doing fine tuning, whether your fine tuned model is adequate or not, that's actually a bunch of tests that needs to be run and then compared and so on and so forth. So the one thing that we have noticed is this is almost like there's a lot of signals that are coming in from these develops systems that we're going to.

Speaker 5

32:01

Use for training, inference, production and observability of the AI, but there isn't a real solution out there that reacts to it, that brings everything together so that you can.

Speaker 3

32:15

Use these things together in a more orderly fashion or easier, integrate them, easier to be to cite it in a different way. So what we I don't know if this is the right language to use, but what we said was, oh, let's build a control plane for this so that it receives all these signals for all your AI m L applications, including everything that is coming in from observability, from your c I, c D pipelines, your training pipelines, your dataset changes,

32:59

your data extractions and so forth. And in the control plane we define what needs to be and then the control plane reacts to these signals and runs the data plane components like their CICD and some so forth, or your deployments in such a way that the system tries to get back to the state that you want it to be. So we started working on a control plane for AI and mL applications so that some of the some of the challenges of running and integrating AI and

33:43

mL into an enterprise is boxed into a control plane. Essentially.

Speaker 2

33:50

Maybe this is a controversial statement to make. I get the sense that our current capabilities for generating new models restricts those models in a way which makes them not very good at reasoning or analysis and If they're not good in that, then what we're left with is pretty much data replay construction content creation. But we know that the models that we're creating, they're not good at that because they suffer from hallucinations from making up stuff by

34:26

combining different pieces. I think I read somewhere that the summarization strategy that a lot of models have taken in their creation is to look at the headlines of articles that have been written and then consume the content of the article and index that and then do a reverse index look up. So you know, hey, here's my article, what should what summarizes for me? It will spit back out the title. And if we look at the content that humans have created so far, titles are all click baity,

34:54

you know, have nothing to do with the conclusion. And so even if they were right, you know, they're it's still not very accurate. And so, like I am, I'm wondering if we are not still a bit too early in trying to automate or improve some of these things when the base thing we're getting out still is fundamentally limited in capability.

Speaker 3

35:16

Uh, I don't think so, Like I think I need to I need to be very careful about this because I don't want to add to the hype. Yeah, because you know, I think it's the AI is over hyped at the moment.

Speaker 2

35:33

No, No, definitely not.

Speaker 3

35:37

What well, yep, I know it's a shock to me too, So it is a little bit hyped, let's call it. But on the other hand, I don't think with the capabilities and we we when we talk about the hype of the A I, we we are usually talking about the gen a I, right, the l l MS and generative AI. But but in reality there is also other kinds of AI and m l as well, so that

36:20

we don't usually talk about. But it's actually getting a lot of benefits from this hype essentially, So when I think about it, that whole space as A as A as one one piece of of of technology area, I think that this is this is not a this is not a like we can start adapting A I and m l projects into into our enterprises. Even with generative AI, which was what you were referring to, Essentially, you can

36:57

get really good results out of it. Like that was one article, I think it was not even an article. It was Walmart reporting their quarterly results. They basically said you know, we are more efficient with our online business because of generative AI. I think they were doing generating product the descriptions more efficiently or something, so they actually

37:30

started doing better financially. So there are pockets of even direct financial benefit that we are starting to see in from generative AI and from AI mL and other kinds of AI. There has always been in the five year last five years, there has always been projects that were bringing benefits to the business. What is changing with the hype is now there is more eyeballs look into this area so that there is an accelerated opportunity for adoption.

38:08

You know, in the past you had a chance to get one base model for something for categorizations, let's say, but now you have ten, and you can actually test the ten categorization based models and then get a better fit for your data. So there are things like that that is actually happening that's going to I think help the the enterprises and the businesses get more value out

38:40

of AI n mL. So I guess generative AI it's a little bit too much hyped up, but that is not a bad thing for the rest of DAI world.

Speaker 2

38:54

I mean, you say that and now you help me thinking and maybe a more philosophical perspective if if we're just lacking a number of people paying more attention and investing their resources in actually researching or building up the technology more.

Speaker 3

39:12

Is it is?

Speaker 2

39:13

Are the limitations that we're facing something fundamentally within the capability of human society to overcome, like or is reaching general artificial intelligence actually completely made up dream? Yeah, And for the record, this is not a I expect that you want to have an accurate answer to.

Speaker 3

39:37

This, but I don't have any accurate answer. Of course, I don't think anyone has, or they may have an interpretation for or expectation for it. But to be honest, the amount of compute that we have been using to trade models, even even the small ones, is enormous For us to get to a place where we have more capable models, I think it will take some kind of a leap before something needs to happen. This amount of compute that we need for that is just just enormous

40:29

at the moment. And the other thing that I'm actually worried about is without actually getting adoption and value, how much can we continue to spend on training more and more capable AI models At some.

Speaker 2

40:46

Point I don't know if it's actually a monetary problem. I understand that we're actually resource constrained, and I'm wondering if those chips are coming from Taiwan, of which has its own sort of issues, and we don't have another opportunity to actually produce what is necessary in order to accelerate further. It's like a fundamental limitation of we're limited not by our ability or by the technology's ability, but realistically our actual capability of a resource allocation.

Speaker 3

41:14

Produce fast enough. Yeah. And also not to forget the data, because the way that the AIS be trained today, it's troven large amounts of data to it, and we true pretty much everything that we can so far.

Speaker 2

41:31

It's worth I mean, that's its own sort of problem because the Internet is sort of over as far as what it used to be a free, free, public sharing of knowledge. That's no longer the thing, and so we're only going to be accessible to a smaller pool of information going forward for future models that that data now has finally have a price on it, and realistically, as you point out, it's not that much, and it's not even that good. If this is the best we can

41:59

come out out with it at this point. So it does seem like we're at some sort of obstacle to get further, both from a data creation standpoint and a resource creation standpoint.

Speaker 3

42:11

Right, But to be honest, like the general AI is a like it's a very good ambition to have, but most enterprises and businesses do not need it. They that what we have today with their data is plenty enough for them to actually get financial benefits and efficiencies and value out of AI. And about like, uh there, I think there's there are two facets to this. There is like, oh, there's these ambitions which we should actually try to pursue.

42:54

But then there is oh, can we actually start adopting the technology that we have generated over these years so that we start creating value to our businesses and communities? Right, So I don't think the two are connected as connected as we think. Where we don't need to wait the first to do the second?

Speaker 2

43:19

Yeah, yeah, for sure. Where is where is AI adoption on the WILL technology adoption scale?

Speaker 1

43:29

Not really a not really a priority for me just but I think I think just because of the industry that I'm in, for me, the most value that it has is like a personal value of using it to remember the things that I've forgotten, you know, like how do I you know, how do I do an a single weight in javascriptor or things like that. From a business perspective, I don't have person have any use cases that I'm actively involved with, though I think it has a lot of power and potential. It's not that I've

44:09

ruled it out. It's just that it's not a priority for me right at the moment.

Speaker 2

44:14

No, I totally get it right, like it always optimizes for like the norm, right, and so as long as the norm is better than where you are, it's a good thing to use it because you'll get to that norm. But will you're such a great engineer that what it's going to suggest to you is going to be worse than what you would create by default.

Speaker 1

44:33

Which comes back into I think that comes back into the prompt engineering topic that we were having, you know, to get useful information out of it, I have to give it all of the internal mental models that I would use to get a similar or better answer.

Speaker 2

44:52

And that's really where all the effort goes into, Like how do I even correctly? It's like I have an intuition about how to think about this, Now I need to convey that intuition to another being using natural language, of which I've never thought about how I'm even thinking about this problem before. I totally agree that's where all of the challenge really comes in to get valuable output from these models.

Speaker 3

45:15

So it's one of the things that I have the test, and I'm sure that there is a scientific explanation to it which I haven't really looked up up. But when you actually give a working example of what you're trying to achieve to the model, you usually get much much better results from the model. Like if you teach it, you know this is how it should work. Once, then the rest it goes much easier for you.

Speaker 2

45:52

I mean, and we know that must be true because like in the lms are designed by humans, and humans operate in this like sixth level of thinking hierarchy, where like the bottom level is like wrote, repetition and that means you saw the answer in some format before and

46:09

you can repeat it again. And I feel like that's the first level if you want an answer that looks like something you like, even another human, Right, I need to show you what success looks like before I expect you to output it, you know, especially in experienced engineers.

46:24

But then there's like five more levels of this where I know for sure that it's not going to get to without also deep conscious thinking and deliberate input on those levels and order like don't just do one I need, because I'm going to ask it to create unit tests for me. I love this example because I actually think this is something that really works well, especially going from one language to another.

Speaker 3

46:46

One.

Speaker 2

46:47

The steps are we were just doing this in an authoress as we were converting some of our code from JavaScript to RUSS. We say, okay, you know we have this code in JavaScript, These tests in JavaScript, right tests that look like this in RUSS wright the actual code, and then run the RUSS unit tests against the code, so we have the sort of validation and the first

47:05

level is really the unit test. But we then need to say things like oh, but also make sure to handle these weird edge cases and like specifically list out some edge cases so someone has to think about those.

47:16

And then there's like, okay, now I need to generate like ways in which this could fail, which is like another level on top of that, and you know, propose different ways in which we could re architect this whole aspect and I can't just say that, right, I can't just say, oh, yeah, you know, refactor this code for me in an optimal way. I need to actually like how do I know what to refactor? How do I

47:35

know what to do that? It's just such a challenge to articulate, even from one person to another one And I almost want to, like draw a diagram how do I feed that in? Though like there had a different len right exactly. You know, it does create this whole other challenge in order to really get the value out to a high enough level that where we really wanted to have.

Speaker 3

47:58

Yeah, I mean it's something that we have recently done with a you know, it's languaging language out right. Lms are are really good at those. It's like we have done a similar thing. We actually had to implement a piece of code that was implemented in Rust, but we had to run it in a different environment which didn't allow us to run Rust at the time. So we said, okay, here's some Rust code that implements this algorithm and this thing, and then just can you convert that to a typescript

48:42

in that at that time? It did like it did, and it runs, it runs as good as it like, it converts all the unit tests and as you said, it's like and it works perfectly, except that you don't get the the performance, Like there is no way that you can get the performance, and all of a sudden you're faced with the fact that, oh, okay, this code is maybe equivalant, but not equal. So there's also that

49:20

fact as well. But the good thing about it was we actually spend on the what forty five minutes to just get that code converted and learn that, oh, this will never perform the same, so that we had to go a different way. So I don't know how much time we saved by just doing that, but that was worth it. So it's not always about actually getting the result that you want. It's also sometimes about getting the wrongers out quicker as well.

Speaker 1

49:57

Oh true story.

Speaker 2

49:58

Yeah, yeah, go ahead.

Speaker 1

50:04

Well I was just gonna reinforce that because a lot of times that's the best learning that you can get.

50:12

You know, if you get the right answer the first time, you don't learn as much as you do when you think you were going to get the right answer and get something completely opposite, you know, Like it's like when tackling a new project or a new problem, one of the most important things to learn is what questions should I be asking that I don't know I should be asking, and getting the wrong answer helps you along that path.

Speaker 2

50:41

I think there's a huge challenge there when you almost know from intuition that the response is wrong. I mean, it's wrong, obviously it came from it all alum, but I mean it's like there's something special about it that you know, you just know. You're like, oh, that that can't really be the best way to do this, and you're like, do it differently, do it differently, And like I'm running out of words to basically say, try again, but don't use any of those constructs you use this time.

51:09

And it's just every time it's like just subtly wrong in a different way. And so there is certainly a learning there of, you know, all these ways in which not to do it. I would just, you know, one day, I would like it to actually give me a right answer.

Speaker 3

51:24

Yeah, it's it's the are sneaky. So they're gonna give you an answer that is ninety nine percent correct, but then that one percent they are going to hide a bug that you're gonna spend the next two days trying to find in production.

Speaker 2

51:43

Yeah, I mean you're lucky if it happens. You're definitely lucky if it's short term turnaround like that right where you know it happens quickly and not like you know, months or years down the road when you may not even be part of the team anymore. And now you've got that, which is where a lot of the you know, bugs come up there. I'm interested, though, if we flash back to something you were talking about before about where

52:04

the some of the real value is. We were talking about Walmart and the content automatic creation descriptions for the products that they're having. I wonder how many of those are out there, because I mean, I think a lot of the hype right now is focused on this. I'm gonna say lie that it increases the productivity of software engineers. A lot of companies are repeating this. I'm waiting for

52:25

some actual real data on it. But I'm really curious about the ways in which the enterprises, the larger companies, you know, are finding opportunities that aren't just like a chatbot on a website.

Speaker 3

52:38

Right, I mean, that's I think that's table states at this point. The software engineering and get psyche. Yeah, there are there benefits to it. There are no benefits to it. It's a controversial, but then there are other users like

52:54

furnish this with. One of the usages that I have seen is uh they the company, Uh they're running support service phone support service, and what they were doing was uh doing sentiment analysis of the calls so that they're uh they can do better callbacks to customers who are whose sentiment was was not as as good as as they they wanted to be. So I think that's one way of doing uh, you know, customer satisfaction and and and keeping your brand uh recognition, uh and so on

53:46

and so forth. So it's like and one of the things that they actually had was they first tried it the sentiment analysis, but they had to actually feed additional data to the model to get the sentiment analysis correct. It's one of those examples where you have, oh you may not get it right from the first run, but then you need to like adjust for it and then

54:12

try again. I wish I could give their name that that's a very exciting name, but uh, but what they did was really uh you know, it's it's they were really happy with it. Uh. They were to the point they were always doing callbacks to the customers, but the selection process was not something that they were able to streamline to the level that you know that they get more and more better results out of it. Now they were.

54:52

Now they're able to target the right customers because they are able to do the sentiment and now out of those goals.

Speaker 2

55:01

Center analysis is an interesting one. We actually have a product, it's like the number one stand up bought and Slack, and we tried to run some AI models on our responses for stand ups, and like it was very clear whether it was like a one or a five on a five point scale, but like you know, you get into and those are obvious, but the ones in the middle were quite the challenging spot there, and we ended up discarding the notion entirely because it is the sort

55:30

of thing that I feel like, you really want to get right and not be in the danger area. But your example made me think of something that I had read recently actually, where a call center realized that the site, the psychological safety of their staff actually being super important for the success of the organization, was being threatened by irate callers that would call up and yell and scream. Maybe obscenities at them because you know, obviously supports are

56:00

being made by people who are usually not happy. I mean, maybe that's a controversial statement, but I believe that's true. You know, generally speaking, you're calling up the support line because you're not thrilled with the service you're getting, and it was challenging for them. And what I heard that they were doing was actually uh using AI to alter the vocal expression of the calls they were getting so

56:26

that they seemed more demur in content. Like they weren't screaming, they weren't swearing, they weren't like high volume pitches, automatically changing what the staff were hearing so that they wouldn't be susceptible or subjugated to an environment that would be worse for them and the company.

Speaker 3

56:44

Yeah, I mean that is that is you know, that is a very no way to way of using sexually right. It's yeah, I it's it's always hurt when you're on the first line. The customer is not happy, so it's it helps, I'm pretty sure.

Speaker 1

57:14

And then there's the people where the more swear words they have is actually an indicator that they are happy. We may be able to identify someone who fits in that category sort of tangential to this. That's one of the things I did in using chat GPT is I told it. I don't know if you' allre familiar with who David Goggins is, but he's a very vocal and

57:50

blunt person. I think we can just say that. But I told chat gpt that it could adopt that person's speaking style and personality, and like instantly I was more productive with it because I could drop F bombs and it would respond with hell, yeah, let's go, And like all of a sudden, I was communicating with someone who spoke the way that I did.

Speaker 2

58:16

So I just assumed, and this is obviously not really knowing your history that if you've seen Full Metal Jacket and Drel Sargent, like I just assume that this is the life you've gone through, Will and your path being in the Navy, that you know there is something just for you special about being yelled at in this way which really gets you into gear.

Speaker 1

58:40

Right, Screaming F bombs is my love language.

Speaker 3

58:48

Okay, we're doing this podcast compete the wrong right?

Speaker 2

58:58

I mean, I guess we could release two verse is the stream, you know, the natural one and the one that's attenuated using AI that strips out all of the obscenities that are coming out of Will's mouth. So if you're not hearing him, yeah, I mean, if you're not hearing Will swear every other word. That's what's actually happening right now.

Speaker 3

59:16

The bill special, Right.

Speaker 1

59:19

There's so many ways that could go wrong.

Speaker 3

59:25

Yeah, but yeah, there's there's always that. I tried to do the same with the chick GPT back in the day on SWAL that didn't go well for me.

Speaker 1

59:43

So one thing I'm curious about, like, are there common use cases you're seeing from specific type of types of customers, like enough where you could say, like this industry or this segment is betting really heavily lead on AI and m L and they're actually making progress.

Speaker 2

01:00:04

That's a good question.

Speaker 3

01:00:08

Finance definitely, is there insurance companies both on the customer support side and also on the risk and analysis and so and so forth. Essentially the risk andalysos they have been doing for for many years, but as I said, the hype, the result of the hype is they are now able to adopt things that are more advanced, easier, and in some cases cheaper than they were able to do in the past. So those are the two that we are seeing. The one thing that I was surprised

01:00:53

about is the consulting companies. Was so because as we're providing a tool, perhaps that's why as well we get a lot of interest from consulting companies, which are essentially some of them are actually building develops pipelines for other companies that are very interested in adopting our tools. So that was, you know, that's something that I wasn't expecting

01:01:32

expected for me. So I do not really know what kind of industries these consultant companies are working for, and some of them are really large, So I don't know if you.

Speaker 2

01:01:45

Know, if you can tell you no, I mean you say consulting. And I had two things give to mind, like like management consulting, of which we knew all along that what the words coming out of their mouth could easily be puppeted by an LM and maybe third party

01:02:02

contractors who are you're hiring to outsource some work. I could imagine they're trying to sneakly basically use LLMS instead of even cheaper because you know, supposedly it's even more cheaper labor than even outsourcing to deliver the work items.

Speaker 3

01:02:18

No, in this case, no, it was no. In this case, they are actually hired as experts were building a pipeline for their For instance, one of the cases that we're helping with is they're building a pipeline for consuming base models and retraining them or fine tizing them. So like, it's not that kind of work. It's not like adapting the LM for doing something. It's more about you know, taking base models, making sure the lineagures there, as bounds

01:02:57

are there, and so and so forth. It's what it essentially is is they are building CICD pipeliness, very complex c CD pipelines for these companies, and as far as I understand, they are working on multiple projects at the same time. So there is a lot of consultancy activity that we are I'm sure that there is more. It's just we're seeing a few of examples for adapting our tools,

01:03:29

but there's a lot of that going on. So that only tells me that it's like there is a bit of activity happening in industries that are known to use consultancy. So and to be honest, I am impressed on the fact that they are actually doing the right thing. It's like you don't at this point of the hype, you don't usually get the people doing the right thing and you know, building provenance, attestations, s bombs and so on and so forth. So when you see that, you're like, hmm, I'm impressed.

Speaker 2

01:04:12

No, I totally get it. I mean different field. But you know, we offer login and access control as a SaaS product, and we're talking with our customers and they say them things like, well, you know, because of these reasons, we decided to go with your product, And it makes me feel like really happy that at least there are some people out there that are that are thinking through this effectively, like whether or not they decided to use

01:04:32

what we have or not. It's like I can actually see how they're thinking about it, and it's so much better than like the rest of what I see, which is like, oh, no, you know, we decided to hack this together, or you know, we're not tracking what we're doing or auditing.

Speaker 3

01:04:45

In it any way.

Speaker 2

01:04:46

You know, it's you know, I hate to say yoloing it out there, because you know they can it's not the most critically important thing for them as they see it. Even though we no longer term tracking the models that are being used, and how you're fine tuning it, like actually being able to trace it back. It's like going out there and as well loves to say right to

01:05:07

production using v I to edit those production files. You know, it's great in the moment, but you know you got to trace that back to the get commit if you care about your production reliability.

Speaker 3

01:05:19

I mean, we actually encourred to a case where the model that was deployed was forgotten completely and it drifted, but it was still part of the application flow. It was still producing data to the application, and the application was putting that into into databases essentially, So it was at the end it wasn't something critical that they were doing, but it took them like they couldn't understand at first,

01:05:59

Like that's what they told us. They couldn't understand at first how that happened, and they weren't even aware that there was a model there at that point that was that was generating the the data that they are seeing was drifting. I mean it's it was doing a simple categorization, but then it drifted so much that it was always putting things into irrelevant categories that it wasn't supposed to put into so, and that's what happens with with AI.

01:06:38

That's why you need to be really careful with the pipeline. Like the v I example that that you have given.

01:06:46

It's like you can get away with ed editing something with v I in production if if something is if if that deployment changes in one time in fifteen years, yeah right, so, but you know, if you have a deployment that changes only once in fifteen years, you may get away with that because your input and output never changes, right, It's like that thing is if it is working, it is working unless you have a bucket and software never

01:07:25

has bugs. So but then with AI, it is going to change, whether that happens in six months, that happens in a year. You need to monitor that. You need to have a pipeline that will retrain and get rid of the drift and so on and so forth and all that. So I think that's where the main differences start to happen between DeVos and maybe mlops.

Speaker 2

01:07:55

That's a really good point, actually, I think you're worded that really well. You are only going to use mL in situations that are already so complex or changed so frequently to get the value out that you have to imagine that you built such a complex like pretend you didn't use mL. Think about how complex of a system you would have had in its place. There must have been so much testing around that, so much reliability, so much concern and fine tuning of that to get it right.

01:08:23

You can't just throw away all those extra pieces because you're using some sort of model now. You still need all of them in place otherwise, you know, just think about that legacy system that's thirty years old and has you know, maybe one hundred million lines of code, Like that's just absolutely ridiculous to be running as a critical piece of software, right, I.

Speaker 3

01:08:42

Mean, there are problems like for instance, again this is a real one, let's say that you're producing shirts, yeah, and you're distributing these shirts to all over the world. How do you know what size is to produce and where to distribute them? Someone needs to make that decision. You cannot send the same amount of large to every country, and you cannot send the same amount of xx large to every country.

Speaker 2

01:09:18

There was actually, I think there was a good paper that was released. I think it was the US Army where they were trying to devise standard kits for sizes, and they performed a number of physical measurements of everyone. Everyone drafted to decide, okay, you know, we're gonna have a small medium in large kit how big should those

01:09:35

things be? And after measuring and calculating the norms, they realized that there was never there was not one person in the set of like a thousand people that were able to fit into a standard distribution of one of those kids. They were like, it was like, it's really ridiculous and a good reminder that there is no such thing as following the norm and getting it actually right, and so there's no reason to believe that it would work in the eye world either.

Speaker 3

01:10:01

Yeah. But on the other hand, given enough data, you can actually predict me you know, how many how many larges that you need in China, and how many largest you need in US, and how many large you need in Canada because you have the historical data as well as other demography fixs and so on and so forth.

01:10:27

So when you combine all of that, and this is actually used somewhere, when you combine all of that, you can actually come up with a data say that, oh, you know what I need this many excels in Belgium, right, so, and that is very important because you know, sending the wrong shirt sizes healthway around the world where you won't be able to sell is a lot of cost.

Speaker 2

01:10:53

I worked for a long time in manufacturing and manufacturing logistics, so I am well aware of the challenges there. So I appreciate the example. We're getting pretty up on time here, so I'm actually wondering if it may be the moment that we switched to doing some picks.

Speaker 3

01:11:12

Let's do some picks. That's exciting, Yeah, do it?

Speaker 1

01:11:19

So, Warren, you want to kick us off?

Speaker 2

01:11:21

Yeah, you know, I always go first. I think that's that's the secret format here.

Speaker 1

01:11:25

You know, feel free to call me out and I'll go first at any point.

Speaker 2

01:11:28

No, No, I think I go first. I think that's the uh, that's the rule here. All our listeners will know that by now. So my pick I sort of alluded this to this earlier about psychological safety on teams. There's actually a paper that was released by Google in twenty sixteen about trying to figure out what makes a great team, and the actual article is released on The

01:11:52

New York Times twenty sixteen. What Google learned from its quest to build the perfect team, and I feel like in twenty two, twenty four, this shouldn't be a shocker anymore. But it's psychological safety and it's sort of ridiculous how important that is is really outlined in the paper, and that I still talk with companies today. My colleagues at other companies are ones that I consult for and it's

01:12:17

not their priority. And it's like, you look at data like this and the only conclusion you've come to is it's the number one thing that everyone should be doing, because it's the guaranteed way of making the most successful company with the most revenue, the happiest customers and the best employees. And yet people still aren't doing.

Speaker 3

01:12:35

It fair enough.

Speaker 1

01:12:38

All right, GOK, what'd you bring for a pick today?

Speaker 3

01:12:41

Yeah, so it was a little bit last minute. So I'm gonna pitch a coffee machine.

Speaker 1

01:12:49

I'm super interested.

Speaker 2

01:12:50

You have my full attention.

Speaker 3

01:12:53

Okay, So a little bit of background. There is something called a golden ratio in the coffee world. If you're doing drip coffee right there, that there is a something called golden ratio, which is the amount of time that your beans are gonna spend with the with the boiled water and there is an institution I can't remember what that uh measures these these uh uh coffee drip coffee

01:13:26

machines and issues the Golden ratio certificate. So this, yeah, the my my first introduction to this machine actually goes all the way back to Nokia, where we had these kitchens in every floor in Nokia, and there was in every kitchen there was one or two coffee machine that which you would go and make your own coffee, and uh it was so ridiculous that the coffee would run out and someone would have to make a new page.

01:14:05

And one thing that we have done is we actually put cameras on top of them so that we would know if there is coffee in there or not before we live our spot. So all of these machines and okay, buildings were MoCCA masters. And the thing about MoCCA masters is they are the simplest machines that you can think of. Like all their parts, you can just take them up apart and then put them in. Probably the most expensive piece in it is the copper wire which boils the water.

01:14:44

And that's actually the secret of it, because it boils the water into the correct degree at the correct time and let the water go through the beans your coffee with the correct ratio, and therefore it's considered one of the better coffee drip machines that you can get, and they're expensive. Right.

Speaker 1

01:15:11

And on that note, I just learned last week that the webcam was invented and I can't remember who it was. I want to say it was at Nokia. The webcam was invented because the engineers were tired of going to the break room finding the coffee machine empty, so they figured out how to hook a camera up to their network and invented the first webcam.

Speaker 3

01:15:40

Uh well, by the time I was there it was invented, it was, but it was a very common practice, Inchia.

Speaker 1

01:15:50

Yeah, rightly, so, rightly so, all right. So my pick is a little bit related to what we've been talking about to just with the early stages of AIML, you know, and how do we make this production ready. And so my pick is actually Voyager one and Voyager two, the spacecraft that are outside of our solar system now. And the reason I pick those is because just go and watch some YouTube videos about this or read articles whatever you preferred format, is the level of engineering done on

01:16:25

these things is so amazing. The data storage on these things is an eight track tape drive that's still working forty seven years later, and you know, it's gone so much further. Both of them have gone so much further than what they were ever planned to do. But when the engineers were designing it, they kind of hoped that this would be the scenario, and so they built for this and here we are utilizing it today and crazy

01:16:54

things like the backups that they have on there. The primary thrusters worked for a long time and then they rightly so they failed, and so after thirty seven years of sitting dormant, they fired up the backup thrusters and they worked flawlessly. And so just the level of engineering in this in the Voyager spacecraft, I think is worth spending a little bit of time to just acknowledge and admire and then reflect on how that same engineering philosophy

01:17:28

can be applied to your common everyday tasks. So Voyager one and Voyager two are my picks for the week.

Speaker 2

01:17:37

You got to upstage everyone there?

Speaker 1

01:17:39

Oh well, now, I just it just it was on my YouTube feed last night. It wasn't planned at all, but after watching it, that was like, wow, this relates to what I do on so many levels.

Speaker 2

01:17:52

There's an inspirational YouTube video sitting out there that we should get in the link section of this episode.

Speaker 1

01:18:00

I'll pull I'll pull on YouTube history and get the link and make sure it gets into the show notes. Sounds good, Yeah, because it was a cool video, super cool and it's only like twenty minutes. It's not one of those two hour ones. And on that note, Gorkam, thank you so much for joining us today. This has been a cool conversation. I'm looking forward to see how

01:18:22

it comes out. And now be sure to keep in touch, let us know what kind of things you're on, and when you come across something cool, I'd love to have you back on to talk about.

Speaker 2

01:18:32

It and figure out how it works.

Speaker 1

01:18:35

Awesome, And to all the listeners, thank you so much for listening. Be sure and reach out to us if you have questions, comments, episode ideas, or someone that you want us to have on the show. We'd love your input. Jarren, thanks for joining me today. Yeah, as always, all right, and we'll see y'all next week.

Transcript source: Provided by creator in RSS feed: download file

AI in DevOps: Managing Non-deterministic Workflows and Ensuring Model Lineage and Transparency - DevOps 214

Episode description

Transcript