The AI Skills Software Engineers Need to Learn Now - podcast episode cover

The AI Skills Software Engineers Need to Learn Now

Jan 07, 202644 minEp. 233
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Software engineers often think adding AI is just a simple API call, but moving from a Proof of Concept to a stable production system requires a completely different mindset.

Maria Vechtomova breaks down the harsh reality of MLOps, why rigorous evaluation is non-negotiable, and why autonomous agents are riskier than you think.


In this episode, we cover:

  • The essential MLOps principles every software engineer must learn
  • How to bridge the gap between a demo and a production-grade solution
  • Strategies for evaluating agents and detecting model drift
  • The security risks of customer service agents and prompt injection
  • Practical tips for using AI tools to boost your own productivity

Connect with Maria:

https://www.linkedin.com/in/maria-vechtomova


Timestamps:

00:00:00 - Intro

00:01:25 - Why the AI Hype Was Actually Good for Monitoring

00:03:07 - Real-World AI Use Cases That Deliver Actual Value

00:05:16 - MLOps Basics Every Software Engineer Needs to Know

00:08:08 - The Hidden Complexity of Deploying Agents to Production

00:12:02 - Minimum Requirements for Moving from PoC to Production

00:15:41 - Step-by-Step Guide to Evaluating AI Features Before Launch

00:18:08 - How to Handle Data Labeling and Drift Detection

00:21:55 - Why You Likely Need Custom Tools for Monitoring

00:24:56 - Why Engineers Build AI Features They Don't Need

00:26:01 - How Software Engineers Can Learn Data Science Principles

00:31:36 - The Dangerous Security Risks of Autonomous Customer Service Agents

00:34:44 - Why Human-in-the-Loop is Essential for Avoiding Reputational Damage

00:36:18 - Boosting Developer Productivity with Opinionated AI Prompts

00:39:20 - Using Voice Notes and AI to Organize Your Life


#MLOps #SoftwareEngineering #ArtificialIntelligence

Transcript

Intro

ML OPS is hard. They're always saying it's hard. It's not the beginner's job. It's very easy to deploy an AI application if you ignore all the hard parts. If you're using open AI or GPT 5, we can't even guarantee that today, like at 9:00 and at 12:00 is exactly the same model. They may have changed something, you have no idea and suddenly your system doesn't behave anymore. Step by step, you could trick it into revealing something that

you're not supposed to know. How do you even put guardrails in that? No one knows. Do I really need this complex? Do I even need an agent? Do I need an LLM? Those are the best questions. If you don't need it, don't use it. That's why I believe ML OPS will be. Very, very popular in an upcoming years. If you're a software engineer and responsible for bringing AI models to production, this episode is for you.

We go through the basics of ML OPS specifically for software engineers, how to build out your proof of concept into a production worthy solution with Evals, observability, and much, much more. Joining me today is Maria Vechtomova, cofounder of Marvelous Envelopes and cofounder of Cauchi. And she's been in the envelope space for over a decade, which makes her the perfect person to have this conversation with. So enjoy.

What excites me personally the most is that when you deliver value, and that's the tricky part because everyone is just overly excited about the genetic development and trying to apply it everywhere where it also doesn't necessarily make sense.

Why the AI Hype Was Actually Good for Monitoring

And that part doesn't necessarily excite me. But yeah, I think the whole LLM hype did a lot of good for ML OPS. Finally, we have attention to monitoring. That's a topic that, you know, was always important, but no one cared about it that much. It's kind of yeah, monitoring. We we know it's important. We don't have it. And now suddenly everyone started paying attention to it also because software engineers started doing it more right before they were just data

scientists. Now it's coming from a different angle. Yes, tangible and a product. Yes, indeed so that that's one of the things that I I definitely like about it. And also finally, we pay attention to serving that serving your agent becomes an important problem and in the past, all of these tools are, you know, kind of wacky. I, I don't like, I, you, you bump into limits very fast and now they realise that's a limitation and they're working on it. So that's what I also like about this.

Yeah, I like that a lot. What are some of the implementations you've seen that really deliver value that actually work? Yeah. Well, I don't know how much I can talk really about the use case. Yeah. Well, I think for internal processes, I think it makes a lot of sense when you have some kind of documents and you need to do something and every week and you spend hours on it, you can largely automate it. And parts of this decision making you know can be can be outsourced an agent.

Real-World AI Use Cases That Deliver Actual Value

So that's where I see a lot of value that time saving also NLP heavy applications in the past. I think telecom you have all this custom interactions and you need to forward to the right agent or to the right problem solution. And in the past, I mean I worked for a telco before, so I know this process is already pretty well defined. It's just someone had to go through it, routed to the right place. And now all of that can be done

with agents. And I think it's, yeah, it's a great opportunity and definitely it saves a lot of money there, yes. I've seen it specifically at booking for example, applied between the client, which is the user that just booked something, and then the person that offers the booking like a hotel or something. And they have something in between that gets the context from the dialogue that's going on and sees and checks for tone of voice and sees if it's actually appropriate to send to

the other the party. I was like, that's very interesting. And then it also quite quickly gets into a level of scale and complexity because you also have the language factory in there. Certain models are better with different languages and I have no clue, but it's it sounds like a fun challenge. Yeah, definitely.

And I think in in general A-Team spend so much time on, you know, doing hands on work where it's not per SE needed and can be outsourced, an agent, something related to analytics work, that business comes with the same questions all the time. I guess you could make dashboards, but instead you could have this kind of AIBI tools.

Of course, that must be applied with caution, but I think business has pretty good understanding of what they supposed to. And actually in fact analysts when they come up with the numbers, they're also validated with the business. So for a lot of these steps can be just skipped by giving directly access to the data and to this interface to the business users. And that's also what I think very valuable these days.

Yes. You've been involved in the ML OPS community for a long time, even before the the AI hype. Yeah, and now software engineers are really looking into fitting in Gen. AI solutions in their products.

MLOps Basics Every Software Engineer Needs to Know

What will they have to know with regards to envelopes from a basics perspective? Yeah. Well, I think they need to learn a little bit of data science, the basics of data science, because, you know, it's very easy to deploy an AI application if you ignore all the hard parts, OK. Because, you know, it's just regular software if you look at it that way, but all the parts around it are, you know, related to evaluation. Those are very close to data science.

The problem here is the data gathering. You actually need humans to come up with, you know, with expectations, what is expected from an agent and what is the correct answer. And that answer also can be vague, right? So you don't really know. So this clear evaluation of what we're evaluating for and what are the examples of this proper of this proper responses. So that must be gathered. And that's something that a lot of people just ignore like it doesn't exist.

It's not a problem at all. It's just also the the fine tuning of the agent, like how the agent is built is not really done properly. It's just, you know, it kind of works. It kind of something working. It comes up with something that, OK, yeah, it's a black box, but

it's not really a black box. It can be pretty, pretty well evaluated if you come up with these metrics first includes business users or whoever is using this application from the very beginning and have these kind of data sets available. And there are nice tools like a metal tracing where you can get very deep insights and what's happening within the agent like LLM calling, tool calling and well reasoning steps. All of this can be logged and you can also give feedback on

these steps. So each of these steps can be also evaluated by a human or by an LLM, and that can be used for enough monitoring purposes. Also things like, I don't know, on average we have 5 * 2 calling for an agent and now we have 30. So something is going wrong there. You want to get alerted in this kind of anomalies. So this is also something you want to pay attention to.

From what I've seen, people mostly do go the easy route ignoring all of these hard parts and they also disattach all the pieces, right? Typically you have some kind of data processing and if you have APD FS, you are going to parse this PDF, the OCR, the chunking, you use some vector search.

The Hidden Complexity of Deploying Agents to Production

You also maybe extract some metadata storage in some SQL database. And then you define tools for your for agent, maybe you have MCP server as well. So this process is kind of, it's something that needs to run periodically once you get new batch of data. So the in that sense, it's very similar to, you know, how data

scientists work. There is also data preprocessing work that is embedded into it. Then you have the agent definition itself that consists of, you know, the logic of the agent, the system prompts and all these components that are also moving a lot. Like if LM changes, it may start behaving weirdly right? Like there are so many moving pieces that you need to control and in that way, it's much harder to control than an ML model, the standard ML model.

And then you have piece when you deploy the the agent and well in I like data bricks a lot and I use data bricks a lot. So there are processes that ML flow allows for. So for example, when you register a new version of the model, the deployment job will start. And it's not just you're going to deploy right away. Now it has an evaluation step using, you know, this kind of a project I was talking about before. And then you have a manual approval step.

So actually human is going to look at it, maybe evaluate in certain ways, look at traces maybe and then we can go and deploy. You can also say, OK, maybe for some protesters we don't want the human to approve and maybe an agent it's also possible because it's an API call in them, right? So, so there are these components that you need to think about, but also the governance part of the of these agents, right?

So you have to, so if, if the agent is deployed behind endpoint, you have governance on top of the agent, but also on top of the LLM that it's using. So to ensure that there are guardrails, that there is no PII data involved or being processed, things like that, right? The funny thing is, like when I started with AI within a product and also I looked online, people are saying for software engineers, there's nothing new, right? Because it's just an API call away.

And right now, if I hear you say like evaluations, figuring out your context, figuring out what actually is valid with regards to using an agent or a model behind the scenes, than the governance and the guardrails, yeah, If that all of a sudden becomes also a responsibility of a software engineer, it's a lot on their plate because they don't just have to think of implementation, but also these are all data science principles. Yeah, yeah. ML OPS is hard. ML OPS is hard. What?

Can I say right? I'm always saying it's hard. It's not the beginner's job. Yeah. And it's hard for pretty much anyone because data scientists, they also just, you know, work with notebooks. It's pretty much industry standard, which I'm trying to fight. But yeah, that's the truth. They they are not great. The following software engineering practices it, it now gets better, but it's still not quite there. And software engineers, they often lack these data science principles.

They never learn them because it was never the their job. Now we need to kind of merge it together. And I think that for the first time in history, we actually get this. Opportunity to emerge it. And so that's why I believe that Melops will be it, yeah, will be very, very popular in the upcoming years. That is a fun and exciting time though. Yeah, definitely. Like I can get up and running really quickly. But that's just the proof of

concept. And I've had many people on on the podcast and they say the part from proof of concept to production is incredibly difficult. So from your perspective, what is the minimum set that I actually need in production? Because I feel like a real mature production solution is going to be very different from, let's say, my first version that goes to production.

Minimum Requirements for Moving from PoC to Production

Well, all of the components that I'm adjusting. That's already a lot, yeah. Yes, I haven't mentioned all of them Even so there are more. So like you want to monitor what's going in your systems, what's going out of your systems at. Like each step really you want

to keep. Track of it and there are tools for it today like a mouthful tracing and you can also dump all this information into open telemetry supported tooling like Datadog or you can put them in delta tables, sync it to delta tables, but also observability on top like who was calling that API. Yeah. And how long did it take to, you know, to give the response back and and maybe put some limits on certain users that they can't and use more than so many tokens or things like that?

So there are this kind of guardrails that you want to implement on top of that as well. And I find that it's also part of ML OPS. So yeah, you need to have this data processing pipelines, you need to have evaluation pipelines, you need to have deployment pipelines, human loop pipelines, and all the governance as well on top. And what I talk about as a Mallops principle a lot is traceability and reproducibility. You need to know what data was

used. You need to know what code was used, what the environment was used. And with ML models, it's most straightforward to them. I guess with the agents, because as I said, there are so many more moving pieces. We can't even guarantee that LLM, I know if you're using open AI that I know GPT 5 that today, like it's 9:00 and at 12:00 it's exactly the same model. No, you can't. Probably it's not it.

They may have changed something, you have no idea and suddenly your system doesn't behave anymore. I mean, it can happen. And even though you don't have a monitoring on top of that in place that I guess monitors in real time, whether your answers get skewed somehow based on LM evaluators, because I mean, there is no way you can let humans label things real time, right? No, no. But at least some kind of yeah, it gives some kind of idea. OK. And what you see, it's actually funny.

At the sorry con, we had this discussion about observability of MLS systems and there were some awesome people that do it like for a very long time. Todd Underwood, he I think he retired now, but back then not yet. He that Tropic head of SRV and together with Neil Murphy, they wrote a book on ML OPS in SRV. So it's, I mean, all of the things you talk about is still very much valid.

So at Azure, for example, I think to emulate their models, they they use whether internal employees give thumbs up to the model and how often and whether it's going to change over time. Because it's actually really hard to tie it to certain standard evaluations. And more generic your model is, the harder it is. So the more generic your agent is, the also the harder it is. So that's why I really believe in specialized agents that only do very specific tasks, or maybe

not even agents. Just tell them workflows. Now today, we could call everything an agent. Let's put a, let's put a pin on agents for a second cuz I, I want you to help me through kind of a step by step.

Step-by-Step Guide to Evaluating AI Features Before Launch

A lot of this information is partially new for me and I'm expecting for the audience as well since they're mainly software engineers. But let's say I have an existing product and I want to embed some type of Gen. AI feature, right? Whether it's PDF parsing and then getting context and prefilling some form or it's indeed being between chat and checking for tone of voice with regards to the communication that's there. I can call an API and I have a model, so I can get stuff up and

running quite quickly. Or what would be the next step? Is that evils? Is that more observability with regards to some of the metrics first time to token or measuring the performance? Or what would be the next step to go actually towards production? Well, what you're talking about is, not. An an agent really. So if it's just an API call, again, it's a very deterministic and well, more or less NLM workflow, right? Indeed, what you need to add is evaluation. So you need to have labels,

actual labels done for humans. So the tone of voice example, right, it's actually a classification problem. So the way you evaluate it is the same way as you would evaluate the classifier you have your. Accuracy of ones corridor and cost of your false negative, false positives, that's what you need to account for and you know, maximize for for the value. So yeah, I think that's a very straightforward example.

It's pure data science, except that instead of a model they can say I could learn you're using an LM to do that. When you're starting to label things with regards to evaluations, do you let something run in test acceptance or production where you have the data that you can label, or how do you typically approach that? Well, yeah, I think overall in development and production, we need to have access to the same data just to start. With it must be all production

data. And of course you shouldn't be able to write from development to production. So it's just to read access. So the data is the same. So you basically when we're talking about labeling, it's a separate process that is outside of your development cycle. I mean, it's tied together, but it's I I think it must be viewed as a separate stream. But it's always production data that you're, yes, Interesting. OK.

So then I will go live with something that is unlabeled typically in production to get to a step where I can label, Yes. OK, interesting. Yeah. That's a very interesting part in, in, of, of data science in generally, yeah. And then when I am on production and I can label things, this is where you pull in your business experts with regards to OK, what

How to Handle Data Labeling and Drift Detection

is your opinion on this or typically who does this? Yeah. But I think before you deploy, deploy this kind of classifier that uses LLM, you already you need to have examples and you need to already have someone, a human having labeled that already before ideally you can do it half human, half LLM labeling like there are different approaches. I think LLMS are pretty OK in labeling things, especially when they're straightforward as this

example. But I mean, I mean some examples are just not that straightforward. And you need align your human judgement with your LLM judgement. And to have this LLM judge you know to have certain level of. Accuracy, I guess. So that's that's one part. So you already need to have this label date and you already need to have late your agent and maybe fine tune your agent. So what can you fine TuneIn this

scenario? You can fine tune your prompt and they're like a melt flow prompt fine tuning. There is other flow that also does prompt fine tuning. That's basically like in data science. Instead of fine tuning you model, you fine tune your prompt. It's kind of the same process, right? And for classification, I would probably do it, you know, periodically, just like you're a training model, your actual model is behind it, especially if you don't host it yourself. It may change, right?

It may change. So you don't even know whether it's going to perform, you know, an expected way. How do you figure out if your model, whether it's behind an API? I mean self hosted doesn't really change because you would know about it. But typically those that are not yours, not self hosted, they can change. You mentioned that. How do you measure if they have changed? Yeah, well you need to monitor real time whenever possible or with a delay when your labels are delayed.

Yeah, it just so this LLM emulator that I was talking about. So let's say we take this example of this sentiment, what the sentiment is it you know, it's good. So we can we can see what LLM is evaluating and and that's something that we we can see skewness on in, in real time and we can put some alerts on top of that. So there are some detect detection of the drift for example that you can implement there. Gotcha.

And this the same type of metrics and signals you would use for, for example, adjusting your prompt? Yeah, because then you can see the same changes with regards to the data set that you already have. Exactly. Interesting. OK. So I understand evals. What tooling would you typically use for that? Do people build their own custom things or what is kind of mature on the platform out there? Well, I think you always have to build something custom anyways, because your problems are never

that easiest. Which now discussed, right, They are pretty complicated and what you're trying to evaluate for is pretty custom usually, right? Like for example, I don't know you want to process some documents and then in the end these documents must be stored in a very specific way. So you need to basically say is it correct? That is that way that it's now I would put it, is it actually correct? Some of them are pretty deterministic and you can just pay the antic maybe even.

But some of them are not necessarily that deterministic. So that that's not an evaluator that you can find in any library, right? So you have to create your own custom evaluation. But there are tools that facilitate this kind of custom evaluation like ML Flow. I mean, I don't know how many of these tools survive anyways, but I'm pretty sure ML Flow will survive so. Yeah. Now I like your thought process in that you will need something custom and the typically people

Why You Likely Need Custom Tools for Monitoring

build that around their own solution. But monitoring in general in the mail is very custom. Like, I mean, who cares anyways about accuracy? I mean kind of cares. But what we care about is the business value, right? So and to monitor business value, it's custom, it's always custom. So that's why monitoring is hard. That's why. I mean, monitoring accuracy is easy. I feel like monitoring but who business value is, is way more difficult. Exactly. Yeah, yeah.

How do you get to a point where you actually have proper metrics on the business value of your solution that you've built? Yeah. Well, I, I don't really have answer for that. I think no one really has because it, it depends. It really depends on the problem you're trying to solve. For some problems it's very straightforward. For some it's not. For classification, it's typically easier. You also can estimate the cost of your false negatives and

false positives. And I think that's what you need to, you know, kind of steer towards minimizing the loss. Yeah, yeah. I feel like this is where. Maximizing profit you. Know, yeah, I feel like this is where a lot of product understanding will also come to

the teams, yes. And even though those metrics are very far apart from maybe the feature that you're building, things like, OK, are users actually happy with regards to the functionality on our platform, the functionality that you're building that is contributing towards that? And happiness, depending on what your revenue model is of your product, whether it's retention or actually conversion, those things you will need to be aware of to actually be able to track

business value. Yeah, exactly. I don't know, Like there are so many different business problems you're trying to solve, right. So for example, cross sell on the on the web page. So I guess we do care if we show a product that the person may be added to the basket, but what we really care about is that the total basket value increases. That's what we care about. If they do add what we suggest, but the basket value is not increasing like why are we doing that anyways? It's.

Just different, yeah. So indeed, So what we steer our algorithm towards is often different that what we actually care about. And yeah, it's very hard to align it in general. Interesting. Yeah, I understand how these concepts come together from actually calling an API, let's say that has an an element behind it, adding evals and the right observations to catch drift and also figure out if the changes you make to a prompt are actually benefiting the product in the end of the feature in the

end. I've also seen people kind of struggle with regards to picking the right model, but if you have this in production, I feel like picking the right model for your feature becomes a lot easier. Now there's the question of if your feature requires many models, if you have the factor of language, for example, you will also have to hook into within the same feature, different models. That gets another layer of complexity.

But if you like, if you have these fundamentals, you can really build upon a solid foundation. Yeah. Is there anything that's still missing that you want to add on top of a set it set up like that? No, I I think we we covered

Why Engineers Build AI Features They Don't Need

pretty much the basics. Yeah, you can always make things more complicated like by having this multi agent system. So that and each of the parts must be evaluated. So you basically observability gets harder and harder, the extra level of complexity you add and then you need to start thinking do I really need this complexity just to start with maybe do I, do I even need an agent? Do I need an LLM? So this kind of questions we need to really ask ourselves. Those are the best questions.

If you don't need it, don't use it. Yeah, but I mean, it's, it's always like that people want to try all this new shiny cool things. We, I don't know, it always has been like that, right. And it's often resume driven developments. That's what they see. Yeah, that's the thing. So, yeah, people do it just because, you know, they want to learn it. They think that it's useful for their career, whether it's actually useful for the business and the problem they're solving not.

Always. You touched on 2 topics, 1 is knowledge and one is more organizational maturity.

How Software Engineers Can Learn Data Science Principles

I want to cover knowledge first because you mentioned these are very much data science principles in teams that you've seen operate effectively in the features that they build, do they usually have data science people embedded in that team? Or how do software engineers familiarize themselves with these concepts with the rest of envelopes? Yeah, Well, I think it it arise, it depends on the company.

I think there are some really good examples of teams that actually indeed think about that from data science perspective. And I think these are previously mature organisations that were doing the mallops well in the past. So they already have this kind of data science soft engineering way of thinking blended together. So those doing do it well, they know what they're doing.

But there are teams that, you know, they just had some software engineering teams and they just rebranded them into an agentic AI whatever. Yeah, no, they those don't do well. No. And also data scientists that I use the notebooks, and then they start also developing those systems, they also don't do well. So yeah, I think maturity here is the key. And we have a very, very long way to go to get there. Yeah. What's the best way for people to gain that knowledge?

Is it to go YouTube and check out some videos? Is it to go to conferences, content, read books? What is your advice? Yeah. I mean there is so just so much knowledge available already and I think you just need to know what people to follow and who who say something that is actually makes sense. I think there are a lot of really nice courses in Maven

that well I like a lot. We also have course in Maven, yes, LLM OPS with data breaks specifically, I think we talk about tools and principles first. So like how do you apply the principles to build proper systems? And yeah, we do use data rigs just because it's easy so embedded and there are two little examples of how to do it well on data rigs.

So that's why we teach it. But I think there are a lot of some courses like Hugo Bona Anderson, he also has a course on Maven. They focus on evaluations and building agents. So not a lot like the we focus really on the upside of things more and they focus more on building good agents, yeah. Yeah, what you mentioned in that an organization needs to be mature is really a foundation for a team to thrive in the 1st place already.

The fact that some organisations that have been in do not have production data available in the other environments and and data availability has always been a challenge for them. It means that actually getting whatever AI feature you're building to a mature level is going to be very challenging because you don't have solid foundation to build on top of, and then you get all the complexity of doing that in the first place. Yeah, yeah, I know for sure.

I think like if you have this foundations, you will be all right. If you don't, yeah, then invest in those first and don't try to do all the shiny things instead. Yeah. But I want to start with the results from a business perspective. Now I get that organizational maturity has always been complex. Yeah, indeed.

And like I think now we now move those into consulting and I guess before you're exposed to certain set and now the the set of things you look at, it becomes bigger and you just realize, you know, everyone is struggling. Everyone is struggling. Yeah, Yeah, it's hard. Yeah, I do like that a lot of people are looking into this and the fact that people are. That means when people figure things out, knowledge is being shared.

Or when new things come out, people try it, they figure things out and they talk to each other and they share that knowledge. That's always been the most fun of being part of this industry specifically. And I don't know any other industry that does that. Especially now I feel like it's it's happening a lot and I'm having a lot of fun with it specifically. Yeah, no, I agree. I also really enjoy sharing my knowledge. I write a lot on LinkedIn.

We also have sub stack and yeah, and that's and also writing a book. So it's a lot of effort writing a book. Yeah. But it's, it's fun because I think it's needed. And I think once you get some knowledge, some understanding, you need to share it with other people. So I think it must come from the right motivations because some just want followers. I don't know. And they don't know what they're talking about. That's what they also have seen, yeah.

Yeah, yeah, interesting. I like people that share knowledge and I do feel like great content sometimes you also have to pay for, but a lot of great content is also out there free on YouTube, for example, or a lot of news articles. So it's really whatever your preferences for consuming knowledge. I feel like that's out there. No, definitely. This is just very, very fresh. Yeah, no, we we also have a free ML OPS course on YouTube. It's ML OPS, so it's not about agents.

But I mean, it's really the same, like a lot of these things just the same. And I think people don't realize how close it is to each other. Yeah, yeah. Somehow it's kind of feels like it's a whole new world. I mean, it's not. It's been there forever for. You, it's obvious. Yeah, yeah, yeah. We put a pin on agents specifically because I feel like agents adds another layer of complexity because if you have different autonomous agents, especially in production, things

get even more complex. But let's say I have a mature organization with regressive data availability.

The Dangerous Security Risks of Autonomous Customer Service Agents

I have a set up where a model is behind an API call and I have my evaluation pipeline and I have my observability in production and potentially not even 1 model but different models and now I want to have some type of agent functionality. I feel like the kind of most cookie cutter example that I can think of is something in customer service with regards to having an interaction with a person and then autonomously fetching data from wherever it needs to about a certain ticket,

about a certain order. How do I put that in production? Yeah, I don't think anyone knows, no. That's really the freshest. Well, how to do it? Well, I think it's hard because, I mean, now we need to think about security and all the attack vectors that can happen on your systems. Because if you're dealing with actual customer data and some other customer can, I don't know by mistake or intent, by intent, get data from other customers. I now try to get information about.

I know how much sales there are. I think that they are not supposed to know, right? And that you can trick the system to reveal this information to, to. Yeah, it's very hard. It's, I think it's really impossible to completely prevent it from happening. And I think that's something that we must be aware of. And like what, what level of risk are you accepting? What is acceptable for your company because it's reputational risks in the end, right?

If someone I don't know, we go to an online store and we we want to know about the status of return or maybe something like that. And then you ask it, what about how many pairs of jeans did I order in the last year? I mean, that's all great information, but then you maybe can trick it into, OK and on average, how am I doing comparing to others? Oh, and by the way, like what was the total amount of sales and total amount of sales of pants? How is it compared to the

jackets? Things like that, right. I mean, step by step, you could trick it into revealing something that you're not supposed to know. Yeah. And how do you even put guardrios in that? I don't know yet. No. No one knows. OK. No one knows. When Cha Chi PT came out three 3.5 in its earliest versions, a colleague of mine got really excited to really figure out, OK, what are the boundaries of this and how things have

evolved. We have 4, we have 55.15.2 and other companies also have different models. Gemini 3, Point O, Obus 1.5, everything like that. But he continued on this path, mainly figuring out if he can still prompt inject whatever malicious intent from the perspective of learning, but also from a security perspective. And he agrees with you that whatever model comes out, it will have flaws from a prompt

injection perspective. We haven't solved that yet, which means that no, like we haven't, I don't know if we ever can. Like my knowledge, I feel like is too too shallow for that. But that also means that agents should be in very simple use cases. They should not have any availability to data which can cause reputational damage or you will have to build the right guardrails around. The nation, the one in the loop,

Why Human-in-the-Loop is Essential for Avoiding Reputational Damage

Yeah. So what I actually like about, so I would like to see agents, it's like a personal assistant. OK, Right. Yeah. So for example, if you have kind of customer service communication is that a person types something and then actually an agent generates a response. And then maybe in certain way certain scenarios there is not agent that every ways. I mean, is it safe to say that? But also and and if that is certain, I guess OK, but human actually clicking on yes every time.

So actually the human is doing the sending, but not typing anymore and also not execution of the tasks. So and I guess certain things can go automatically, but just some generic things like asking for certain information from the customer that's required to retrieve the information there. You don't need an agent to ask like a human agent to be involved. But at certain stage when we get into certain questions, I mean, always human must be actually

clicking, yeah. And that's not the most exciting work. That's the work that typically you would be like, let's automate that also. I don't think we can. I don't think we will ever be able to. Yeah, that's the challenge. I've had a lot of fun working with agents specifically for software engineering for for the producing of my code in whatever context I need to to develop features.

And not everyone agrees with me. Some people really like the craft part of things or writing, writing good code that they think is really elegant or apartment for the solution, take

Boosting Developer Productivity with Opinionated AI Prompts

a lot of pride in that. So there's a lot of aspects with regards to the people I talk to, either they're in one camp, I don't like it. Code generation is kind of taken away from my craft and that's what I found joy in. And then I say to really just keep doing that and see how else it can make you productive or people like me where they really enjoy kind of the productivity boost and they want to work

towards outcomes. Yeah. I'm wondering from your sense, what have you found that works well for your own producing, whatever that is? Yeah, I I'm opinionated in how my clothes should look like, but I already have enough examples for, you know, clothes to generate something that is similar to what I think is good.

And I have critically instructions of that should look like and I think the more examples you have, the more clear instructions are the better, and the closer the output is the to what you need and what you like. And. What you find acceptable and then it is a productivity boost for sure. So I think you just need to find the right balance between these two things. And I mean, it's not easy.

It's not easy. Let's zoom in on that because I feel like opinionated people, they have it easy because they have an opinion. So they know kind of how to define at least some kind of rules of guidance for something that generates code. Let's say if you don't have an opinion that you need to figure out what what your opinion is in the 1st place before you can actually have an opinion on what the output is. But how do you then structure your opinion? Is it in skills?

Is it in a Claude MD file to take Claude as as a specific example? Yeah. How do you structure that? Well I I am yet to try skills. I am behind on this, but yeah, it is typically in cloud. Cloud MD file and other MD files that have been pointed from Cloud MDI. Mean what I'm saying now is probably not already the best practice if you're looking at the skills what's available there.

And I have some MCPS configured, so I know depending on the tasks that I'm doing and for, for, for example, we also now work with linear and we have some brainstorm sessions and we write things down that should be there. And based on this example, we have an also a prompt, an instruction. And based on that, it will generate tickets in linear expect as as expected. You can even maybe when you're in the car, you can dictate things on the phone and on Apple.

On the iPhone you can also get transcriptions right away. You could also use to. You're always working, even when driving. Yeah, yeah, yeah. So I mean, there are so many different tricks you could, I think these are really productivity boost. So like when you have a conversation, you can record it and transcribe it and use it to create a summary and make a log book of everything that's happening in your life.

And it will. You will be very thankful for doing that and a half a year later when you already forgot what you were doing and having an interface to actually search through that. And I think code, for example, can connect to your Google Drive and actually be able to retrieve this information for you. So I think for me, it goes way beyond just code. Yeah.

Using Voice Notes and AI to Organize Your Life

I like that a lot. I wonder if we get to a future where so when computers came out we started typing and then when phones came out we started typing on our phones. But the way I can speak is way faster than my ability to type. Exactly. And now if I want to be productive, if there are more, more tools for me to actually speak to. And that then becomes context

for something that executes. If we move to a society where people speak a lot more than they are also better at communicating, better at kind of explaining themselves instead of typing. Because I feel like the people that type are not necessarily the best communicators if they only type, and for me to be more productive if I don't have to speak, that would be good, I think as a personal skill as well as a life skill. Yeah, yeah, Yeah. No, there is something there for sure.

But I think it's still different talking to yourself versus talking to another human, right? Yeah, yeah. But yeah, I agree. I think it's a it's a good exercise. It's interesting. To have your thoughts more concise. And yeah, I do like writing, though I don't know, I like writing more than talking. Somehow I. Mean it's. Do you like reading? No, not personally interesting. Well, no, actually. Well, I just don't have that much time for reading.

I think that's it really. So I'm listening more like podcasts and stuff or books actually, I'm listening to the books, but it's mostly because I just don't have time. And when I'm driving, I'm trying to squeeze all the stuff that I want to do into that time. So then I'm talking and

listening. But it's not my natural preference because if I writing and reading, because if I, I don't know if I'm searching for information, I wouldn't go to YouTube channel unless I really, really have to just because I don't know, maybe it's my personality, but I rarely like the voice of other people talking or I don't know why it's, it's really, it's, it's me really, right?

But it's always been like that. It's not like so I I do like reading more because it goes through more through my lens somehow, right? Yeah, you get. That yeah. So it's easier for me to to get this information out of the text and visualization rather than, you know, hearing. I think it's really, really cool that you say, OK, this is my kind of natural preference, but

I just don't have time for that. So I do this instead, which is still like everything in mind with the rest of productivity. Yeah, I learnt something new about specifically Claude and skills because previously I had a big Claude MD file with my code conventions and a colleague of me and said I had so many that Claude started complaining because when you load in a Claude MD file, that's too much. He said it's about I think 1000 or 2000 lines. There's guidelines on this on

the Anthropic website. Then it might actually miss parts of that contacts that it loads in. And he says that's when I use skills. So code conventions, styles with regards to a certain controller or yeah, that's what he's been using and he's really happy with that. Everyone is so happy with skills, so now it's Christmas vacation. Yeah, you're gonna. You're gonna.

I'm gonna do that, yes. I'm finally gonna do because I'm just so much stuff I need to do. Of course, just this little tiny thing to try already feels too much. Yeah. Where I really use skills, my own thing is mainly for content creation. So episodes come out, I have a transcript of an episode, I need to figure out titles and thumbnails and descriptions and timestamps. I made that into a skill. I was like, let's try this thing out, and it actually works quite well. I'm happy with it.

That's awesome, yeah. But everything with regards to personal productivity and experimenting, I feel like is it's just really fun to do. To be honest. But I I don't really have like very large files like Cloud MD and other MDI mean they're still manageable, but they're per repos. I have many repos and in those repos you have this. But yeah, it's like I I do like mono repos but not for personal projects somehow like.

How do you split those up? Personal projects, not minor mono repo or. No, it's not mono repo. It's all separate things. Separate repos? Like even tiny things. Relatively tiny, I think so different people. Why is that? You just want them isolated. Yeah, I just want them isolated. Yeah, OK. Yeah, I know it's personal preference. Yeah, yeah, interesting. But when you talk about, I don't know, actual production, mono repo makes a lot of sense for different use cases.

Yeah, for sure. Yeah, I think sometimes that might be controversial or kind of the the thought on the Internet has shifted around. First mono repos were evil and now people are coming back on that and people are saying it's actually quite stable. Yeah, right. It's so funny how the the world changes in that regard. Yeah. Yeah, definitely.

We've gone through, I feel like more fundamentals of envelope specifically for software engineers going to production with a first model and then also the challenges of having agents and a little bit of personal productivity in that conversation. Is there anything that we miss that you still want to share? No, I still recover the Lords. Awesome. And thank you so much for coming on. I really, I really enjoyed this. Yeah. It was fun. Cool. We're going to round it off

here. If you're still listening, let me know in the comments section what you thought of this episode and we'll see you in the next one.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android