¶ intro
Hello, everyone, and welcome to Open Observability Talks. I'm your host, Totan Horvitz, and here at Open Observability Talks, we talk about anything DevOps, observability, and open source. So, as I always say, may the open source be with you. Actually, I just came back from KubeCon North America in Atlanta. I even put on the T-shirt. For those of the video cast, let me show you this one. and as you can see from from my cube concert
We celebrated 10 years to the CNCF. Really, really exciting stuff. I still remember, by the way, blogging when the CNCF was established a decade ago and actually wrote a blog post now on this first decade in my own. personal experience also as a CNCF ambassador. So I'll share that on the show notes and I'll share some major updates on the breaking news section at the end of this episode. So stick around with us.
And with that, let's move on to today's episode. Generative AI is everywhere, right? Everyone these days builds some Gen AI apps. But how do we monitor and observe these workflows? OpenTelemetry has been a prominent tool and a standard for observability. We've been covering it extensively here on the show. And recently, the auto community has been aiming to expand its scope and cover the Gen AI.
workloads with semantic conventions and with tools. One initiative is around creating semantic conventions for Gen AI observability, and another is the Open LLMetry open source project that extends OTEL for this use case. In this episode, I invited Nir Gazit to tell us about these developments. Nir is the creator of the Open Telemetry project and is also a member of the Open Telemetry Gen AI SIG. So, hey Nir.
Hey, great to be here. Thank you so much for joining me on this podcast. Where are you joining us from? Tel Aviv, Israel. Finally, having someone on my time zone that makes it much, much easier to coordinate and someone who knows what it feels like to have this heat at the end of November. When the rest of the world is freezing cold. Yes. And you've been around data and AI for quite some time. Can you tell us about your background that led you for today?
Yeah, I worked at Google for many years, built ML models for predicting user engagement and retention for kind of other Google products. I also worked at Fiverr for a couple of years. Architect there. When I was there, we built, I don't know, a machine learning platform.
and data pipelines and also my co-founder who's by the way he's the co we are i'm not so creative of open energy i'm the co-creator together with this guy my co-founder and he was also working with me at fiverr and he was the lead for everything that I mentioned before. He was the director of platform at Fiverr. Amazing. And of course, today you mentioned co-founder Traceloop, your new venture today that is squarely in this area. So I'm going to say a bit about that one.
Yeah. So Trace Loop, we're a platform for monitoring and evaluations for Gen A High specifically. And I think, and we started it, me and Gal, my co-founder, two years ago, kind of right when... opening i released gpt3 was november 2022 and and we came you know with our background from fiverr building with open telemetry and we kind of it felt natural for us to take open telemetry to to the gen to the kind of now emerging genii world yeah sounds good and uh so i want to start
¶ what is observability for AI
before even talking about technology or open telemetry or anything else, just we talk about observability. I think my audience definitely knows a thing or two about the classic, let's call it observability in IT systems. But...
Maybe in the context of AI, what kind of observability needs are there for AI? So this is a really interesting question. I think, first of all... uh there are kind of two types of observability that you need for ai and this is a kind of different than if you compare it to kind of classic observability so you have let's say you have a database and you want to observe it so you want to look at kind of metrics like latency error rate uh maybe kind of
These are the basic things that you want to see when you observe a database. So if you go to AI, first of all, you have the same things, right? You want to look at latency. You want to see kind of slow... cause you're making to an LLM. You want to look at error rates. But then you have this whole new world of different types of error that you cannot just...
observed just by looking at the metadata of the request, right? So going back to the database example, you want to look at 500 errors and slow queries for your database. When looking at AI, you... probably going to have a lot of AI calls that ended up successfully, but from an engineer perspective, weren't successful at all because...
the ai hallucinated the ai just returned the wrong answer or i don't know there was a jailbreak like a jailbreak attempt so there's a lot of things that can go wrong with when you call an ai even though from By looking at the metadata, everything seems okay. Yeah. So, so.
it's important point because obviously the typical things that we typically make like red metrics the golden signals and so on are applicable but it's also important to realize the differences or the unique characteristics you mentioned a few like hallucinations like jailbreak attempts i maybe even add other things like i don't know
Bias detection, ethical LLM is a big thing these days that people obviously want to monitor some of them. You need to monitor to adhere to and conform to certain standards. So what do you see from the use cases, both in your startup and in the community work, are the main use cases that are, I would say, ai specific or special specialties that come out in observability you mean
Use cases for AI or like... For observability. No, no, for observability, for AI, but they're not, as you said, the classic one, but rather ones that people might not think of, people that don't come from AI workflow that, hey... If someone wants to, from our listeners, want to take away here and they're new to this field, be aware or at least pay extra attention in these areas that you want your observability there as well. So I think...
Everything that I mentioned before, you can kind of divide it into two types of metrics that you want to monitor. One is kind of the... a classic cost latency error rate but then the other hand you have like quality metrics so everything that we mentioned is kind of measuring the quality of the response and the classic way for measuring quality
Classic. Everything is super new. But the way it became common for measuring quality for AI systems today is by using another AI, another LLM. So this is this so-called LLM as a judgment method where you take... response you've gotten from one lm and then use another lm to judge that response and make sure that the response is correct and and this is kind of this opens another set of metrics and techniques on how to do it
accurately in a cost-sensitive way so that you don't go bankrupt just by monitoring your own AI system. Sounds good. And I think it's worthwhile mentioning also, if you look at the flows, they're single stage, they're multi-stage LLM workflows, maybe worth saying something about the complexity of what you said, because you said also layered LLMs.
right i think i think when you think about quality of ai there's a lot of things that you can measure right you can look at the kind of your lm workflow or agent is a black box you just okay there's an input the user has created the input and then the ai produced an output let's see if given that input this is the output that i expect to get or is this input like the
good one, they want the user to be able to send the AI. And then there are other methods to kind of go into the agent or the workflow and kind of make sure that everything inside was working correctly you can i like comparing it to testing to like classic engineer like classic testing for pieces of code that you write so you have the black post testing but then you can also have unit testing so you can kind of have an agent you can test each step the agent took separately you can test
And you can also test kind of like the overall flow of the agent. So if you think about it, it may be that the agent took the right decision each time, the output was correct, but then the flow, the route that he took to get to that answer was... really weird and really it took a really long time to get to that answer and it took a lot of different kind of path that you didn't expect to take this is sometimes called agent trajectories this is another like the third thing that you want to test
a monitor for an agent execution yeah and i think one of the things that you mentioned because you sort of said correctly that there's the black box and the white box approach i think the white box approach we all know again from the traditional i.t so i'm not going into the traditional i'm saying specifically in ai
there comes another dimension that is the dimension of explainability, which becomes more and more important. And I think here is the white box gets extra important also for being able to then take the reasoning. and report it back to either auditing purposes or to leadership or to regulatory reasons or whatnot, right? Right. But then...
But then this goes all the way back to just we need to be able to log everything that's happening in your AI system, right? You need to kind of have a standard way to log all the prompts, the completions, all the tool calling, every step. along the way that the agent took to get to the answer that it got to at the end
Yeah, so that's like sort of the multi-stage reasoning that when you get the visibility into this specific stage, you can at least try as much as possible. I know it's challenging and not trivial at all, but to try and provide some sort of explainability. your reasoning uh explain the reasoning uh to to justify why uh these things happen so uh and and you mentioned also we said tracing the the thing and i want this
It takes us maybe to the telemetry signal. So again, my audience knows traditional observability, as maybe you call it. So we know the traces, logs, metrics, and other signals. Again, I want to put the hat of the specific AI needs. What would be the types of telemetry signals and usage of these signals in the case of AI? So this is, again, where we diverge from classic observability. So again, going back to the database example.
uh you usually don't really care about like the actual query like the content of the query and everything that went between the database and your services or for example if you look at open telemetry then the uh Instrumentations for HTTP calls or instrumentations for database calls usually don't log the content of requests that are sent across your services.
In AI, it becomes really important. First, because I want all the quality metrics that I mentioned before. This will be another layer on top of the data that I want to collect. But also for, as you said, I want to explain why the AI made the decision.
it did there's a regulatory requirements like in the eu right now you have to log every prompt every completion everything that's went in and out of your ai system so you need to do more you can't just log the metadata of a certain kind of request you need to log what was happening inside so then if you want if you're thinking about ai monitoring and you're thinking about the trace of ai monitoring it's not just the different microservices that went and like how much time each one took
process that request you also want to log the content of each request like the prompts and the completions and of course then the metadata like tokens number of tokens latency another kind of request metadata that you have but the prompt and completion are really important and there alone are enough to to make everything much more complex when you think about how do i use open telemetry to to observe that
Yeah. And I think we should definitely dive into that. So, let's look again. Let's look at tracing, okay? What's tracing? uh ai what does it mean to trace ai apps are we tracing prompts are we tracing the model behavior what is the trace in the context of ai workflow so we're tracing everything that's happening on on the client side so we want to see
tool calling, you want to see multiple prompts, you want to see completions. We want to see everything that went out to the model and every step that you took.
in response to what the model returned to you so let's say you have an agent so how how how do you operate an agent you basically have a prompt that which is the user question with some kind of context that you gave it and then a couple of tools that the agent can call and then you send it to the aic the llm and the llm returns to you hey call this tool and and pass it these parameters we want to log this right we want to log the problem we want to log the
the the request to call the tool with the arguments that the llm asked us to to call then we want to log the tool response and then there's another llm called to kind of okay this is the this is what the tool responded to me now what should i do and and this kind of and fold within the element until we get to the final answer is what will become the trace of the agent execution and contains everything that we want to log.
Interesting. And it's important to say that this is, as you said, it starts on the client side and to the service side. So here we see in IT that many times the trades focus, for example, just on the backend or here, it's very, very important that the context flows.
from the original api call from the from the client side to the server side and back right although when i say client side i'm talking about your back end right i'm not talking i think there's an interesting differentiation between your back end and let's say open ai's back end so we have full visibility when i log everything on your back and everything that i said is happening on your back end but then
I don't have any visibility into what's happening in OpenAI. It's a shame. I would love to get an understanding of what's happening inside. Why, for example, and this is like a real story that happened to us, why when I upgraded to GPT-5, everything was just... so much slower without changing anything on my end i'm guessing it's it's something you know happening in there on their back end but i have no visibility to that so yeah
Client side is my backend in this case. You're calling to the backend of the LLM provider. Yes, exactly. makes sense and you touched before also on uh on metrics again metrics we know them and their value from traditional uh observability but what what are the metrics that you find or the best practices that you see around
AI observability with regards to metrics. So in this case, it's much more traditional, right? You want to see latency. You want to track cost. You want to track number of tokens, which is... kind of similar to cost. And then you can also track things like time to first token if you have streaming requests. And I would say that that kind of basically covers most of the metrics that you can extract from the LLM as is.
Then by applying some layer of processing, you can get the second layer of metrics, which I talked about before, all the quality metrics, right? These are also metrics that you want to... uh log and look at you at the end of the day you want to see like a grafana dashboard which contains cost latency but then also accuracy and and a hallucination rate or whatnot
Yeah, makes sense. And you mentioned cost. I think this is something that I see day in and day out, that this becomes the most significant cost factor with the cost of tokens when you use, of course, the not your own model. but the models provided and suddenly all the infrastructure, the things become negligible when you look at it. So I think here, while we know it from IT, I think in AI, as you said, the amount of tokens, the cost.
associated with that becomes like really, really crucial for the leadership to monitor, right? Exactly. Yeah, makes sense. So that was a bit of a background because I think it's important. And actually... maybe before we go on the technology the specifics i think one more thing is is um
that there are many concerns. Again, I think you touched it briefly, but I want to put the spotlight here because for those who don't know, it might not be intuitive. There are many concerns in AI that are different than classic cloud IT monitoring.
¶ AI observability differences from traditional observability
um like we talked about tracing but you know the amount so the throughput that you'll get of the amounts of invocations per second will not be as many but the latency on the other hand we're not less sensitive to the latency of the call itself because the The LLM takes factors, orders of magnitude longer. So can you tell us a bit how you see the difference of concerns between classic AI monitoring and observability and the AI ones?
Maybe it's a good time to talk a bit about OpenTelemetry, the project we started back in August 2023. So we knew about OpenTelemetry because of our days at Fiverr. kind of leading, helping implementing OpenTelemetry within Fiverr, which was great for us.
and and then when we went out and started trace loop we were looking for ways to observe ai systems and we you know we realized okay we should use open telemetry and we started building there were there weren't any instrumentations for any of the models that were out there you know open ai i think entropic had a kind of a cloud two model um and maybe bedrock has just started there back then and so we we didn't have the weren't any uh
open telemetry instrumentations for these models, so we built them. And then quickly enough, we discovered the issues. Okay, how do we log a prompt? So for us, okay, you know, you want to log... Let's go back. Let's think about, okay, I'm creating a span that will represent a call to OpenAI. What do I want to log as an attribute to that span?
So I want to log the model that I used, right? It makes sense. I want to log the number of tokens that it cost me. I want to log the cost maybe as well. I want to log other request parameters like... I don't know, the service level that I required from OpenAI and other kind of request parameters that I'm sending to OpenAI. And then I also want to log the prompt, the system prompt, the user prompt, and the completion, the response I got in front.
from OpenAI. And the problem is that these can get big. And, you know, with time, we got into bigger models with bigger context windows. So this means bigger prompts. and and we wanted to log them on on the as attribute on the span right is the only thing that makes sense the problem is that open telemetry wasn't designed for big attributes
And so there was a challenge. How can we log that? And we made the decision of logging them as an attribute because then all the post-processing that I mentioned before can easily be done. The problem is that...
All the other backend observability systems weren't designed to handle large attributes because, you know, in their mind, we are handling massive amount of volume of... of uh i don't know requests and spans that are created in systems that handle millions and hundreds of millions of of requests per second and so i cannot process
big attributes because then i will just you know i won't be able to process that many spans a second but if you look at the ai system so you have you you don't have that much scale you have much less spends per second just because each spend takes much longer to to complete right you call an open ai it will take you a couple of seconds no matter what you do uh so you have much less spends each spend costs you much more right each each request to open ai cost
much more than a request to Postgres. So by design, you're not making as many calls to an LLM as you would have made to a database. And so the scale is much lower. And then you may be able to make some compromises on the spend size because you have much more time to process each spend. And so there were a lot of kind of... discussions around, okay, how do we, how do we do that? And when, and when I built, when I founded Gen AI Sieg back in, it was April, 2024.
April 2024, we got into these decisions. And then OpenAI came up with the vision models, with the first vision models. And then the discussion was so much more complex because it's not just prompts that kind of, you know, take... one kilobyte, two kilobytes, now we have images which can wait and...
up until eight megabytes so you have eight megabytes of an image that you want to send on an attribute of a span it doesn't make really much sense so there were a lot of technical challenges that we need to to kind of solve when thinking about how to observe AI systems in a way that will be useful for the end developer.
Makes sense. And you mentioned briefly, and I also mentioned in the opening, so you mentioned the SIG. So we'll talk about a lot about the SIGs. Don't worry. SIG is a special interest group for those who don't know. several SIGs, special interest groups under the OpenTelemetry project, one of which is the one that was a fairly new one about the Gen AI, generative AI in OTEL.
with the aim to define the semantic dimensions. Don't worry, we'll dive into that. But I want to stay a bit with OpenLL imagery. So you talked about, first of all, the pains, and going back to my question about these... concerns in AI that are different than classic monitoring I think you brought it up just to summarize for for audience the several things first of all uh the throughput is we don't about I know millions of invocations per second because anyway it takes
maybe even a few seconds for the LLM to respond. So this puts a very strict upper limit to how many about the throughput. So on the one hand, we talk about much, much lower throughput for traces. On the other hand, uh and also on the latency of obviously because we're less in many many since systems with very very sensitive on the latency uh
And as well as the instrumentation latency adding to it here, latency anyway, again, the invocations to the LLMs are slow themselves. So the rest of the things and all the optimizations around that become much more negligible. On the other hand, payload size, the span attribute size. We talk about something much, much bigger, orders of magnitude bigger than what the traditional observability is because of this.
context that you need to pass along with the uh with the call so i think this is important first of all for the users to understand that on the one hand it's open telemetry on the other and there are aspects that are very different that needs to be considered also as end users also i think as background to understanding why we need all these this this uh brain power to think about tooling and about specifications and semantic dimensions to accommodate or to adjust
these use cases so i i think this was very clear and uh for those who are not living it day in and day out i think it's uh it's very good to understand which brings us to um as you said you started the project open elementary it's an open source project uh currently actually
¶ OpenLLMetry intro
uh it's apache 2.0 right uh if i'm not mistaken run by trace loop uh and don't give us so a bit of background what is open telemetry what's the give us the elevator pitch for open elementary
So OpenTelemetry began as an effort by us to build the missing instrumentations for Gen.AI that were missing from OpenTelemetry. So we started by covering all the LLM vendors that are out there. We continued to... vector databases and frameworks and so now we support almost 40 different providers vector databases and frameworks and we've been collaborating with a couple of them so you know
a couple of vector database owners are the co-maintainers of those instrumentations within open elementary and we've been also collaborating with other observability platforms that that wanted to support Gen.AI as well, namely Dynatrace, IBM, but also Google, Microsoft, and Amazon has been contributing code and supporting us through this journey. I think... so we started with you know building those instrumentations and quickly enough we realized that
it's not just that we're missing those instrumentations from the open telemetry project but we're also missing the current the correct semantic conventions and the solutions to how to log everything that i mentioned before images big prompts and and so on and so forth and so we've kind of defined our own set of semantic conventions that was back in august 2023 with time we've got in touch with the open 10mg community
And we established the Gen AI SIG. It was me and someone named Drew from Microsoft and Ludmila from Microsoft. Back then, now she's working at Grafana. And we began working on the Gen AI SIG in an attempt to kind of... standardize a semantic convention that we've built with open elementary and i think the main and and so the goal for the open elementary project has been
has become has become since then to be kind of the spearheading the open the open telemetry research for gen ai while we use like sig as a as a platform for standardizing the work and the best practices that we've built as part of the OpenLMG open source. So I think it's sort of like two main parts to OpenLMG as I see that one is the... the tooling or the instrumentation part and the other one is more of the
I get the open protocol extension and the associated semantic conventions. Would that be right? And then obviously the latter became or is feeding into the SIG now that we have a special interest. Exactly.
and the last thing we added was an sdk because you know we have the instrumentations if you go to our open source you will see in the packages the directory there's like lots of different directories for each instrumentations and these are all like separate pipe a python package so you can just use them directly but then also we have an sdk because we realize that people a lot of people who are coming from the ai world don't know how to use open telemetry and don't care
So we want to make it easy for them to just install OSDK in one line and then just use it. And then with time, we realized that there are a couple of kind of missing concepts in OpenTelemetry that we need for AI. And so, for example, we built annotations. a couple of decorators on Python that you can put on top of functions that will kind of mark the entry point.
for certain AI flows. So you're looking at agents, for example. So you want to mark, okay, this is an entry point for an agent, and these are entry points for different tools that the agent can call. You want to mark them. in your code so that we can log them and kind of rebuild the trace for you. We don't have these in OpenTelemetry. We can start a span in OpenTelemetry with an annotation, but that's it. We don't have like a notion of an agent or a tool.
that we need for AI. So we built them into the SDK. And then we realized that for AI observability, it's really, really important to log sessions because... As opposed to kind of, again, classic observability, you don't just care about a single trace. There's another concept of a session, which is composed out of multiple.
A session can be like a chat conversation you're having with a chatbot. You want to be able, we need to build another notion, which is bigger than a trace, that will kind of model. this entire conversation that the user had with an eye with in with an ai bot we wanna we have metrics that we want to log for these kind of sessions and so we needed to kind of extend on the concepts that we have in open telemetry to support the needs
of people who are building AI systems. Yeah, and I should say that the need for a session or whatever we will... define or flow or something like that that is sort of the augmentation of several traces together is definitely something that I've heard in the community in the past years as well also outside AI so I think this is a
provides yet another motivation and maybe riding this hype to get more attention to addressing this specification because you see that in other uh even if you look at traditional you know web-based interaction user interaction human user interaction if you think about that i don't know a login process in in a in a
uh on to a website could be not just one invocation of of a okay user login and get the success it could be a multiple multi-stage where you do some sort of a i don't know an authentication first and maybe even you go out to uh
know Google authentication or some other means of external and then you come back and then you go to the service and each one of these is its own trace maybe working with so I think uh on the one hand this is definitely need that I've seen arising in other use cases secondly uh this i think becomes even more relevant when it comes to ai so i i hope this gets us maybe this will be the incentive to get the community to address that across the board um
And you mentioned the decorators and annotations. You mentioned that for Python, right? do you what what programming languages are supported these days can you help uh so we support so python and types are the main languages we support we have some beta support for ruby and go and then we're also working on support for java and then we also built other ways to instrument your applications using open telemetry that we we realized that people will find useful
For example, we've recently released an open telemetry first LLM gateway. It's called Hub. And the idea is that for some companies, it might make more sense to just use an LLM gateway. that will sit between uh the lm uh the lm models and your kind of entire system and so and this can kind of log standard open telemetry spans for every request that's a going through it
And so we've built that. And it's also kind of classic open telemetry. Well, there's semantic conventions and everything just out of the box. When you say classic open telemetry, is that like an open telemetry collector instance or just to make sure? it's written in rust and it uses the rust as decay to output open telemetry spans yeah but then i mean it's it's standard open telemetry using all the semantic conventions that we
we've built with OpenTelemetry and GenAIC. And we've also been thinking about other ways that the OpenTelemetry can benefit from GenAI. So for example, we've, and this is super new, we just released it last week.
we've built an mcp server for open telemetry backends so any open telemetry backhand can now use another open source we build also apache too um can use that, can connect to that MCP server, and then you can basically connect, let's say, Grafana to Cursor or Cloud Code, which, you know, opens a whole set of possibilities on how can you kind of consume your production open dynamic traces.
metrics and logs in your development environment. And so we see kind of the way that Gen.ai... gen ai technologies is developing there will be a lot of these uh integration points between open telemetry and everything gen ai Makes sense. And yeah, MCP, that's hot news from last week, I guess. You can say another KubeCon hot update. Yes. That's great. And do you have for other, if you talk about these... protocols uh any anything else that you have in the works or in the roadmap
I don't know, A2A or anything else? Yeah, we've been also collaborating with all the A2A, Google's A2A, and agency, which is... which was founded by Cisco and now became part of the Linux Foundation, which is another agent-to-agent protocol. We've been collaborating with them all, and we have plans to kind of fully support them as part of open telemetry.
uh including you know propagating traces ids in the headers and everything that's that's needed and we've been also working hard on supporting all the frameworks that are coming out so you know you look at the at the at the space in the industry
At the beginning, LangGraph was the most prominent framework that everyone was talking about. Then came LangGraph, which kind of superseded LangChain. LamaIndex was kind of like the main competitor. But in you know november 2025 we have many many more frameworks that people are using and are becoming extremely popular so uh there's a crew ai uh on in python and master in typescript
and agno in python just to name a few and then also google became it came out with their own agents framework and amazon came out with their own agents framework and open ai came out with their own framework and we support them all and and i say it and it took you know
a lot of time to make sure that we support everything with all the semantics that we need and like the the domain and the market of agents framework today is huge and it's like there's more agent framework than than people building with agents yeah that's true that's uh
that's a lot happening there and uh keeping track of these is uh is a significant uh effort and also i think you know coming also wearing my my hat from the uh foundation hat both in the lf and the uh the lynx foundation the cncf you see that there's First of all, a journey to make these foundational and not vendor led. And secondly, to try and get the gather.
uh influencers and companies around that and i assume that this will ultimately get to a convergence process so we've seen that in the industry in many other things you get the divergence you know the diamond opening up and then the diamond closing uh on the divergence we're now definitely on the diversion side um but good good grasp for you and i think if we talk also about that
They're the main vendors that provide, you know, the LLM vendors. You mentioned OpenAI that you sound like you work a lot with them, with others like Anthropic and others.
how does it work with them like you mentioned before that you don't get the level of visibility from that black box that is the back end of the llms but what do you get what kinds of observability do we get into the workings of these vendors and maybe you have deeper integration with the ones other than over the others i have to hear more about that we none we literally don't get any visibility in what's happening within these models uh it's unfortunate i think
i think part of the reason is that they don't want to kind of tell their secrets and how do we how do they actually run this model behind the scenes because you know today we don't we know that
We know that these models are not just the neural networks that are powering them. We also have a lot of kind of engineering working behind the scenes to kind of... route to the right model or do you know run all those mixture of experts that we have since gpt4 and so we know that this is much more complex than just a neural network but we have no visibility in what's happening
Behind the scene. Unless you're using one of the open source models. So for example, VLLM. It's a popular open source for running LLMs. We've built together with IBM. instrumentation within vllm for vllm so now if you're using vllm you can get full visibility into what's happening inside the server that is powering the llm which is super great so If you're using your open source models, you get much more visibility in what's happening.
that's amazing so and then for the large one the commercial ones essentially so you get the granularity you get is the calls the call to uh anthropic or the call to api this is the granularity of what you can uh you can monitor right uh and you mentioned I think again just to uh emphasize a lot around vector databases right uh because these are for those who don't know these are the
storage for the embeddings that are generated by the LLMs, the embedding models and so on. So can you tell us a bit more about what you do with the vector database space? So again, vector databases, as you said, we need them to... uh for context engineering or for running rag pipeline so i want to send right context to the models to be able to answer a user's questions so i'm storing kind of my whole corpus of data.
in a vector database and then I'm fetching it whenever I need it. So we are logging... A knowledge base, essentially. Yeah, kind of a knowledge base. And so we are logging everything that's coming in and out of the vector database. So for example... the query and the contents of what we're
what was the vector that we sent to the vector database, what were the vectors or the responses, the data that the vector database has outputted, along with the scores. Because, for example, looking at the scores... maybe two sentences on vector databases. The way they work is that they use semantic similarities to fetch the most relevant texts to the query that you send them.
So the way they do it is that they have some scoring mechanism. Sometimes it's just a vector search, sometimes something else. And then... when they return the responses, they usually attach a score to each response. So we log that as well. So it can be used to kind of detect how well your regular database is performing. If you get low scores, then you might need to tweak. Maybe you're missing...
You're missing some data in your vector database for the queries that your users are asking. Maybe something else. We also do that. Sounds good. And I think in terms of when you look at your...
¶ OpenLLMetry latest updates and roadmap
So you mentioned a lot, you've been doing a lot in the integration space and everything. And actually some of these came hot off the presses in some of the integrations that you mentioned with new frameworks and so on. So all of that is really, really new. Really happy to see that coming out. What do you have on the roadmap?
well on the open elementary roadmap uh so i would say so now the open elementary is not just open elementary right it's open elementary plus hub plus the mcp server so we want to continue building and investing time in all all three of them and and for us you know number one priority is keeping up with the industry and specifically for gene i hope it will slow down in 2026 i don't know let's see but uh so far everything is moving fast
There's a new framework every other day, new models, new capabilities that we want to support. And so we want to keep up with this. This is like our number one priority. How can we make sure that we're supporting? everything, every new advancement that we see in the Gen.AI world or domain. And then B, we want to kind of make all the... tooling, all the OpenTelemetry native tooling that we've built work well and be as feature-rich as expected.
and be as useful to people so you know take the lm gateway as an example there's a lot of features you want to build into an lm gateway that we want to build and we're building this year and next year If you think about an MCP server, there's a lot of complexities into, okay, how do you make an MCP server useful for... for the coding agent that is using it. So the more we get users, and we just released it last week, the more users we get.
to use this mcp server the more we understand what kind of tools that we need to expose so that cloud code can actually use and fetch you know the right logs and and be able to solve production bugs more easily for example or like investigate and do root cause analysis for uh outages yeah And that's a good opportunity maybe to tell our audience where can they see all these great stuff where they can check them out and maybe even chime in on the discussion, get involved.
yes so everything is on github so github.com slash trace loop you'll find all the open source the main one is open elementary so you can also go just directly there is github.com slash trace loop slash open elementary and then the other ones are also a part of the organization uh we divided them into you know different languages so you have open elementary dash js
or go dash open elementary open elementary dash ruby and then separate from that we have the hub which is the lm gateway and open telemetry mcp which is the open mcp server And is there a place for people to chime in on discussions, chat, slacks, and other means of communication beyond the GitHub? Yes, we have the open elementary desktop community, which I welcome everyone to join if you want to kind of contribute code.
uh you welcome i'm always there i always if you go there you will see how many kind of answers i'm answering to our community members it's you can easily join at traceloop.com slash slack so we are on slack
And you can also find it if you go to the open source, there's a link there. And we love contributions. I always, I said it here, but I always like saying that everything is moving so fast and just... making sure that we support every new development in every so imagine you have four different models with databases and languages we and framework and frameworks we support across multiple languages
It's not easy to always make sure that we support the most up-to-date SDK, the most up-to-date API changes that these providers are making all the time. And so we love contributions and we try to make the process as painless as possible. We have tests that are making sure that we don't break existing behaviors for our users. And as long as the tests pass and lean past. I will make sure to kind of merge any PR and release new versions as fast as possible.
Amazing. And, you know, wearing my CNCF ambassador hat, obviously, and talking about open telemetry, the natural question would be, is this project that is currently hosted by TraceLoop? uh there's intentions to make it part of the cncf and specifically part of the open telemetry project i know the short answer is yes there is an invention but do you want to update the audience where where things are
Yes, there's an intention. I think it takes a bit more time to actually do that because there's a difference in speed, you know, open telemetry.
as a project which is much more mature than open telemetry moves fast moves slow uh because you know it needs there's a lot of constraints around open telemetry and open elementary needs to be able to move fast because it operates in an industry that moves much faster than like the classical ability industry and so we want like we are working on measuring it upstream but at the same time we are still working on open elementary because we want
Like for us, number one priority is to match and to be and support everything that is available on the GNI industry.
¶ OpenTelemetry GenAI Semantic Conventions SIG
makes sense and talking about the again on the on the cncf side and open telemetry side we talked about the uh uh open telemetry llm semantic conventions uh sig so uh it's been just as a background it's been like hotel had suggested goal to introduce semantic conventions for modern AI for some time. And I think this was the right way to plug it in. I think it even started as like LLM.
Otel LLM Semconv, right? The LLM Observability Sematic Inventions. And then it was rebranded as a generative AI observability or Otel Gen AI instrumentation. But it's essentially the same thing, right? Right. And so... the the the semcon is there it's a semantic convention it's there since uh you said april last year so it's already over over a year now yeah a year and a half ago uh and there's already uh
Again, for the audience, the SIG means that you have a project board. You can see what's been worked on. You have meetings, regular meetings. You have the Slack channel. So maybe you can give a bit of a background about just for people.
to chime in and follow the same maybe even join yes so uh there's the cncf uh slack workspace that you know everyone can join and if you search at google for the open uh telemetry gen ai sig you will find uh all the meeting summaries so everything is public i think this is what i love about open telemetry you can easily people ask me sometimes can i how can i join the open and what you see
you can join just search online you will find all the documentation if you go to the open telemg community repo so it's github.com open telemg community You will find all the information there. Everything is public. All the meeting summaries are documented there. There's a Google Doc. that we keep up to date with all the meeting summaries. And there's a Zoom link you can join. It's Tuesday at 7 p.m. Israel time, which is 9 a.m. Pacific.
So you can join the meetings. It's every week. We also do every other week kind of APAC-friendly meetings, which will be at, I think... 7 a.m. Israel, which is 9 p.m. Pacific, for people who are in East Asia. So we really want to have everyone from no matter where we are in the world to join the SIG and influence the decisions on how should we kind of instrument the Gen AI-based applications.
And who's, just to get a sense, you mentioned a few names before, but who's on the SEMCONF SIG, who is involved these days just for people to get a sense? So it's actually, it's really... quickly enough, became one of the largest SIGs in OpenTelemetry, in my opinion. Currently, we have their folks from Grafana, Microsoft, Google, and also other Gen. AI, it was really startups, you know, Arise.
and epidentic ai and we have really really people from all across the industry joining on a weekly basis discussing uh how to do air accessibility which is really cool if you think about it Yeah, for sure. And the output, just for people to understand, if you summarize the mission statement, what the outputs, the collaterals, the deliverables, sorry, that it's supposed to be producing?
we want to define all the semantic conventions that we need and we also want to work on the implementations for all the instrumentations so as we have a lot of them in openlm2 we want to figure out ways where we can eventually either merge them upstream or keep them within the openlm repo and and kind of standardize them there
Okay, sounds good. And for those who don't know the semantic convention, I know it's a bit abstract for many who are not involved with that. But ultimately, once semantic conventions are stabilized, they're then baked into, and then they're stamped, then...
each and every one of the sdks and the and the agent auto instrumentation agents and whatnot will then uh down it will sort of cascade down and they will support that so once and i'm also a lead for example in the cicd semantic conventions and in the service and and the resource metadata so
and deployment metadata. So I'm saying all of these, it doesn't matter which aspect of the system it is. Once the semantic conventions are there, Then they're being propagated into all the implementation side, both on the pipelines, on the auto collector, both on the generation side of the telemetry with SDKs and agents and whatnot. And then you get it out of the box. That's essentially the beauty of it. So for those who are less familiar with the Semantic Legends, because it's
For many people, it sounds like, yeah, it's very abstract. Why do I care about it? You care about it because once it's stabilized, you get it out of the box. And the other part where you should care about it is that when it's not yet stabilized, you can influence it. you can get involved. If you have a passion into the topic, please join, join Nir, join the community, join the calls or join the Slack channel for the async part of the discussion and help us shape.
these semantic conventions in a way that will best describe and enable your own workflows and your own applications. Yes, exactly. I think semantic conventions, yeah, sometimes it's hard to understand because at the end of the day... the output is a constant in Python. You have like a const which says, okay, what is the name? When I'm creating a span for an LLM call,
what is the name of the attribute of the request, of the model requested, okay? This is like an example of a semantic convention. And the type, and if it's an enum, then the values. Yeah, exactly. The reason it's important, what... okay so first you can think okay why why would i want to influence it and the reason you want to influence it because if you have something that is important for you to you want this to get logged then you want to go and come to this meeting and say hey
i want to log the a service level agreement that i'm sending to open ai this is like really specific to open ai but i want this log because it's really important for me okay comes with some of the conventions and bring you know and tell tell us about why is it important and then we can add it to semantic conventions which means that every opening instrumentation will automatically log that the good thing is that once this is stabilized
all the backends can use and can rely on this information to do all sorts of things to help you monitor your systems better. So if everyone knows that the requested model is in... gen underscore ai dot a model dot request request dot model then they can use this information to build dash to build automatic dashboards for you and to kind of surface information more easily if
on the other hand, each instrumentation logs this data in different attributes, then the backends can't really use this information and produce all this kind of nice tooling on top of it. So this is why it's really, really important. And this is why... Once it's stabilized, all the users can get a lot from that.
I think you explained it beautifully. And maybe one last question before we move on to the last part. Where can people actually follow Junia and maybe get in touch with you after the episode? So, first of all... I am available on LinkedIn. You just search for my name on LinkedIn. And then you can also find me on Twitter. I'm fairly active there. It's near underscore GA.
I'm also active on TikTok, even though I don't, I'm not Gen Z myself, but I started, I'm uploading these videos on LinkedIn for a while, talking about AI, Gen AI, observability and everything. and and at some point someone told me yeah you should just upload it on on tick tock as well maybe it will work
And it worked. So now I'm a TikTok influencer as well. So you can also follow me on LinkedIn, Twitter. I'm sorry, on TikTok as well, near underscore GA. I think I'm like, people... send me emails you can you can reach out to me even on my kind of work email near at tracer.com people sometimes ping me they say like hey can you talk to me can you walk me through the open source project i want to contribute and i'm saying i say sure yeah like let's go on a kind of a
30-minute Zoom call. I will walk you through the project. I will help you set it up so you can kind of contribute code. So if anyone wants my help, feel free to ping me. I will just send you my canadly and we can book some time.
Sounds good. And also, of course, on the show notes, I will put some references to follow up with Neil. And with that, I'd like to switch to a few quick breaking news. Neil, please stay with me. I'm sure that you will probably have some interesting insights on that as well.
¶ KubeCon updates: CrossPlane, Knative, Dragonfly, in-toto reached CNCF graduation
hot from uh kubecon beyond all the goodies that came with the open elementaries releases and the sig uh i guess that i will at least focus on the on the graduated projects i think a k-native as graduated for those who don't know middleware for building deploying managing modern serverless workloads on Kubernetes that takes care of all the auto scaling routing event delivery all of that stuff so
I think this is a major milestone for a project that has been around for quite some time. Actually, we recently had an episode here on the show about CADA, which is another CNCF project in the space. I'll share the link for that on the show notes as well. Kudos to all the maintainers bringing Knative to graduation, to maturity. And a few on this also cross-plane project reached graduation under the CNCF.
uh really nice stats like over 3 000 contributors from more than 450 organizations really uh really impressive uh momentum around cross plane top 10 of all the CNCF projects for contributor engagement and authorship. Really, really nice. And it nicely aligns with the V2.
a major release that crosspoint just had uh i think three months ago if i remember that i think the most notable was the composition and how you can now include any communities resource not just the cross-plane managed infrastructure and the vast integration just like in uh that we talked here the the integration is key so having integration with the equinities and helm and argo and flux and you know uh opa prometheus backstage whatnot this is also again a very
clear sign of the maturity of the project. We also had graduation of Dragonfly and Intoto. To be honest, I have less experience directly with them to comment on them. But maybe Nir, do you have any insights on any of these or your thoughts on these?
uh graduation of these projects uh it's uh i think it's great to hear that uh these projects projects regulation i think like uh i love the cncf kind of community and i think you know you're looking at looking at like these organizations that are helping the open source community prosper i think open like thinking about open telemetry. So I think CNCF is clearly becoming like the, you know, the top organization and the place where you want to be working on open source projects.
Yeah, definitely. And maybe the last note is that we at the CNCF, we just released a report with Slashdata about the state of cloud native development. Also interesting to see. I proposed the momentum. one of the more interesting stats that I've seen, I've been preaching that finally I have the quantitative measure, is that it's not just for DevOps. It started from there and people think Kubernetes and Prometheus and whatnot. But actually, 77% of backend developers reporting
using at least one cloud native technology. So this is to demystify this thing that no, it's primarily DevOps. I think it was very nice to see 77%. And we see also growing uses hybrid cloud usage. uh reason from two to thirty two percent almost a third across all developers
use a hybrid cloud model, also very interesting, and also multi-cloud deployments getting to 26%, again, over a quarter now. So really nice to see, because we all talked about Kubernetes being sort of a... common language in a way that abstracts across it doesn't matter which infrastructure you use here you can really see that in action as part of that so check out the stats i'll share obviously the the links there on the um on the survey and the the results
there, the state of cloud native development. That's all we had for today's episode. We're on time. So, Neil, thank you again so much for joining me on this episode. Thank you very much.
¶ outro
Great. And I'd like to obviously thank all the listeners and the viewers for the episode. As always, all the episodes are available on your favorite podcast app or on YouTube. So just...
Check out Open Observability Talks and you'll find it. You can also follow us on Blue Sky, on LinkedIn, on X. uh at open observe to uh stay up to date about uh about these topics about the episodes about sometimes we do also live streams so maybe you can make catch these live streams and just carry on the conversation I'm Dutan Horwitz. Thank you very much for listening and see you on next month's episode. And until then, may the open source be with you.
