Hello, and welcome to the data engineering podcast, the show about modern data management. If you lead a data team, you know this pain. Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one off tools instead of doing actual data work.
Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data while keeping it all secure. Type a prompt like build me a self-service reporting tool that lets teams query customer metrics from Databricks, and they get a production ready app with the permissions and governance built in. They can self serve, and you get your time back. It's data democratization without the chaos.
Check out Retool at dataengineeringpodcast.com slash Retool today, that's r e t o o l, and see how other data teams are scaling self-service. Because let's be honest, we all need to retool how we handle data requests.
Your host is Tobias Maci, and today, I'd like to welcome back Lucas Tlozin and Drew Gilson about the application of semantic layers to context engineering for agentic and analytics. Analytics. So, Lukas, for anybody who hasn't listened to the past episodes, if you can just give a quick introduction.
Yeah. Thank you so much for having us here today.
I was a data analyst for many years, ran data teams at a few different companies after that, and found my way to Looka to be part of the executive team there. Looka is a BI tool that then got acquired by Google, and at Google I had the opportunity to run the data analytics and AI product team. So that was a really interesting time. And with that, right, my co founder Drew will introduce himself in a second. We got to see what is happening and what is about to happen. And I'm a passionate chess player. Right? And so I thought about what is the seven moves out? What is gonna happen next? Right? And that's where, we thought, you know, we need to leave Google because we can move much faster if we are out there and building the product that will happen in a few moves out, in a couple years out. And, so we left Build Gravity and and Orion as our product. And so I'm very excited to touch on that a little bit here today, but, before I go too excited and too deep into it, let me hand it over to Drew. Quick note, I'm father of four daughters here in Boulder, Colorado.
Thanks so much, Lucas. So similar to Lucas, I spent years in the Looker ecosystem, first as a customer, and then I ran a forward deployed engineering team helping Looker's largest customers be successful with the product. And then I joined Google through the acquisition. Lucas and I have worked together through both of the Looker journey and then the Google journey for many years now. I finished my career at Google in the Cloud AI product group, and then we left together in April 2024.
And as Lucas said, we just we're so excited when we saw the capabilities emerging with the large models that we have access to today that we just, imagined a very different and exciting future for the analytics and data engineering space. We'll tell you more about that today.
As you were looking ahead, you decided that you needed to use gravity to make some waves?
Yeah. Yeah. Absolutely. I mean, you can you know, Google is an amazing company and, you know, it's I I I love the people there, but it's just your your constraint. You're constrained when you're operating. Cloud alone is over 50,000 people.
If you can be out here, test all the different AIs, all the different LLMs, be able to connect to all the different databases, work with teams, not just in the ecosystem of Google, but, you know, Snowflake, Databricks, the DBT community is amazing, and you you can just move so much faster and learn so much quicker.
It it is interesting to think that there is an alternate reality where if OpenAI had not decided to expose GPT-three as a chatbot, we may actually not have gone down the path that we've gone down over the last couple of years. It might still be that inside of Google, we had maybe you know, a GPT four class model, but perhaps nobody else would have gotten the benefit of it. So I I think that would have been unfortunate. I'm glad things happened the way they did.
And so digging now more into what you're building with Orion
at Gravity. We've talked a little bit about it in past episodes and I'll add links in the show notes, but I'm just wondering if you can talk through some of the capabilities that are unlocked by the introduction of these sophisticated transformer models and some of the ways that as we've gone through the discovery journey as an entire ecosystem, how you have worked through some of those early learnings in terms of how to actually use these models effectively,
particularly for a use case where you need to have a high degree of trust in the output in order to be able to drive a business in a particular direction?
Yeah. I mean, in any enterprise is the most important thing, right? Once you lose trust, you it's really, really hard to ever gain it back. So we had to make sure, first and foremost, that we will never lose the trust. We're 100% accurate. And with AI, right, that presents some unique challenges. We can dive a bit deeper into that in a moment. But what we got to see and experience through leading the team at Looker and then at Google was that we got to work with, you know, we got to work with Walmart. Before the acquisition, we got to work with Amazon. Right? We got to, work with Disney and Uber and all these larger, really cutting edge data companies out there. And the themes we saw were the same over and over. Every time we came in, we found
hundreds of thousands of dollars a week in savings, or additional revenue opportunities, where just things weren't allocated correctly, right? And it wasn't really the team's
fault that, you know, the data team's fault that these were not uncovered. They just didn't have the bandwidth to actually dig deeper and ask, well, why did revenue change? Why wasn't inventory in stock over here when it should have been? Right, and so as AI matured and the LLMs became much more capable, we thought now we can actually build sophisticated AI analyst patterns of, know, how to perform great analysis
to uncover the why and then the recommended actions to take, which I think that has been missing a lot of times too. We built a dashboard, and people look at it, the numbers are green, nobody does anything. But like actually peeling back a couple layers and discovering what is going on under the hood, why are these happening, and even a green number can actually tell us a lot about what we can do better in the business. That's where the starting point of Orion was, and of course then technologies have matured a lot, our own, including over the last year and a half or two years here. And now it's it's just phenomenal of how it just can build ad hoc dashboards,
slide decks, prep you for the next meetings as it sees it coming up, right, make sure your account managers are fully informed before they meet with the customer. But all these different use cases that are now popping up.
And I think it's important for the data engineers and AI engineers and analytics engineers on the podcast to understand that our focus is really on making sure that all the great work that that you've done over the last, you know, ten to fifteen years is used by the business as the business has access to these amazing new tools. And so we're really not looking to change necessarily
the hard work that goes into providing a clean data pipeline that allows the business to ask questions. We're more focused on making sure that the folks in the business are asking great questions because we can part of the software coaches them to do so, and that there's a feedback loop that's maybe a little bit tighter than it used to be in the past to make sure that the data products that you're providing are exactly what your stakeholders out in the business need. And that's something that historically hasn't really been well served by business intelligence.
And that the business understands how to use the data. Right? That's the other thing where, like, I think a really well highly resourced team, right, you can have an analyst sitting right there with the team, right, with the decision makers, and tell them, hey. I saw this spike over here. I saw this drop over here, and this is what I think it means and what we might wanna do about it to test if we can change the numbers. You know, with the amazing context engineers now helping put this together, right, in Orion in a governed way, Orion is then able to actually tell, you know, the head of marketing or associate on the inventory team, here's what I'm seeing, and here's what you could do about it. I think that is really important where
we invested so much in data. Our data engineering teams built pipelines, we set up data warehouses, we built all these dashboards, and still people didn't necessarily take actions. That's really the gap we want to close. So can we provide a platform where you can do the context engineering that then helps Orion provide one on one guidance to the business users on here's what I'm seeing. Do you have any questions about this? Here are the actions I think you should be taking.
And so one of the key aspects of what you might think of as context when you're talking about things like data warehouses and data analytics is this idea of the semantic layer that's been gaining a lot of popularity alongside tools like dbt and Looker and LookML, just the more software engineering oriented aspect of building these analytics workflows.
And given some of the lessons that you've learned from your experience at Looker and at Google, how has that concept of the semantic layer and semantic modeling informed the ways that you're thinking about the product and use cases and user experience around the Orion product?
Yeah. So I was the head of data for a company when I discovered Looker back in the days. But, and, my idea was, like, I need every team to be able to self serve, so I need to create a foundation that is the ground source of truth where they can explore
in a safe, guardrailed environment. And I think that still is very much what we need. I think any enterprise out there, you as a data leader, you wanna provide the guardrails, but then also let people go and explore and actually see, you know, because they know a lot of times the questions they should be asking or where where to look a bit further. Right? They have a lot of domain expertise. And you as a data leader don't have all of that, but you understand the data really well. So as we were leaving Looker and then Google, you know, there are some challenges with Looker and LookML. There are some challenges with all context layers and and data model layers, and that is to keep them up to date can be a lot of work. So what we now did with Orion is that it actually reads, you know, it reads your model files, your dbt model files, it reads any kind of dashboarding tooling you have, including Lookout, of course. It reads the metadata in your database, and it puts together an understanding, its knowledge base, of of your data. That is about 50% of Orion. The other 50% is the business context. It quickly builds out the business context and understanding of the functions that it serves. And then you, a data leader, can say, I agree or I disagree over here on what you learned. Those are really good guesses, you know, that Orion made based on the patterns it sees, but then you have the control of saying, no. Over here, we wanna do this, or actually, this business we discontinued.
Ignore this. You know, there's just no historic data about it, so that's why you didn't see that. So you can really quickly get to that controlled environment, and we actually went to the point of, partitioning Orion. So you can run a multi tenant instance, or you could call it custom agents if you want. A couple of people have asked if they can call it custom agents because I think they have descriptions that they wanna build a custom agent. So you can build these custom agents within Orion where it's guardrails, and you say, okay, for the executive team, you know, this is how Orion should think. There's some nuances to how I want you to deal with things. Or this particular custom agent or project is for one of our customers, and in that, we actually want you to handle things in x y z, and there the access is restricted.
They can't ask any questions about competitors, why those things are all set up within Orion, so you can be I use this analogy of you can be the conductor of orchestra. You know, the orchestra, all these different tenants of Orion executing for you, and you control at the knowledge base level, at the context level, who can see what and how they interact with a very, very capable AI to get the data stories.
I'll add a little bit of color about how our understanding of the value of a semantic layer has evolved as we've gone on this journey. Because I think that's always very interesting, particularly because we've been on this podcast now a few times talking about it. So even in the last six month months, I think our understanding of the different sources of context
has evolved quite a bit. I think we've learned that a semantic layer is is one source of context, and it's often a really good one. But it's not necessary, and it's not, sufficient. So it's not necessary because we've seen in the latest class of frontier models that if you have an LLM with access to a well described schema, your key business terminology,
and potentially common query patterns and a few other details, the LLM is very likely gonna write correct SQL without needing a compiled semantic layer. And it's not sufficient because even if the semantic layer was was perfect and everybody agreed upon it, it still doesn't tell you why a metric matters and who cares about it. And so that aspect of context engineering is is extremely important.
And I just wanna make sure that we talk about it because for the folks listening, I think it's not necessarily intuitive to go gather that kind information about your organization. The context that we gather about how the business operates
has, in some respects, very little to do with the data itself. And I think you can provide a much richer and more valuable experience to the business user if you take the time to go and gather the information about what it is that they are trying to achieve, what are they accountable for,
what is it that their boss expects of them, and what are the actions that they can take in the business. So what what is their span of control? And so a lot of our product thinking is aligned around that because there's a huge opportunity there. There's many different ways we can structure data and provide context about the data. DBT is one of them. LookML is one of them. There's a bunch of other options, but it's only a small piece of the puzzle. And had you asked me that a year ago when we were on this podcast I think I might have disagreed. And so it's just fascinating of course the space evolves quickly and I like hearing people on podcasts talk about how their opinions change. So that's a changed opinion on my part.
And the idea of context engineering is
possibly one of the most frequently used terms that I have heard both in the podcast and in my day to day work over the past six months to a year. And, obviously, the semantic layer is a key element of that context, but there are also a number of organizations that haven't actually built a dedicated semantic layer yet. And I'm just wondering how you're helping them bootstrap some of that by virtue of having some of these other elements of context that you can ingest such as warehouse schemas,
dbt, or LookML code bases, doing some data sampling from the tables and things like that, and also some of the elements where you need to have a human in the loop in order to build a proper representation beyond just the mechanical and kind of physical elements that you have access to programmatically.
Yeah. This is, I mean, you know, quick flashback to back in the days, right? Ten years ago, we landed off for, like, one of the biggest Looker customers we landed back in the days, and I flew out there with our team. We're just like, we do everything possible to make them successful. And we got there, and I had 6,000 Tableau workbooks sitting on thousands of materialized and temporary
views on tons of tables underneath. We were just like, shoot, like this is you know, even if we get a warehouse full of whiteboards,
it would take us a year to map out this spaghetti net that they built, you know, with acquisitions and all these different things. How are we ever gonna build a semantic layout of this? And I really I mean, I couldn't sleep. Like, my team wouldn't sleep. Like, we we promised them we and they were gonna go IPO and all these things. So now, you know, we connect Orion to the spaghetti web of things going on, and, man, it's having a field day. It's gone you know? It's like it's not perfect, of course, if you have a gigantic messy data. Right? If you have a beautiful semantic layer, man, you're in a really good spot. Right? You're golden, and it will be fantastic. If you don't, it is still really good. So we just, started with a bloodshot company this week where we connected and Orion did this initial report, you know, and it was phenomenal. They were blown away by the things it found and what it recommends using and doing really quickly. Right? It was over here. We, we actually discontinued this business. And so, like, just a couple couple clarifying questions,
and we're good to go. And that is, you know, within a span of even less than ten years. This is if you could have picked an example from just a couple years ago. And now it's just amazing to see this rapid change. To get to the second part of your question, I was ready for Drew to jump in on data model recommendations and where to store logic. So I didn't jump on it. Drew, do you want to take it?
Sure. Yeah. I can answer that question. So I think that now that we have a fundamentally unstructured approach to the world, to the digital world, we have a model that can reason over a whole bunch of inputs with an enormous context window that, to Lucas' point, we can now connect it to a customer who has, let's just say, a normal operating environment. Right? Because a normal operating environment has a whole bunch of stuff in it, and it sometimes looks messy. That's normal. And the model surprisingly can make correlations and
assumptions that are pretty good. Now you have to validate them. There's no doubt about it. And there's also, if you, for instance, just over index on some signal like query
patterns, let's just say, it doesn't necessarily tell you much about the unknown unknown opportunities in the business if you were to just learn based on what queries people are running. But I guess what we do is we absorb as much as we can through what we call ambient listening into both our graph database that runs underneath Orion as well as more traditional structured databases.
One of the most important things that we would tag as we learn about the environment that we're operating in is recency and frequency, which are fairly well understood concepts. And then just feedback, you know. So if we run-in a channel, if somebody is maybe issuing a correction, like, oh, don't use that field when you report to Bob because Bob needs to have the other field. That's an extremely important signal that will get filed away and cross referenced with the project
that the person is currently working on in Orion. So Orion has a concept of projects where you have assignments that run within it, all with their own context store and memory associated with it. And then when the system runs that analysis or a similar analysis next time, we'll find that the information about the preferences of the stakeholder has been recorded and then will be preserved for the next time around. And so it's hard to visualize this because it doesn't look like a traditional database, and it's, again, very it's it's in some ways, it looks kinda like the messy operating environment out there in the real world. We have a lot of work on retrieval that's going on. Our former Google engineers, of course, are really good at retrieval to make sure that we surface those facts and then bring them to bear when when they matter. I think that it's equally important to know which facts
no longer are relevant when you do this kind of work, and that's one of the things I would caution those folks who are trying to do this themselves to really think through because it's easy in the first few weeks to build a really compelling AI,
data agent. But as time goes on, the context that it collects becomes a technical debt of its own, And the management of that technical debt can be very challenging because if you remember that Bob actually only needs that field through the end of the first quarter, that's gonna lead to a very different and correct experience in the second quarter or the opposite, which is not gonna be what Bob expects.
In terms of the actual manifestation of the semantic layer, obviously, everybody has their own opinion about it, but I'm wondering what you see as some of the most effective ways of actually representing it and storing it, particularly given the fact that you want to expose it for retrieval by these agentic use cases.
Yeah. Text. Really, it's text. It's unbelievable what you can do with a frontier model that has access to simple Unix command line tools to interact with text. Now, of course, you need to tag the text with some additional attributes, for instance, like to do recency analysis.
But at the end of the day, text is is king. I mean, are large language models after all. I think what's really interesting is that the amount of research and development that's gone into embedding over the last couple years has been, in some ways, less important than I thought it would be because the model capabilities have just caught up so quickly.
And it used to be that you'd have to think through very carefully how you would embed chunks of unstructured text to make retrieval easier. But as it turns out, in the last six months, the models are so good that if you just give them a giant, essentially, data lake of of unstructured text and tell them a little bit about it, how it's organized,
the filing system that may have been at play when it was encoded into the storage bucket or whatever, it'll happily just crawl through it with grep, awk, sed, and all sorts of stuff to just manipulate what's in there. And in some cases, that would work even better than a complex embedding strategy. So I think that's pretty cool. So I would, I guess, wrap this point up just by saying let's lean into the strengths of the models, which is to manipulate language.
And in some ways, all of the harnesses that we've put around the models over the last two years, which we inevitably will try over
the course of a few weeks or quarters, We'll try one approach. It'll work well for a long time. The next model checkpoint comes out. We have to try something slightly different. So it's just been this, you know, this game of constant iteration for us, and I'm sure everyone else over the last couple years. But what we what we are finding is that as the models get better, you need less of that stuff. And I suppose that's the point that I'm making. The models themselves are getting pretty good at manipulating
a giant amount of text.
Yeah. And I think as a as a guide, right, as we so we're not advocating for one data model over another or, like, you know, should be where should you put things? I do think it's important that it should be accessible. You know, you want to make sure that you're not necessarily locked in now. And, I mean, I think that's one of the challenges, you know, and Looker has been criticized for that for forever, since the beginning. Right? You have LookML, and it's in LookML's proprietary.
Other there's some other BI tools that have their own proprietary data models. Like, why do that? Right? Like, you want various AIs to use it, like Orion being one of them. And so I think making it accessible. And then you can do cool things, like Orion is suggesting, you know, Orion is ambient
learning and always, like, always thinking about, hey, this is maybe not classified correctly, or this might be different. So it also reports back to the data team and lets them know that there might be some things you wanna change, and here's the suggested descriptions you might wanna add. Right? So you can then copy and paste that over if you if you decide it's correct. And I think that's interesting, right, where one of the this is, like, one of the big takes from the Looker the decade with Looker. Nobody ever ran their financial reporting through Looker. Like, no company out there ever ran their quarterly earnings report through Looker. And it's like, well, here we are priding ourselves with the greatest data model ever created, yet we're not doing that, you know, the very basic. So Drew pointed that out to me a couple months ago, and I'm a diehard. I was a diehard believer in data models, and I still think it's it's a good thing to do, but that
really stuck with me. So why is that the case? Well, because we don't keep it updated, right? We don't maintain it as much as we should potentially, and so let tools like Orion help you with that. And the best way to do it is by making it accessible, So it's not locked down with one specific vendor.
In terms of that self reinforcing improvement loop, there are a number of ways that that can factor in, one of them being populating and maintaining that semantic model as you get feedback and corrections. Another one that is gaining a lot of popularity recently is the idea of agentic memory, which can come in different formulations.
And I'm just wondering if you can talk through some of the elements of self improvement that you've built into the system to allow for that reinforcement cycle to be able to course correct and say, no. This wasn't correct. This analysis was made up, or you're generating data over here that doesn't actually belong and just some of the ways that you're fighting against some of the potential drift both in terms of model capabilities,
but also the semantic drift that is just the nature of businesses evolving.
Yeah. Let's let's paint a picture here for the audience. This is what, of course, I wish that we had the screen to share as well, but we'll try we'll try. So there's all sorts of little details in the UI that make that very easy. For instance, you can select text in the output or perhaps a section of a slide in the output, and there's a flyover that comes up that gives you a few options. You can ask Orion to explain where it sourced that particular
fact from, and that opens up essentially a full lineage. And so we really believe strongly that this should be a glass box, not a black box. And we are able to trace all the way back to the input data how that particular statement came about, and that would be traced through various transformations, potentially Python scripts, and any number of things. Right? So that's very helpful. And if it happens to be something that you wanna correct or make a note, potentially, the interpretation
is slightly different because you know something that the model doesn't. You can do that in a couple ways. You could just simply give it a thumbs up or a thumbs down. So that would be maybe the quick thing if you had to go run rush off to a meeting. Or if you had more time, you could just provide some specific feedback that would then get encoded into either the memory or potentially a wiki entry in our knowledge base that had to do with that particular part of that analysis.
And then the next time that that analysis runs or a similar analysis runs, that feedback is going to be incorporated. And so it gets compacted regularly with all of the feedback from all of the users who are using it to essentially learn dynamically based on how you interact with it in the user interface. And we spent a lot of time trying to polish that journey.
With the various points of input that you have, one of the other challenges when you're dealing with AI and agentic use cases is the risk of exploding the context budget or stuffing too much information into the context and running into the catastrophic forgetting problem.
And I'm wondering what are some of the failure modes that you have experienced as you have gone on this journey of trying to think that seven steps ahead and figure out where are we going, how can I build towards that eventual future that we all want to be in where the AI does all my work for me, and I could sit back and twiddle my thumbs and just some of the hard engineering or just failure modes that you've had to deal with in order to be able to make sure that the system is resilient and can
maybe even self identify when it's starting to go down a wrong path?
I think providing the input to the model about its own token usage is critical for those who are building these type these types of systems or playing with them on your own. You know, we also only show the beginning and the end, and then it depends on the file, but we have essentially an algorithm that determines how much
and which sections of the file will show to the model. And as I said earlier, because the models are so good at manipulating text, if they see something interesting, they have the tools to the to then go continue to follow that thread and potentially get the full file. But as they do that, they're receiving feedback about how much they're using in terms of tokens,
which will then guide their behavior. And that's very important because if you don't do that, they could make one tool call that then blows up the context window and then game over.
Yeah. There's a lot of
like, we we sometimes talk, you know, work with larger companies. So, like, well, we're also thinking of building something like this in house. And so that's when, you know, conversation like this is super helpful. Here, you have a team dedicated on thinking about which context when how to archive memory. Right? When is when is memory no longer needed? When is it needed again? Because this thing from six months ago is actually relevant now. All these different pieces that we had to build, you know, there's a team dedicated
here at Gravity for analytical use cases, right? So we often talk about build with Orion. Right, there is a lot to still do as a as a context engineer, a data engineer, an architect of data in your company, where, you know, you can put your IP onto Orion and into the context, you know, the different ways to manage context of the knowledge base in Orion, and then it handles
things like that for you. So we handle the accuracy, the checks, the multiple running it multiple times, checking on itself, evaluation frameworks, all of that, so that you can focus on, Okay, my executive team needs this, and they want to hear it in this kind of way. They like dashboards, or they like slide decks, right? And I want them to always include the action recommendations, like those kind of things. But you can focus on that, where the data should come from, additional data pipelines you want to add to it, and not on
I don't I don't even know it, Drew, anymore how many different agent groups we have running under the hood at this point.
With that proliferation of agents, there's also the question of how do you figure out what are the appropriate segmentations of context across those different agents, use cases, roles, and what are the what are the elements of shared context that need to be maintained as you continue to expand the level of agency of the overall system?
I think it's really important to remember that the context, as I said earlier, is spread between the facts about the data, which I think comes naturally to the folks who would be listening to us, and then facts about how the business operates. And so those facts generally would sit above and guide some of the activities that you would do within Orion
in a in a different way necessarily than, say, the definition of a particular table. And so I think I'll give an example that might help. In our world, the context about what somebody might ask Orion and why is just as important as the context about the data tables that are available to the system. And the reason for that is that we don't really think necessarily about one shot accuracy,
which is if you read a lot of the research about text to SQL and products like ours, most of the focus, at least in the academic world, about benchmarking happens to be given a set of inputs, can we produce the SQL that will produce the correct answer? And we don't really think that that's the way it works in, the world most of the time. It's really more of a dialogue. You need to provide the model the right inputs so that if it isn't absolutely
certain that it can get the right answer, it has the questions to go ask the user to get the information that it can ultimately give them what they're looking for. And the only way that it can ask those questions is if it knows a little bit about who the user is, how they work, what it is that they need to do in their day to day job,
and, again, the actions that they can take. And once you provide that, the models are smart enough that if the data context isn't perfect, if there's various, say, strategies to calculate a metric, you can create a dialogue. And that's really, really hard to measure. That's actually bit of a nightmare if you were to do that in an academic setting because it's multi turn. It's very it can potentially take an hour. And if you can dial that experience,
it actually doesn't matter that the model maybe didn't know the answer right at the very beginning and spit it out in one shot. Because what you've done is you've become a a virtual coworker, a very intelligent virtual coworker for that person who then sort of coached and guided them through the process of thinking through everything that they know about the business and the data to then ultimately get to an
output that they can get stand behind and then go advocate for, potentially in the next meeting with a customer or with their boss or with a partner organization or whatever it might be. And so I think that you know, the question really was about how do we decide whether to include different types of context in different settings. And so just to answer that directly, given what I just shared, it really is,
all about understanding what is it that that person needs from us right now in this moment. Did this happen because there's a calendar event on their calendar, and now they're engaging with us because,
they've got a meeting coming up and they're in a bit of a rush because they don't have the facts to share during that meeting? You know, that type of context would probably be the most important context of all in that interaction, not necessarily the the data table schema. And so, you know, there's not really a hard and fast rule so much as just we're really trying to model the workday of our users, and we're only just getting started with that. And I expect, you know, by 2027,
we'll be quite a bit further along, and we'll be providing a very magical experience for folks.
It's like yeah. I was just thinking of this example of of a retailer we're working with, like the global, where they
like, one of the team members has a question, what was revenue last week? You know? And and that's a that's a good question. And, of course, Orion can get the answer right away. But why is this person asking what was revenue last week? Right? The underlying is actually that there was a little bit of a drop from what was expected. And so Orion then asked why was that the case? Okay. South Korea was down. And in South Korea, actually specifically on Thursday, there was a drop that was significant. And then Orion can say, You know what? I want to overlay weather data and actually see if the retail stores were affected by particularly
bad weather in South Korea on Thursday. Orion goes through all these thoughts and questions to ask, data to pull in, to then present back the answer, and I think that is really powerful. If you had an analyst that had forty hours or so available, they might have been able to do that, but now you have a model that is actually fully aware of the context
and the data sources available and the business and what it does, and it's able to do all that for you just by, okay, this person has a question about revenue last week. Let me give them some responses that are even much deeper than what they were asking for. And it really equips that person to then, if they go get challenged on their finding, to say, well, well, look. Here here's the set of assumptions that I made and the questions that I asked throughout this process of analytical inquiry. And I think then that will build trust faster than,
of course, just the single correct answer. Quite frankly, if you get a single correct answer back without going through the process that Lucas has described, I I think you're doing a disservice to your to your workplace. You know, there's so much more that can be done and should be done than just getting getting those one shot answers back from a language model. That's not so that's not not at all what we're building here.
Other elements that have been perennial in this overall context of building data warehouses and analytic use cases is the idea of master data management for one. And one of the key elements of that or at least a way of approaching that has also been the idea of building a organizational knowledge graph in order to help with some of that entity resolution
of those conflicting records or fuzzy matches of things like that. And I'm wondering how you're seeing those practices get factored into the ways that you're thinking about the Orion product as well and just additional ways to converge on a correct representation of the business and its clients?
I'll take the first crack at that. I think that there's a lot of very well intentioned master data management projects out there, and there's been a ton of great work done in that governance space. I am not aware of any that have, let's just say, finished because I think it's an ongoing process, of course. Right? This is work that's never done. I think that given that you can use large language models as universal function approximators and they're really flexible classifiers that it's never been easier to sort of apply
the rules that you would want to apply to cleanse data to get it to fit whatever domain rules you specified. So although that's not what we're building, I'm very excited about the opportunity to make a lot of the pain go away because we can use the language models to sort of apply the same common sense judgment that a human would apply when we're correcting inputs and rationalizing them. So I think that there's certainly a a place
for it. I think that the old heuristic based or rule based approach
is probably not something that I would bet on as we go forward. I think that it is even if the data isn't clean at this time, you know, if I say share an extract with you and it's sales orders and there are some folks on that list multiple times with different spellings or perhaps different capitalization, If I just put that into a large language model today, a good one, actually, it will be smart enough to do a minimum amount of deduplication and inquiry to make sure that it produces
an answer that's right. I'm not suggesting that it's magic, that it will figure it all out, but I think that there's less necessary there's less value to get it all perfectly governed up front when we have the ability to work in a more flexible way just like a person would. Again, not that I would bet on that, but I think it's a very interesting space to watch. I think everything has changed. Data governance and data catalog management absolutely
will change as a result of what we're discussing today.
I mean, I think that's good news for all of us. I'd like there is pretty much no organization I've ever come across who well, okay, maybe one or two, who has it perfect.
Perfect is, you know, the the goal that you never obtain and you always try to get to and, you know, you complain to leadership ship about resources and and all these other things. And so it's it's a solid foundation. I mean, are good investments to make. Clean data is better than messy data, but messy data won't prevent you. And I think that is that's very exciting. So, the foundations you're building are valuable.
The, you know, the explanations you put together, the definition of what good life good looks like is is helpful. And so anything you do will get you to better outcomes, but it's not necessary to be perfect to get great outcomes with the technology we have now here.
Yeah. No waiting until it's perfect. Right? I think that's a key takeaway. If anybody is waiting at this point to try experimenting with the technology that we're talking about, stop waiting. Go do it.
Yeah. One of the top reasons why people like back, you know, back in our Looker days, the number one reason why people wouldn't start with us would be, hey, we're doing a data warehouse migration right You know, I think that answer of like, we're migrating, you know, we're moving because the old house was messy, so moving to a new house in hopes that the new house is not going to be messy because of the move, you know, that is not really necessary anymore as an as an excuse to stop getting the business the data they need to make real decisions and take actions. I think that that little bridge of seeing data and then because I've seen very good companies with great dashboards, where people still not took data driven actions. So as we focus on getting you from dashboards to action, you take, but I think that's much more interesting because then you really realize the value. There's so many times where, you know, we build out an amazing data warehouse. It took us a couple of years to make it really great, but you actually never got the business to change their behaviors. You never got account managers to be prepared in meetings with all the data they needed to get an upsell positioned. That
is a conclusion of the data journey, in essence, that we stayed away from so many times because the data warehouse wasn't quite ready yet, and that excuse no longer really exists. Now we can really shine with taking the data from being there, fully analyzed, visualized, understood, to the actions you should be taking and the ROI we can actually track.
Another interesting element of where we are at today is that the warehouse is not the only repository of information and knowledge that businesses are relying on. Not that it really ever was, but that was the ideal, which I think we've probably largely given up on ever fully realizing. It's it's that asymptotic approach.
And how does that shape the ways that you also think about the types of data that you're working with, particularly as LLMs unlock more ability to work with unstructured sources and convert them into structured useful information that can be joined or enriched by that warehouse data or for organizations that have multiple different warehouses because they have different suborganizations that need to have their own scope and control, etcetera.
Mhmm. I'll just jump right in because it's one of the first things we built. So the ability to upload documents and data and bring your own data was extremely important to us. It was one of the feature requests that Looker had forever and I think recently has finally served. But if you have a spreadsheet that you had locally, say, from an event, maybe it has information about folks who came to a local event that you wanna then match back to your CRM,
that was a real pain in the butt if you had a centrally governed data model that you would have to get in line, open a ticket, ask for that data to be modeled, and then it would be a couple weeks, maybe, you know, maybe longer, and then you'd get the answers you were looking for. So one of the first things that we built was the ability to just upload data or unstructured documents
to then inform the analysis. So that could be, as I said, a spreadsheet with with names on it. It could be a slide deck with the goals for the quarter for that organization or any number of of
different type of files like PDFs or Word documents. Right? So that was that was important. And then also, we made it possible to connect to multiple governed data sources and then mix them to do analysis across them. And so if you have half of your data sitting in an older warehouse, an incumbent warehouse, and then maybe a a partially completed migration to a modern cloud data warehouse, we can connect to both of those sources,
ingest the data that we need from either place, and then produce a finished analysis so that you're not blocked by the state of the migration or where the data sits. So that's my take. Lucas looked like he got pretty excited. So why don't we give Lucas an opportunity to answer the question too?
Yeah. So one of the like, we just we just raised a big round. We need to announce it, but it's it's already in the cash is in the bank. But one of the questions I was asked a lot was like, well, how, you know, how how do you differentiate from all these databases adding AI capabilities in the database? Right? And, I mean, I think if you are in the console, right, if you are in the database every day, and that's all you want to do, like, totally fine approach, right? They're probably quite capable. I mean, I tested some. They are quite capable within that realm of you know the questions to ask, and you're comfortable in that environment and that UI. But when does
a business user Back to the example of weather data in South Korea to cross reference if that had an impact on your retail store performance. When was that data ever all present in that one database?
With Orion, one of the things we built is that you don't necessarily need to join all the data. We don't need to actually have a hard join on the weather data that Orion has access to with the data in your data warehouse. We have the context as in PDFs. The goals for the quarter are actually in a slide deck that Orion pulled in. It can do the analysis across the slide deck, across the PDF, across the weather data it can independently access
and the data in your warehouse. And that's where some beautiful things can happen. And it doesn't need to join it on a specific key, right? The agents actually have a conversation, where one agent talks about the goals for the quarter, another agent talks about the weather data it has access to, and another agent actually accesses your data warehouse. So I'm very excited for, you know, of course, for our product, but also products that are not locked in with one specific data source, one specific, you know, data silo, so to speak, because there's, I think, a lot you can bring in beyond
what is in your central data warehouse.
As the models themselves start to be repositories of knowledge, particularly if you're doing things like fine tuning, maybe even outside the bounds of Orion specifically, but when you're dealing with organizational agents, how does that also factor into some of the ways that you need to start thinking about the boundaries of Orion and ways to interface with these other intelligent systems that organizations are building either for internal use or for exposing to their customers?
Yeah. One of the I think one of the challenges that we have, and I think, you know, other companies as well, but we have a very strong, like, glass box focus. Right? We wanna be able to for you to see what Orion did, and all the different steps, and the different piece of memory that it has, and the knowledge base it uses, so you have full control as the administrator and visibility into what's happening. There's no black box here. But then on the flip side, we don't want it to feel overwhelming.
Right? And we don't want it to feel like this is just a lot to like, is not a space shuttle with a ton of knobs. Right? So how do you handle this balance of giving you full visibility into all the things Orion is doing and thinking about and is aware of, but then also condensing it back down so you can say, okay, these pieces of memory are now no longer needed or this understanding. So what we had to build is, of course, self reflection within Orion so it can surface things where it now believes, okay, I think this memory should be deleted. Because what John said three weeks ago might no no longer be relevant. But I think that, you know, that goes a lot into architecting that kind of system.
Tobias, I think you were getting at something else too, which is just how Orion interacts with other bots that might have a bit of agency in the workplace. And that's something that I'm absolutely fascinated by, and I look forward to exploring with our customers. Internally,
we have a few autonomous bots that operate in our Google Workspace as if they were people. I'm not aware of, you know, certainly probably Anthropic and OpenAI of anyone else who has the level of autonomy in their organization as as we do. I'd love to have people reach out to me if you are operating similar bots that have access to everything that your humans do. Orion interfaces with those bots. And so we're right on the very cutting edge of this experience,
and we're learning a lot about bot to bot communication and how to essentially prompt our products agents
to understand that they might in fact be interacting with something that isn't a human. And, of course, that can go both very it can be very interesting, and it can also go into a spiral of of who knows what. Right? It can be quite amusing as well. We haven't yet had the opportunity to do that inside of a customer just because this is so new. I don't think there's very many people out there with truly autonomous agents running around like a, say, an OpenClaw type agent that is running with a heartbeat and can do whatever it wants. Will they be present in in another year or two? Of course. And, I think that there's so much that needs to be done to make sure that that experience is helpful and safe and governed and, as Lucas said, visible, really, at the very least,
visible and observable so that we can tune it, at least the things within our control, which is our own product. But there's gonna be so much outside of our control, people's AI chiefs of staff, the note takers that are now becoming more than note takers, the, you know, proliferation
of AI powered products, all of which might be talking to each other here in the in the very near future. That's boy, that's a fascinating and difficult problem, you know, for those of you in software engineering who are wondering what to do with the amazing skills that you've got, you know, go solve that because it's not gonna be boring. I'll tell you that much.
Yeah. It's a really interesting I mean, it's fascinating to be in the space right now and and and working in this way. Like, Orion talks to a meeting notetaker and is asked, like, hey. Was there anything interesting in in this meeting that might be helpful for, you know, analysis that I'm doing for Lucas or for Drew? But it's not agent to agent everywhere yet. Right? And then there are some guardrails, so we often pull the transcripts, and then Orion looks at it or, you know, the same Google Drive. Right? Could it talk to Gemini and just get, hey. Who is Lucas? Right? I'm onboarding this new user. What do I need to know about him? What does he care about? Which tone does he like? You can do much more accelerated onboarding experiences, and I think we're gonna be there soon. It's just everybody has to figure out the right guardrails and how it moves. Right now, we are doing a lot of this with just traditional API calls, but
everybody's talking about agent to agent, and we have to make sure we have the right guardrails in place. Everybody else is pretty protective, though, of their agent to agent communication as well. So there's not a lot, you know, that another agent that just was onboarded can ask from your existing agents. Right? Like Slack, for example, is a wealth of information, and it's amazing when you have Orion in there as an ambient listening agent to learn of what does his team care about right now, how does this change my analysis
patterns and what I focus on. But the first time it comes in, it can't just pull the entire history and get a condensed version.
I am also interested in the ways that the introduction of autonomy changes the landscape of what are those trusted repositories of knowledge and how do we think about the role of data warehouses and the ways that they are hydrated and populated as the data continues to increase the pace and diversity of of where it comes from and what we need to keep track of in order to be able to actually run our businesses?
I I think one of the things that you would find is that if you look at the queries that are actually made against most of the data warehouses out there, you would probably see a, you know, Pareto distribution just because we see that everywhere. But that's to say, probably a good portion of it doesn't get used very frequently. And I think that humans are very reluctant to go make changes to systems that are already in place that they don't necessarily own.
If somebody inherits a data warehouse, the last thing you're gonna go do most of the time is delete a whole bunch of tables in your first thirty days because who knows what impact that might have on your career. Now there's tools to help us do a better job of that, but it's still pretty scary. I think that an AI agent is not encumbered by some of the social
norms or worries that might cloud the decision like that. And so, you know, this is something we're gonna have to be careful with because I mean, not that I'm suggesting anybody give an agent the ability to manage their data warehouse, at least not yet. But it might point out right away, well, this has actually not been used for several quarters now, and you should do something about it. I mean, one of the toughest jobs I had while running the field engineering team at Looker was cleaning all this stuff up. I mean, everybody knows that, but it's pretty quick to do it if you just get the information about what data is in use and then make some, let's just say, broad, you know, cross cutting decisions with conviction. So, you know, I would assume that our AI agents are gonna be suggesting some things that might make us uncomfortable over the next couple years, that might challenge the value of the work that we've done or might make it necessary
to to make a decision like, well, you know, maybe we should clean this up and get rid of this. Maybe it would be more valuable to the business because it's there. It's all there in the logs, and, it might not be as reluctant to make those types of recommendations. So, it's just a little bit of a prediction. I'm not sure, again, that we should ever allow agents to have this role in the workplace, particularly in mission critical data systems. But somebody's gonna do it, and then we're gonna see this type of interesting behavior. And I'm sure we'll be reading about it on Twitter.
And as you have been building this business and product and exploring this overall space of agentic capabilities, what are some of the most interesting or innovative or unexpected ways that you've seen Orion applied?
Yeah. I think we'll have to so the reason we're hesitating, Tobias, is we we can't talk about some of our most interesting use cases, and that will change as we release some of the success stories associated with them. But, you know, generally speaking, I can tell you that we've had multiple public companies adopt the system and have business users who are very excited by its capability.
And when we have a case study that's available publicly, of course, you can go check our website and read all about it.
It's I can maybe without naming anyone, right, like, there's a couple there's a couple of fascinating ones. Like, we did this POC, and then it became quickly a a customer, a publicly traded company, and they found like, Orion found something during the POC that actually was follow-up items from the last board meeting, where it came up during the board meeting and the board members decided this requires further investigation, and Orion found it and actually presented it to them.
They bought very quickly because now they can use it to prepare for the next one. But that's where I think the ability to ask some of the not asked questions, you know, beyond the specific, you know, job description that everybody in the organization has right now and the specific things they're looking for, it can find some of these anomalies.
And if you let it, right, like not everybody wants to, but some of our customers let it go and just explore what else can you find for me. You can get some quite interesting results because it's always curious, right? It's very aware of your business. There are fascinating
like we have a company where the our stakeholder had to speak at Capitol Hill, and he was about to go on the plane to speak there the next morning. One of the competitors in the field had released this research paper just before he got on. And so he he actually took the entire research paper, put it into Orion, and said, Look at our own data and help me prepare
supportive points or counterpoints based on the market data we have available. When he landed, he had the entire report with all the data that he had access to, that the organization has access to, and he was able to speak very intelligently about this competitive research paper that was published hours later. I
didn't think about that. I didn't think about uploading a giant PDF into Orion. We thought about spreadsheets and a couple of QBR decks or something. It is, I mean, it is fascinating to talk to customers and see how they then decided to use it. It's interesting to see some of the executives feeling a bit more empowered. Right? And actually, I want unfiltered questions on, unfiltered answers on this. You know, I just wanna understand my business a bit more before I go into the executive meeting on Monday morning. So it's really interesting and good to see, you know, people actually interacting a bit more with the data and also being vulnerable. Right? I think that's one of the things that was always challenging for a leader in the past where do you really go to your, you know, head of data and tell him, hey. I don't quite understand how we how we calculate and think about, you know, out of stock inventory and lost revenue.
While you're the CEO of the company, right, do you really you really have that vulnerability right now? Where, you know, with Orion, they do ask those questions, and they they get answers to those. And I think that is nice to see, right, to have people to be more be more informed and actually not just pretend they understood what was on the dashboard.
Yeah. That's a good point. The dialogue that folks have with AIs like Orion is can be very intimate and and ultimately very enriching.
Right? We can see people getting better all the time through those interactions. One of the most rewarding things for me lately is we're working with a global footwear brand. And just in the first day that we connected to their Looker instance, we were able to tell them because of the data that Looker generates, because, of course, Looker has data about how the product itself is being used, we were able to tell them how their business users were using the data products that they'd made available and shine some new light on all the hard work that they put into that deployment and that implementation. And so that's maybe
it's something that we do with all of our Looker customers, but it's not necessarily the thing that you think of first. And it's very validating for that team, particularly because they can see, wow, we've invested so much into this product than it is being used in all sorts of interesting ways that maybe we hadn't have imagined when we built the Explorers in the first place. And so that's something that, I guess, I would emphasize because it's not necessarily just about the data that's in the warehouse
that's meant to serve the business users. There's a lot of data that can serve the data team as well. And sometimes you gotta start there because looking at your own data about how how effective you are in the business and which problems you give really great or sorry, which questions you give great answers to, where you get further engagement, maybe where you leave people behind or people get lost. All that's available in the rich metadata that comes from some of the incumbent products like Looker. And so that's, something that, I think everybody should be doing is making great use of all the data available to them, which might not have been something that you would, be willing to spend time on yourself because it's not in your ticket queue.
And as you look at the competitive landscape as more businesses and teams start to build offerings that are applying these agentic capabilities to analytics use cases? What are some of the ways that you think about your differentiating factors and maybe some of the situations where you would advise against using Orion?
Yeah, I think we come from the Looker background. I was head of data for ten years before that. So we really come from that perspective of how can I set up an environment for my business users that they can serve themselves and how can I do that with guardrails and governance and the right context at the right time? So that is what we focus on enterprise deployments. Think if you're small company,
you know, you're a couple of people and everything is in your database, maybe just give them all access to Databricks Genie or Cortex on Snowflake or something, you might not need, you know yeah. I mean, smaller companies also sometimes don't necessarily need the most optimized,
you know, x, y, and z. Some of the data problems are much more simple. We're definitely focused on larger companies. That said, right, I think there's you don't, like a large company can be not that many employees. So it's, we have a we have a broad spectrum there. Yeah, we're focused on on governance governance layers to control the experience for your business users and then getting it to them in the output they want. So if it's dashboard, slide decks, or, you know, something to read, all of that is definitely possible. There are a lot of options out there. Right? There's a lot of legacy tools that are trying to pivot, adding AI bots to their tooling. Right? They are, of course, limited in what they can do because they have tech debt to some extent, but also a lot of existing customers that are used to using the product in a certain way. So you don't necessarily get the most interesting or, you know,
a novel way of doing things. You're you're stuck in the last decade. So I would be I I would say be a bit open minded. Like, one of the things Orion does with this almost ephemeral dashboard is that they don't need to be there forever. Right? They need to evolve, and the story needs to evolve as the business changes. So if you are a lover of very detailed dashboards that you build over weeks,
know, then Orion is probably not the right tool for you because you can test it out yourself. But it's like it's a different way of doing things. It's more of a conversation and ongoing, you know, a changing story. As you as you impact changing your business,
so should the analysis change slightly. Of course, the KPIs should stay the same, but underneath, right, things are changing all the time. And so we built a product that can be very fluid with you. And make it possible to not have to go look at the dashboard just to underscore. We'll look at it for you. We'll tell you when you need to go look at it.
And as you continue to iterate on and expand the capabilities explore the edges of what is possible with these AI powered systems? What are some of the things you're excited to explore or any new features or problem areas that you're digging into?
Yeah. I'm very I'm very passionate about connecting data to action. I like doing something with it. And so one of the first things Orion does is, like, it gives a slide deck to the account manager so they're prepared when they go in the meeting with the business. Right? So they with their customers. So they have everything ready. They have all the data points they need, and it's beautifully done. That is a very simple action. There are much more sophisticated actions, you know, that we optimize here, make sure we don't get run out of inventory or help order more inventory potentially, right? Like data really sits, and that's why everybody's job here that is listening is very secure.
Data sits at the heart of the organization, and data driven decision making is as well. As you now have a tool that allows you to connect the insights to the actions to take, like you are at the center of the organization as well. And I think that's a really interesting future to be in. So, you know, we have scenarios where like, hey, I wanna use Orion to optimize my AI spend. That's an interesting scenario, right? Like, we internally use Orion to optimize Orion.
Like, we have cut our cloud cost bill down by 60% because Orion constantly looks at inefficiencies and optimizes it for us, right? So it can take actions on your behalf, of course, controlled and governed. Right? And so that sits at the center of what we built.
Are there any other aspects of this overall space of context engineering for agentic analytics or the work that you're doing at Orion that we didn't discuss yet that you'd like to cover before we close out the show?
I think I would just say aspirationally, like, our goal is to be a trusted colleague to you in your organization. We're getting there very quickly. In the next few years, we're all gonna be working side by side with AI coworkers.
And I think that, you know, my vision is that Orion will become an entity that knows much more than any one individual person at your company and an entity that you can trust, just like a trusted colleague who who you've worked with for ten years. So there's not necessarily a list of features associated with that. It's much more than that. It's much deeper than that. But the signs that we've seen over the last two years really do point to that that we will achieve that, no doubt, in the next few years, and this is coming now. And it's extremely exciting time. And everybody here is gonna have a role to play in figuring out how to usher this new world into the workplace.
Yeah. Second that trust is at the center. Like we started the podcast with the topic of trust. And I think that's know, it has been our focus the whole time. It's gonna continue to be our focus. And I think that anyone anyone out there, we are
building systems here that need to be trusted and first and foremost. You know, it's not about the flashy features per se. They're cool and they're great, and we we have them as well. But trust is at the center of it all. That needs to be the first the first thing we have. For anybody who wants to get in touch with the both of you and follow along with the work that you and your teams are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management and AI systems today.
You know, the biggest gap, unfortunately, is is the speed of our existing data infrastructure. The speed of thought is is quite a bit faster than the speed of the average database. And so it's not necessarily a gap that we don't know what
will fill it. I think it's a very obvious gap. But if you, your organization, are gonna deploy the technology that we're talking about on top of your legacy database systems that can take minutes to, you know, and potentially ten or twenty minutes to return an answer. Unfortunately, I wouldn't have high hopes for the success of that initiative.
And so rather than, you know, suggesting we need to invent things that haven't been invented yet, I think that if you're in a place of influence in your organization to go solve that problem, that you'll really pave the way to get the value that you're looking for out of the, new AI tools.
Alright. Well, thank you both very much for taking the time today to join me. I appreciate all of the time and effort that you're putting into harnessing these generative capabilities into helping organizations operate more effectively and efficiently. It's definitely a very fascinating space to be working in, and I wish you the best as you continue to push the boundaries of what you're able to do with that. So thank you again for taking the time, and I hope you enjoy the rest of your day.
Thank you so much for having us. Thank you very much.
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used. And the AI engineering podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@dataengineeringpodcast.com
with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
