Data Acquisitions, Designing Agents vs. Software, and Metadata and Semantic Model for AI - podcast episode cover

Data Acquisitions, Designing Agents vs. Software, and Metadata and Semantic Model for AI

Jun 04, 202556 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Robert and Joseph delve into a flurry of recent data and AI acquisitions, including Databricks' purchase of Neon, Hex acquiring Hashboard, and moves by Informatica and Datadog. They discuss how AI is fundamentally reshaping product strategy, leading to consolidation and a shift towards agent-native software over human-centric GUIs. The conversation also critiques the limitations of current semantic layers and data catalogs for AI, proposing new approaches, and concludes with an analysis of the difficulties investors face in navigating the rapidly expanding, yet undifferentiated, data+AI market.

Episode description

00:00:28 Recent Acquisitions and Strategic Moves in the Data Space00:06:27 AI and Product Strategy Shift 00:17:10 Consolidation in BI and Data Catalogs 00:36:35 The Role of Semantic Layers and Knowledge Graphs in AI Analytics 00:46:29 Challenges in Investing in Data + AI Companies 00:55:45 Future Topics



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit agenticdata.substack.com

Transcript

Intro / Opening

If the demand for data insights like that kind of Q&A went up by 10x or 100x or even 1000x, can we possibly rely on this manual way of mining for insights from your data warehouse?

Welcome and Recent Acquisitions

Welcome back, Robert. Welcome to episode, episode two. Yes, I'm very excited. Should we get right into it? Yeah, let's get into it. All right. So a lot of stuff has happened this week. I feel like it was... Acquisition week, no? The past few weeks, yeah. Past three, three, four weeks. What, Neon and Databricks, Hex acquiring Hashboard, Informatica, Data World, Number Station.

What else? I need to check. Some fundraisers. Some fundraisers, yeah. Like Kyle's Hex. Should we just go through them? Yeah.

Databricks Acquires Neon: Agentic Play

All right. Tell me about, I mean, since we're somewhat proximate to the Neon folks, what happened with Neon? What happened with data rigs? Also, I want to know why it makes sense because I'm reading this and I'm like...

My first reaction is this is bizarre, but I mean, I guess I can see it, but I just haven't tied together all the pieces yet. But I'm sure you have your own take on this. Yeah, so we got... the chance to work with Nikita Shangunov, who is currently the CEO of Neon, back when he was still a partner at Koza Ventures. We work with Nikita pretty closely during our hyper query days and Nikita has been amazing to us and gave us a lot of great feedback and helped us grow the company.

Nothing but good things to say about Nikita, but I think this acquisition news was quite surprising. And at a first glance, it seemed... kind of weird that databricks is acquiring an OLTP company but I think upon you know a second look it might make more sense and it is making more sense to me and so Databricks, you know, being one of the canonical data companies in our space, I think it has made a few really good acquisitions in the past, say, year or so or year or two.

And it's clear that Databricks wants to move away from just purely being this data lake house company into being a data plus AI company. And part of that was through acquiring. mosaic ml which allowed it to go into the foundation model and lm model territory and so that that really signified its its movement towards that space and i think um this is sort of maybe act two

to that movement, that chess move, a series of chess moves. And Neon being a serverless Postgres provider, I think it really is cementing itself as... one of the de facto providers of serverless Postgres for AI agents. So I think what's happening is there are a few vibe coding and AI based agents.

building building applications like first cell has v0 bolt bolt.new and lovable.dev these services allow folks to easily build applications using pure English and just general generic software applications and I think these applications need a way to easily acquire databases to use and I think Neon is extremely well positioned for

for one of the de facto ways to get an agent to spin up a new Postgres instance and start using it as a production-grade database. And the fact that Neon is serverless, I think it allows... one too and one could be like a human or an agent to really rapidly scale up and scale down these instances so i think it's the architecture is perfectly suited for this purpose you know so i think

I think it's really in the thesis of, in an agentic world, what infrastructure is necessary for you to really capture a lot of the value here. And there is definitely a data play here and a serverless Postgres that definitely fits in this thesis. Yeah, that's interesting. I hadn't thought of it that way. I just have my immediate reaction to this news. I'm reading it.

My thought was just like, not even the vibe coding side of things, but there's this parallel between OLTP and OLAP. I don't have a good sense of what...

AI Driving Market Consolidation, Product Shifts

it looks like to build agents against Postgres because that's not really what we do. But I guess I'm getting to, you know how like OLTP, OLTP has ORMs, right? Yeah. And OLAP has a semantic layer and they're not, they're not like that different. I guess abstractly they're about the same thing. Yeah. Like they provide this like gated programmatic way of interfacing with

a fully functional Turing complete SQL database, right? And so I guess I'm wondering, I'm just kind of going off the cuff here, Joseph, but like Databricks...

It feels like there's something there. Databricks going into OLTP feels like it's an expansion into a unified... consolidation play against like both because an agent, maybe you want the agent to be able to both access data and then also access transactions and then having like a unified interface against both would make a lot of sense to me. But this might be not at all what they're thinking, but I can't help but see some sort of parallels here. Who knows what they're actually thinking?

And to some degree, you are making some shots in the dark here. And everybody, right? Like nobody really knows how the market is going to evolve. But you still need to make your chess moves.

so it does seem like that and i think you know even snowflake snowflake is going into more transactional data processing as well so i think once companies companies grow with a singular wedge and then they they expand their wedge to multiple offerings and and uh you know if you if you are starting to tap out of the the olap data space then naturally you want to

move contiguously to both OLTP and also into something that is contiguous in a, in a different direction. Like for example, like AI agents, I think last episode you said, um, AI people are essentially data people just doing slightly different things, right? And if that's the case, then, you know, I think going, buying Neon is sort of a move in both directions. It's a direction in the AI direction.

and also a direction in the OLTP direction. So it's a pretty good strategic move if you think about it from that perspective, from an optionality perspective. You know, this makes me wonder, like there's... There's so much bundling and consolidation that seems to be happening as a result of AI. You know, last week we talked with Madison Faulkner from NEA, that Tobiko meetup, right?

I remember she brought up this interesting point, and this is not related to OLTP, OLAP, but she was purporting that there would be some sort of consolidation of like ML and analytics, finally, right? Yeah, yeah, yeah. It's interesting because I see this theme of like bundling and consolidation seem to be easier with AI or like perpetuated by AI or something. But I mean, you're getting at one of the, one of the, one of the.

canonical directions that that people can go into to eat more of the market but i wonder if there's more of like a systemic dynamic that exists probably is a systemic dynamic i think um people i think Folks thought about things more naively back a few years ago where if you said data infrastructure is consolidating, then you just thought, okay, Snowflake is going to do both OLAP and OLTP.

in one database. That's like the naive way of thinking about it. Um, and then the same thing with data breaks, right? Um, like, okay, you have a lake house, so you should be able to do both, like not only structured data, but also unstructured data, but.

But not only OLAP queries, but also OLTP queries. That's like sort of the more naive way to think about it. Just like, let's just expand the scope of the database, which happens all the time, right? That's just a natural progression of a database company.

The non-naive, the more sort of novel way that this has panned out is that AI is becoming this new interface for humans to interact with. And if the interface changes, then the underlying... abstraction could change as well so um so if the human was interacting directly with a database then as a human um i guess it makes more sense in some

in some scenarios that like one database does everything. But if you're an AI agent and if the human was interacting with a data agent, with an AI agent, and I don't really care if... this ai agent uses two systems because it's just going to handle things for me so uh the ai agent using snowflake for everything versus it uses snowflake plus neon for some other things doesn't really matter for the human as much so i think

Hex Acquires Hashboard: BI Evolution

having this abstraction in between the human and the data systems allows a company to actually expand and it doesn't have to generally be continuous contiguous with your product offering i think so i think this is what's one of the things that's happening look like um previously you had sort of

One of the canonical ways to build a SaaS company is you just have one product and the product starts to grow and it just becomes like a monstrosity. But I think what's happening right now in the market is you have a lab or a product lab.

uh you know you have a company and then you have multiple offerings you have multiple offerings of different models you have basically different skews and that's strange right because that's like um contrary to everything that the sas SaaS boom has taught us which is just build one just be fake much is just build one figma build like one notion and that's that's going to be it for your entire company but now it's okay you have this

multiple skew world and that has a lot to do with the fact that i think with ai agents as one of the interfaces then it doesn't really matter for the end user whether you have many multiple skews or or a single skew because the agent is interacting with the software And so from that perspective, I think the psychological difference between humans using software and humans using agents and then the agents using the software is a profound change. And it's something that you alluded to.

last episode when you said you need the data systems to actually be built natively for agents and not retrofit things that are that have been built for humans and then just like add like an agent on top of it i think it's a it's a similar concept here so interesting because i mean i never thought about it in the way that you articulate it now but like i guess you can really hop skip a lot further and i'm wondering if like all of our intuition around product building is

maybe wrong in some sense because of this because like the the unifying thread between skews is now the lm whereas all of our intuition is built on like the unifying thread being like i don't know product proximity or user journey. I don't even know how to articulate this, but there does seem to be that shift and it seems like traditional product thinking or design knowledge must.

have some sort of reckoning as a result. Yeah, probably, probably. I think, I mean, at every platform ship, this probably happens. Like we're a little bit too young to have experienced the internet boom. But we did experience the mobile boom. And we were still a little bit too young to be. We weren't really not in the workforce fully when that transition was happening. Like I was still in high school.

But I would imagine folks building on the web or for desktop had to really quickly realize, okay, mobile is its own thing. And in particular, for mobile...

Having an application that had fewer features but had higher virality made a lot more sense than a web application that was more fully fledged. So from that perspective... a lot of web companies and and like desktop software companies probably thought like this this this app thing is a joke this is like a toy and then uh you know they they got their their lunch just um

eaten away by these app companies so from that perspective of the same thing if i were the sort of the web web-based data companies like software companies then i would think okay these fucking idiots running around these ai agents don't have A lot of the GUI capabilities that a traditional data system has. So these companies are just going to get destroyed. But maybe the counter is like, no, this is a new paradigm. And what matters is.

What matters is that the agent knows how to do a lot of different things rather than it's very specific for a particular GUI-based workflow. I think that's one of the things that we're thinking through. And especially because I think... There is a scaling law in the sense of if you talk to researchers from OpenAI or Anthropic, I think one very common thread exists, which is you would think that building...

You would think that building sort of more like an expert system that is really good at doing one task at coding, that would outperform a more generalized model that just has many, many more parameters, but it just knows how to do many more things. But it does seem like in actuality.

Data Catalog Acquisitions: Metadata for AI

in terms of performance, of accuracy, not speed, but from an accuracy standpoint, it does seem like having a massive and general neural net outperforms more specialized neural nets, which is completely contrary to human intuition. And from that perspective, then if you think about it, then like, of course, an agent that knows how to do more and more things is in actuality, the better agents. So you have to build your agents that way.

And so that's like very, very different from software building because software building is like, you need to do this like thing very, very well in this niche. And then, um, and then if you, if you put too many features inside, then it.

the the product becomes a monstrosity right but that's just because human the human brain can't take too much uh gooey complexity because of the limitations of our brains but if you build um software for agents so let's just call them like headless software like without much gui like just like it's a bunch of api calls and the agent knows how to handle this complexity pretty well so

like having more uh capabilities is is actually better than not and then building things like beautifully with with the gui right so i think from that perspective um the sort of the uh the the best advice for building product might get shifted. Yeah, I guess I can see both the paradigm shift from akin to mobile coming out in the user experience. Now you have

I never thought about that way. I never thought about how it was so dramatic that now you can just talk to a thing. Right now, it just seems like a toy in some ways. But of course, that's as profound a shift as going to mobile. But then on top of the... That changes also the shift. I mean, it's similar to the shift in building products that are physical versus building them on a computer, right?

You just have to think about it drastically differently. There are different constraints, there are different scaling laws, there are different dynamics that govern what makes an object successful. But yeah, I guess with this in mind, I'm... Maybe we should talk through some of the other acquisitions. I feel like there's a theme here. I can't quite tease apart what it is yet, Joseph.

I feel like if we talk about enough things in the same way that LLMs will do better if you just give them more data, maybe if we just keep talking about all the acquisitions, we'll get to something interesting. Yeah, sure. So what else? Hex. Hex acquired Hashboard and they raised their Series C. And you had this thesis around this, around everything becoming BI, which we've talked about before. Yeah, I think everything becoming BI is really Ben Stansel's bull.

blog post. But of course, this is not a new idea, but that phrase is extremely catchy. This one is a consolidation play. And of course, we were in the data notebook space, so we have some views here. What we have found is that data notebooks, even though they're amazing, being able to do exploratory analytics with SQL and Python in a notebook-like shape, the market for that is not...

not super large, just because it's due to a simple fact that there are not that many people who want to use a SQL and Python notebook. It's as simple as that. From that perspective, You're going to run out of market at some point. And then from that perspective, you need to start, uh, expanding your, your workflows, expanding your job to be done in a company. And of course you're going to, if you're a notebook company, you're going to.

you're going to come across situations in which you naturally want more BI-like functionality inside of your tool, including say like a semantic layer, including the ability to drag and drop. components to to make dashboards and data applications and things like that and really allowing for non-technical users to actually do some self-serving and things like that as well and we've also seen the story play out for Periscope data and

and mode if you if you saw these companies um so like mode and periscope data they both started out as like this the sequel runner i think i mean this is a little bit before our time um so we didn't actually experience this firsthand but if you just you know study the history of these companies

Datadog's Strategic Expansion into Data

you know you start out as the sql runner with like some viz and then uh they well periscope got acquired i think by uh by uh sisense i think and uh and so that really folded into an actual bi company and then mode eventually became like legitimately a bi company and so um from that perspective yeah there's these like ad hoc surfaces and notebook like

things eventually become like full-on bi products so this this this feels very very obvious that um you should be like that company should be going in this direction it makes a lot of sense to me too i mean I mean, we know that Hashboard team, right? But I imagine the direct thing that they'll be able to help with. Their acquisition probably indicates they have some immediate semantic layer.

ambitions. I see, I see. I mean, but it's all in the service of like just getting the full AI end-to-end thing over BI working, right? But I just, I wonder if that's the play here. That is a... that is a very reasonable play if you are a bi or bi like company i would say but then our play is a little bit different which is we're not we're not a bi company we want to we want to be completely native and that

um has its own challenges but also uh has some structural advantages yeah that's right so i'm connecting some dots here just i feel like you know we we obviously learned that a difficulty with the notebook space was that there just aren't a lot of people that are willing to use a notebook to do SQL and Python work. But I wonder if this is also kind of going back to our last point around the unified interface. I wonder if this becomes also the biggest barrier.

to entry for people wanting to use, say, like an AI native analytics tool if you build on something like Hex. Like simply because it's a notebook, you've already kind of restricted, you've started with an interface that makes it more difficult for... I think so. Yeah, I think so. And I think if we have learned anything, I mean, we were, we're pretty technical founders and we didn't really know that much about marketing, but if we.

put our technical hat on and try to analyze what really is marketing. I think the fundamental thing about marketing is positioning. How do you position your product? If you position yourself as an SUV, then... you know, people who are not so hardcore about outdoor, outdoorsy things is going to, are going to want to buy the product unless you, you, you say, okay, it's a luxury SUV now. And so you get all the soccer moms to buy the car. And sports cars, not everybody wants.

Reimagining Semantic Layers for AI

sports cars either so I think the positioning of what the category of your product is is very important and of course you could you could start one way and then expand but if you learn anything how you start really matters and how what you become like Snowflake started out as an analytical query engine plus database. So it really has this focus in analytics and like Databricks started out as basically a query engine for basically machine learning and AI, right?

And so, of course, it's much better for those workloads. And of course, it's less good for analytical workloads. It's not as convenient, really. Right. So from that perspective, of course, these two cars.

claim to do the same thing but uh the category is different so different people want to buy it and so if you start out as this data notebook is kind of like a sports car like the only the only really people who really love cars they want this thing like they want to drive like stick shift right and and you know if you started out as a sports car company it's really hard to say no jk we're i don't know toyota camry now like it it's hard it's hard to do that

Yeah, it reminds me of the, I mean, you gave the analogy before when we talked about like Blackberry versus iPhone too, right? It's like you want to be iPhone, you don't want to be Blackberry where you're kind of catering to this like technical one-color group first. We've gone through a couple.

um different acquisitions now and i feel like they're just give me a sec to like orient myself because i it's all tp all ap and that's like one kind of more obvious consolidation but it's also being perpetuated by AI and now this is like a different this one is is more um I guess consolidation along the like an orthogonal axis where we're just making analytics

consolidated, right? So there's also the data catalog acquisitions that have happened recently. Jesus, what is happening? Yeah, what is happening is a lot of consolidation. So Informatica went to Salesforce? Yeah, I think Salesforce. Is that right? I guess it's the missing half of Tableau, right? I think Informatica is a full stack data integration and data cataloging and governance.

all these things, they provide all these things for larger and legacy organizations. And it does seem like the data integration piece is a piece that is missing from this site. tableau acquisition so that's from like a super high level perspective and uh but but from a strategic perspective i guess um people have this thesis folks are starting to come up with a thesis that metadata is really important because metadata

Having data about the data assets that you have is really important for AI. But this is also in the theme of these are traditional data systems trying to tack on this AI play. who knows if this is going to work out or not type of situation. And so what do you think? What do you think about these metadata companies like Data.World and Informatica consolidating? And then, of course, the strategic buyer.

It's actually being strategic about this. This is likely not just buying the revenue, but it's saying, hey, I own the strategic asset and I could do something with it. And doing something with it is very, very obvious what they want to do with it. They want to do AI with it. But does that make sense to you? I mean, you know, just like we did, we ran against all those benchmarks, the SQL benchmarks like two years ago, right? And we found that.

Just having a catalog and injecting the right context gets you to quite robust text to SQL accuracy. But I, and so like, of course, like I think the whole industry and everyone and their grandma has figured out that you need a catalog. or some sort of central source of truth for all the business context to have a robust text-to-SQL offering or just generally a robust AI offering, AI analytics offering. But I remember like...

You talked to Bob Muglia, right? And he'd mentioned this knowledge graph idea, which has been sticking in our heads for the last year. And I'm thinking more and more that something is wrong.

with the catalog uh and the graph feels like it's more the right solution but i i can't articulate like it's easy to just say a graph and then like point vaguely at the fact that um connectedness can get you some advantage but i haven't been able to articulate yet what that advantage looks like and i think people don't have that intuition which is why everyone's still anchoring on this catalog that is an interesting point

so i think fundamentally the details so if you're really engineering a system so that it the context could get injected correctly into into the l the dlm or the agent then of course the details matter like the very granular details of how that context gets injected. Literally, I mean, prompt engineering is very, very fickle. It's not very robust. So how you do it specifically really matters. So, so of course the format of

how that data gets injected matters. And from that perspective, of course, how it's stored matters and how it's retrieved, how it's stored and retrieved matters. And it's not super clear to us that. Like the traditional data cataloging systems, I mean, they probably contain a lot of very useful information, but it's not clear if that's enough. And just because I think data catalogs, I think the killer use cases were.

in basically in governance like compliance or governance like data sensitivity and security and also like financial compliance for banks like I think that's what the Collibra really focused on and so forth

Investing Challenges in Data + AI

it's really not about the consumption use cases it was really about governance and and from that perspective if your killer application on top of your you know data store is is governance then your data is shaped in in a format that is that adheres to the killer application. And the killer application now is AI. So the idea is that could we do something with existing catalog or are we going to see some creative destruction here?

it's like it's like um it's kind of like our stance and i think my stance is likely it's um yeah our stance is Just like BI companies introduced the semantic layer as a bundled entity, because you need to have a killer application that necessitates that semantic layer as a concept. And this is precisely why there are not that many.

semantic layer only companies like cube or like at scale or something like that they're not that many of them like there's they're like literally like dozens or if not hundreds of bi companies but not that many semantic layer companies because it's really hard to sell that um as a standalone thing

It's probably going to be hard for a standalone data catalog company to layer on the AI part on top of it and expect it to work. The system has to work in tandem, right? And so it does seem like... this is like a story based acquisition and like an optionality based these acquisitions are like story based or are like optionality based acquisitions but it doesn't seem like something that will actually result in

like a massive amount of creative destruction here yeah i can see that um this is a meta comment here it kind of like as you you were explaining this i find it kind of funny that I feel like catalogs went from the least sexy part of the data stack into apparently now the darling child of AI. I mean, for now, right? I think and I think you think it's going to...

It's going to change in the next few months, especially as we build our platform and go to market. But what about the other way around? So there was a number station. I mean, obviously we have Informatica going into Salesforce. We also have Numbers Station going into Alation. You think this is also, like it's no longer like an option. It's more like a, it seems like.

an acquisition that is aimed at doing the very strategic thing and going after AI. I mean, if I were a relation, I'd probably be like, we have a catalog and everyone's talking about how the catalog makes AI better.

We should go after AI because we've solved the hard part, which is building the catalog. Yeah. It feels like a talent grab play, it seems like. Yeah. Does that make sense for... It makes sense for... elation and maybe uh for number station they thought okay actually having access to first of all the sales channel into going into all these customers and then having access to all the metadata already mapped out

could be a really good way for them to accelerate their go to market it makes it makes sense high level and um i've been hearing about um some companies that are in the market for like a seed and then they just get acquired by by companies because uh companies are just um really really hungry for ai talent so so from that perspective maybe i mean number station does look it looks like they they had great folks they have great folks so it does make sense to me that that would be the play from

uh the elation standpoint i can see that oh there was also there was also i guess there was also another acquisition another set of acquisitions like our friend kevin But also the Datadog acquisitions, right? And this is another consolidation play, but from another contiguous yet orthogonal source, meaning it's not...

The data dog is not a tradition. It's not a data company, but it's, it's contiguous to a data company. I would say it's adjacent to a data company. What do you think? I haven't thought about this one at all, but I just, it's weird. Okay, the proximity makes sense to me. Metaplane and Datadog makes a lot of sense, right? But Epo... I guess experimentation...

So for people who don't know, Epo is an experimentation platform that was also acquired by Datadog. But experimentation never seemed like it was part of observability. But I guess it is if you just say like... observability is whether an app works, period. And then I guess experimentation is whether the app works for users. I see, I see. Yeah, I see. It's kind of like part of the APM.

paradigm then if you squint you could you could technically say that because apm is a way to measure all sorts of things about the product's health and the part of that is experimentation data But it feels like a lot of mental gymnastics to try to get there. But I mean, I can see it making sense, but I wonder what they're going to do there. Or maybe it's just different SKUs, like we were saying before, right?

Yeah, I mean, this does feel like a Databricks entering into the AI space by acquiring Mosaic ML and also acquiring Neon to go into a contiguous but adjacent space. It does feel like that to me where...

Future Discussion Topics

you know if you're a company that you're i mean observability is effectively a data company right you're looking at metrics and traces and and logs and you have different ways to cut that space you have like a way to consolidate all the logs into one platform. You have APM and a part of that is getting other data into their platform and their sales force.

Getting product data is very important. Getting data from the modern data stack, like from the business intelligence stack, makes a lot of sense, I guess. In terms of expanding into... an adjacent field that is not observability in the traditional sense but observability from a semantic sense like it has nothing to do with business continuity but semantically

Business intelligence is, of course, part of observability. If you're thinking about observability over your entire business, you have observability over your metrics and traces and logs from your machine systems, and then you have... Human systems observability, which is business intelligence. Like, okay, like about like my KPIs about revenue and profit and all these like lagging indicators about my business. Isn't that part of observability?

From a semantics sense, I guess it makes sense. So semantically, it makes sense. And then also from a core competency of the business and being able to sell data systems like build. sell data systems from a competent sense. I think it makes sense that Datadog will try to acquire all these companies. Okay, so from that perspective, I guess Salesforce, the thing that is missing from the data stack is really...

the data warehouse or data lake house, they're really missing that piece. Datadog is also, if they're really expanding horizontally, they're missing a data warehouse, especially I think with players like ClickHouse. coming in and taking observability, right? Like they're taking the observability workloads away from Datadog because Datadog is so expensive. So it makes sense for Datadog to try to own a data warehouse. Otherwise they're going to become like this intermediated or

like they're going to be cannibalized by the data warehouses. So they, they probably desperately want a data warehouse and, and, you know, because they can't buy one and like everything that is out in the market is quite expensive. or not played out. So I guess the next best thing is buy a few companies in that space that are connected to the data warehouse.

I see. I see. And that makes a lot of sense to me. I mean, kind of like dip your toes in the water into the data space, see if it sticks in some way. And then you can kind of extrapolate out what would happen if you did buy a warehouse company. Let's talk about... semantic layers as it's completely related to all the stuff that we we talked about um so i guess just adding a data catalog and then shoving that

with AI and then expecting something magical to happen. That's like pretty naive thinking. We probably think also putting a semantic layer that was built for BI and then putting AI and then just calling it you know ai for bi that seems quite naive in the same exact logic the chain of logic as the catalog logic but what do you think about this like what do you think is going to happen with

the semantic layer plus AI situation. Yeah, I mean, I think you believe this as well, I believe, but I think it has to get rebuilt from the ground up. I think the difficulty is this feels like a P is not NP problem. We like to say P is not NP when something is kind of obvious in hindsight, but actually getting there is quite difficult. But it feels like, I don't know what the solution is, but I think once we see it, we'll be like, oh, great, that makes total sense. Because like, clearly...

If you use any of these, and we've talked to folks who are using the semantic layer company AI tools, it theoretically makes sense, right? Like you've somehow like mapped the space with all the semantic information and you now have this beautifully guard railed system that has access to just like vetted things. And so it can spit out and answer questions. But like something is, it's just not quite flexible.

And this kind of brings me back to the concept of the knowledge graph again, which I don't want to get too highfalutin about this, but it feels like when you really know what you're looking for... Semantic layer perhaps can suffice, but most of the time people don't know exactly what they're looking for. Like most questions almost by definition are... are things where you don't know what you're looking for. And I think when ambiguity is high, value of traversal along a graph to find...

more loosely connected objects seems like it would be pertinent. But again, like practically speaking, I don't know if this is like, I mean, I doubt this is like something about... the underlying data structure. It's not like having a GraphDB or something will magically solve this. I think it's like a hierarchical semantic structuring problem.

or something along those ends where you figure out the new primitives that work to map the flexibility correctly. Yeah, hierarchical semantic structuring, I like that. HSS. But how do you think about hierarchical semantic structuring? So it's probably not as naive as less.

you know let's shove in the semantic layer or something like that it has to do with and like it's not it's probably not as naive as let's literally just use neo4j or something and uh and neo4j is gonna handle like everything for you is probably it's probably not that yeah it's something probably more sophisticated than that okay i think a really simple way a really simple

A question that you can ask that obviously can't be answered directly by semantic layer is something like, why did this metric drop? To some extent, yes, you can have a reasoning engine take all the different dimensional cuts and all the drivers of metrics and like kind of break out that Cartesian product and search through all of them systematically. But I mean, first of all, the LM has to think to do that. But secondly, like.

You now have this level of non-determinism that you're trying to work through with the LLM. And it feels like that semantic layer alone isn't quite... the right shape to solve that problem. Whereas you might want something that can actually, that actually understands, I don't know exactly what it looks like, but we have analysis, right? So like deep research came out everywhere, right?

But from deep research, we have these like paralyzed systems where you can go down all these different rabbit holes, come to a bunch of different answers to a specific question and get to a like. wisdom of the crowd solution that ends up being extremely accurate. I mean, as far as I can tell, right? Like deep research has been phenomenal in perplexity. And the question is like, how do you take that?

and then apply it to the same constraints, apply the same constraints that you have in analytics, maybe around determinism or around around accuracy or what have you. And then how do you adjust the semantic layer such that it fits with that? that paradigm. And I think that's still like an unsolved question. I think it's even just like an unstated question. I don't think people are, I think everybody is just hoping semantically or catalog, this will solve the AI problem without realizing that.

There are different kinds of questions than the ones that you would just compose dimensions and measures for to answer. Like, it's not like everyone's just going to be asking, how many users do we have in the US? People are going to be asking much.

much more ambiguous questions and i think that's where real value comes from and also where we have no real solution and now i'm going to get off my soapbox but is it possible okay like is it possible that this is a midwit curve and then we are over complicating things oh absolutely absolutely i mean we we i mean joseph and i talk about this frequently for everyone listening that we i think there are just cycles of midwit that i think we all go through

Maybe if I just use some metaphors, it kind of helps me figure out and make some bets here. I feel like semantic layer is like Legos, right? And yes, you can build a lot of things with Legos.

But sometimes you need building materials. And the question is whether it's resolving a problem where you're just trying to give a toy to a kid and Legos actually are an extremely... uh elegant way to give a kid unfettered access to an ex to a creation machine right like the components that they need to really like use their creation creativity uh as much as they want and that's

Or is it more like we're trying to build houses and all we have are Legos? I don't know where we sit, but I think depending on where on this spectrum we sit will... determine whether or not the semantic layer alone is enough. There's like a very small point, which is if we think about using LLMs to retrieve and synthesize information from data, like structured data.

as something like a search problem we're stuck in this yahoo like world where everybody has to semantically model things data model things correctly and then suddenly things start to work uh it feels It feels like having a bunch of directories everywhere, like Yahoo did, is probably not the right answer to organizing large...

set of information for consumption that seems like it's not the right way to do things especially when the sheer query volume increases so like I guess in the in the old world if you don't have english-based q a then people have to be reliant on on bi people like data analysts to like make you dashboards and things like that for you

And from that perspective, then just semantically modeling a small subset of things that really move the needle, like basically just like KPI-based questions, that was sufficient because that's like all you could do relying on a human system. If we just thought for a second, just from first principles, if the demand for data insights like that kind of Q&A went up by 10x or 100x or even 1000x, can we possibly rely on this manual?

way of mining for insights from your data warehouse? Probably not. And so even from that perspective, you probably need something a little bit more automated in building that kind of semantic. knowledge in your organization and that likely has to be more algorithmic and self-learning than just doing semantic modeling yeah yeah like the access pattern And that naturally leads to some sort of refactoring of the information architecture and how you...

how you should be constructing things. Yeah, and we'll see. Maybe it turns out to be something as damning as being Yahoo. Like starting as a semantic layer company or a catalog company, and it may end up being something as damning as being like Yahoo and sticking with that pattern, but it might end up just being the scaffolding that you use to build up Google, right? So I can't, I won't die on this hill.

But obviously we have our own bet, but it's hard to see through the fog. It's hard to tell. On this point, Joseph, something that we've... encountered a lot is when talking with investors, we've noticed that people have a really hard time trying to pick between companies to invest in.

uh do you have any any thoughts on why why this is and i think this kind of relates to investing in the data plus ai space yeah in the data space yeah yeah um I mean, if my confusion isn't any indication that it must be hard for investors too, even as a practitioner trying to build a company.

I mean, it would be hard for us, too, I think. If we were investors, even though we know a lot about the space, it will probably be very, very difficult for us to invest in the winner. Just because, first of all, the sheer volume of... of companies that are out there that are purportedly doing data plus ai it could be uh you know automated like an ai data engineer or a data ai data analyst or something like that uh the ideas are easy but the execution is hard

The navigation through the idea maze is hard. And also I think it's like a mid-width curve idea, I think, which is, it's very, very obviously a good idea. So you would like, you know, folks would just jump in to solve this very obvious. problem but at the same time it is actually an excellent idea just really hard to hard to execute on and then you know maybe the naysayers are really uh you know in in the middle part of the mid-width curve and over complicating

things. And so from that perspective, of course, you have a lot of competitors. I think there are literally like a hundred AI data analysts out there in the market. But there's like one every week. There's like one every week popping up. But the idea is how do you cut through the noise and how do you actually build a sustainable system? And how do you build a system with...

that is differentiated, right? And like, how do you build a differentiated system? How do you? And from that perspective, of course, you have to get away from this like semantic layer plus AIs can solve everything mentality. Because you can't build a differentiated architecture.

and method if you just follow everybody else and I think this whole data game is is it's not only just a function of just building things so that it works it needs to be built in a way that gets the maximal adoption from the community and i think the adoption part is really hard so i think if you were to if i were to invest you want to invest in a team that will get to

will get data folks to adopt you. And from that perspective, you need a certain level of experience and gravitas in the space and trustworthiness from the space to. uh to work together with your customers to actually figure things out and i think uh pretty much like 99 of the companies that are coming out there just don't have the trustworthiness that uh that is required to like you know uh really

Push through the noise. And from that perspective, I think from that perspective, it is easy to just disregard almost every single company. But if you're not seasoned. in the data space as an investor, it might be, it might be hard to, to, um, you know, overcome all the noise in the space. Sounds like you're describing a team. team we know well um yeah but hey this i'm just as you're saying this it kind of reminds me of like like the social media battles from like 2000 i don't know we're like

It wasn't a game of picking the space. Everyone knew social media would be a sticky thing. And MySpace had already validated this. But it wasn't clear that Facebook... would be the winner, for example. And there were hundreds, right? I remember signing up for all these different ones because I was like, oh, I want to do the cool kid thing, the avant-garde thing and not sign up for Facebook and MySpace and do some other thing.

And in that sense, it's like, I guess it's like, if you want to make a non-team bet, but like a principled bet about the product. You have to make a bet on what you think the eigendifferentiation is going to be for the space as a whole, right?

I guess, okay. So the similarities between the social media companies in the, I guess, the early 2000s and the data plus AI companies now is that the only thing that really matters is at the end of the day is who gets a distribution first who gets to the network effects first like who basically builds the standard in the space and i guess it's business dependent so for for social media you had a bunch of like 20 something year old kids like

evan spiegel or or zuck uh coming up with these or like kevin system or something like that coming up with these new companies that really spread like wildfire like the founder of like telegram etc uh building these services so it really helped to be young and helped to be savvy about this new paradigm. But for the data space, I don't think that it's going to be a very young team that...

gets to distribution first, just because I think there are so many idiosyncratic things in the data space that prohibit you from getting to distribution. And I think a key part of that is probably... Data, I mean, this is very obvious. This is another P versus NP thing. We have realized that it's really hard to sell data products in a PLG fashion just because...

Smaller companies don't really have that many data requirements. And from that perspective, there are not that many small companies that need to adopt data systems. And if you think about that, then your traditional... like young person's game of plg and like viral growth type of situations that is not generally going to work and so um you have to devise a way to get the trust of

larger companies to give you their business. And data is a space where there's a lot of sensitivity around proprietary data. So it's really hard to... get a company to open up their kimono, so to speak, and plug in their data system to your data system. It just is not so frivolous.

compared to other systems even though like you know things like the crm or something like that that obviously is incredibly important data but somehow people are okay with using using like a web-based crm which makes no sense to me versus oh like here's our data in a database and we can't connect to to this system because we have all these like security hurdles and so from that perspective it's really like the the rate limiting step to distribution is not this viral distribution

methods that young people are generally good at it's it's probably getting the trust of the of the customers that's really um uh the rate limiting step of course that's like from a distribution standpoint and then And here, I do think that having built this system for having thought about this and then having actively been building this system for the past half a year or so, it does seem like the...

The research problem of actually figuring out how to do this hierarchical semantic segmentation. Is that right? This HSS thing that you just came up with. That is actually a really hard problem. So like, yeah, you have two problems. One is like, you need to know the.

prerequisite AI and ML knowledge to actually build this hierarchical semantic segmentation algorithm to actually get to the right data and the right insights. That's number one. And then number two, you have to figure out the distribution. so that people trust you in deploying these systems. These two things, it's going to be pretty hard for a really young team to tackle so that DNA is going to be different at the very least.

you're going to need a team that really understands the nuances of machine learning and AI, as well as a team that not only gets that, but understands the sort of, that gets... is able to get the trust from um the folks the data folks required to get the distribution so it's like a two-part problem which makes it doubly difficult and from that perspective we're a little bit bullish

about our own team and trying to tackle this problem. So what's our secret sauce, Joseph? Are we going to tell everybody? Secret sauce is like, you know, the... hundreds of different decisions that you have to make to navigate the idea space. That's very true. Yeah, very true. All right. Well, Joseph, do you have anything else? No, I think that's...

Good place to stop. Next time we'll talk about some other spicy topics like MCP and A2A and orchestrator agents and perhaps multi-agent architecture, et cetera. There are a lot of things to discuss. Things are moving fairly quickly. Lots of fun stuff. Cool. All right, Robert. We'll see you next time. All right. Bye, Joseph.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android