The Graph Problem Most Developers Don't Know They Have

00:00

If you're interested in graph databases and curious about knowledge graphs and how you can use both within generative AI, this episode is for you. Joining me today, I have two guests, Bosco Brookmeyer, head of engineering over at Every Cure and Neil Seyoung, product manager over at Neo 4 J. We go through the basics. What are graph databases? What are knowledge graphs, and what problems do they solve? You might actually have more graph problems than you think,

00:23

so enjoy. Do you have any experience with regards to like rack solutions or implementing stuff in production with regards to Chen AI? Oh, it's difficult. I mean, that's the reality of it getting to production is is I think currently the biggest challenge with Chen AI. Yeah, yeah, I think it's quite easy to set something up. But once you start work, especially like when you're I'm I'm living in a graph world,

00:47

right. When you're converting documents to a graph, that's something that's quite easy to to do a first demo, but you're going to have to start thinking about chunking. You have to kind of model, you have to do so many things to get to something that's production worthy, and scaling to 10s of millions of documents is where the challenge is. What do you mean when you say OK, we're going from a document to a graph? Like visually, I don't know what that means.

01:09

Yeah. So I mean, you can probably speak about this as well, but when you're in my world, when you're building a knowledge graph from unstructured data, you're taking a document, you're converting that into a knowledge graph, which means that you have to define a model. You take the model which has your entities in it. For example, you work in a bank. Your entities would be the bank itself, a client, an account. These will be your notes in the

01:33

graph. And then you use your large language model to extract these entities and convert that into a knowledge graph. So you can use a large language model to extract that. So this is something that is kind of crucial when you want to do graph Rag, which I'm sure we'll discuss later, to build the right knowledge graph. And that's where most people kind of struggle. Maybe you can say something about that, because I know you've had a lot of very ambitious ideas to construct

01:57

knowledge graphs from from data. Yeah. So I mean, my, my team right now, we mostly grab existing knowledge graphs and they're based on certain ontologies and we merge all of them together to try and create one very large set. Because then you can still introduce filtering steps between your modeling and the graph itself where you can say, OK, I'm going to slice away stuff. I'm going to take away certain categories, I'm going to take away certain kinds of nodes.

02:26

But you can't add afterwards, right? So we first throw everything together and make sure that it's all harmonized within the ontology. And then we do data experiments. So we say, OK, what if we remove all of the proteins from the knowledge graph? Does that up? Or like, does that lift up our performance? Actually the experiment, I told you on Friday that it was running where you perturbate the edges. So I love how we just go

02:52

straight to the graph. So the question was what happens if you mess with the knowledge graph? Does it actually impact your model performance because the the hypothesis you want to refute or you really want to avoid that? You can mess the knowledge graph up completely, but the model still performs the same way. If you have that, then it means you actually didn't. You're not deriving information from the knowledge graph. Your model isn't performing good because of the structure.

03:20

Your model is performing good because it's memorizing answers or because it's just in our case may be latching onto it, saying that certain drugs, which we call frequent Flyers, just always help. So if you inject someone with adrenaline, that usually resolves most symptoms because adrenaline just has that impact on the human. Body shocks your system. But that's not necessarily something that helps people in term. And so that could be memorized by the model, but So what?

03:48

What we then found out in the experiment is if you mess up the knowledge graph 99%. So I took the target nodes. So you always have two nodes that are being connected with an edge and they're directional in our case. So I took them edges that point at that object and I changed them to another object of the same class. So I would say drug treats disease, but it would be a different disease and that wouldn't actually be true, right? It's false information most of

04:16

the time. But I wouldn't say drug treats, I don't know. We have food types. We also have like a food type category. So you wouldn't say drug treats hamburger because of course that's silly. And so we perturbate all of the edges within each category, one percent, 2050 and 99. And what is interesting to see is the model performance doesn't drop to 0 when you perturbate 99%, it actually drops about half. So half of our models performance is from, at least that's my hypothesis.

04:49

It's from the data. So it's actually picking up from signal from the graph. But the other half is also not from the data. So yeah, that was just an experiment. Yeah, yeah. So you cut down your knowledge graph and you have a model that uses this as a base for reasoning. Do I understand that correctly? It's actually we open sourced the whole thing last week. So that's, it's all in the in the giant mono repo, but we take the knowledge graph, we then create vector embeddings for each node.

05:22

And there's a variety of ways you can do embeddings. At the moment we do node to vet because it's the highest performing one. But you really want to encode the neighborhood and kind of the topology around a node in the number that describes this. And then you give those numbers to a model because models, of course, they always need a, a numerical representation. And then you try to predict what is the probability that a certain drug treats a certain disease. So I think it's the class of

05:48

edge prediction essentially. Yeah. You mentioned ontologies as well. Can you lay out what what do ontologies mean? Do you want to do that? Because I'm not a big Ontology fan. Yeah, I mean, they're this, this is a very scary topic because people have different definitions for an ontology. But in my world, an ontology is just a model definition, OK? It's just a specification, as he was saying. And this drug treats this disease, and this disease is related to that disease.

06:13

So in the world of property crafts, we call it a model. In the RDF world, they like to use the word ontology. I kind of switched them up interchangeably. Yeah, what's RDF? RDF is a oh, that I will redirect that to you. That's a. So they're they're basically two big philosophy in the craft world on on storing craft data. Yeah, one camp says property graphs are the way to go and the other side says RDF is is much better way to go. OK, so it's a way of storing your graph data. Yes, it is.

06:40

Yeah. Gotcha. Yeah. What does it stand for then? Resource description framework OK. Yeah, just a different way of storing it. Yeah, yeah, it's, it's all graphs, right? It's just a different way of storing the data. It's a different way of querying the data. Yeah, yeah. And graph Graph thinking is not something I'm used to. I was thinking before the show.

07:00

Have I seen any, any graphs in real life like you mentioned the entities and there I can draw a relation to kind of DDD when it comes to defining your domains within a certain field. So that was kind of similar. I've recently been really structuring my notes and I'm trying to be a settle custom practitioner. And there one of my colleagues said, OK, don't even think about the folder structure. Just make sure that you have references from one document to another when they just make sense.

07:27

And then I have a graph visualization. That's the only thing I can think of where I think, OK, this might be. I don't know if I would call it a knowledge graph or if that's comparable. Would you say that is? Yeah. Yeah, what you're describing, I have a colleague, he calls it the graph problem problem. OK, which means you don't know that you have a graph problem. That's your. Problem OK. And I love that.

07:46

And I think it's very fun because once you start thinking about graphs, you start seeing everything as a graph. OK? Like going to Amazon, buying a laptop, It says other users who bought this laptop also purchased a mouse. Yeah, that's also a graph. OK, You're a node as a person. You bought a product that's a node, and then there's another person that bought that same product. So you're building this little graph of connected entities and you're doing a recommendation.

08:10

These two nodes should be connected, you and this other person, because you purchased the same product. And then from that other person, again, next step in the graph, what other products did they buy? A mouse? So then you're thinking in terms of connected entities. Do they use a graph under the hood then also to recommend those things or? I'm pretty sure. I'm not 100% sure, yeah, but it feels like a very natural. Graph problem. Interesting. Yeah, yeah, yeah.

08:33

So that apparently is a graph problem. Yes. I never like, I never think about that, what happens under the hood. But it's very interesting that you go from one thing to an action to then a person to their actions. And you can kind of follow steps in that way. Yeah, you can traverse those paths and you can use that with regards to Gen. AI for reasoning then. Yes, yeah. So my philosophy with Gen. AI and Speaking of graph, right, context is, is king.

08:59

The more context, the more relevant context you pass through a large language model, the better it performs. And graphs are just extremely good at context because you've got a network of connected things. It's a document living attached to entities that are connected to other entities. And you can go very, very deep in that graph. So you have a very rich context of a ontology or a model that you've defined. And that is just why I think graphs and graph rag is is very

09:22

cool for for this kind of thing. Interesting. I really wonder how it compares to like Rag. In essence, how I've seen it is we use text and text is interpretable for humans and then we make embeddings from that and then we do reasoning for that. It's a different way of kind of filtering this context. But the way you explain a graph, if I have an action and then I purchase a product and we have other people that also purchase that product, but also they bought different products, it

09:47

very much narrows the context. It makes it hyper specific. And that might be a good thing. I don't know though. It's it's, yeah, it's relevant, right? You're defining. These are the relevant things. Yeah, have any of you experimented with kind of embeddings? You mentioned node 2 VAC is the highest performing one. Well, it's just in our specific situation, right? So we used graph stage before, OK, which was a combination of we first took the nodes name and

10:16

description and category. So it's like a certain drug, the name, maybe a certain description from drug DB or something. And you take all of this and you throw it at an embedding model, the classic, well, not the the opening icon, right, the the text embeddings. And then you have a text embedding which describes the node in isolation without its neighborhood.

10:41

And then graph Sage. I'm not going to be able to perfectly recall it, but I believe it does a bunch of random walks and then pertubes the the base embedding to also encode the neighborhood. And so then this number contains both the information of the node in isolation. And the interesting theory is because you're throwing in, you're throwing in a couple of things now, right? You're having language models that read the entire Internet.

11:10

So they have very generalized knowledge about the world. And you take that as a baseline. And then you kind of manipulate that that vector and make it point at something slightly

11:20

different given the environment. And so then the resulting number is a mix of both the structure of your graph and it encodes the knowledge in general of the language model that the embedding model was based on. That is nice because it's, it in theory should contain information both from your knowledge graph, which is very codified as well as just general

11:42

knowledge. But it's also problematic because it might leak information because how do we know that we're, we're, we're having a withholding set of, you know, you do a training and then you withhold the data and you try to predict whether your model is able to predict a certain edge. And how do you know the language model that you use for the embedding doesn't have that information stored in its weights? And that kind of leaks through the language model.

12:09

So that's why no to vec is probably more pure because you use a system that you have entire control of. You start at a vector that is completely randomized and then you purely from no to vec you get purely on the topology of the graph, you get a vector. So the problem now is you're throwing away this magic power from machine that we have invented the last couple of years transformer models because we're just using Node 2 AC. Yeah.

12:38

I mean, for me like bringing Gen. AI applications to production things that provide value, that's already very challenging. And I feel like what we're discussing now with regards to the knowledge graphs, it feels like it's 2 steps above that even. When would I use that from a from a use case perspective? I know you have a good one, but was it like the first thing you gravitated towards or did you evolve into kind of this way of thinking? Did it just make sense for the

13:01

use case? So in my case, my founders essentially decided that there's unique moment in time where we have now machine learning models at our disposal that can connect all the different information and they wanted to figure out, OK, all the drugs, all the diseases, can we just search for any repurposing opportunity? That is why we use knowledge graphs for this I think was just because at the time that seemed like the highest probability to be able to actually do a

13:34

prediction all versus all. But we're now mixing in a number of other because other methods because we realized actually it's it's just a ranking problem trying to rank everything. You can rank things in many different ways, but I think when when you both were talking about like it's so hard to get things into production, I'm actually very happy that not everything makes it to production because I have real beef with the complexity of large corporations.

14:05

I sat in a boardroom couple years ago and someone like mentioned kind of an off way rise, a major insurance company with 23,000 IT systems and everyone seemed to be completely fine with it. That's ridiculous. I mean, how much can you, how many IT systems do you need to manage? I don't know. Even if you have 100 different forms of insurance, one Postgres database probably could handle all of your customers, all of the insurance contracts that you have, all of your employees.

14:40

Why do you need 20 + 1000 IT systems? And I think the answer is they don't. But now they have them and it's so hard to get rid of them. So what I'm really waiting for is getting to the point where maybe graphs of the answer there, I don't know, where models were able to just stare at the system as a whole and go. All of these things we can consolidate so that people have much less chaos to focus, to worry about, and then try to instead do this one thing that is actually valued to the

15:15

business. Yeah, I think we're we're seeing early steps in that like we have some some customers, you know for J where they're trying to consolidate all these different data sources into a massive enterprise knowledge graph or digital twin or whatever you want to call it, which is currently it's somehow automated. It's a little bit manual, but 23,000 systems that's obviously

15:36

far in the future. But I, I do think that's a big area and I think we're going to see that because the, the downside is there was 23,000 systems. There will be a lot of people's jobs depending on it. So they're not going to be very eager to participate and get their stuff out. There I follow the training.

15:51

It was about like architecture and architecture patterns from from Gregor Hope. And he said for every dysfunctional organizational behaviour, there's a vendor and they'll be happy to take your money to kind of fix that. But the behaviour is still dysfunctional.

16:04

So all of a sudden if you have a process problem and you think, OK, I'm going to have a SAS solution that's going to solve that and it might not solve it or it just might alleviate it, it's still contributing to this cost of ownership. I feel like it's very challenging to keep things simple. It's very easy to just buy, acquire, to add things and then increase your complexity. But yeah, it has long term consequences. And I'm, I'm very not happy operating in a complex

16:29

landscape. Like I like to be productive. I like to be effective. I think things should just make sense. And when there's a lot of dysfunctional behaviour and we have a lot of applications or like tools for that to try and solve that, I'm just like, what are we doing here? But still it grows like that. It reminds me of that that thing, the programming thing, where it's super easy to have 200 lines of code, but to remove 5 lines of code. No one wants to do.

16:51

That that feels like the the coding equivalent of what you're just saying. Yeah, yeah, yeah, you get that. Yeah. I still don't know if I'm then in an organization and I have an actual use case where to go with that. What do I do with my data? Do I actually need a craft database or knowledge? Like you already said, I live in this graph world and I can see graph problems. What problems do you have, other than the Amazon example for example, that you've solved?

17:14

I think there's a lot out there and I'm being surprised still to this day. Unfortunately, I can't tell you about all that. Of course. I mean, there's a very obvious ones like real planning, right? You're taking a route from a station to another station, the extras algorithm, finding insurance path, telecommunications industry, building a network of cables to provide everyone fast Internet. That's a graph controlling the

17:41

load, understanding. If one link in my network goes out, you know what, what will be impacted? Where do I have to reroute? You can translate. You can take that idea and copy that over to logistics problems as well, right? There's a boat stuck in the Suez Canal. How do I, how do I get around and, and get my stuff out there? But I think that the really interesting ones are the ones that are not so obviously craft

18:06

related. I think what I was just saying, the Amazon example is one of them, right? You have people, you have products. You can think of any organization really. You have teams with people in them. The people have are working on a on a tool. They have certain set of skills or expertises. You can imagine a skill is a node in the graph, team is a node in the graphs. Your team needs a Cotland expert. We just discussed it.

18:29

You can look at your enterprise graph and see, OK, there's someone working on the other side of the org that's a Cotland expert. Maybe I can bring that person in, identify skill gaps in teams, and then you're starting to think, OK, yeah, everything in my organization is a graph. I can combine that in different ways. I can and keep expanding that as well. Right. Interesting. Yeah, Yeah, I mean, the reasoning part that for me is

18:52

most interesting, right? Because if I have something like that that depicts kind of in a graph way what my organization structure looks like, and indeed, if I have a question or a query and who do I need or do I know a person that is related to any people in my network, that might be really helpful. Especially nowadays where getting a job to stay, it's more and more difficult, like relationships are I think

19:12

something you build. Upon I think in consulting, almost any consulting company should have at least a program in place. Whether they try to solve that with graphs is one question, but how can we bring our best possible person to this specific line? And I think, I mean, I've worked in three consultancies now. It's always very organic and human organized. You've got staffers and staffers have their network and then they try and think maybe they call some some partners up or some

19:45

colleagues in different offices. And sure, you've got some keyword based search systems where maybe you can you're looking for, like you said, a Kotlin expert, right? So then you search for Kotlin experts in in a consultancy in your internal directory. What if the person didn't label themselves with Kotlin because no one likes constantly updating their labels for themselves. It's kind of.

20:10

And the ones that are really good definitely don't need to because they're constantly getting one project after another. So having some form of a representation of what's everyone's expertise and what is the most knowledgeable person. That could answer this question. And then probably that person's going to be completely swamped because if you're the most knowledgeable person about a popular topic, everyone's going to come to you.

20:31

What are the people around that person that still have capacity and they're probably good enough for the job to do some form of a load balancing across talents and also kind of nurture that. That all happens in organizations quite organically, but it's certainly not optimized to the sense of what is a near optimal way of leveraging our resources. Yeah, yeah. I like that use case a lot. For me, like the tagging principle that you mentioned, it really hits on because we have exactly that.

21:01

We have a, a specific board on Monday and you're supposed to tag like what technologies you're good at, what are you comfortable with? It's going to help sales and it's going to help your future assignment. So if you want a great assignment, then you kind of work to update that part of the tagging. But wouldn't it be nice if do you guys use Slack? Yeah, is something just constantly watches all the Slack channels and sees you talk about certain topics and others

21:25

appreciating. And you can read from the way that how many people respond, how many people react to it, And then pulling that all into a graph. I guess you could do a lexical graph where you see every post as a node, you're attached to the node as an author, so that's your relationship. And then how many people refer

21:41

to that. You can now pull out of the graph structure the facts of you're going to be a central node for a specific topic and so you could find the experts through that and you don't have to manually label. Yeah. I like what you said about this, you you'd never want to get the ultimate expert because they're

21:58

going to be busy. But having a kind of community again, I'm thinking in graph algorithms, bring a community detection algorithm to find, you know, Pascal's the expert on X, but these are the people that work with him. So by definition, they might also be the expert on Kotlin or whatever. I think that's very interesting. And then then you start using data that's not super apparent. So these tags, but you're starting to look at graph patterns as part of your feature set.

22:23

Yeah. Now you get that. I mean, for me, like information right now in the landscape that I'm in is distributed because we don't just use Slack, we have Slack. Some organizations or some parts of the organizations use Teams. We have emails. So I'll read there. The information is scattered, but if you have indeed an automated way of already plugging into Slack, that's incredibly valuable. I know what's stopping you from also plugging in the other data

22:45

sources I feel like. So this was an interesting 1 during the classical RAG days where everyone wanted to stuff things into a vector database. My hypothesis to this day still is because most engineers who were put like made in charge of a certain project and were vector database. I didn't even know these things exist. I want to build 1. I want to deploy one. But so everything was trying to suck information into the vector representation and then copy stuff.

23:12

And of course, then you've got the outdated problem, you've got the access problem. Because if I take your direct messages between the two of you and I chunk them, embed them, put them in the knowledge graph, how do I now make sure that that result isn't served to someone else, right? So I need to attach metadata who can read it. It's a huge pain, I guess. Now I think the term that's

23:31

emerging is a gentic rags. So you have an agent search through search AP is so it calls, it calls the Slack search API, it calls the team API, it calls SharePoint and Google Drive and whatever. And then those systems return them the most likely results. And then the agent kind of, which is the same as what a normal human would do, right? You kind of search for all the different systems.

23:54

But I was I was listening to a presentation from I think, I don't know if it was the founder of Exa, but there's a company called Exa and they're calling their way of searching now neural search, where they argue that can you reinvent search in the age of neural networks where yes, they do embed everything. They do that for web-based search. But their argument is that they that way actually out compete companies like Google, which do keyword based search.

24:26

And so I think it's a really interesting feel to see what is going to ultimately when is it going to be a hybrid? Probably because everything's usually been mixed things. Is it keyword search? Is it semantic vector embedding based search? Is it an agent or some kind of system that just calls a bunch of AP is and looks through the different systems? Or can you really in the Valhalla I guess is having this like 1 unified knowledge

24:53

structure of your company? And I know a graph companies say like this is what you, but I don't know if we can ever reach that because it requires you to only pull in all of the different IT systems and connect them all to your graph system. Have 20 in the insurance case, 23,000 ETL pipelines that all translate things into one unified ontology. Sounds like sounds like a project that's never going to go live. Yeah, that would be painful.

25:20

Yeah. But again, like what what we just said like not to get into production. If I look and I think one thing that makes these projects fail, it's the the ontology or the model, right. So imagine you do have the ETL pipeline to suck in these 23,000 systems. If your model is not right, if you're not modeling the right entities, you're not modeling it in the right way, you're not gonna succeed. But if you have the right model, I think then you can do well.

25:47

And you can do this step by step, right? You can start with 100 systems and 100 more, but the model is really where you fail or. Succeed. What ensures you have the right model? Is it starting small and expanding? Or how do you kind of test your assumptions and validate? Yeah, I think so. The first thing when you're making a graph model you're coming from relational is you throw away your relational model. That's step one. That's step. One step one, throw it away. I have nothing now.

26:12

And then you. So The thing is, the way graphs work, they don't do joints like there are no real time, there are no joints done at query time. We store the relationships at right time, So you have pointers between elements. So what you typically do is you take a whiteboard, you start drawing, you say, this is my, this is my employee, this is a skill, this is a team. You draw that on a board. And almost always you can very easily take that whiteboard model and convert it to a graph

26:42

model. So it's very much closer to conceptual model. You don't need join tables and all these awkward structures. Interesting. So that's kind of a step one. And then I really like what you were, what you were doing in your experiment. Just getting back to that, like iterating on the model, seeing what edges we can drop, what edges we can keep. I think that's that's really interesting, especially in in the age of LLM built ontologies and and you know.

27:04

I feel like right now, and it's more so now than ever, there's a lot of information that's like new and that's fresh. And we were talking about graph databases and knowledge graphs in the 1st place. For me, these are all like I knew they existed. I've never experimented with them because I also see there's two types of engineers. One of the engineers that you said was vector database is really cool is now up and coming. Let me see if I can put that in production.

27:27

Regardless of the use case. Maybe they just want to play around with technologies. I know I'm not that type of engineer. I'm the engineer that's like, I want to keep things simple and I want to keep things maintainable and rather find like a good solution fit. And even if I have many options, I'll pick one, I'll pick, I'll try and pick as fast as possible and then validate my assumptions. And if it doesn't work, I'll

27:44

pivot. And I know for like really good solutions, you probably need both aspects because otherwise one person that's going to look at established solutions will always lag behind when it comes to trailblazing and kind of newer technologies. And the people that experiment, they will get a better feel for what works where and form an opinion on that. And then they can actually apply

28:03

that to production. So you need this balance of like I think newer tech and experimenting versus actually looking at things and keeping things maintainable simple. It's very often have these, maybe not student labs, but a lot of young, young, career driven individuals joining these labs of larger companies, which is a bit of a ring fenced area where you can experiment a lot. You can try stuff, you can throw things away because they can let their creativity go wild in

28:32

there. While I guess the more seasoned people that have burned themselves a couple times have gone through the pain of now I have to deal with all this legacy and all this complexity. They're the ones who then act as the stage gate to say, do we take this forward, yes or no. Ideally they're not the grumpy kind that just doesn't want anything. They don't want to be the You don't want the person who just wants to sit through this for another 10 years until they hit retirement.

28:58

But you want someone who has who's interested in new things, but burn themselves enough times that they don't want to deploy the 74th Postgres database that does just stores 3 tables. But then rather, let's consolidate and clean up. Yeah, yeah, yeah. I'm interested then from your perspective, Neil's looking at a product and you have kind of a complex product graph database. It has interesting users because those are then going to be developers.

29:28

Do you work on ease of use? Do you work on education? Or how do you actually kind of inform users and make sure they know this is the best solution for what problem they have? Yeah, that's a good question. I don't know if graphs are necessarily complex, but it's a different way of thinking. Yeah, I think that's the biggest thing. The modeling is different, the querying is different, the visualization is different. So there's just a a big step

29:47

that you need to take. So we do a lot of education. One thing that is actually really cool is that and for J, we had a query language cipher for a very long time because we open sourced that. It wasn't de facto graph language. And now there is now a standard language GQL like Sequel SQL that's for graphs. Which is super cool because this is the first standard query language in 40 years. Again, I'm geeking out this.

30:12

I know, I know, this really maybe sounds boring, but there has never been a standardized query language for no SQL databases. They just couldn't get to the consensus. But the fact that there's not one for graphs is awesome because you can then pivot between different graph databases.

30:27

It's standard and it's just, I don't know, that's just feels really cool to me. And we actually the people that were involved in creating the GQL standard where some of the people working on my original SQL standard 40 years ago. So it's amazing that these people just came back. Some came back from, I don't know what they were doing, retired or retired, but it's just amazing and I think it's.

30:45

Really nice to have them on. Yeah, I think that in terms of education, you know, first of all, if you have a standard language, people will Start learning that and then this other thinking, modelling, visualization, that comes later. But having a good language is is is really important. Yeah, Yeah. So that is, but that's not really what you focus on for product sense, right? Is that more the education side that you focus on?

31:05

Yeah. So I'm, so my, my job right now is I'm working on visualization tools, right? So I'm, I'm working on dashboarding tool for graphs. So you can imagine your standard BI tool works really well with tables. Yeah, graphs, not so much. So we're making a tool that has a graph visualization component in it, ease of use using then cipher or G call. And from that perspective, what I'm focusing on is user

31:30

experience. So my, my thinking is graphs are for many people difficult, new, I don't know, difficult is the right word, but it's new is different. So I feel like we should have a higher standard of user experience compared to established tools because there's already the cognitive jump to move to graphs. If there's a cognitive jump to move to a different user interface as well or, or the experience. Yeah, I think that's difficult for people.

31:53

So I, I really care about user experience and I've had hour long discussions with my team about very minor things like button placement. But I but I love that. I think it's, it's really nice to have amazing user experience. Yeah, yeah. I've said this many times on the podcast, like if I have to go through a hurdle to try something out, I'm gone. I'm just, I'll search something else. Yeah, if there's a paywall, I'm out. But that's me.

32:17

I think people are impatient. I think it's also with the age of LLMS getting even worse. Like people don't have patience anymore. And if you build for the most impatient person, then the more patient people will just also be happy because for them it feels like a breeze. You know they won't struggle. The struggles that they would normally have patience with, They're not there. So it feels amazing to them. And the inpatient ones will still be grumpy, but at least they'll they'll stick.

32:41

They won't give up. Yeah. Yeah. Have you seen more adoption with regards to Gen. AI, because we mentioned RAG, we mentioned graph RAG is this way of kind of distilling your context, giving it in a different way to reason about and it might even outperform some other embeddings or might have higher accuracy. Have you seen like improved usage? I've got it in graph databases in general because of that. Oh yeah, 100%, yeah, yeah. Everyone is moving towards graph Rag, especially right now.

33:06

It's we've had a lot of people come to us and say, hey guys, we tried Rag, but it's just not working out. OK. That's the pattern that we see, OK. And then they say. To cater to that then, because they're already frustrated, right? So you now have to be the saviour for them. That's a tough spot to be in. We tried vector databases, it didn't cut it for us. Now this next, like we've got one more shot at getting this right before we're going to get our funding cut.

33:31

Now fix it. Yeah, No, your database now has to be the same here. It's a tough one. Yeah. I mean, on the upside though, people realize that simple rack doesn't cut it. So they say, OK, we're OK with investing this time into making a model and getting things set up right to get that boost in accuracy, that graph rack. Yeah, first. But things are moving so fast. Like I have had people on half a year ago, a few months ago, maybe a year ago.

33:54

They were like vector DBS of the thing and now you're already saying it's not cutting it and we need to go to this graph database to like fix. It, I don't know, I think vector DBS have have a good place, but I think if you really care about accuracy, traceability is something that comes up all the time. People say, I want to know that where this answer comes from, how was it derived?

34:11

And again, back to the graph, what are the entities in my graph that were used to compose this answer from this large language model? So that's another thing. And and for big companies, the difference between 80% accurate and 90% accurate or 90% to 95, that's, that's huge. That's huge. Yeah. Gotcha. What I find a bit curious is that all these companies are trying to build this themselves. There's RAG came about and then every company decided to build

34:38

their own knowledge assistant. And I get the desire. Having a Volkswagen GPT or a Continental GPT or, you know, HSPCGPT, well, it's too many acronyms, but that's very powerful. You want that. But then at the same time, Google dominates search because all of this is a search problem. Microsoft tried to come in and it took them ages to get a decent search engine going. It's very, very hard to build a good search engine.

35:07

So it's curious that now every company under the sun has a team somewhere that is supposed to build their internal knowledge agent. And as one problem out of many, I mean, they have to get user usability, right? Access management, right? What is the information that you can actually have access to all the Etls? And then they also just have to like on the side, solve the retrieval problem, which is a search problem.

35:31

It just seems like I would, I would rather say, OK, can we sit this one out and wait for a year instead, get all the stars align. So what, what will we need to do? We will need to make sure that our data is accessible, well structured, that we have some form of a grip of what are the main queries that we need answering. And then if and when a start up from Y Combinator or maybe one of the big tech companies bring something that solves this problem.

36:00

So you can really feel like you can find the information in your entire internal knowledge world the same way as you can just ask strategy BT for something. When that technology comes about that has get that gets everything right, the right mix of graph database, vector database, keyword search, index, look UPS, ranking, re ranking, synthesizing.

36:20

Then you plug that one in. Yeah, I still, I would have probably, I gave this advice two years ago and I would still give it to companies like sit this one out, wait until someone solves and productizes it, and focus on the things you need to actually get right data use cases. What are the main queries and what are the. Yeah. Where are the biggest levers of optimization for you as a company? Yeah. I mean, I feel like there's never been as much FOMO as there is right now.

36:50

Yeah, right. And I think I can kind of think of how to explain it. Proof of concepts are quite easy, right? Models are decently accessible. They're getting more and more accessible, giving access to proof of concept data set to prove value of a certain business case without even looking at cost factors. Because especially in larger organizations, cost doesn't come to play in proof of concepts yet.

37:12

Especially if you look at like say your IT landscape versus using everything that's out there. If you can do everything that's out there in a proof of concept, you can get something decently up and running. And then that's the next step is like, OK, user adoption is like a huge thing. If people have this ingrained way of working and they do that over and over again, it's like their pride. That's what they're there for, That's their job. And you come in with a tool and

37:34

that's like 75% accurate. Now let's see if you're more productive. People will be like, what is this shit? And even though it might be accurate, let's say from a theoretical sense, if the linguistics are not the same, if the tone of voice is not the same, people that be like, it doesn't work for me. They'll throw it in the trash faster than anything else because they also see that it's just another tool.

37:54

And then instead of going from 14 screens, they now have a 15th screen that they sometimes have to do something with and it's just another tool in their tool belt. I feel like that is the most underestimated part is the adoption sense. There might definitely be really good use cases in organizations. And I do agree that taking something off the shelf versus like really looking at the context that you have, leveraging that to make a more

38:16

optimal solution. I think the second one is better, but I also recognise that not a lot of people, not a lot of organizations are equipped to be able to execute to the high standards that it needs to to succeed in that way. For me, I feel like that's the bigger challenge. We saw that, at least in consultancy, a lot of organizations have this FOMO fear, like fear missing out,

38:37

basically. I don't know why I said FOMO fear, but in any case, they're like, we need to do something with AI because our competitors are doing something with AI. And if everyone has that mindset, then indeed it UPS this fear of missing out completely. And then people were looking at their own organizations and seeing what's out there. And then chat bots was like the fast food Jenny I solution that a lot of companies were building themselves.

38:57

And then I asked the same exact question that you did. Why are you building this yourself? Like there are companies and there are start-ups that this is the problem. How do we distill context from an organization and create tooling for that to have an optimal user customer journey with regards to chat And companies think they can do that better themselves. I was like, I don't, I don't see this yet, but waiting it out.

39:19

I feel like there's also not really a convincing alternative, which means you need to start experimenting and start small and kind of incrementally grow, which I think is healthy. But then, yeah, you see a lot of failure and hopefully that's good. If there's a culture of failing is OK and we learn, then that's the best. I think there's also, of course, it's not black or white, right? But let me paint 2 extreme cases. And then there's companies all in between.

39:41

You've got companies where, and I think there's probably a vast majority of the companies out there, there's going to be dozens of initiatives going on at the same time. People are running 55, you know, hundreds of projects depending on the size of the company, of course, that that number keeps

39:59

growing. But I think that the hallmark or the the marker that I would look for is if you're in a group of managers that are responsible for a department or for a certain region or whatever and you ask all of them, what are the three overarching? Priorities for us for this year or for this quarter in a company that is very chaolic these and everyone writes them down individually and then you put them side by side, they will be very misaligned.

40:28

Everyone will kind of be in their own mind, in their own headspace. And then you've got a laser focus companies on the other end of the extreme where you could probably go at through all of the different levels of the company and you could say again, write down like what are the three top priorities for you? They will be very unified because everyone's aligned, everyone's kind of on the same radar, on the same, on the same track. And of course it starts at the top.

40:55

You need someone to just give a laser sharp vision. This is our objective, this is our vision. This is what we what this company was made for. If you're over here, then you have such clarity. People are able to now deploy a new project. They can say, OK, we are now going to go after customer support agents or customer support AI bots because it'll drastically improve our customer attention, Whatever.

41:21

The companies over here I think are the ones that you and I are familiar with where these projects actually get kicked off. And then it's one project that gets drowned in all the other things that are going on. All the other initiatives, probably they have three half finished transformation projects still going on Guilty having been in a consultancy where you then also have the access to the

41:43

executive level, right. So then there's probably stuff that gets pushed through with like you have the the rank, right. So this is an executive sponsor project, so it gets priority treatment. So then others have to wait. That's frustrating in that chaos projects. It's very hard to get a project to to succeed unless you make it absolutely minimal, absolutely simple. Yeah. But just whipping up a company wide knowledge agent there, tough one. Yeah.

42:13

It's not good enough. I feel like like you need to figure out indeed, what does it contribute to in in business outcomes and if there's no laser focus from an organizational sense, then for some how shape or form you need to figure that out bottom up right. This use case that I have, does it actually make sense with regards to cost and benefit and is going to be enough time saved for people to actually work to

42:35

adopt it? Like is it going to stand true or do we need to indeed focus on kind of the exact decision and like the highest priority cases and do we focus on that? I also wonder what would happen if you take a lot of the companies and you just said, OK, we're going to have to get rid of 50% of all of our initiatives. That's just a extrinsic requirement. We're going to crank up the like the evolutionary pressure a little bit. You do not have capacity to do all these things. So 50% has to go.

43:02

That doesn't mean that people have to go. It just means that the multitasking of the Organism as a whole has to be toned down. I think that would actually be a very healthy exercise for a lot of companies. Yeah, I'm there with you. But that's the only reason I'm there with you is because I had an inkling that was the case. And then I was responsible for product and I already inherited a road map that had like many tracks in parallel. And I was like, this is this is impossible.

43:27

We cannot do all of this in parallel. Nothing's going to work. Everything will go slower as a response to that. So then one of my, and I also thought that was my role was to say we do one thing at a time, we execute, we deliver and we move on to the next. And there will not be many things in parallel. There will be nothing because I have this ordered list and we go top to bottom and then that's it. I was very, I, I did not have a road map. I don't, I didn't have any Gantt charts.

43:50

Just made one XL sheet, it has already rose and it's very easy to follow how we do things stop to bottom that's it in it's simplest and kind of in essence priority and that's how we execute it. And in the end it was really hard to get through, but it was quite effective. I. Have this hate love hate relationship with gang charts. My first full time job, I worked for a subcontractor for the European Space Agency and we built the gang chart that plans out everything that you need to

44:18

bring to the space station. Because it was the, I forgot what the name was, but it's like the management platform for the ISS. The ISS has some pretty hard deadlines because the rocket is going up if you forgot your screwdriver. You're screwed. You can't go back. That was a good pun.

44:37

I'm kind of proud of that one. And so that one, there was no gun software for this kind of use case because gun softwares are made with this idea that you've got dependencies, you can move them forward and backward, you can make them the blocks large and smaller. But it didn't have this concept of you have launch windows and you're going to launch something and you have certain capacity. And so all of these, this linear optimization had to be custom built.

45:01

So I built gun software and I was kind of proud of it. It's super hard to build on the front end. You've got all this state management with the reactive redox and whatever, but then you see a lot of project plans and you just see all of these blocks on top of each other. And I often look at these and then I just go, yeah, this project's going to be massively delayed. And then I mean, how, why? If you then get asked, like, how do you know? I don't know, just intuition.

45:32

This is not going to work because we can't possibly do all these things at the same time. People are going to be distracted. This is just modelling what you know. There's just all the stuff that you don't know and your team of five people already has 25 active things going on just in your project management tool, plus all the stuff that you don't even model.

45:48

No way. But then if you distill it down to a gun chart where you would really say, OK, we have 3 objectives and that's the most we can paralyze. So when we finish those, then we start with the next thing. Then you don't need a gun chart anymore because you always have three things. You can just write them on a list and you finish like you cross one off and then a new one makes it. So then gun charts, they come

46:13

from engineering. That makes sense if you build a skyscraper, but they don't make sense in project management in software in my mind. It sounds like a graph problem. Dependencies. Yeah, exactly. Dependencies between sorry, broke a record. Like, as a final thought, I was wondering what you both think of this because for me, newer technology pops up, right?

46:33

We have, first of all, newer models that just keep popping up. There's tools that are trying to solve organizational dysfunctional behaviour, but also behaviour that like alleviate certain problems that you didn't even know you had. And all of a sudden now with tooling and technology, it might be a problem or it might be a solution that's offered to you. Which means me as an individual or let's take me as a software engineer.

46:55

I have a challenge with regards to my learning journey like you with regards to Gantt charts. I'm not going to learn everything at the same time incrementally. I like to focus and prioritize. But then also from a software engineering hat, I need not just the theory, I need to be able to walk the walk. So I also need to build stuff. And I feel like, especially nowadays, I feel like I need to build more. I don't feel like I'm building

47:14

enough. How do you guys learn or familiarize, familiarize yourself with new topics with regards to executing and then trying to be kind of an authority on that topic? How do you handle that? Good question. Yeah, so I, I started being a product manager about a year ago and I was an engineer before solution architect doing a lot of hands on work.

47:36

And what I found myself doing recently is instead of writing an RFPI just vibe called something which is really fun because it kind of helps you think about a problem one step ahead. And it also is an opportunity to get in touch with new technologies, right. If you're building something front end, you can try a

47:51

different stack every time. So I think coding something, vibe coding, whether it's vibe coding or actually learning like a language, whatever, just having these tiny problems in your life and thinking about those in a software way, could we could really help, yeah. What have you Vibe called it lately? What have I vibe coded recently? Actually something really fun. So we're building a cipher

48:14

editor. So for querying the graph, and what I wanted to do is I wanted to have intelligence suggestions based on the questions to the user. So, OK, a little bit context. OK, so I'll keep it short. We have for the graph, we have a cyber interface which is queries. And then we have a natural language interface where you can type normal text and it translated to a graph query using an LLM. Love that. That's really cool.

48:36

So you can use and you don't need to learn, don't need to learn the graph language because you can just use natural language. But there's still the problem of looking at a blank screen. It says natural language input right here. What am I going to write? You know, that's I need to think about what is the question? So what if I could suggest questions automatically by looking at the user's schema, by looking at the type of chart

48:59

they want to create? So I've coded this super quickly, this little demo that generates that natural language queries that then could also be translated to me for jqueries. So you kind of have this automated question generator for for graphs. Yeah, yeah. Is that going to make it in? Because that's very much like it's very tangible to go then from this idea that I have, look, here's how it works to making it even in the product at the end. Let's see.

49:23

Yeah, let's see. I hope it makes it in. I think it's, I don't know it it as an engineer, it's always fun to do these things. But yeah, it does help you think and it helps you reach your limitations. Like OK, maybe I should have shorter questions. Maybe I should look forward questions, check in at two months and then see if it's in the product and try it out. Yeah, what about you post call? What have I coded or how do I learn? How do you learn?

49:49

I think I actually have the the inverse problem, so I'm very curious about just trying all sorts of things, but I realize I can't spend my whole day trying things because I'm being paid to run a team and, you know, lead engineer.

50:05

Do the job. So what I want to make sure as I time box my kind of curiosity journeys to a certain amount of time where I say, OK, I allow myself 10 percent, 15% of my time budget or say, and then whatever I do after work hours that sometimes that goes out of hand, but you know, whatever. And then within those boxes I just follow whatever I feel like. I think you can say I'm going to be really structured about it and I'm going to do this.

50:35

I'm going to build an agenda of my learning journey and I'm going to do it like in a university, but I hate it. I hate it. Structured learning. I much rather go off into letting my curiosity be my guide, and then I just hope that I'm curious about things that on average, other people are also curious about. Statistically speaking, it's likely right.

50:56

I'm most likely going to be interested in something that other people are also interested in, because it's very unlikely that I'm in the 0.001% of people that only like this one weird little niche corner. So I'm going to try and learn, or I'm going to be interested in a topic that other shows find interesting. I'm going to learn about it. I'm driven by my intrinsic curiosity, so I'm going to go deeper and I'm going to have more fun on it.

51:17

Time's going to go fast, quickly, and I'll learn more quickly. Yeah, I think learning you shouldn't overthink in a way of this is a job, do your job. And then just learn whatever you find cool. Like right now I find multi criteria decision analysis and stage gate modelling really interesting. So I'm trying to think of can you model a lot of business problem as like a multi stage gate system with MCDA ranking at each level? I don't know why I find that

51:52

interesting. Just an article somewhere random. Yeah, I think it reminds me of decision theory in university. And I think it's what we do is we rank drugs, right? So we have to figure out which of the 60 million do we want to do. But I saw a podcast you had where you were talking about people asking how do I get like a high paying remote job from the US and how do I get in there? I think another big one in Europe is where do I live?

52:17

People constantly I'm in this bubble where everyone talks about which is the best city to live in. London is Amsterdam is a Paris. Should I move to New York? Should I try the Asia? So this is a ranking problem. Which is the best city for my unique situation? What is my criteria grid and what is the weight of each criterion? And So what is my personalized ranking for the all options of all cities?

52:41

So that's a multi criteria decision problem and there's many, many like it. What is the stock that a company should invest in? What is a compound that a pharmaceutical company should put more money into and which should they kill? We have the same problem, so I find the class of problems interesting, and so I'm reading

52:58

up in the literature. If I have to force myself to read that literature because it's on my schedule now, I would not read it, no. I feel like if I ever need like a, a project or like I'm not aware of any problems that I can solve. I just need to have a conversation with you because I feel like you just, you just have many already, the one where you know what's for my use case, the best city to live in. I think that's fascinating.

53:20

I don't know if there's a tool out there somewhere, but I feel like that solves the problem that many people have. That's a funny one. I think once you're out of high school or university, you have this, this possibility now to guide your own learning journey. We were all forced to follow an agenda where 9th grade, we have to learn about Napoleon. You don't. Who says every 9th grader in September is interested in Napoleon at that point in time in their life, in their unique journey?

53:47

Doesn't matter. It's on the agenda. General knowledge. Yeah. As an adult now, you get to just do whatever you want. Just make sure you time box it so that you also get to still do the things that other people expect from you. Yeah, yeah, I love that. Cool man. Thanks, guys. This was a lot of fun. I think it kind of kind of went everywhere. I do have a better picture of graph databases, knowledge graphs. I'm definitely going to have

54:09

many conversations, I think. I hope also after the show see if I can build some stuff with the rest of that because I think it's fascinating. It wasn't necessarily the most structured. These are all three topics conversation, but that's what graphs are all about. They're they're network, they're messy, they're kind of chaotic. Good stuff, good mess. Then we're going to round it off here. If you're still listening, leave a like. If you like the episode, they're

54:30

free. They only take a second and otherwise we'll see you in the next one.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript