KCAA: Inside Analysis with Eric Kavanagh (Sun, 9 Jun, 2024)

00:00

As a ride. The world is teeming with innovation as new business models reinvent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at inside analysis dot com, insideanalysis dot com. And now here's your host, Eric Kavanaugh. Ladies and gentlemen, Hello, and welcome back once again to the Blurer Group's webinar series. I'm very pleased to have a very special guest today with

00:35

us. A former Gardner analyst and all around expert has been in the field for quite some time, Sumid Palace with us today and we're going to talk about knowledge graphs and the whole concept of a center of excellence. And if you were here in the pre show, I was chit chatting with Suman about all the different applications of knowledge graphs, in particular within the context of all this generative AI and beyond that, foundational models. AI is just one flavor

01:00

of artificial intelligence. There are many other forms of AI. But I think, as we've been discussing, by and large, we're going through a major transformation in enterprise software now and in how business gets done. Quite frankly, and these AI models are going to subsume most of traditional enterprise software sooner or later. Some of the low hanging fruit is definitely in the customer service space,

01:26

obviously in copyrighting and content creation things of this nature. But you can rest assured that business intelligence, analytics, most of what enterprise software does, is going to find itself in the crosshairs of these foundational models. And here's the good news, folks. Graph especially knowledge graphs, are extremely valuable at

01:49

being able to true the wheel of GENAI, if you will. In other words, whether it's part of a RAG architecture and that's probably the most of what you'll see, or some other type of implementation perhaps fine tuning, knowledge graph provide a tremendous foundation of the concepts and constructs and key ideas that you're

02:08

trying to manage with your enterprise. So with that, I'm going to hand it over to Submit Paler for my long intro there knowledge graph Center of Excellence, take it away, Submit, Thank you, Eric, thank you for the introduction and give giving us the opportunity to discuss about knowledge graphs and Center of excellence around it. Hi, my name is Summith and I'm delighted to

02:28

have all of you here. My role here at Autotech's straddles between marketing, pre sales and solutions engineering to bring in thought leadership across various industries for adoption of knowledge, graph and semantic technologies. So let's I'd like to welcome you to this buffet, the data buffett. Right. This is a picture from Matt Turk's Mad Landscape twenty twenty four. I'm not sure how many of you have seen it. There are about twenty four hundred logos here on this one

02:55

single chart. It's called Mad for a reason because it covers machine learning, AI and business intelligence and data and data. And what we see here is the data ecosystem today is crowded with shiny objects, dazzling buzzwords, and data ecosystems have sort of become data jungles where data teams are struggling, grappling with the high entropy in this ecosystem to create in this disaggregated system a functional modern

03:24

data experience. What is happening as a result of this is that data skills have become sort of the most sought after skills, and job descriptions are shifting as quickly as the new tools hit the market. This unbundling of the data ecosystem has led to the problem that there is no one end to end, two causing teams to causing data teams and data personas toduct tape, different products and frameworks to build these end to end automatic or automated, agile and repeatable

03:55

data driven systems. Let's look at this data maturity spectrum right. We call it the DEKW pyramid, which will show in the next slide. All organizations today are on this data journey, but it turns out that almost every organization is stuck at the information layer. They find it very difficult to cross that chasm from information layer to the knowledge layer. They have no dearth of data.

04:23

The data is hidden in data lakes, lakehouses, data warehouses. There's lots of data, but it lacks context to make the connections across these data sources, across these data units to gain valuable insights. So we have this thing where we say data data everywhere, not a drop of insight, neither drop of context. We have heard about big data, wide data and so

04:47

on, but actually no one talks about the context aspects. And as we get more and more data, the context gets diluted unless it's managed well. So this is the dikw pyramit and what we see is most organizations just stop at that information layer. There are some organizations that can make this transition to the knowledge layer through the use of graphs and knowledge graphs. And the critical step, the critical misstep, I would say is in adopting AI and data

05:23

technologies is the disconnect, the major disconnect between business needs and technology. Organizations rush to embrace technologies without clearly defining the problems they are going to solve. They bring in different technologies in a more bottom up approach instead of taking a top down approach to solve business use cases that leverage data and the technologies, and hence miss the fact that they are building. They are building solutions that

05:53

often don't serve the business. And this is where we come up with this term called bad data tax. In this race to become data driven, most of the efforts of most organizations have resulted in a tangled web of data integrations often point to point and integrations, as well as reconciliations across these data silos, and this has resulted in a huge cost to the organization, often forty to sixty percent of an enterprise's annual technology spend, which amounts to millions in

06:29

dollars. We call this the bad data tax, and these investments often don't translate into insights needed to deliver better decisions or build better processes, and hence there is a solid justification in every organization to fix this so that those who need access to the data can convert it to insights and drive business decisions and processes, and make data available and accessible in the right format in the that

07:00

is flexible, accurate, and machine readable. It's been also seeing that, you know, from my experience at Gartner when I used to talk to customers and clients all the time, it seems that only half of the CEOs are able to drive innovation using data and just about forty percent of the CDOs manage data as a business asset. These are some of the numbers that we have

07:27

seen. Also comes to my mind is about sixty about sorry, about twenty twenty five percent of the CDOs also do not have a single point of accountability for data within their organizations. This was a survey done by sales Force a year back where they illustrated this point that ninety percent agree from the data teams that the need for trustworthy data is higher than ever and today it's even more because it's again the whole idea about garbage in garbage out. You're feeling if

08:03

you're feeding garbage to your AI technology, going to get garbage out. So this sort of summarizes this mind map sort of summarizes what we discussed in terms of challenges that most organizations are facing. One of the biggest challenges you'll see the one which is right to the buttom is also about findability most organizations. Most data personas in most organizations cannot link the data, cannot find the data

08:28

where because of lack of context. And McKenzie and IDC had done our research about a few years back where they found out most data personas in organizations spend about thirty percent of their time just finding the right data for their use case. So if we ask ourselves these questions, right, why do most data like efforts fail and why is data getting increasingly harder to find? Why existing data catalogs are not working for most enterprises. The reason is to be effective

09:01

to work with data, it's not just technology and more data. It requires context and semantics to make the data more powerful. What has happened is we are allowed the data to lose consistency and precision of meaning because most organizations haven't thought about context and semantics as they build their enterprise data platforms. So data

09:30

by itself is powerful, but the challenge is again the context. As data has grown in volumes, the need for automated context has grown as well, and that's why organizations cannot aspire to or cannot dream to become data or AI driven unless, in my mind, they are context driven and data context here

09:54

includes both business technical metadata, governance, privacy, access ability issues. It is context that makes data more valuable okay, and in my opinion, data engineering will still remain a huge cost center for most organizations until it matures from becoming a peer ETL oriented or an ELT oriented approach to an ECL oriented approach. Where C is the context by leveraging knowledge graphs and ontologies for knowledge management.

10:35

So what is a knowledge graph? Knowledge graph is sort of a network of entities representing real world domain objects like people, organization, as well as concepts, topics and their semantic relationships and attributes. Knowledge graphs with their emphasis or with their more stress on semantic relations between the entities, creates the text

11:00

for both humans as well as machines to do automated reasoning. And knowledge graphs go beyond just simple storage and querying of data, and it focuses more on the idea of definitions of the connections between the entities and as you will see, it requires connecting the dots across most of the organization where most organizations struggle, and knowledge graphs help to build this foundational semantic graph layer by semantically linking

11:31

the data across the silos, whether it's structured, unstructured, semi structured, and that reduces or eliminates bottlenecks in the process of becoming data driven. What makes the knowledge graph useful and powerful is that semantic model the ontologies, the taxonomies that includes domain concepts and their inter relationships, their hierarchies, their dependencies. It is this semantics that actually enriches data with the context that both machines

12:05

and humans can interpret unambiguously. So knowledge graphs actually give a holistic view of the data, revealing these intricate hierarchies within the data the precise definitions that gives meaning to the data. As I said earlier, data by itself is powerful, but context is what gives the meaning and real value to the data, you know, real quick. So, mat, we got a question from the audience, and it's a very good question about data literacy, it seems

12:39

to me, and you've already mentioned data catalogs. Obviously, data catalogs are great for improving data literacy because the whole point is to capture the definitions and the meanings of these concepts and to share them in a useful, accessible way. But it seems to me a knowledge grab is an incredibly powerful tool for

12:56

improving data literacy. What do you think? Absolutely, absolutely, that's a very good question and very good point that knowledge graph has embedded in it that semantics and the context, as I keep saying again and again, which are again very domain specific. A lot of the times, the data literacy aspect is understanding the data and understanding the nuances associated with the data, which is oftentimes embedded in the heads of domain specialists, data stewards, even data engineers.

13:30

They are encoding all those business rules in their SEQL code and that sort of you know, decouples the data from the logic, whereas in a knowledge graph, where everything is connected in a knowledge graph. The data and the metadata are in the same place. That provides enormous benefits from a data literacy perspective. Spot on mm hmm, okay, go ahead. So the knowledge graph platform adds semantics and meaning to the data, as we said, but

14:03

how does it do it right? It treats the connections between the different entities as relationships are first class citizens using in a knowledge graph or in a graph based technology, using nodes, edges, and labels to depict these entities their inter relationships and properties. It's this semantic layer that comes out from it that contextualizes the data, giving it meaning, a formal representation and meaning, making

14:31

it machine interpretable. The advantage other advantage of knowledge graphs, especially built with RDF stack is it follows open standards. Everything in a knowledge graph that is built with an RDF stack is thereby reusable, it's interoperable, and it's very

14:52

amenable to data sharing with the with unambiguous semantics. Also, the other aspect of graphs is graphs can be very easily accommodated to change to make changes because enterprise data systems are always changing, and graphs provide that flexibility of flexible schema to make adjustments and changes. This shared meaning resolves a lot of the ambiguities

15:18

associated, especially when you're building let's say, simple data pipeline. As you're handing off data or bringing in data from operational systems to analytical systems you utilize, there is a sort of an impedance mismatch happens where the terms can mean something else in the operational systems while it means something else in the analytics system That's where knowledge graphs, with their ideas of semantics, with ontologies and taxolomies,

15:43

removes these ambiguities. Now, there are different ways in which semantic technologies increase the value of data. First, it helps in data integration. It's not we all do data integration. Data integration is probably one of the richest tools you would see in that first slide or the third slide that I showed you. But those are all doing data integration with a lot of the code, with a lot of the business logic in the minds of the people who

16:08

are implementing it. But with knowledge graphs, we do what is called semantic data antigration, where you are semantically joining the data across different data silo. That's why you will see a lot of the data fabrics are powered with knowledge graphs. The second aspect is data quality as it captures relationships between things and

16:30

adding context through ontologies as well as doing inferencing through the relationships. And a side benefit of this is doing entity resolution so we don't have to duplicate nodes or DeDuplicate nodes in a graph. A knowledge graph built with our dear stack, you cannot have duplicates. The system will not allow you to have duplicates. It removes that whole data engineering aspect of you know, where we do

17:00

d Duplication is a huge set of work that needs to be done. But with knowledge graphs, when you're ingesting data into a knowledge graph, especially with an RDF stack, it will prevent duplicates from happening. So all this leads to trust with data validation, lineage and provenance out of the box, a knowledge graph built with RDF provides you provenance. Think about the tools that provide you provenance and lineage how much work they have to do to provide you the

17:29

whole lineage stack. While with knowledge graphs built in version in capabilities and capabilities to do provenance, there is a provenance based ontology. It comes to you out of the box, same thing with what we call fair principles. Data in our opinion, needs to adhere to these fair principles, which is findable, accessible, interoperable, and reusable. And knowledge graphs provide you with all

17:56

these capabilities. So this light sort of some arizes all that we discussed earlier about the capabilities of knowledge graphs to remove ambiguities, to represent data consistently, and integrate and unify the data sources. Now, this graph foundation, this

18:15

as what I've been talking about. Graph foundation with a knowledge graph with based with taxonomies and ontologies, enables you to do these things that that we have all been doing in a much more seamless, much more cost effective way where you have to where you get these new capabilities out of the box and these how does a knowledge graph do it? It does it with these semantic standards.

18:44

Semantic standards have been there since the beginning of the century or end of last century, where it uses identities to represent entities to represent the concepts and all these result in these foundational capabilities that are very essential to the whole data management practice to provide things data quality with validation capabilities, of doing, of reusing, of governance and lineage, and those on the right and side you

19:15

see the valued drivers of all these foundational capabilities. Now, this is the outline of an enterprise knowledge graph platform and how it interplaces, how it interconnects with the different tools and engines that we have in most organizations in their legacy systems, and it supports two major design patterns here, the semantic knowledge hub and the semantic data fabric. The knowledge hub part of it uses knowledge graphs

19:45

to manage documents and unstructured content. Unstructured content is all around us and improves the way the documents are found, especially with the relevance and with their precision and accuracy. The data fabric side, the semantic data fabric side, is the pattern that provides better unified access across multiple structured or semi structured data sources and its objective is to enable twarying all of them as if it's a single

20:17

federated database or a data source in both these use cases. Both these use cases leverage the semantic metadata, which, as you see in the center, is the conceptual model, which is based on domain specific ontologies and other metadata capabilities. When you do this, it makes data much more discoverable, interpretable, unambiguous and also consistent. And at the bottom we see to manage these

20:49

platforms we need different engines and capabilities. We include integration with llms, with machine learning tools, specially with text analytics, document store maybe full text search engines and vector databases for doing and again integration with other LPG based graph sources

21:08

to do graph analytics. And these are some of the major use cases that utotext has been doing in the last twenty years to solve some of these data management problems along with the high level architectural patterns that support those use cases. Now show you what Some of the next slides are mostly quotes which i'd quickly

21:34

go through. You can read them once once these decks are shared. However, there is one particular example where even Gartner now has been talking about in the last five seven years about the value of graphs and knowledge graphs as well as the semantics and the metadata needed to be become successful with your data management practices. And if you have seen Gartner's two thousand and four Impact Radar, you see knowledge graphs at the center here where they're talking about knowledge graphs with

22:03

the metadata aspects how knowledge graphs can help with GENAI and LMS. One particular example I'd like you to look at and this is available even on YouTube if you search. In a knowledge graph conference twenty twenty two twenty three, Gregor womb from he's the head of Data Architecture at UBS spoke about implementing their next

22:26

generation data management based on this foundational graph layer knowledge graph layer. They're doing all the things that sort of I highlighted in the above slides, building common data models and ontologies with the unified meaning that could be shared across the organization. They built this with schema dot org with all the shared models and schemas that were standardized across the organization. They built a data service to enrich their

22:53

data by converting it into a knowledge graph based on these shared ontologies. They cataloged all their data assets to build this conceptual layer and map data to this player to power their downstream applications, analytics and data products. So, before I finished just the last two slides, where what is this graph center of excellence? Right? So how can something like this be implemented in larger organizations?

23:25

Most large organizations also have few isolated graph projects for specific niche projects and use cases like fraud detection or recommendation, but this data management case is very

23:37

different. We see companies like UBS which have gone down this path and they have adopted this graph center of excellence with strategic prioritization of graph use cases across the organization with c suit sponsorship to start with a single sea level executive t who champions the strategic vision to build or bring in knowledge graph based approaches to solve the data problems. And finally, these are some of the key takeaways.

24:14

In order to become data driven in the age of AI requires organizations to shift from to shift to a more connected and contextualized aspect of thinking about their data with graph technologies as the foundational layer for their modern data management requirements because graph enables semantic data integration, traceability, ambiguity, resolving ambiguity, ambiguous entities, promoting sort of consistency, sharing, reuse, and following the fair data

24:48

principles to connect the dots across the organization with this semantic graph layer using ontologies and taxonomies and control vocabulary which are domain specific models for the organization. That was in short about a quick run through of some the graph approach. That's pretty impressive. We've got a couple questions that I know you have to run in a few minutes here, but a couple of quick questions. How does a graph maintain lineage to the source once it's been loaded? Is that in

25:21

the relationships between the entities or how do you actually preserve lineage? Yeah, so if you have to preserve lineage, then both yours. There is a

25:33

specific ontology that can be incorporated into the graph. It's called provo. Provo is a is a well known, publicly available ontology that is based on the RDF stack that, when incorporated into your knowledge graph, can help you to do sort of what current data lakes also are doing in terms of doing time travel, in terms of doing versioning, to keep tag of how the data is morphing as it is moving through your system, because remember, your data

26:06

pipelines are not getting replaced by data by knowledge graphs or the graph lift. The graph letter is just the metadata around all your data workflows. This metadata will help you enrich your provenance, lineage, time tracking, versioning, those aspects of it. Yeah right, I mean, I'm reminded of how we got here with the whole concept of data warehousing, and years ago we realize that you cannot analyze, you cannot run queries on operational systems very effectively.

26:37

So we pulled out key aspects transactional data from these systems, but in doing so, stripped out all the context. And what you're suggesting here is that the graph, especially via a center of excellence, can preserve all of that context, which can then be leveraged downstream in any application, whether it's a data warehouse or a bi tool or just some front end that a user has to get access to information about customers or products or whatever. Is that about,

27:06

right? Exactly exactly, Because as we move from one system to other, right, as we are making a lot of organizations are copying data, and copying data is bad, right, And as we're moving data each of these handshake points, there is a lot of information that is getting lost in translation, and the knowledge graph, the graph foundation layer, helps to capture them and interconnect them. What has happened, why it has happened, and what was the reason it happened? Right? So you're spot on, Eric

27:37

in terms of that kind of an interpretation. Yes, yeah, We've got a couple more questions here. One was could you explain the difference between a data vault approach, a data about modeling approach, and a knowledge graph. That's a pretty long discussion in terms of a data vault versus a knowledge graph. However, please reach out to us, would say, it's an interesting conversation. Reach out to us and we'll be able to explain to you how

28:07

we are. What we are saying is, yes, the graph model with the found with the knowledge graph and an RDF, So knowledge graphs can be built with an rd OF data model or an LPG data model. RDF data model by itself has the semantics and the context associated with it. Now the data vault model or any of the other data modeling they are more at the logical layer, while the while the knowledge graph is more at the foundational layer

28:37

or at the implementational layer. Interesting, We've got a bunch of good questions coming here. I'll just throw a couple more at you with the time we have left. One attendee is asking, how can we ensure the scalability and efficiency of querying and updating large scale dynamic knowledge graphs while maintaining data consistency and minimizing latency. I mean, how do you maintain performance over time. Now, one of the things is, so don't think about bringing copying your data

29:11

from your sore systems into knowledge graphs. Right. Knowledge graphs are more about the metadata aspect of it. Right when you're doing graph traversals or quertying a graph where you have billions of nodes. A lot of our customers have have knowledge graphs where the metadata as well as the instance data is runs into millions of nodes and edges. However, your knowledge graph is not your ol app system where concurrency and latency is like that deadly combination where you need high concurrency

29:45

and low latency. Right here, you are doing most of the traversals or most of the quadring of the knowledge graph for figuring out the dependencies, for figuring out, as maybe one example, how the whole traceability happens, how the lineit happens, as you have to traverse the graph. But you're not doing aggregations right where you are rolling up data or doing a lot of analytical

30:10

type all app type reporting type querdits. Gotcha, there's one last question here as well, and folks, for any unanswered questions, we will pass these on to submit. He can get back to you to be able to design and implement knowledge graphs. Do I need to learn graph databases or vector databases? Well. Vector databases of course are purpose built really for large language models, and they represent entities like concepts, people, et cetera. They represents

30:45

as vectors, so as memories will formula. Right, So there's always going to be a lossy capacity with storing something as a vector because you're converting it from text or imagery into a mathematical function. So they're different, they're different entities. They do a lot of overlap, but final thoughts from you as soon it go ahead. Yeah, they are totally different things, right.

31:07

Vector databases are mostly to store the embeddings from your concepts, from your data from every instance data into an embedding format and store it and retrieve it quickly. Retrieving it quickly is what vector databases do well because of support for indices and all that. Now, a lot of the existing databases, whether it's relational or non relational, are also adding vector capabilities. Right. Graph databases

31:33

are also adding vector capabilities. We at autotext integrate with vector databases. Choose your own or bring your own vector database and we'll integrate. If you have to store like that's the way we do graph FRAG, where we build on top of RAG with graphs store the embeddings in an external vector database. We could add vector databases on our own, but right now we don't see a huge need for it. But vector databases and graphs are totally separate things.

32:00

Yeah, and the last question, I'll answer this you tell me if I'm wrong, and I tend to ask, is it a separate knowledge graph ontology database for each use case you deploy? Not necessarily, I mean it could be. There are different ontologies that make sense for different chunks of data basically, but you do want to reuse the ontologies that you deploy right real quick.

32:22

Absolutely, Yes, different use cases can have an overlap of different ontologies and that is where the knowledge graph platform is meant for sharing for reusing. But again, if it's a siloed use case, you could definitely use it. Wow. This has been fantastic. Thank you so much for your time, seeming us to jump to a customer call. Customers always come first, but thank you so much for all these excellent questions, folks. We'll be sure to pass these along to the auto techt folks. And this is the

32:49

second and a series of three. So I have one more webinar coming up on the bad Data Tax. So and if anyone wants to be on a show like this, I mean email info at dm radio dot is that comes right to me. And with that we'll bid you farewell. Folks. Thanks so much for your time and attention. Thank you, submit, we'll talk to you. Take care. Thank you, Eric, and send me the questions please. I think you will have it in the database somewhere right. Yeah, thank you, take care, folks, Bye bye, Kaca.

33:16

The information economy as a rod. The world is teeming with innovation as new business models reinvent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at inside analysis dot com, insideanalysis dot com And now here's your host, Eric Kavanaugh. Ladies and gentlemen, Hello, and welcome back once again to the Blower Group's webinar series. I'm very pleased to have a very special guest

33:52

today with us. A former Gardner analyst and all around expert has been in the field for quite some time. Suem It Palace with a today and we're going to talk about knowledge graphs and the whole concept of a center of excellence, And if you were here in the pre show, I was chit chatting with assuming about all the different applications of knowledge graphs, in particular within the context of all this generative AI and beyond that foundational models. Generative AI is

34:19

just one flavor of artificial intelligence. There are many other forms of AI. But I think, as we've been discussing, by and large, we're going through a major transformation in enterprise software now and in how business gets done quite frankly, and these AI models are going to subsume most of traditional enterprise software sooner or later. Some of the low hanging fruit is definitely in the customer service space, obviously in copywriting and content creation, things of this nature.

34:50

But you can rest assured that business intelligence, analytics, most of what enterprise software does, is going to find itself in the crosshairs of these foundational models. And here's the good news, folks. Graph especially knowledge graphs, are extremely valuable at being able to true the wheel of GENAI, if you will, In other words, whether it's part of a RAG architecture. And that's probably the most of what you'll see or some other type of implementation, perhaps

35:19

fine tuning. Knowledge graphs provide a tremendous foundation of the concepts and constructs and key ideas that you're trying to manage with your enterprise. So with that, I'm going to hand it over to submit Pal for my long intro there, knowledge Graphs Center of Excellence, Take it away, Submit. Thank you, Eric, thank you for the introduction and giving us the opportunity to discuss about knowledge graphs and center of excellence around it. Hi, my name is Simmith

35:45

and I'm delighted to have all of you here. My role here at Autotext straddles between marketing, pre sales and solutions engineering to bring in thought leadership across various industries for adoption of knowledge graph and semantic technologies. So let's I'd like to welcome you to this buffet, the data buffey right. This is a picture from Matt Turk's Mad Landscape twenty twenty four. I'm not sure how many of you have seen it. There are about twenty four hundred logos here on

36:14

this one single chart. It's called Mad for a reason because it covers machine learning AI and business intelligence and data and data and what we see here is the data ecosystem today is crowded with shiny objects, dazzling buzzwords, and data ecosystems have sort of become data jungles where data teams are struggling grappling with the high entropy in this ecosystem to create in this disaggregated system a functional modern data

36:44

experience. What is happening as a result of this is that data skills have become sort of the most sought after skills, and job descriptions are shifting as quickly as the new tools hit the market. This unbundling of the data ecosystem has led to the problem that there is no one end to end two causing teams to causing data teams and data personas to do tape different products and frameworks to build these end to end automatic or automated, agile and repeatable data driven

37:15

systems. Let's look at this data maturity spectrum right. We call it the DEKW pyramid, which will show in the next slide. All organizations today are on this data journey, but it turns out that almost every organization is stuck at the information layer. They find it very difficult to cross that chasm. From information layer to the knowledge layer. They have no dearth of data.

37:40

The data is hidden in data lakes, lakehouses, data warehouses. There's lots of data, but it lacks context to make the connections across these data sources, across these data units to gain valuable insights. So we have this thing where we say data, data everywhere, not a drop of insight, neither drop of context. We have heard about big data, wide data and so

38:07

on, but actually no one talks about the context aspects. And as we get more and more data, the context gets diluted unless it's managed well. So this is the dikw pyramit. And what we see is most organizations just stop at that information layer. There are some organizations that can make this transition

38:30

to the knowledge layer through the use of graphs and knowledge graphs. And the critical step, the critical misstep, I would say is in adopting AI and data technologies is the disconnect, the major disconnect between business needs and technology. Organizations rush to embrace technologies without clearly defining the problems they're going to solve.

38:54

They bring in different technologies in a more bottom up the instead of taking a top down approach to solve business use cases that leverage data and the technologies and hence missed the fact that they are building They are building solutions that often don't serve the business, and this is where we come up with this term called

39:16

bad data tax. In this race to become data driven, most of the efforts of most organizations have resulted in a tangled web of data integrations often point to point and integrations, as well as reconciliations across these data silos, and this has resulted in a huge cost to the organization, often forty to sixty

39:40

percent often enterprise's annual technology spend, which amounts to millions in dollars. We call this the bad data tax, and these investments often don't translate into insights needed to deliver better decisions or build better processes, and hence there is a solid justification in every organization to fix this so that those who need access to the data can convert it to insights and drive business decisions and processes, and

40:14

make data available and accessible in the right format, in the format that is flexible, accurate, and machine readable. It's been also seeing that you know, from my experience at Gartner, when I used to talk to customers and clients all the time. It seems that only half of the CDOs are able to drive innovation using data and just about forty percent of the CDOs manage data as a business asset. These are some of the numbers that we have seen.

40:46

Also comes to my mind is about sixty sorry, about twenty twenty five percent of the CDOs also do not have a single point of accountability for data within their organizations. This was a survey done by sales Cource a year back where they illustrated this point that ninety percent agree from the data teams that the need for trustworthy data is higher than ever and today it's even more because it's again the whole idea about garbage in, garbage out. You're feeling if you're

41:22

feeding garbage to your AI technology, going to get garbage out. So this sort of summarizes this mind map sort of summarizes what we discussed in terms of challenges that most organizations are facing. One of the biggest challenges you will see,

41:36

the one which is right the button, is also about findability. Most organizations, most data personas and in most organizations cannot link the data, cannot find the data where because of lack of context and McKenzie and IDC had done our research about a few years back where they found out most data personas in organizations spend about thirty of their time just finding the right data for their use case. So if we ask ourselves these questions, right, why do most

42:08

data like efforts fail and why is data getting increasingly harder to find? Why existing data catalogs are not working for most enterprises? The reason is to be effective to work with data, It's not just technology and more data. It requires context and semantics to make the data more powerful. What has happened is we are allowed the data to lose consistency and precision of meaning because most organizations

42:38

haven't thought about context and semantics as they build their enterprise data platforms. So data by itself is powerful, but the challenge is again the context. As data has grown in volumes, the need for automated context has grown as well. And that's why organizations cannot aspire to or cannot drink to become data or AI driven unless, in my mind, they are context driven and data context here includes both business technical metadata, governance, privacy, accessibility issues. It

43:20

is context that makes data more valuable. Okay, and in my opinion, data engineering will still remain a huge cost center for most organizations until it matures from becoming a peer ETL oriented or an ELT oriented approach to an ECL oriented approach. Where C is the context by leveraging knowledge graphs and ontologies for knowledge management. So what is a knowledge graph? Knowledge graph is sort of a network of entities representing real world domain objects like people, organization, as well

44:02

as concepts, topics and their semantic relationships and attributes. Knowledge graphs, with their emphasis or with their more stress on semantic relations between the entities, creates

44:17

the context for both humans as well as machines to do automated reasoning. And knowledge graphs go beyond just simple storage and querying of data and it focuses more on the idea of definitions of the connections between the entities and as you will see, it requires connecting the dots across most of the organization where most organizations struggle, and knowledge graphs help to build this foundational semantic graph layer by semantically

44:50

linking the data across the silos, whether it's structured, unstructured, semi structured, and that reduces or eliminates bottlenecks in the process. Of becoming data driven. What makes the knowledge graph useful and powerful is that semantic model, the ontologies, the taxonomies that includes domain concepts and their inter relationships, their hierarchies, their dependencies. It is the semantics that actually enriches data with the context

45:22

that both machines and humans can interpret unambiguously. So knowledge graphs actually give a holistic view of the data, revealing these intricate hierarchies within the data, the precise definitions that gives meaning to the data. As I said earlier, data by itself is powerful, but context is what gives the meaning and real value to the data, you know, real quick. So, man, we got a question from the audience, and it's a really good question about data

45:57

literacy, it seems to me. And you've already mention data catalogs. Obviously, data catalogs are great for improving data literacy because the whole point is to capture the definitions and the meanings of these concepts and to share them in a useful, accessible way. But it seems to me a knowledge grab is an

46:14

incredibly powerful tool for improving data literacy. What do you think, absolutely, absolutely, that's a very good question and very good point that knowledge graph has embedded in it the semantics and the context, as I keep saying again and again, which are again very domain specific. A lot of the times. The data literacy aspect is understanding the data and understanding the nuances associated with the data, which is oftentimes embedded in the heads of domain specialists, right data

46:46

stewards, even data engineers. They are encoding all those business rules in their SQL code and that sort of you know, decouples the data from the logic, whereas in a knowledge graph, where everything is connected. In a knowledge graph, the data and the metadata are in the same place. That provides enormous benefits from a data literacy perspective. Spot on. Mm hmm, okay, go ahead. So the knowledge graph platform adds semantics and meaning to the

47:21

data, as we said, but how does it do it right? It treats the connections between the different entities as relationships are first class citizens using in a knowledge graph or in a graph based technology, using nodes, edges and labels to depict these entities their inter relationships and properties. It's this semantic layer that comes out from it that contextualizes the data, giving it meaning, a

47:46

formal representation and meaning, making it machine interpretable. The advantage other advantage of knowledge graphs, especially built with urdfstack is it follows open standards. The thing in a knowledge graph that is built with an R day of stack is thereby reusable, it's interoperable, and it's very amenable to data sharing with unambiguous semantics.

48:15

Also, the other aspect of graphs is graphs can be very easily accommodated to change to make changes because enterprise data systems are always changing and graphs provide that flexibility of flexible schema to make adjustments and changes. This shared meaning resolves a lot of the ambiguities associated, especially when you're building let's say simple data

48:39

pipeline. As you're handing off data or bringing in data from operational systems to analytical systems utilize, there is a sort of an impedance mismatch happens where the terms can mean something else in the operational systems while it means something else in the analytics system That's where knowledge graphs, with their ideas of semantic with ontologies and taxolomies, removes these ambiguities. Now, there are different ways in which

49:07

semantic technologies increase the value of data. First, it helps in data integration. It's not we all do data integration. Data integration is probably one of the richest tools you would see in that first slide or the third slide that I showed you, But those are all doing data integration with a lot of the code, with a lot of the business logic in the minds of the

49:27

people who are implementing it. But with knowledge graphs we do what is called semantic data antigration, where you are semantically joining the data across different data silos. That's why you will see a lot of the data fabrics are powered with knowledge graphs. The second aspect is data quality, as it captures relationships between things and adding context through ontologies as well as doing inferencing through the relationships.

49:57

And a side benefit of this is doing an resolution so we don't have to duplicate nodes or DeDuplicate nodes in a graph. A knowledge graph built with RDF stack, you cannot have duplicates. The system will not allow you to have duplicates. It removes that whole soft data engineering aspect of you know where we do dduplication is a huge set of work that needs to be done. But with knowledge graphs. When you're ingesting data into a knowledge graph, especially with

50:25

an RDF stack, it will prevent duplicates from happening. So all this leads to trust with data validation, lineage, and provenance. Out of the box, a knowledge graph built with RDF provides you provenance. Think about the tools that provide you provenance and lineage, how much work they have to do to provide you the whole lineage stack. While with knowledge graphs built in version in capabilities and capabilities to do provenance, there is a provenance based ontology. It

50:58

comes to you out of the box. Same thing with what we call fair principles. Data in our opinion, needs to adhere to these fair principles, which is findable, accessible, interoperable, and reusable. And knowledge graphs provide you with all these capabilities. So this light sort of summarizes all that we discussed earlier about the capabilities of knowledge graphs to remove ambiguities, to represent data consistently, and integrate and unify the data sources. Now, this graph foundation,

51:32

this as what I've been talking about. Graph foundation with a knowledge graph with based with taxonomies and ontologies enables you to do these things that that we have all been doing in a much more seamless, much more cost effective way where you have to where you get these new capabilities out of the box and these how does another edge graph do it? It does it with these semantic

52:01

standards. Semantic standards have been there since the beginning of the century and last century, where it uses identities to represent entities to represent the concepts and all these result in these foundational capabilities that are very essential to the whole data management practice to provide things data quality, with validation, capabilities of doing, of reusing, of governance and lineage, and those on the right hand side you

52:32

see the value drivers of all these foundational capabilities. Now, this is the outline of an enterprise knowledge graph platform and how it interplays, how it interconnects with the different tools and engines that we have in most organizations in their legacy systems, and it supports two major design patterns here, the semantic knowledge hub and the semantic data fabric. The knowledge hub part of it uses knowledge graphs

53:05

53:36

54:07

platforms we need different engines and capabilities. We include integration with llms, with machine learning tools, specially with text analytics, document stores maybe full text search engines and vector databases for doing and again integration with other LPG based graph sources

54:28

to do graph analytics. And these are some of the major use cases that utotext has been doing in the last twenty years to solve some of these data management problems, along with the high level architectural patterns that support those use cases. Now show you what some of the next slides are mostly quotes which I quickly go through. You can read them once once these decks are shared.

54:58

However, there is one particular example where even Gartner now has been talking about in the last five seven years about the value of graphs and knowledge graphs as well as the semantics and the metadata needed to be become successful with your data management practices. And if you have seen Gartner's two thousand and four Impact Radar, you see knowledge graphs at the center here where they're talking about knowledge graphs

55:21

with the metadata aspects how knowledge graphs can help with GENAI and LMS. One particular example I'd like you to look at, and this is available even on YouTube if you search. In a knowledge graph conference twenty twenty two twenty three, Gregor womb from he's the head of Data Architecture at UBS, spoke about implementing their next generation data management based on this foundational graph layer knowledge graph layer.

55:51

They're doing all the things that sort of I highlighted in the above slides, building common data models and ontologies with the unified meaning that could be shared across the organization. They built this with schema dot org with all the shared models and schemas that were standardized across the organization. They built a data service to enrich their data by converting it into a knowledge graph based on these shared

56:16

ontologies. They cataloged all their data assets to build this conceptual layer and map data to this player to power their downstream applications, analytics and data products. So before I finished just the last two slides, where what is this graph center of excellence? Right? So how can something like this be implemented in larger organizations? Most large organizations also have few isolated graph projects for specific niche

56:51

projects and use cases like fraud detection or recommendation. But this data management case is very different. We see companies like US which have gone down this path and they have adopted this graph center of excellence with strategic prioritization of graph use cases across the organization with c suit sponsorship to start with a single sea level executive t who champions the strategic vision to build or bring in knowledge graph based

57:25

approaches to solve the data problems. And finally, these are some of the

57:30

key takeaways. In order to become data driven in the age of AI requires organizations to shift from to shift to a more connected and contextualized aspect of thinking about their data with graph technologies as the foundational layer for their modern data management requirements because graph enables semantic data integration, traceability, ambiguity, resolving ambiguity and viiguous entities, promoting sort of consistency, sharing, reuse, and following the

58:07

fair data principles to connect the dots across the organization with this semantic graph layer using ontologies and taxonomies and controlled vocabularies which are domain specific models for the organization. That was in shot about a quick run through of some the graph approach. That's pretty impressive. We've got a couple of questions, and I know you have to run in a few minutes here, but a couple of quick questions. How does a graph maintain lineage to the source once it's been loaded?

58:39

Is that in the relationships between the entities or how do you actually preserve lineage? Yeah, so if you have to preserve lineage, then both yours. There is a specific ontology that can be incorporated into the graph. It's

58:55

called to provo. Provo is a is a well known, publicly well ontology that is based on the RDF stack that, when incorporated into your knowledge graph, can help you to do sort of what current data lakes also are doing in terms of doing time travel, in terms of doing versioning, to keep tag of how the data is morphing as it is moving through your system, because remember, your data pipelines are not getting replaced by data by knowledge graphs

59:28

or the graph let The graph layer is just the metadata around all your data workflows. Over KAA is Inland Express KCAA Homlender ten fifty am the station that needs know this year behind. You're listening to an OnCore presentation of this program KCAA the Inland Topic Express. Thank you for tuning here for this suggestion. Justice Watch with Attorneys Zulu Ali. I am Attorney Zulu Ali with a Justice Watch crew Rosa Nunez, Michael Blaud, Clark and

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript