#107 - Data Mesh: Delivering Data-Driven Value at Scale - Zhamak Dehghani - podcast episode cover

#107 - Data Mesh: Delivering Data-Driven Value at Scale - Zhamak Dehghani

Oct 03, 202257 minEp. 107
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

“If you want to unlock the value of your data by generating data-driven values, and you want to do it reliably and resiliently at scale, then you need to consider data mesh."

Zhamak Dehghani is the author of the “Data Mesh” book. In this episode, we discussed in-depth about the data mesh, a concept she founded in 2018, which has then been becoming an industry trend. We started our conversation by discussing the current challenges working with data, such as the data centralization approach and why the current data tools are still inadequate. Zhamak then described data mesh and why organizations should adopt it to generate data-driven values at scale. Zhamak then explained the 4 principles of data mesh, which include domain ownership, data as a product, the self-serve data platform, and the federated computational governance.

Listen out for:

  • Career Journey - [00:06:49]
  • Challenges Working with Data - [00:10:19]
  • Centralization of Data - [00:13:53]
  • Why Current Tools Not Adequate - [00:16:00]
  • Data Mesh & Its Drivers - [00:19:32]
  • Principle of Domain Ownership - [00:25:54]
  • Principle of Data as a Product - [00:35:57]
  • Principle of The Self-Serve Data Platform - [00:40:51]
  • Principle of Federated Computational Governance - [00:46:01]
  • 3 Tech Lead Wisdom - [00:52:23]

_____

Zhamak Dehghani’s Bio
Zhamak Dehghani works as the CEO and founder of a stealth tech startup reimagining the future of data developer experience. She founded the concept of Data Mesh in 2018 and since has been implementing the concept and evangelizing it with the wider industry. She is the author of Architecture the Hard Parts and Data Mesh books.

Zhamak serves on multiple tech advisory boards. She has worked as a technologist for over 24 years and has contributed to multiple patents in distributed computing communications. She is an advocate for the decentralization of all things, including architecture, data, and ultimately power.

Follow Zhamak:


Our Sponsors

Mental well-being is a silent pandemic. According to the WHO, depression and anxiety cost the global economy over USD 1 trillion every year. It’s time to make a difference! Learn how to enhance your lives through a master class on mental wellness. Visit founderswellbeing.com/masterclass and enter TLJ20 for a 20% discount.

The iSAQB® Software Architecture Gathering is the international conference highlight for all those working on solution structures in IT projects: primarily software architects, developers, professionals in quality assurance, and also system analysts. A selection of well-known international experts will share their practical knowledge on the most important topics in state-of-the-art software architecture. The conference takes place online from November 14 to 17, 2022, and we have a 15% discount code for you: TLJ_MP_15.

DevTernity 2022 (devternity.com) is the top international software development conference with an emphasis on coding, architecture, and tech leadership skills. The lineup is truly stellar and features many legends of software development like Robert "Uncle Bob" Martin, Kent Beck, Scott Hanselman, Venkat Subramaniam, Kevlin Henney, and many others! The conference takes place online, and we have the 10% discount code for you: AWSM_TLJ.


Like this episode?
Subscribe on your podcast app.
Follow @techleadjournal on LinkedIn, Twitter, and Instagram.
Pledge your support by becoming a patron.
For episode show notes, visit techleadjournal.dev/episodes/107.

Transcript

Mental will be is a silent pandemic. According to the who depression and anxiety caused the global economy over one-trillion u.s. dollar every year, it's time to make a difference. Learn how to enhance your life through a master class from Founders well-being. And my good friend, Sunday prowl on mental Wellness, visit Founders well-being.com / Master Class to enroll and enter tlj 24. A 20% discount be a better version of yourself and make an impact.

The ISA QB software architecture Gathering is the International Conference highlight for all those working on solution structures in IT, projects primarily software Architects developers and Professionals in quality assurance but also system analysts who want to communicate better with their developers. A selection of well-known International speakers will share their practical knowledge on the most important topics in the state of the art software

architecture. The conference takes place online, from November, 14 to 17. And we have a 15% discount code for you. Enter tlj underscore MP underscore 15 for 15 percent discount. If they want to get value from their data, generally data-driven value at the one to do that by applying analytics and AI in almost every aspect of their business. They need to utilize data for both aspects, and all touch points and all applications inside the company and outside.

If they have such a mission and they want to do that reliably resilient lie and do that at scale fast, then they've got to consider this dish. It's all about really are locking the value of the data. Hey everyone. My name is Henry Surya with Robin.

And you're listening to the technology, you know, podcast the show where I'll be bringing you the greatest technical leaders practitioners and thought leaders in the industry to discuss about their Journey ideas and practices that we all can learn and apply to build a highly performing technical team and to make an impact in your personal work. So let's dive into our Journal. Hello again, my friends at my listeners, welcome to the technology.

Now podcast the show where you can learn about technical leadership and Excellence from my conversations with great thought, leaders in the tech industry and you're listening to the episode number 107. If this is your first time listening to technology, you know, subscribe and follow the show on your podcast app and on LinkedIn, Twitter and Instagram. And if you'd like to support my journey, creating this podcast subscriber, A patron at Tech. Did you know dot f / Patron?

My guest for today's episode is Jean meghani. Jama is the founder of data mesh Concept in 2018 and since then has been evangelizing it to the wider industry, including writing, her latest book, titled data mesh in this episode. We discussed in depth about the data mesh concept which is starting to become an industry. Trend nowadays, we started our conversation by discussing the current challenges working with

data such as the outdated data. Approach and why the current data tools are still inadequate, jean-marc then describe data mesh and why organizations should adopt it to generate data driven values at skill jean-marc, then explain the four core principles of data mesh, which include domain ownership data, is a product, the self-serve data platform, and the Federated computational governance.

I really enjoyed my conversation with jayamma learning the data mesh Concept in depth, which has been something, I would love to learn more More about and this episode taught me a lot about it. If you also enjoy listening to this episode, will you help share it with your friends and colleagues who can also benefit from listening to this episode? My ultimate mission is to spread this podcast to more listeners and I really appreciate your support in any way towards

fulfilling my mission. Before we continue to the conversations, let's hear some words from our sponsors definitey is the top International software development conference with an emphasis on coding architecture. And Leadership skills. The lineup for this year is truly stellar and features many Legends in software development

names. Such as Robert Uncle Bob, Martin can back Scott Hanselman, Franca subramanyam Carolyn honey, Alan Halep, Mary, poppendieck, and many other prominent names, including some of those who have also appeared in this podcast before the conference takes place online. So you can enjoy it from the comfort of your couch. We spoke to the definitey organizers and I'm happy to share that technology. You know, has got the 10% discount code for you. Enter the promo code, awsm

underscore tlj. When you purchase the ticket on definite e.com, here's the promo code. One more time awsm underscore, tlj. Depending on the time when you purchase a ticket early price is still available. See you there. Today's episode is proudly sponsored by skills matter. The global community and events. Form with more than 100,000 software professionals here members, can organize their learning experiences around the technology topics.

They care about most you get on-demand access to their latest content thought, leadership insights, as well as the exciting schedule of tech events running across all time zones. So whether devops our data science is your bus or you are fan of functional programming or all things Cloud, you can make real connections with people who share your Our interests head on over to skills method of calm to become part of the tech community that matters most to you.

It's free to join and you will find it easy to keep up with the latest tech Trends. Hello everyone. Welcome to another new episode of the package you know podcast today I'm so excited to meet yarmulke ghani. She was last a director of emerging Technologies in thoughtworks. She was there probably around 11 years, my nose. I'm not following her work when I was working in thought. It's as well.

So she was part of the technology radar comedy and always come up with all these emerging Technologies and recently, in the last few years, jean-marc came out with this concept called Data mesh. I think it was around in 2018,

if I'm not wrong. Since then, the data mesh concept has taken a surprise by many people and many people Rave about it. So today we'll be talking a lot about data mesh and I'm really looking forward to have this conversation with you Jama. It's a pleasure to be here and really thank you for that perfect pronunciation of my name. Mmmmm. Oh, surprise that. Okay, yeah. I would love to probably know more about your career.

So I always stop to ask my guests to share their career Journey or any turning points or highlights in their career maybe if you can share a little bit about yourself. Sure. Laughter I guess my journey is filled with detours and going to new places led by curiosity. So I started as a software engineer for the first 14 years of my career. I worked in Take RND companies, where they were building a technology product side. Dates distributed systems before cloud and large-scaled.

It's really systems. Was thing. Build monitoring and observability from scratch building steering systems, building databases that basically gets signals from critical infrastructure. Analyze those signals turning them into reports and so on, the full stack of what real world, this was so could look like And I did that on multiple operating systems, various laser is a feelings and HP Tandem and windows and so on, that really gave me a good Insight, bottom up to the technologies.

That I realize that a lot of Technologies today. Maybe start with web development or app development, and they don't get the opportunity to really like look inside the kernel. And how to do system programming, how to build product calls. I did all of that, which was awesome. And then, I took a bit of a detour. It went to a hardware for a little bit. I work for a company. It was building various Hardware

from scratch. We were building the firmware on digital pen systems and that was interesting as well. Again, a lot of great learnings, how to build embedded firmware and then, another detour I came to Consulting without works. That's we worked at the worst. We work on have largest skill. Exact you ssion. So that led to microservices and

again building large scale. I guess this resolutions with micro services and that ecosystem like did quite a bit of work in that space for quite a while and I was Excited about service mesh and kubernetes and all of the Technologies. That's really made it possible. And as you said about four years ago, I kind of started putting my nose into the data space to my surprise, the world of data was far from the agility and nimbleness and totally and distribution decentralization,

that was seen active world. It's slow. Moving is based on paradigms around centralization of data and middleman moving data around and up. It was very sad. Observation, to be honest. That I thought, okay, I could be the little kid who points the bit naked Emperor. And I don't like doing that even if I get attacked. So I started like talking about capping would shift the parents on. What are the pain points? Like wake up people? I am. Yeah. And they suggest really came as if I caused.

This is a question with some answers, some basic answers since then I've been kind of building it, evangelizing it. And as of two weeks ago, I decided to start A company to build a product, a technology deep take, kind of Dev product, developer facing products that it's going to make it. So easy to kind of work with data under the mesh principles to really show the world a different way of doing things and enable developers. Most importantly, while sounds

real, exciting. Yeah. I saw your post about quitting Pollock's job. It was surprising to me but I think looking at the opportunity of investing more effort in building the tools about data. I think that will be definitely crucial and will help a lot of people. But before we go into this exciting journey, I think. First of all, the topic of this conversation is about data mesh. And you wrote a book with the same title data mesh with the subtitle of delivering data

driven value at scale. So maybe if you can share a little bit, what did you actually see the problems when you were working with the data? You mention about some old ways Paradigm centralization and things like that. But what kind of challenges and problems when you work? With the data problem during that time, and maybe if you can give an overview, how data has always been approached in the delivery or in the day-to-day development. Yeah, that's so good. Way of positioning data mesh.

The problem that I saw was that we had an assumption that to get value from data and value being through Pi business intelligence through reporting, through training machine, learning models, or all sorts of analysis of data data needs to come on. It's modeled or raw and skit. Centralized that was a response to a siloing of data, in applications databases.

That really didn't allow cross-cutting analysis of data across town occasions, but that centralization that idea of centralization of had led to a very fragile, very slow-moving, architecture and bottlenecks for really getting value from data. So what are the points of fragility? That causes so much waste? Those are data Pipelines. So to centralized data from applications, we have this concept of ETL elt, you know, extraction of data for

application databases. I think it's curved the biggest clients and we can do it. Architecture is so intrusive, right? Because there's no contract. There is no abstraction. So you have constantly breaking very task-oriented job. Oriented a labyrinth pipelines are complex fireflies, moving stuff around and converting them putting the from one thing to another thing. And that is a very fragile and Causes a lot of waste.

By the time the data has popped out of the other end of the pipeline, The Source has moved on and you got problems to solve and then on the other hand you have this big bottleneck. So the assumption that there is a data team responsible for the data from everywhere, and they put it in a warehouse of Lake. It is a flawed assumption, in an organization that needs to move fast, leads to share data. More peer-to-peer, they become a bottleneck, the team of the architecture itself, become a

bottleneck. So you have frustrated users As data users, they can't find the data they need. They can't access it. They don't trust it by the time they to God made available to them. The source has moved on data people in the middle that honestly I have all the oven empathy for them in the world that they have been. Given this impossible task of getting data from people that have no intention of sharing

data. And giving it to people that they have no idea how they're going to use it and it's stuck in the middle like waiting. They do watching and troubling data with no ready purpose to be honest. They are under a The pressure and the don't perform and then the applications really don't have much of a visibility or even opportunity to use that data. They just put data how to put that in the databases and use it for their operational needs and they never in the conversation

in analytics. So they're isolated from real application of possibility of the mail embedded into their applications, but becoming analytical data users because if they've been got, just put a satellite from this data world. Yeah. So fragile Leti. The only time from day Attitude value, bottom legs, out of that are major problems that you see with the past part. And I think in my career so far, I've experienced things like for example, big giant database where you have when the database

like Oracle SQL Server, right? Then we move into the paradigms of data warehouse concept, where you have another set of tools. So you pump the data, from the oil, TP database into this,

analytical data base. And then in the last, maybe 10 years we moved to another concept called Data Lake where you probably Release raw data, put it there and from there you move into different data Maps, may be of different small data warehouse and now we move into this Cloud Model where we also see Cloud Technologies. Helping I'm familiar much more with like bigquery and Amazon, maybe you have red shift.

So if you see all these historical, the unique thing about it, like you mentioned the centralization, why people actually move more towards centralization rather than the decentralization part? Yes, easy to get started with right when your problem space is new. New or solution space is new and you require specialization. You've got a new set of tools and new ways of working with data. You can leverage your majority of organization can really push the responsibility into every

team. You have to centralize a specialized people and also it's easier to control and easier to get started with like we know right? Any startup start building an application or solution. We say like start with the money list, find your Market. It and then break it down because decentralized distributed systems are inherently complex.

So there has to be a pivotal point that the complexity of your environment increases to the point that centralized Simple Solution does not work anymore and I think the data warehouses or lakes or lake houses as an architectural Paradigm, not so much of an underlying technology. They've been suitable for this world. That the data wasn't ubiquitous perhaps we were capturing. And it'll from every touch point.

The data wasn't being used in every single application domain and having a centralized team and decentralized Technology stack to do with it was acceptable. But we don't no longer needing that world. We've passed that pivotal point of complexity. So you mentioned about this point about operational forces, analogy goal, divide where the application team probably just

dump data in the database. And maybe there's another set of data analyst data engineer, trying to get the data from those databases and put it in a central place and getting in. Sites. There are a lot of complexities definitely but I think in the last few years we also see a lot of advancement in data Technologies. So I think I saw in your book and presentation.

You have this one slide we're probably there are so many different Technologies. The logos are too small, why those tools still couldn't solve this kind of problem because it seems like there are so many advancement. Yeah, that's a very good question. I mean, the tools are solving these problems. Oh, actually, collide with an example. The nevertheless Yes, they are created for an operating model that at the end of the day is pipeline data transform.

Put into storage layer with metadata, or spring, 40ml on top and voila. You get value on the other end. So they have been organized around this very centralized pipeline model at a very macro level, right? If he's ooh Weld and if you start sprinkling, these tools to an overall kind of big picture meta. Picture, that's what they're organized to do with it. Organized to solve ingestion problems, that were organized to self big hairy part, line problems their flow.

Like those sort of Technology. They designed to solve the big data storage or parallel processing problem. So unless you change the operating model and that meta architecture, no matter how locally you optimize a particular solution, we're going to optimize and solve the really hairy centralized data pipelines.

You still going to have a Harry satellite data pipeline, you get a little bit better and maybe detecting the errors of connecting to more resources, But ultimately you're still stuck in the past Paradigm and those fundamental assumptions that need to be invalidated and need to change remain. In fact, people who are deeply in the data space and try to use these tools, they say they struggle and they suffer.

They've got this massive landscape of fragmented technologies that frankly work, really, with difficulty with each other. The work with them have to integrate themselves. That costs of integration looks of this technology to a meaningful. Scalable solution is very high. I mean, if you look at every vendor on that diagram, and if you go to their connector Pages, this business is built around, just custom proprietary

connectors to yet. Another day's associate allowed, a distinct lack of standardization is just mind-boggling in this space. So it's like the Tower of Babel, or like it's just falling apart in my mind, it just feels like nobody speaks the same language. And to a large degree tools are built to solve a very custom solution, and the foundation of this house is about to fall out of. So, you still have a rocket Foundation. I know, I can be very polarising as I describe this because, I'm

very passionate about. Let's fix the foundation. Let's rethink. Our operating model. Totally makes sense. Because he had the way you mentioned about cost of integration. I can imagine every data products that I see, you will see all these Integrations, then the more the better after you do it actually the in Essman the kind of like lockdown and you put so much effort and maybe money to actually move your

data. But eventually if you need to switch also the costs for you to move the data out to the different Technologies. I think that's so painful. So I've been into some of these kind of projects. I think I agree with you about that problem. So you invented this data mesh, right? So if I may describe data mention from your book, you mentioned data, mess is a decentralized social technical approach to share access and manage analytical data in complex. And large-scale environments

within or across organizations. So they have so many interesting topics but the first that I picked is actually you mentioned it as a decentralized socio-technical approach. So tell us more about this sure it actually started as an architecture because I am a technologist so I kind of apply the lens of technology to solve problems. So I saw this as really an architecture to organize how we decouple and how we break down these big problem of how I get

value from data. But very quickly, I realized as we know, Conway's law and just real life experience, technology and architecture mirrors and get influenced by the way, we organize our organizations and teams. So, very quickly, I had to like, self-correct. No, this is not just a technical solution. No. This is not just an object, or we've got to rethink the organization of teams. The modes of communications, the contract for data sharing between the teams and the She is like data product.

Owner was a new role that we introduced. So parents, it became a substitute technical. As in we try to find excellence in our Solutions involving the interaction of people and teams and the technology, some people say, always a techno social or is it Associated that I don't really care which one comes first, as long as they're both involved, hence the word. Thanks for sharing that. It seems like the Conway's losses. Really like true principle in many software.

Architecture, right? So I think database is probably one of it and you mentioned that it is an approach to solve complex and large scale data problems. So does it mean that not everyone will need to go to data

mesh since the beginning? Yeah I think at this point in time I mean I answered usually this question by saying well at this point in time, if you don't have the organizational complexity, if you're late for laid out so where does model is not a bottleneck for you, is the centralized data thing is doing a great job and Everyone's happy. Well, why introduce a concept? That's rather complex. It creates kind of system complexity.

So yes, the short answer is. Yes it's not for everyone at this point to tell her I mean future technology advances are sinking or approaches are process advances in a way that bootstrapping with data mesh is as easy as bootstrapping or even easier than its centralized. So then at that point you said well, there's missions for everyone, because the maturity of support of the environment has reached level of Shorty.

So you have shared all these problems challenges that you saw before, and you came up with this concept. But for those people who are already in this state of complexity of dealing with their data, either, the data architecture pipelines, and things like that on organization, that is very large scale, maybe Global where they have all this data challenges. If you tell them, what are some of the reasons why they should consider moving to data mesh?

So maybe in business value or maybe in some kind of more impact, value-driven, kind of a benefits. Yeah, just simply if they wanted subtitle of my book, if they want to get value from their data, generate data driven value and they want to do that while I am applying analytics and AI in almost every aspect of their business.

And they want to you to do that. They need to utilize data for all aspects, and all touch points and all applications inside the company and outside, if they have such a mission and they want to do that, Reliably resilient lie and do that at scale fast. Then they've got to consider this fish, it's all about really are locking the value of the data. So, let's give a real world

example. If you are in a particular part of the business, let's say I use this example in my book of this, but if I like company recorded, a think, it's a Digital

streaming company. And if you have a team, whose job is really to create immersive musical experiences personalized for every moments of every person in the world depending of what they do. That team constantly comes up with new hypothesis and how to use data about music and artists and listeners and their behavior to create a more immersive experience. More personalized that moments in life, every one of those hypotheses they require discovery of the data and access to the data.

So are they going to be more successful to be able to discover and get access to the data and even ask people to provide the data to data is not there. If they were working in a peer-to-peer fashion, or are they going to be more successful? If it was a centralized team in between all parts of the

business? So as an example, if the playlist team that generous Emerson playlist, want to create targeted music for people, if there are doing cycling or running, are they going to be more successful to go and talk to teams that are taking care of partnership with cycling platforms directly and say, look, we need to see what people are responding to it when they're on that pallet on this. I suppose as an example, Oil or are they successful?

If they say to a middleman and data broker team and say look I have this hypothesis. So as all of these other teams that you need to Now put on your centralized backlog and plan somewhere and get me to the data that I need, that doesn't scale. So imagine your organization emerging the missions and the values that can be enabled through the data and see if you have bottlenecks that need to be addressed.

Then if you do then think about dating message though, You describe this use case is very interesting because yeah, maybe not all organizations are in this state where you have data and you do Discovery and maybe shape the next set of data that the application produced, again, do hypothesis and maybe analyze. And then again, reshape the data over and over iteratively. And then maybe one day you will come up with a new insights and maybe new business lines as well.

Because the data has transformed so much with the scale of the discovery, and also the scale of the hypothesis, that the The team does. So I think that's really a very interesting concept. I haven't experienced it myself because I haven't work in this kind of organization, but thanks for sharing that. So let's move on to the principles. I think, when I read all these data mesh concept that you share, you always come up with this four principles of data mesh.

And when I read all of them, I find it interesting because you kind of like use other kind of framework or maybe approach from application development and mix it into the approach, dealing with data. So maybe if we can Go briefly one by one, right? The first one is you call it. The principle of two main ownership and this is something like applying domain-driven design to data. Maybe if you can tell us more about this principle. Sure you arrive. Your observation is very

correct. That I wasn't as clever as creative. I basically contextualize the things that I had seen working in 24 years of my career in complex environments in operational systems and say, let's contextualize and let's apply them to the world of dating. As we've seen these principles work before, why shouldn't they work with this bottleneck? If sole purpose? Not only? So, the wind ownership is about, it's the same as what domain

driven design. Meant at the Strategic design level applications, which is you have this smaller business domain oriented teams and groups of people that are collaboratively across, functionally working to solve business problems with technology. So there Ultimately responsible to develop applications and software to enable that business outcome, but they're also responsible for using and sharing data out for analytical

purposes. For application of machine, learning model, again, back to the Show streaming, if you have a team that is working on your player application and their job is to give the best digital experience to a user. That's playing music or playing and liking it recording whatever the interactions are. You are also responsible

Responsible as a teen. Well, augmented for house, but new roles and team members for sharing that data in a way that data can be used to directly by some sort of an analytical work load, and that analytical work of might be a machine learning model, that is being trained by your data and some other data from other domains or it could be the reports that we are producing in terms of errors and anomalies of this application.

So we can improve it over time. That's the core of it is that break down the responsibility of data, sharing around the seams of organization. So you have this infinitely, scalable model. As you introduce new domains, you introduce new data sets that there was no mates can use and share and give the responsibility of this charity

to people for analytics. Guessing it for this, kind of cross-cutting use cases to people that are capable to be responsible for it because they're so close to it. They understand, they know what they say about. Don't give that responsibility to someone. One Downstream that actually doesn't know the domain and spirit hard for them to keep cognitive load of knowing all the domains is their team's heads. So that's the first principle. The way you describe it?

It sounds intuitive, right? Yeah.

Why not? But we came from the traditional approach where we have centralized team, they have to understand all the domain within the company organization, understand the data model, the evolution of all the data, and try to put them in one central place, but I think your approach here they are principal putting in To a domain ownership means that the domain team themselves is responsible for, not just the operational data, but also the analogy call data part where

probably they will transform the operational data and responsible for sharing them for analytical purposes. Well, and since they are the experts of the domain, they are probably the best person to come up with that kind of a data model. So I think it's really intuitive after you explain it. There's also one concept that is on this principle, right? Where you mentioned.

That the data pipeline, Is now is not a responsibility of a central theme, but now it becomes an internal implementation of that domain team itself. So maybe if you can describe why this is so important. Sure, well, if there's a fish is really successful. The way I had imagined that they're no longer be any data pipelines. I think if you go to the macro level, macro view of your architecture, you really shouldn't see data pipelines anymore. Between these domain between the data products.

Just concept that we can introduce it, the next principle. So if you think about pie, that is what's the purpose of them pipelines or job oriented task-oriented, computations that happen on some input from the data and transforms and put it on some output sink and you repeat these task oriented process until the leader gets transferred. In a mode that somebody can use it. Usually there is structure around looking. We need to extract information. We need to cleanse them with a to model.

And the usually done in between the scenes of the data outside of the source outside of the destination somewhere in between. So, Taylor is completely challenges that concept, because we're no longer working this task oriented kind of environment. We working on this value oriented outcome oriented environment data product re into the environment. So then the job of the transformation is really an

implementation details. One of these data products in one domain, it's not something that happens in between. And it's really falls into the similar principle of micro services that we had in microservices world. This Enterprise service was an anti-pattern, right? Because we want to localize logic and computation and complexity inside and boundary of a contract abstracted within a service, the Enterprise service bus wasn't doing that.

So, we came up with this idea of smart, endpoints and dumb pipes. So your pipes are super down there, just transform data and your Our inputs are smart because they implement the logic behind the apis of that. Led to kind of more API thinking world. It's the same process. So these pipelines are like the Enterprise service bus analogy.

If I choose and they have the same challenges, so we've got to break them up. Followed the pieces that are relevant, should feel where they should be known, they can be implemented as Parkland if you want to within the implementation or other ways. Again, very intuitive. If you again compared with the application development, right?

So yes, be has long. Been an anti-pattern So the same concept in data, if you want to connect to different data products, you should not create like a complex data pipelines. It should be just maybe transferring the data when transferring the data. This is also another concept under this principle that you mentioned, there is now probably not a kind of like notion of one source of truth anymore about the data. You will have multi maybe shape

of the data multiple copies. Maybe it is in the domain team themselves, or maybe it's already copied to the other consumer of the data where they will use it for their the use case. So why is this the case? Why normal single source of truth? I thought, like in the data world, people love to have a rest this single source of Truth. Where is the data that I can trust because single source of truth? It's not the real thing, it just doesn't exist in real world. Okay?

So let's unpack that. I don't intend to claim that we are becoming irresponsible about data and you will have contradictory copies of the data lying around and have Kind of concerned those that's not the aim. The aim is we still want to be able to get a consistent View and understanding of the data, but we want to do that in a way that it doesn't slow movement. It doesn't snow value generation. It doesn't become a sterile

source of Truth, very quickly. We want to do that in a way that we support the chaos of reality of organizations, their different teams will be different spaced, they generate different bits, And pieces of properties of the same entity, but those properties come from different sources with different Cadence. So we want to embrace that almost complexity chaos, but yet create a system that gives the same outcome that the single source of Truth wants to give,

which is if I search for information about customer even though that information comes from different places, I can understand a particular snapshot of the customer at a point in time. Has a set of consistent values.

That's why I challenge this notion of single source of Truth. So they admit when people don't actually read or understand, it has a lot of constraints and disciplines built into it. For example, the data that data nodes provide or read-only, they never change their temporal, they have two x terms by temporal.

So at any point in time, there are streamed across different nodes in a way that is some data arrives, new data arrives Upstream, the downstream notes that are For transforming that were copying that and transforming it into a new shape of the data they get notified. They have a responsibility to either react on it and generate a new slice of the database

pointy toilet or not. And then also this image provides a means of stitching is probably same Concepts like a customer that comes from the call center versus customer. That comes from the cover set, for the punishing concept of the customer, can be stitched together by the consumer because there are links that are It's between those systems. So between those little product. So there are set of constraints.

There are set of operational disciplines like this policy, meaning cage, and by temporality and immutable data, but still results in the saying, outcome of a single source of truth. But it's designed for inherently complex, business model, and operating model, if that makes sense. And you mentioned it as a most relevant copyright. So maybe you don't get the latest up-to-date. So it's like the concept of this asynchronous or maybe eventual

consistency. Maybe you just need a snapshot of a data at certain point in time. And yet the consumer will decide, okay? I just need this kind of data instead of always getting the latest. Let's move on to the next principle which is a principle of data as a product. And I see that you are applying product thinking to this principle. So tell us more why. It is important to treat data is a product.

Yeah, I think of Mind shift that needs to happen when we put data sharing and serving it, Lighting the experience of people using data as a first-class concern, is also an antidote to the first verse of all the problems are the first principle, so domain oriented data ownership. One can imagine can lead to Beta siloing on the player domain, but the data that I need to improve my application. So why should I care about sharing that and be responsible for consumers, which adds a ton

of work. So this has a product try to incentivize people to share that data. Be part of an ecosystem that is generating value through the exchange and through data sharing. And again, put some discipline and constraints in place for that to be done effectively. So defines such roles of people that are responsible, for that define success, metrics for data

as a product. Define, an architectural consent, if we define usability characteristics around discoverability address, but it's like all the things that make the experience of the data user, really easy, really delightful. Those need to be translated into structural Architectural Components that are built into this data as a product. So then you have kind of technology that needs to shift and change.

So, yeah. So in short is an antidote to the problems that are always from the first principle and also really focus on again getting value from data who gets value from data that users do. So, let's put the first. I like the way you explain, why the concept of a product is up here because you mentioned for a successful Products, you need these three attributes which is visibility valuable and usable and I think traditionally again, we just read data.

Okay. This is just a data you go and figure it out so sometimes it's not usable. So sometimes maybe the query language is different. The database Technologies different because we all have this polyglot database Technologies or maybe it's so ancient right Legacy Technologies. We don't know how to deal with it. So I think if treating it as a product we also need to think

about the usability aspect. So I think that is definitely The key when I read about this principle, there are also a new role that is being created because of this concept of data is a product. You mentioned about data product owner, and data product developer. So not all teams have these roles yet. Tell us more about the importance of these two roles.

Yeah, so if they support it becomes a thing that we are creating maintaining operating evolving retiring with it's useless and nobody use it, then they have to be people and roll. Rose to take that responsibility on and it's very unfair. I think it's impossible to say to app developers who's working consumer or very different personas, like they're serving the end-user to say, oh now from here, all of you also have this

other responsibilities. So not only your serving these end users that are interacting with your player application pressing buttons and make sure they have a responsive app and all of those great things but also half of your day you have to face around. And face this data analyst and designs in different domains who want to use the analytical data that you're generating, you've got to serve them to it's

impossible. You turned up two bosses like because service to purposes at the same time so if there are super humans that can do both jobs. So be it. That's fine. Maybe it is possible to kind of share your time and split your time that way. But nevertheless you need to have a very explicit responsibility and accountability for that. Part of the job.

If there are people that are appearing, like the data product, it will offer and adaptable, preparing and of collaborating closely for the two different people playing these roles and that's fine too. Yeah, so unless these kind of idea of the data Pride doesn't happen out of a good intention, we have to allocate space, we have to empower people have to keep people accountable as follows for hence, the roads. Yeah. And also not to mention the

skill set. They are totally different Technologies, different paradigms. I think it's very difficult to find people who can Master both application development and also data engineering. I think for That to be also usable as a product that you mention about all other usability attributes, like discoverability, address ability. It should also be understandable and it's trustworthy as well,

and it can be interoperable. So imagine if you have multiple consumers, then they want to access your data, but they do have set of constraints on how they would integrate with your data. I think interoperability is also a good concern that the data domain owner should think about. And that's why I think having a data product owner that maybe can Define the set of Of requirements and maybe it's kind of all usability concerns. I think that is key why these

rules exist. So let's move on maybe to the next principle which is about principle of the self-serve data platform. I think this is also interesting because you kind of like apply platform thinking to data mesh, why we should have a self-serve data platform. Yeah, I think it's an obvious one, but let's go deeper into it and say, maybe answer it. What is this little platform? So, when we think about It's the roles of Platforms in general

platforms. Are often shirt kind of infrastructure on top of which you build domain specific solution. So they often like the Mets diagnostic infrastructure that Empower other teams to build Solutions on top. So the valley horizontal not really bear to go so much. So in an organization, that's implementing data mesh. There you have this domain teams, they have active faults in them. You have data product, folks in them and they're doing their

daily job there. Daily job should be focused on delivering value based on the outcomes of that domain. Their daily job should not be focused on metal work, creating kind of the way diagnostic pieces of technology that they need. So, I think platforms as mean to empower autonomous teams to lower, their cognitive load to do what they need to do. More easily. They're wonderful. They're necessary. In terms of data mesh, the data platforms, many of them exist or

lots of Technology out there. They're built To give basic tools like you want story short? I will give you a storage. I can provision that bright. You want workflow processing. That's why like I'll give you that but the to eat slow devil for a developing the product. So we need a new layer of the platform that really takes data product away at least a to mesh that I defined envisions treated as a first-class concern and

hides away. Those details of, oh, I need a storage partnered with a, see that and it gives His life to a completely new concept, this new concept of this Quantum as it did product. So the job of that platform, the reason I put it in was making it feasible for independent, domain teams to do data work. And some of the attributes that you mention of this data platform is that it should be autonomous interoperable and domain Gnostic.

So I think one of the challenges when I was working with data related stuff, also is that? Yeah you have all these tools like you mentioned, but bootstrapping all these tools takes a long time. You have the setup clusters, you have to maybe install things dependencies and on top of that, then you have to write code, you have to maybe understand the source, the sink and things like that. And then you have to write the pipeline's itself and then

deploy it, and things like that. Yeah, it takes a lot of effort just to come up with a very simple Pipeline. And I could imagine having this kind of self-serve data platform, maybe something like the UI console, where you can just go login. And, you know, click, I want this data from this source and move it to my sink and then it just creates everything for you. I think that will be a perfect scenario where maybe some of the tools, not yet, catching up.

But yeah, hopefully, one day we will reach the experience for the data engineer or maybe for business people, they don't even need to care about dealing with data Engineers themselves. So, you mentioned about this concept of self-serve data platform because one of the challenges of building this platform is that you will need to make it agnostic. So, I think this is probably one of the challenges because we have so many different data Technologies.

So how Should we think about building this platform? So that it becomes agnostic because we have so many Technologies, right? We have so many different shape of database Technologies. Yeah, I mean it depends the agnostics city, I guess of the platform and its independence from underlying kind of Technologies the level of it

depends on the appetites. It's an organization that how independent they want to be. So I think what we need is interoperability between the different technology, so if it's still the solution on top, And the solution requires data from across two different platforms. There is a level of interoperability that I can access data across two different clouds or interest to different technology Stacks.

So at minimum we need and the year that creates that interoperability even if the vendors themselves are not incentivized to do that. Right. Now when it comes to underlying infrastructure agnostic again, I don't think it's meaningful in all organizations because you end up with a like most common denominator of the Features that are available on all the platforms. And that's not ideal. That's a lot of work and very little result so don't know if it needs to be completely

agnostic. But we have to have the pieces of it, that enables interoperability and movement moving from one to another and remove locking as much as possible. And those pieces are usually around cross-cutting concerns, right? How do I manage security? How do I be able to automation? So that if tomorrow I want to move to a different Set of infrastructure, my processes are automated and not hand-cranked.

I can kind of through automation facilitate the movement, much faster, that's the way I think about this being Tech agnostic. As opposed to a nice layer on top, I don't think that's really realistic speaking about cross-cutting concerns. So this also touching on the next principle. The last principle which is principle of Federated

computational governance. It's quite a mouthful to mention that but it's taking care of all these cross-cutting concerns are Mention things like for example, security policies and things like that. You are kind of like applying systems thinking to this principle so that we can govern the data better. So it share more about this principle because to some people this might be hard to kind of like understand. Yeah.

It's a mouthful already. And if I could sneak in another word, I probably have called this principle of embittered Federated competition covering. But I think Marti Fowler would have not Not posted my article, right, did that? Yeah, so I think that the concept is really is again, I don't to do two problems that arise from the previous principles, which is we need interoperability.

We have now this despair it sets of data products domain oriented, their own team, their own Cadence their life cycle. How can we apply set of concerns that need to be standardized across all of them? And what's the best way to go about defining them? What's the best way to go about implementing them? Observing and enforcing them that leads to this principle. So as an example, if you know to have secure data or you need to have high quality data, let's go

with high quality. So server quality data definition of quality and then enforcing quality and Sherry quality data. One way of doing it is say, okay. I'm going to put a quality control team. My governance team in the process of generating every data, and this is going to sit in the middle and verify at some points in that life cycle of the day. Data its beta acceptable to be shared. That's where system taking comes

to play. That system is going to have a massive bottlenecks and it's not going to scale. So how can we achieve a defined level of quality without creating just necessarily just controls? We need the consensus or definition around what constitutes quality as in what attributes to reuse to describe the quality of data. Is it complete? This is of Integrity is a timeliness, like, is all of the above and others So let's define those and in that definition is have subject. Matter experts.

This have domain people who actually know their data and how they can articulate quality involved in defining that and once that's defined, that's automated you. Let's put it into the platform as a plan for capability. The moment, you are instantiating it in a product, you will get out of the box, a library, or some SDK or something. That gives you the ability to now, calculate capture and share.

This quality metrics and then you will have observability that runs across this measure across, all of his data, paradise, and capture, those information shares that information also validates, whether you are meeting the requirements of the quality that you've done. So that's it.

Computational part and the embedded part that I haven't put in the title, is that this enforcing quality and measuring quality becomes an embedded concern in every single day product, it's not something that smeared over And added later on, it's actually from ground up built in the data, product itself is embedded in there, so hopefully that gives a good example of achieving I guess. Well, oh and and cohesive mesh of interconnected data products through embedding policies.

A standard policies in an automated fashion in everydays product and it have the teams that are responsible for guaranteeing. Those policies involved in this defining what these policies are. Yeah, if we can borrow things like from the application development rights, we have this concept as well policy as code. There are some tools in kubernetes clusters where you can embed this kind of policy. So before you apply something, you will check towards the policy.

And if it's doesn't comply even reject. So data governance is probably one of the least sexy part of the data management because they are things like pii data security and maybe should not be leaked out and it should not be exposed maybe things like data quality, right?

How much lagging for example the data Be and I think all this definitely needs to be governed because otherwise it's really difficult and you mention about observability and you use the concept from SRE where you have also data SLO. So maybe if you can touch a little bit about this data SLO? Yeah absolutely. So data products for people to trust your data you need to share a set of real time.

I was or at least as real time as your data in these information to give people trust that this is a Suitable data. So, again, the dimensions of that, I think I unpacked it in the book and probably don't remember all of them are top of my head. But the dimensions of that all around quality, there around timeliness there around completeness. There's a whole set of often in the language of potato people. These collect metadata is language, I don't like and I totally use because it's just a

casual bag of all things. But there are classes of really information additional data that you've got to provide for what purpose. So that the people that want to directly self-serve, use the product, they can self assess if this data suits their use case or not as an example, the distribution of their data. So if I'm doing an analysis, where trading a machine learning model for particular, use case up, perhaps like to have a very I don't know.

Nice bell curve distribution of the data and the samples that I can get for training that machine learning model, rather than by estate us. So how by is the data is. So these are again SLO is in the active or more about part up time and response time down tournaments so on and then the data world to the competition of part of it still has those concerns. But the data part of it, it has a different set of concerns that defies of usability, Mitch weeks

of a data. So thanks so much for explaining all this. It seems like a very crash course of data mesh. I hope people do study about this data may be from reading your All your articles are watching some of your talks. I think it's really an eye-opening for those people who work with traditional data management. So thank you again for this. I have one last question before I let you go. So normally, I ask these three things called three technical

leadership wisdom. Maybe if you can share some of your wisdom for us, maybe to learn from your journey from your experience, or your expertise. It's a hard woman. So maybe just a few things that I didn't do as well. They can share or things that maybe I did. Okay, I separate leadership from management. I'm a terrible manager, and you don't want me as a manager, but maybe I'm going to K be there because I believe in the mission, I'm a very Mission

oriented person. So, as a leader, you need to believe in your mission and create a mission oriented team and organization and continuously through communication, through reinforcement of like embodying, the bright Behavior to achieve that mission reinforce that and remind your teams and keep real. Lining the team, maybe there are different styles of leadership but that mission area to Visionary into leadership, resonates with me. I love working with people like that.

And then to get to that mission, you have two ways of going there. You have no way of leaving. May be some casualties behind like going in a way that not everyone can catch up and if you wouldn't take some soldiers along the way, but you want to make sure that everybody's alive you need to seek about every single member of the team, their

needs their pay. Pays their specific hopes and really it's about not only having the vision and charging the past but also making sure everyone can come along and

think Beyond yourself. That's probably an area that I need most help with personally myself because my mission oriented kind of leadership, usually has casualties and the way your people need to be able to trust you and believe in you, you need to be very self-aware in terms of your strengths and your weaknesses where you want to delegate. And where you want to actually take something on.

And if you're a technical leader, I personally respect technical leaders that still stay close to their craft. They still keep up to date with their craft as we all know, to technology moves really fast. So you have to find a way to keep yourself relevant and up-to-date. And sometimes that means going really deep for a moment in

time. Get your hands dirty and coming back up. And of course, As your scope of leadership, Rose, your ability to go really deep diminishes because there's time doesn't allow for that. So, having carving out space to go deep when it's needed, even for a very short period of time. I've seen some technical leaders do that, I admire people who can strike a balance between the depth and kind of the breadth of knowledge and relevance of their knowledge. Well, really beautiful.

Thanks for sharing that. I think it speaks to some of the leaders where they are more efficient driven as well rather than managing Well, so thanks for sharing that. So, maybe Zama for people to learn more from you maybe about data mesh or just to reach out and follow up with the discussion piece, their place, where they can reach up, well, as it's now really Twitter and Linkedin will be the place.

So I listened to both channels but hopefully soon my company's website will be up and we will have jobs and have places for people to reach out directly through that, but that's not up yet. But when it is, I will let you know and you can share with your network. Really excited to hear about that. So many different data mesh Technologies maybe will be coming from that. So thanks so much for your time, really a pleasure to have this discussion with you.

It was wonderful to be here and we thank you. Thank you for listening to this episode and for staying, right until the end if you highly enjoyed it. I would appreciate if you share it with your friends and colleagues who you think would also benefit from listening to this episode. And if you are new to the podcast, make Subscribe and leave me your valuable review and feedback. It helps me a lot in order to

grow this podcast better. You can also find the full show notes of this conversation on the episode page, at Tech Legion, o.f website, including the full transcript, interesting quotes, and links to the resources mention from the conversation. And lastly, make sure to subscribe to the show's mailing list on package. You know, dot f to get notified for any future episodes. Stay tuned for the next technology. No episode. And until then, goodbye.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android