Today we are discussing data at Goldman Sachs. We're speaking with Nemo, Raphael the firm's head of data engineering and their Chief data officer. I've been at Bowman about 20 years, always as a software engineer data, engineer rat, or want as we call volunteer stress. And now, as you mentioned, I run our data team or data, engineering team and chief data officer. For Goldman data engineering, Goldman is probably a little little bit different than other
places. I, I, sort of organize my brain and my aim and sort of three buckets. We have our platform team, which hopefully, I'll have a little more time to talk about and our curation, our content and curation team. And also as a chief data officer, with my chief data officer hat on sort of our
governance and quality teams. So that's that's sort of the background of the work we do. Data is so obviously important to financial services but it would be really interesting to hear your perspective on the role of data how data fits into the world of Goldman Sachs information is the lifeblood of financial services and people say well what is that?
And and I think you know, in financial services at Goldman, of course you know information is is actually our currency So, a lot of a lot of decision making a lot of helping our clients, a lot of innovation is all based on what information we see out in the world and what information we see internally. And and so organizing that making sure it's readily accessible, making sure people could get to that data quickly to, to do to do their work or to help their clients.
Starts becoming sort of like a core, a core component of doing business, that Goldman Sachs, you mentioned, the various aspects of your role, how you break things up. Do you want to dive into that? And tell us a little bit more about that. The first thing I think about data is that it has to be a first class asset. Right at Goldman. People have to believe that it could help solve problems. Help, our clients help us
innovate and financial services. So, so the first thing we sort of did is okay, how do we want to work with data Goldman and again, as an engineer, that always push me to sort of think about data as as an engineer would think about their code. So I the analogy, I always say, or maybe it's not even analogy, but the way I always talk about data is like, you got to think about data, just like you were thinking about your code and that's sort of has pushed us to build a platform team.
That really thinks about that. Things about data engineering as an engineering function. He thinks about the workflows of data Engineers, how we could help get, leverage and scale to our businesses so that they could start making decisions better faster, cheaper easier. And so the platform team and my team is really laser focused on that, right? How do you make data engineering workflows just seamless? Beautiful, and all the things that, you know, Developers are
used to right? I ve s code completion. Making sure that we have like reproducibility of our data. Like we have code think that so all of that sort of becomes their first class like role is that won't is to only think about how do we make data awesome working with data also and then we have a Content team. So again like I said, A lot of a lot of our workflows at Goldman run on data. So you can imagine, you know, part of our team is a real-time
Market data team. So all the all the data streaming from exchanges or different, venues things like that in real time, super low latency. You know, we're talking about like millisecond latency here right to get to Trader screens or to get into the Algos. And so there's a, there's a whole team of mine which is in the curation business. The content business, whether it's real-time Market data or things.
Like our reference data that we use a, you know, like who our clients or what products, can we trade things like that?
And so that team is really, I'd say, sort of a data engineering bent team that uses our platforms to sort of do their work, make sure the data is amazing, quality wise accessible and ready to go. And then, the third bit is Is really about the framework that we do data governance and data quality around like so a little bit around policy and Frameworks but then how do we push our platform team to then build in those mechanisms or controls on the platform.
So when you do your work at Goldman, you sort of get those for free, I say or complimentary. So that's sort of the, a little more about the three buckets, we talked about here. What I find particularly Interesting is unlike most businesses which are of course have their their business data. You're dealing with huge volumes of real-time information of real-time data at which there is enormous amounts of money and risk at stake. I have to imagine that that places and additional kind of
intense bird. Ian on you, as you're thinking about this whole technology landscape, I would sort of flip it in a positive. I don't think it's actually, we don't think of it as a burden. We think of it as a real opportunity, right? That I think, is the first sort of mindset shift. You have to think about when you're in these sort of high-stakes games. Yeah, of course, it's high stakes.
Of course, things could go wrong, but really, it's more about the opportunities to make sure that again, we're helping our clients, do the right thing, we're helping the economy, we're hoping the Financial system do the right thing. So we sort of take pride in that instead of a sort of Burden
thing. And the other one thing I'd say is, I actually like to say you said, huge amounts of data, I actually like to say Goldman Sachs were in the sort of medium sized data but very complex data World. Our data super complex, the types of data are complex, the speed is complex, there's everything from super real. Time low latency all the way to, you know, end of day batch processing the relationships between our data as complex, right. The product catalog is complex,
right? It's not just use of whatever its actual financial instruments that people have have made up. So the complexity is really the really interesting and cool challenge here versus I'd say the volume what makes The data complex, or just describing a little bit. Can you drill into that? The complexity again comes from various angles, like you mentioned the complexity comes from the, from the speed is one angle one dimension, right? But also the relationships of the data.
So you could imagine, you know, there's data about the stocks and bonds that peep that our clients want to trade for one example, but then there's a whole infrastructure. Around those like I don't know how many people here are familiar with financial services, but there's derivatives on those products, right? So now you're not just talking about stocks and bonds, but you're talking about really complex.
Now, algorithms plus complex data elements about now that have a relationship that have been now made up in the world on those stocks and bonds so layering of this data and the Alexa tea. Now that people have come up with such creative ideas to help our clients and things like that starts pushing the boundaries of like how this data is related and interconnected and curated. Hopefully, that gives a little bit of a sense. You are managing these very
large volumes of real-time data. Do you want to tell us at all about your, your technology infrastructure? It's not a question. I usually ask, but it seems that in this case given the the equities, the derivatives, everything else that you had just mentioned it. Seems the technology infrastructure has to play a very important role in the structure.
Absolutely plays a huge role. Here, we have an incredible infrastructure team and you could imagine again that this goes all the way down to sort of the hardware Hardware layers, like the networking stack, the computers you use the network cards in the computers use. All this stuff sort of actually matters at At this sort of latency in scale, but I got it. I don't want a super over index on the Real Time stuff.
The Real Time stuff is seen as a very important, very critical and important part of of of our of our world, for sure. And the team does an amazing job all the way from the hardware layer up. But we also have, you know, data challenges just after, you know, as the real-time data.
As in, you know, when you do a trade right, that store also starts being complex and so we have built sort of a data platform here, we call Legend and we've actually open source that recently in the last couple of years. We've built this Tech stack over the last 10 years. Internally, two years about two years ago, maybe three. Now we open sourced it it's fully on GitHub, you can check
it out. We gave the code to a non-profit and open source nonprofit called Fitness, Around GitHub happy to talk more about about that as well. Why did you open source at first glance? It seems like an odd decision to me. Because one would think that a firm like Goldman Sachs would want to keep all of that infrastructure to yourself. There's a couple of reasons. One is, as we were building this, we talked to a lot of clients who are having exactly
the same data challenges. We Were and, you know, we're very obviously, a very client-centric firm. So we thought look, this has helped us so much internally. This platform has helped us so much internally, we'd love to give it a chance. Help our clients to help push the industry forward as well. And and so you also have to separate a little bit about the platform itself versus the content that we curate and work in the platform.
So, we haven't yet. Or maybe never will open source the actual content, but we could talk about that as well. But the platform and the work we've done to sort of standardize, how we work with data. We thought that was so powerful. That that actually giving it out to the community and building a community around, that could actually help the whole industry and our clients and and ourselves you are wanting to share best practices.
Is approaches to working with data things like that with the community and then of course, everybody's going to have their own data content within within those constructs. That's exactly right. The interesting part is even the so even if we don't give out the content, right? We are working with industry standard bodies, to even describe the data and structure the data, and the linkages to the data. So, it's a really cool and
Powerful technique. And, you know, working with these standardization bodies and you know like what does a derivative look like and even if even if our data looks slightly different than our clients. At least we could standardize on how we talk about those those things. And that's what the legend platform, really excels at. So it's really a contribution to the broader data science data and analytics Community essentially. Ultimately absolutely absolutely.
I think the platform is Is a general purpose platform for working with data and then you know we specialize on some standards and things like that working which is I think another contribution to the into the financial industry in general. Is just how do you get to that interop and standardization of at least data contracts or how we talk about different terms and relationships of different pieces of information.
Please subscribe to our YouTube channel and hit the Subscribe button at the top of our website. We have Have a number of questions on Twitter so why don't we jump over to some of those? And we have an interesting question from our Salon con are salons, a regular listener and so thank you our salon for, for your, for listening, and for this great question. He says, given that data is such an important asset. How do you assign the financial value to that data?
Who is it that? Decides that what data? As Financial value and how much it's worth and, and I'll just
add to that. I'm assuming that, that helps that attachment of value is one of the things that guides your priorities in terms of where you focus your team focuses, what we do is we work with our business line, so I run our core data T. We work with our business lines about what their, what they need for their client to serve their clients better what they need for their business to scale what they need from us, to help them, sort of get an edge or innovate in the data realm.
And so, a lot of it is about the value of the overall outcome versus just like ascribing value to some piece of data. And so, the way I, the way I think about from a core team, helping people is really what is the outcome? We're trying. To drive for our clients and for our businesses and for Innovation and really, look at
that holistically. I think, again, I think it's a little bit of a, little bit of a misnomer, a mistake to sort of, try to say, okay, this dataset creates this much value, this creates that much value is like, what did we do for that business and for our clients and really we take those wins. As platform and data, winds collectively with, with those teams that makes perfect sense because the point is not some piece of data, the point or body of data.
The point is, what are we doing with that data? And as you said, what are the resulting business outcomes and then you have a framework for for valuing, the the data because we're trying to get to To the outcome. Sorry, I didn't mean to answer for you. No, no, that's exactly right. That's it. That's a great. That's a great. Exactly. This, the right summary. And that's exactly how we do. Our oh, I like our return on
investment. On the work we do, is we actually work hand in hand with those teens and say, okay, like it, we were we able to reduce Risk by X or were we able to help our clients do? Why better or faster or get them into better position for y or did we? Would enable the business to do some new thing that they were not able to do. And that's really like the ROI calc and at that level we have another question from Twitter.
And another great question, this is from Natalie Bean who says, how do you balance data quality and data quantity? And we haven't even spoken about data quality yet, so this is a great question. When you think about data as a first-class, Class concept, when you think about data is the, when you do the same things, you do with your code, that you do with data, like, think about your data architecture up front, think about how you structure your data, how it relates to
other pieces of data, right? And you do that, work up front, we have seen the yeah, The Upfront work takes a little bit longer but the, but the huge benefits of that become apparent as you're trying to scale these things. So we deal with it. We deal with the scale and the volume and the complexity, like any engineering organ. Is we build tools. We build platforms, and then we make sure that those scale to those problems.
And right and so now when we do data Goldman Sachs it's like we do it on the platform and everyone knows like okay now the quants are going to get their data you know and days instead of months because we have set up the right platform constructs and Engineering constructs for that. So again it's not a perfect Silver Bullet, answer of like as things grow like, you know, there's some there's some equation but we have Seen that the investment in the platform
and the tools and the workflows. That's that's the thing that helps us scale again. I find it really fascinating that the platform and as you said, the, the workflows play such a crucial role, but I suppose it's entirely logical that when you need that data when that data is so important and it needs to be right? And it needs to be consistently, write that Need that infrastructure and the automation to make it happen at that, at that level of quality.
That's exactly right. This is my personal view coming from an engineering background. I solve pry eyes off, scale, problems of all your problems and complexity Problems by breaking it down into engineering steps and Engineering platforms. And so that's that's sort of been our ethos of this whole legend platform is really about that. How to attack that problem from
an engineering lens. We have another Another excellent question from Twitter. This is from Lizabeth Shaw, who is now alluding to governance which you mentioned earlier, but we definitely should talk about that. And she says, how do you build in the mechanisms that support data policies? The point of the platform isn't that? It just magically makes these things go away. It's that it makes you think of them upfront, it makes you think of these Concepts.
As you're doing your data design and your data work and so really when you when you're on the legend platform, thinking about your data, the first thing you think is what is your data contract, right? That's actually the first bit of the workflow is describe your data, describe how you want to publish it, describe how you want other people to see it or consume it how you want the track things like lineage. All of that is sort of built in baked in, as a first class
concept. So again, I don't want to oversell the thing. You still have to think about it, you still have to do it, but the point is, bringing it up front, instead of hiding it as some secondary thing. And so, have to think about the entitlements, the security, the
encryption, all of that. You think of, as sort of the first class assets, as you're just a first-class Concepts as you're designing, your data flow or your data workflow, or your data, Production or data consumption patterns, and all that sort of comes together in in the platform. It's fascinating again to me because it seems that your emphasis is definitely placed higher on that infrastructure
and platforms. Then other Chief data officer, others that I've spoken with, but at the same time, you were Dealing with a level of data complexity combined with the speed and the and the financial consequences associated with it, that I think few other companies would have that set of combination of circumstances. Exactly like the quality the data has to be right. It has to be consistently, right, as you said, and the ramifications of that are pretty
or pretty big. We have another question relating to to governance. And this is from again, from Arsenal and con. And he says, how do you decide what data is good to use? What is not? How do you address bias? He's in the data you're collecting and using. And this is interesting, he says, do the business lines, agree with your conclusions. So how do you also get everybody on the same page around this stuff? If we are not some like isolated team in the corner with like pointy hats, right?
Like doing this in isolation. I think that's the first most important thing to get. Like, we are hands on keyboard together with the businesses, making sure that the data were using first is right, right? But the right for them and write for the use case, And really is solving their problem. So, the first point I would make is, you know, aligning ourselves with business outcomes and with the businesses is the first
thing. So like not being in some isolated back room, like oh, we know best about everything, every piece of data. And again, the the financial domain is so complex that we would never even pretend to do that, right? You know, we have people who are highly skilled and finance, Financial areas. So that we could make those decisions with our our businesses, like, in sort of a joint venture fashion, but we would never say we're the experts in everything.
So I think that's the first piece is like, you have to be connected to the business and the business outcomes. So that's that's the first piece then, then again the quality and the governance, right is then becomes an aligning incentives. Of a joint sort of venture, right? It's becomes there's a healthy tension, right? Of course we want to do things at scale and they want to sort of solve the problem immediately.
But the point is again that, you know, bringing these two teams together really helps accelerate accelerate that and then get to the right answers and the right data. So this then leads to the question about the composition of your team. It's obvious that you have very three deep technology and data expertise, but as you just
alluded in order to do your job. Again, I have to assume that you need equivalent Financial depth of expertise especially when you go into Concepts like derivatives, as you were, as you were describing earlier, we do have a team of stress or quants in the maybe more financial world, who Come from being on
the desk. Understanding how data is used on the desks and have sort of a stem background, whether its technology math, physics, whatever who are equivalent, counterparties in Mighty counterparts, in my team that work with these groups and actually understand the financial domain, but cross over to the technical domain. So that's that's what we call our data design and curation team. I'm pretty pretty cool, high-powered, high-powered, crew, that that really is bound
to those business. Was those folks which team are they part of? Are they part of your organization or part of the finance or trading organization? Where do they actually fit in our team? We have a subset of that team, right? But those, those sort of people are also all embedded in the business as well. So there's Bedded what we call embedded, death strats or embedded quants in various
businesses. But then I have a sort of small selection of that team that works specifically on data with those other business teams, if that makes sense. What are the elements that comprise a successful data team such as you have? It really follows sort of the structure of the team like the platform team is high-powered soft. For engineers right there. They come from a software background. Their goal is to make the software bulletproof and build the right workflows for data engineers.
And the second bucket is basically data, Engineers, people who are content experts but can use the platform configure. The platform build data pipelines, build curation pipelines. Build data models that then get shared out then the third bucket. Is really this sort of hybrid hybrid. Again, I'll use the word strap because that's what we use internally. This hybrid team, that's really straddles sort of deep finance
and deep Tech together. And then the fourth is sort of our governance, you know, framework team. They're the people who sort of set the policies set the framework on and and and work with the divisions to sort of, make sure there. They're working in the bounds of our framework. One of the topics that we have, not really touched upon is the notion of data interoperability and I know that's important to you. So can you tell us about that? We touched a little bit of pain.
It, that's sort of why a big big reason we open sourced.
Our Legend platform was exactly for that was because we felt that like if we could bring some Platform standardization in the industry for us or clients are counterparties that then at least the discussions about how data works or how it should be connected or how it should be described in the industry that we could help push that forward and we can help push that forward with standards with other bodies, but they could all do that work in one sort of saying way in in our platform.
Well, not it's not even our For me, in the open source Community platform and and that will benefit everybody. So it's a bit, it's absolutely a big, a big, big play, I'll mention, you know, one project we're doing in the in US Community which is the open source nonprofit that we gave gave Legend to. We brought is though, which is this the derivative standards d'oeuvres body?
They have built a The model called CDM, common domain model for derivatives and now that sort of that data model is now available in legend for people to collaborate on in a collaborative environment out in the wild it's has nothing to do with Goldman Sachs they're just using the platform. So these are like pretty cool things that you know we're trying to do to push the interop and and standardization out how has the uptake bin in the broader community?
Of of this platform and data. Interoperability the cool part is that our code is out there we have clients, actually deploying it using it. We have other counterparties also looking at it, doing pocs, in their own environment, but the flip side of that is, you know, building an open source Community is very difficult. I mean, we vastly underestimated how much time and effort and energy goes into, you know, making Making sure people know how to use the thing, the
documentation. Do they understand the value? Can they contribute code back? Like, all of these things, we knew it would take time, but I think we sort of grossly underestimated how much time to build sort of that, at that big Community around these. These projects, what are the benefits that you have presented to the community as to why they should engage in this, it has helped us Date of better at Goldman, is helped us organize our data.
It has helped break down silos. It has helped the data quality massively. It has helped data governance aspects, massively internally. And so we point to sort of those success stories about how we do it internally that we could also help clients do the same thing. And you know if there's appetite of course that we could also help the broader industry break down those silos as well. We have Another couple of questions that are coming in from Twitter.
Do you think of your data or your platform as a product that you can sell, given that you've built all of this infrastructure? I'll say that's not the intention of doing it. I think, though, again, we have seen that as clients use it as other Banks, use it, that people sort of come back to us and ask, okay, well, what could we What could the support model B, or, hey, can you host this as a SAS
for us? So that we don't have to deal with setting up the infrastructure, running the infrastructure or things like that. So I think we're still again, just not to oversell anything. I think we're still in a very crawl stages of even thinking about that community and how we could help do better and better but Our intention isn't like Like make money off of this
thing. It's really about the standardization parts now if people see a large adoption and see a value, I think it could be an exciting potential opportunity. We have another question from Twitter. Can you give any examples where using the platform has resulted in innovative ways of for users to use and or combined? the data we sort of took again, a data-driven approach to helping our firm and our Risk
Managers and our sales. People in our Traders, understand that risk from a data-driven perspective, right? We have combined all that data and information into our platform and have basically given I hate to use this word but democratized the access to that our team has linked a bunch of that data together done. Some interesting analysis but more importantly, we have In that those based sets of data, those relationships, the relationships between those data
that may not be obvious. To our users hands. That now, they have the access to that information and now they could come up with creative things or risk, mitigating things to on top of that data. So I think that's been, you know, maybe one that resonates pretty recently here, it's kind of the reverse of Shadow it that cios didn't like years ago.
Some going back, maybe, five years, you're Ali, taking the data and I assume the tools and placing it in the hands of users, so that they can be creative. They can innovate with with that data. Exactly. But in a fully governed quality way, right, where that shadow, it problem isn't really a problem. We have put all the right guard rails and all the right governance around that. So now people can actually innovate in a safe space. In a safe way but it's not just lock to our team.
Can you give us some insight into what kind of governance or how do you balance the need for security for privacy against making that data, available accessible and easy to use for folks before we do anything with data. Right? We have first of all very strict policies and Frameworks about, you know, understanding who's allowed to see what. Data. And even if you're allowed to see it, should you see it? You know, we call it need to know, right? Like just because you're allowed to see it.
Maybe you don't actually need to know that client information. So first of all, as an overarching theme, I'd say like one of the one of the most critical pieces is is you know writing those policies down and actually then sort of enforcing them in the platform. But before we do anything like that is a clear thing and then and we have very clear rules About who can get access to what?
How, and then again, would be a little broken record encode, those rules in the platform and make sure that that is like a Slayer. Like if we get that wrong, nothing matters, right? So the first thing is to make sure that that Baseline Works before before giving access to him. So in other words, as long as the foundation of governance, A risk control, compliance with regulations, whatever is necessary. As long as those elements are in place, then you're able to share
the data. And let people, I was going to say have free range, obviously that's not the case, but have enough rain, where they can use it. Use that data creatively in the service of whatever their business goals happen to be much better said, then that then white man eyes that it but Exactly, exactly. The right mental model. What advice do you have for folks who are building a data strategy, based on what you've learned at and done at Goldman Sachs?
Make sure you attach yourself to business outcomes. The things that people care about at your company, right? That I think, As a technologist. You know sometimes as technologists in general maybe I'll make a broad generalization
sometimes. You know, we care a lot more about the tech than the outcome but specifically for data strategy, I think it's even more important to over-index on the outcomes because data sort of becomes this nebulous thing where now you know like well what does it mean to have a data strategy? What is that like what is data even mean? Why do I need that? Like an abstract? I get why I want information, but like what are you talking about?
So the first thing, I the first, first thing I talked about always is attached yourself, the business outcomes and show how the data and the data strategy actually makes those cheaper faster, better easier makes money for your clients. Helps reduce risk, help. Save money, do so. That's that's the first sort of Baseline advice. I give to everybody. And then then I think, you know, but then below that then become sort of okay set out. What is the platform strategy going to be?
How are you going to actually make the engineering work? Think about the business workflows and the Pro workflows. And then, you know, we talked a little bit about the org structure and the framework. So I think those are also key pieces, right? Make sure that, you know, if you're going to do it in an engineering strategy, you have a strong platform team, if you're going to, you know, have a Content T. Make sure they actually understand the domain that you're working with.
And so bringing those like or pieces together. But, but the number one key is like, make sure you're driving outcomes. What's the relationship between your data strategy, and Cloud, where does cloud fit into all of this? I can't believe we haven't spoken about that. I think of the cloud again, as a tool in the toolbox, right? It's not a means to an end it's in my Arsenal how I want to build my platform.
Yeah. I want infinite scalability, I wanted it, I want, you know, other people have done, the hard work on the infrastructure front, I want great databases that people have built to not rebuild that on my own. And so to me, the cloud is a great enabler, it's a great tool, my toolbox, it's a great way to get the scale. So it's definitely a big part of the overall data strategy. But again, it just to be clear. I just make sure like, It's not a thing for the sake of a thing, it's like, okay.
These are great components, great pieces of infrastructure that other people that I can now stand on the shoulder of giants execute on. And that's the way I think about it. It was super important part of the strategy just to be clear. I can see that you are you're very purpose-driven, you are always coming back to the reference point. Why are we doing it? What are we doing at? What are we getting out of it from them from the business? Outcome standpoint.
That's Very, very, very clear. And finally, one last question, what is the relationship between data and building business models economic models Financial models? How does how to those pieces fit together in technology? It comes down to algorithms and data and so like those are the two big inputs, right? Like, you need the data to be great. You need it to be clean and needed to be organized. You need to make sure that like all those pieces are set, it's easy.
Accessible. It's findable to govern and then that becomes a major input into the Algos, whether you're doing forecasting or algorithmic trading or helping your client with something that those pieces just have to like work together as a as a team to get the job done. Again, I'm putting words in your mouth, but this is just summarizing what you've just been talking about is the linkage to the out. Outcome and being clear about,
what's the data going in? What's the expected result at the other and and ensuring that the to match up? Yep, yep, exactly. And with that, unfortunately, we're out of time Nema. I just want to say a huge. Thank you for spending time with us. I really, really appreciate it. Thank you, thanks for the great questions. Thanks audience, amazing questions. I really had a really fun time talking to talking to you Michael. So thanks for having me, everybody.
Thank you. For watching, I just want to say a huge thank you to Nema Rafael. He is the head of global data engineering and the chief data officer of Goldman Sachs. Now before you go, please subscribe to our YouTube channel and hit the Subscribe button at the top of our website. Actually, you know, the Subscribe button has moved to the bottom of our website, so hit the Subscribe button at the
bottom of our website. So We can send you our newsletter and keep you up-to-date on our upcoming live shows. Thanks so much, everybody. Hope you have a great day and we'll see you next time.
