#333 Creating an AI-First Data Team with Bilal Zia, Head of Data Science & Analytics at DuoLingo - podcast episode cover

#333 Creating an AI-First Data Team with Bilal Zia, Head of Data Science & Analytics at DuoLingo

Nov 24, 202545 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Bilal Zia shares his journey of revitalizing Duolingo's data science team, focusing on building trust with leadership through impactful projects like accurate user forecasting. He details the successful "hub-and-spoke" model for team structure, emphasizing both embedded product context and centralized technical collaboration. The episode also explores Duolingo's AI-first strategy, viewing AI as an augmentation tool for data scientists to enhance productivity, automate tasks like anomaly detection, and unlock new product capabilities such as conversational language practice, while looking towards synthetic A/B testing.

Episode description

Data science leadership is about more than just technical expertise—it’s about building trust, embracing AI, and delivering real business impact. As organizations evolve toward AI-first strategies, data teams have an unprecedented opportunity to lead that transformation. But how do you turn a traditional analytics function into an AI-driven powerhouse that drives decision-making across the business? What’s the right structure to balance deep technical specialization with seamless business integration? From building credibility through high-impact forecasting to creating psychological safety around AI adoption, effective data leadership today requires both technical rigor and visionary communication. The landscape is shifting fast, but with the right approach, data science can stand as a true pillar of innovation alongside engineering, product, and design.

Bilal Zia is currently the Head of Data Science & Analytics at Duolingo, an EdTech company whose mission is to develop the best education in the world and make it universally available. Previously, he spent two years helping to build and lead an interdisciplinary Central Science team at Amazon, comprising economists, data and applied scientists, survey specialists, user researchers, and engineers. Before that, he spent fifteen years in the Research Department of the World Bank in Washington, D.C., pursuing an applied academic career. He holds a Ph.D. in Economics from the Massachusetts Institute of Technology, and his interests span economics, data science, machine learning/AI, psychology, and user research.

In the episode, Richie and Bilal explore rebuilding an underperforming data team, fostering trust with leadership, embedding data scientists within product teams, leveraging AI for productivity, the future of synthetic A/B testing, and much more.

Links Mentioned in the Show:


New to DataCamp?


Transcript

DataCamp AI Skills Introduction

Generative AI is transforming industries at an unprecedented pace. But as AI changes how you or your team work, one thing is clear. Your skills also need to evolve. At DataCamp, we offer everything you or your team need to adapt and thrive with AI. Whether it's business users looking to get the most out of ChatGPT and Copilot, or developers and data scientists looking to fine-tune models, you can learn the entire AI skills spectrum on DataCamp. Power your AI transformation today.

Start learning at datacamp.com The biggest success I had was investing in the people. not in ideas, because I think that order is really important. Ideas don't become successful if you don't have good people in the team. So finding that sweet spot between being able to communicate technical expertise and technical concepts in a simple, intuitive way is probably the most valued skill at Duolingo. AI is not going to replace what data scientists do. I believe that AI is going to augment.

what we do. So I think of AI as basically an army of research assistants. So every single data scientist can have access to an army of research assistants. From my perspective, there is still a strong need for a human in the loop.

The Challenge of Data Teams

Welcome to Data Framed. This is Richie. There are a lot of stories about data teams that have found amazing insights that have changed companies' tactics. help them make a lot of money or improve the customer experience, and they've all become heroes. Unfortunately, there are even more stories about how data teams had zero impact, either because their analyses weren't trusted or because the way they communicated their results.

wasn't understood by executives. This is a big problem. And on top of this, the role of the data team is changing. Many organizations have a mandate about becoming AI first or AI enabled or some variation on trying to make more use of AI. I want to know how data leaders can approach transforming their teams. Our guest is Bilal Zia, a senior director and head of data science and analytics at Duolingo. He moved into data science from economics, having had stints at Amazon and the World Bank.

Bilal is now running a high-performing team at a company with a reputation for great use of data and AI. However, when he first took the job, things weren't looking that rosy, with pervasive problems with trust and performance. Let's find out how he took his team from zeros to heroes. Hi, Bilal. Welcome to the show. Hi, Richie. Nice to see you. Happy to be here. Yeah, great to speak to you. Now, to begin with...

Rebuilding Trust and Performance

I know when you joined Duolingo, you inherited an underperforming data team. So I'm curious to be like, what had gone wrong? Yeah, great question. So typically when a new leader joins and inherits a team, they start. They can either start at the ground level or they can build up a team from scratch. What I think I like to joke that what I inherited was not the ground level, but the sub-basement. And the reason I say that is because there were a few data scientists at Duolingo when I joined.

But there was no head of data science. So the team was a little bit rudderless. It's not like the leadership wasn't aware. They were aware of this deficit and they had been looking for it. ahead of data science for a while. Obviously, their standards are very high, and they were looking for a very specific type of person, and they were just looking and looking. What ended up happening was that during this time, there was no advocate for these data scientists at the leadership.

level. And hence, there was a big wedge between what the data scientists themselves believed they were working on and the contribution they were making and what leadership believed that they were contributing. As a result there was basically this lack of trust in what data scientists did.

and also a lack of motivation among the data scientists themselves who didn't feel like their work was being appreciated very well. So in that sense, building up that trust with leadership, giving the team the motivation were sort of priorities number one. for me when I joined and I had to kind of get up to the ground level first before building the team up from there. Happy to go into sort of more details of that, but that's sort of the high level what I inherited.

That seems like a common problem when you have data scientists and think, oh, I'm working really hard. And then the manager's like, well, what are the data scientists actually doing? There's often this communication goal. And you mentioned that solving the trust problem was the first thing you had to do. Talk me through, how did you get started solving trust? Yeah, so when I joined, there were essentially

a need to understand what are the biggest data-related problems that the company is facing or what are some sort of big problems that company leaders are facing that data can help solve. So it wasn't even... uh at times there wasn't even an option that data science or data scientists or economists can help solve a certain problem so the first thing i did was to actually just generically go to all business leaders all possible stakeholders from finance to the leads of growth to

monetization to language learning etc just to understand the biggest problems that they're facing understand the constraints they're facing and then try and match the types of work that we can do to help solve those problems. So that matching exercise was sort of priority number one for me. And that alone did two things. One, it actually helped me direct resources to the right places. At least I tried.

At least the probability of success was higher because I wasn't just doing something that nobody cared about. And the second thing is that it kind of gave the leaders I was talking to. something to kind of appreciate that, look, there's this new leader. He's not just coming in and sort of doing whatever he wants. He's actually listening to us. This is a bottoms up approach.

and it's a collaborative approach and i think both of those things were super important and some of the things we did initially ended up being quite successful that built the trust battery with leaders and then ultimately we were able to do some big swings as well

And most of them were also successful because they were all informed by what are the biggest constraints and what is the likelihood of success? What is the investment needed? So ultimately, I think it was, there's a magic sort of bullet. to how to do this but it was basically just listening to the stakeholders and bringing them along was i think the key ingredient to uh to building trust

Structuring a High-Impact Team

It sounds so simple when you say it out loud. I just talk to all the people who had business problems, found out what they were, found out how data could help them. So it's just matching up what's needed and what can you actually do to help. Fantastic stuff. So we're going to get into a lot of the details of how you improved the team, but I want to skip the good bit. Just for a bit of motivation, can you talk me through some success stories you've had? Yeah, for sure. I think the...

The first thing that I will say is the biggest success I had was investing in the people, not in ideas, because I think that order is really important. Ideas don't become successful if you don't have good people in the team.

As I mentioned, I inherited sort of a smattering of data scientists. They were kind of reporting into different parts of the engineering org. And their managers are great. They're great engineers, but they're not sort of data science managers and they're not coordinated among each other. So it was kind of... acknowledge that this is a stopgap measure. So the first thing was to bring everybody under one umbrella and then to build some team structure.

And what that means is building some level of management layers so that junior data scientists have a reporting layer. They have some people to look up to. They have mentors. They have people they can brainstorm with. et cetera. So identifying who are the right folks to be leaders within the org, moving people around, letting a few people go, because I think...

People get demotivated by people who are not performing well. That actually does have a negative externality to everyone else. So maintaining sort of a high quality bar hiring. people where where there was a gap once this initial sort of internal audit was completed and then building a leadership layer so ultimately where we've ended up is that although i lead the data science org

Under me, I have people leaders who lead every single vertical that we work in. And then the org grows under them. So that's how the tree is going to grow. And it's scalable and sustainable. each of those leaders individually are fully invested in their particular pillar and then they talk to each other as well so it's uh that system i believe is much much better because it gives the blueprint

from where impact can actually materialize. So I think the biggest thing I did where I think I'm the proudest of is that building that structure from which impact can be generated.

Forecasting Success Story

specifically on like to give you an example of where we had the impact i think the example i've shared in other forums before i can repeat here is the first thing we did was to pick something that where the likelihood of success was high So we have, we're a public company, we put out a user forecast. So we have daily active users on a platform. Every quarter we have to forecast what the growth is going to be for the next quarter and for the next year.

underlying this user forecast is a scientific model it's like a forecasting model when i joined there was a model but there was a lot of discretion that was added on top of this model partly because there wasn't a lot of trust in the science And because of this discretion, the forecast was measurably off actuals. So it was about a 10% discrepancy between what we forecast where we would be and where we ended up.

And as you can imagine, the CEO didn't really have a lot of trust in this forecast. There would be jokes in their meetings that, oh, this is the forecast. Nobody really cares. So the first step was, well, from my perspective, this is a science.

problem we can solve this with just doing better science and so we actually worked really hard to dial up the rigor on the scientific model that generates the forecast we did back simulations on data from past quarters to see what the results would have been had we applied this methodology and we showed all the leadership in the company those results, which was that we would have been very close to actuals.

That sort of built an initial trust battery that, okay, let's try this. And then since then, we have been very accurate in our reporting. We have been within zero to 3% accuracy for the last... four or five quarters the last quarter was a little bit of an oddity we had some unexpected shocks but prior to that we've been between zero to three percent which is the sort of gold standard of accuracy the result of that has been that now all the folks who um kind of contributed to

um changing the scientific model and adding the discretion they're doing other things they have other things to do they're the chief of product they're the chief engineering officer they are the chief technology officer there they have other things to do that are more important than tinkering with the forecast and they're busy doing that they trust the forecast i think that's been a really big win that's pretty amazing and i have to say um

You can tinker around with all the other bits of your business, but if the data science isn't right, your forecasts are off, then yeah, nothing else is going to work. So I love that he's focused on getting the fundamental analytics right. And then that's the easy way to make.

Embedded vs. Centralized Data Science

to build trust is like, get the right answer. Okay. So beyond that, you mentioned that the organizational structure is something you're very proud of. And I kind of feel like... In a lot of organizations, they're sort of pushing data analysts and data scientists to be more embedded within.

within business units, but you said actually you wanted to bring everyone together so all the data scientists could communicate with each other and have mentors and things like that. Do you want to talk me through some of your decisions around the organizational structure there? Yeah, I think I see advantages of both of those models. So what we have is...

is a hub and spoke model. So essentially, the data scientists are embedded in individual product teams across the company. So the company itself is divided up into key pillars. So there's the growth pillar that focuses on user growth. There's the monetization. pillar that focuses on making money through subscription ads and in-app purchases. And then there's the learning pillar that focuses on our content and teaching better.

And within each of these pillars are several teams that are focused on individual parts of the mandate. And those teams are typically staffed by engineers.

product managers, designers, and now data scientists are embedded in that team structure as well. And I think that is super important for me because without context of what the product teams are thinking, what they're... direction is what their successes are we can't be successful we can't be successful as a sort of ivory tower team sitting in a corner doing our own thing and then sort of nobody will trust that nobody will think we're thought partners

So from my perspective, being embedded in the teams, being included in the team, ourselves thinking that we are part of the team. is really really important and that has been a key part of how we have been successful and how we've built trust at the same time i feel like this the hub part of the hub and spoke is really important as well which is there needs to be

centralized reporting within data science because yes you are a really good thought partner to product engineering design but you also need um sort of folks with the same technical know-how as you across pillars to understand how the other pillars ideas and constraints interact with the work that you're doing because then you can come up with better solutions and at the same time you can brainstorm ideas so for example one of the things that our team has pioneered over the last

several years is the use of machine learning in product this started off in monetization where we actually developed a model to predict the likelihood of user subscribing to our product and that model was very successful monetization and we have since migrated that same mentality or that same model structure over to user growth where we're now we're now

We now have models that predict the likelihood of a user churning so that we can act before they churn and try and convince them to stay. So that's an example of cross-pillar pollinization that's happened because we have a hub structure. And I feel that's really important. It's also important for culture. It's important for team spirit. We actually are a really fun bunch. We're not just colleagues, we're friends.

We make fun of each other. Our Slack channel is super active. We have a meme wall in our New York City office. We just make fun of ourselves, basically. So I feel like that culture aspect is not really possible if you're kind of... smattered across different small teams uh i feel like a central home is important where you come to be with your your people and then um then you go

be with your other types of your people, which is like the teams that you're embedded in. So I feel like that dual structure is quite important. Absolutely. Yeah.

If you need the domain knowledge, you want to really be solving business problems, then you've got to be close to those commercial teams. But also, yeah, you don't want... every data scientist to be inventing stuff on their own you do need to to share with other data scientists and yeah it's nerd out a little bit together you mentioned that um

This is a happy ending with all the team being friends. But earlier you said when you took over, you had to get rid of some people. And I'm sure it's very tough management decisions. But can you talk me through how you decided?

Hiring for Communication and Scrappiness

Who do you keep? Who do you get rid of? Who needs to be worked on? What are the criteria for the people you want to keep? Yeah, I don't think there is just sort of... hard data on this. A lot of this is subjective. A lot of this is based on feedback from stakeholders and also just talking to the data scientists themselves. For example, one, so I'll come to the specific question about who to keep and who to let go in a minute. But I think even before that, what was important is how to decide.

whether someone is capable of managing. So typically in engineering and even in data science, I think there is a tendency that management is just a route to seniority, that everybody's an IC. And then at some point in your career, you have to start managing because that's the only way to progress in your career. So I am a complete disbeliever in that methodology. I think ICs can grow.

to very senior levels and remain ICs if that's what they're passionate about. And management should not be seen as kind of a, I must do this. You must be passionate about leading people and being a good mentor.

uh if you want to do that so identifying those types of people so for example one of the people leaders that i have on my team she leads our user growth and forecasting and bi teams even when i joined she was actually just an ic i think she was managing one person and that was also stopgap but she was informally mentoring a bunch of people um people would just go to her and say hey

I need help with this. And she would make time, sit down with them, get them through the problem, et cetera. So to me, that was already kind of, I think she's ready. This is what she wants to do. And this is where she's good at. And now she's thriving. She's one of the best managers we have at Duolingo and not just in data science. In terms of how to decide sort of... who to let go. I think stakeholder feedback is very important.

And then just the output of their work. How long are they taking? What is sort of their mentality around working around constraints? So one thing I've noticed is that people can be the really good... data scientists can be very scrappy. If our data is not in perfect order, if there are other constraints, if they're facing some other sort of blockages to their work, they will find a way.

They'll find a way to deliver or they'll be very clear and crisp in their communication. The people who don't do well are the people who complain all the time. um they come to me to solve their problems uh i think that's fine to do that's part of my job but if that's the majority of our discussion then then i think there is a bit of a problem there's a lack of effort there was a bit of that so just

letting go of a few people, and then using the backfill to bring on people who are significantly aligned with the way that the rest of the data science org is working. I think that was quite a big game changer for us. Yeah, certainly. I can see how if you're complaining about problems all the time, like they just are just full of endless problems. So, yeah, you need to have that kind of level of autonomy and sort of self-motivation in order to try and solve stuff.

or it's not going to work. And I love the idea that just if you are mentoring, informally mentoring colleagues, that's a really good sign that you're kind of ready for management. That seems like a good indicator. You talked about bringing on new people. Are there any particular skills you were looking for when you were hiring new people? Like, what do you think are the most important skills for data scientists at the moment? Yeah, I think the... So this is something that...

I believe I have changed my mind about over the course of my tenure at Duolingo. So when I first came, I thought that if we hire the best technical folks, they're going to be successful at Duolingo.

And there's a reason for why I thought that. And the reason I thought that was I came from Amazon. So Amazon is a very different place in their growth trajectory than Duolingo. So Amazon is already... pretty well established they're sort of on the flat part of their s curve if you want to think about it in business cycle terms so in order for a data science team or a science team to have an impact

the level of rigor needs to be super duper high. And who are the folks who can do that? The people who are really high technical skills. And everything else doesn't matter so much. If they're terrible at communicating, that's okay. We're never going to put them in front of a stakeholder, etc., which is also fairly standard in some of the big tech companies.

At Duolingo, things are very different. Here, even our most junior team member gets a chance to present to the CEO. So technical skills are will get you through the door but what will make you successful is how well you can communicate your ideas in simple intuitive terms and that's where people were just really struggling and we had so many candidates who

came in, they aced our analytical brainstorm, they aced our technical tests. But then when we asked them to present to us, like do a presentation, a 45 minute presentation on any topic you want to talk about, they completely failed. So either their talk was overly technical that they lost everybody in the room, including data scientists, let alone the product managers who were in the room.

Or their talk was so simplistic that the data scientist in the room was like, well, I don't really understand the contribution you made here. So finding that sweet spot between being able to communicate. technical expertise and technical concepts in a simple, intuitive way is probably the most valued skill at Duolingo.

We are very selective in our interview process to find those types of people. It is painful because we have to go through a lot of interviews. Most of the people who come through that pipeline stumble at some point. But the people who make it through, obviously people make it through because we do hire. Those people have been very successful at Duolingo. So I feel like technical skills for sure, yes.

But then being able to communicate that to stakeholders, partner with them closely, bring them along are equally important priorities for us. Yeah, certainly finding that communication sweet spot, something I've spent my whole career looking for. So yeah, incredibly important stuff. And it is interesting that I guess the smaller the company you are, the broader the skill set you need. So you need the technical skills.

And those communication skills, whereas a larger company, you can sort of get away with being a bit more narrowly focused. Okay, so, you know, we've gone 22 minutes without mentioning AI, which may be, I think, a record for this year. But I'm curious as to whether...

Duolingo's AI-First Approach

the rise of generative AI has changed the profile you're looking for in data scientists? Let me just speak to it at a company level first. I think Duolingo has been an adopter of AI even before. Generative AI became such a big sort of thing in the last couple of years. So we've been using AI in the way that we generate our content. We have...

used AI in tweaking the difficulty of exercises that our user sees based on the exercises that they've just done. And most recently, we have a conversation practice. tool where one of our world characters

has a live conversation with you so you can practice speaking in the language that you're learning. And that's been a pretty big game changer for us because speaking has always been sort of the monkey in terms of learning a language. But now... thanks to the advances in LLMs, we can actually have an intelligent conversation at a very low cost with our users in their learning journey.

AI as an Augmenting Force

So we are definitely pretty high up in terms of the AI adoption. For data science in particular, my belief is that AI is not going to replace what data scientists do. I believe it... ai is going to augment what we do so i come from a research background i spent some time in academia before i switched careers and one of the really important

productivity drivers in academia are research assistants. So these are poor graduate students who faculty members just really torture and get a lot out of and pay them no money at all. So I think of AI as basically an army of research assistants. So every single data scientist can have access to an army of research assistants. And these researchers can go and do sort of simple tasks.

even more complicated tasks but definitely simple tasks that are repetitive that are predictable and i can go do this from my perspective there is still a strong need for a human in the loop. So there needs to be the supervisor who's sitting on top of these RAs and making sure that they're doing good work, auditing their work, and then being the final sort of quality screen before the work actually gets.

to stakeholders. So that's the way that we have been thinking about AI, and that's the way I think it's going to evolve for us over time as well. Yeah, that's interesting. And certainly, I mean, I've been a longtime Duolingo user, so yeah.

It's always been very good for the sort of reading and writing side of things, but having that actual conversation thing, that's a big game changer compared to just, oh, I'm saying something out loud to myself. So yeah, fantastic technology and innovation there. Going back to your data science examples of being able to automate repetitive tasks, have you got some specific examples there of things you have automated using AI?

So one of the things that we're working on, and we're still playing around with this tech, is being able to generate querying our database. in a simple intuitive way. So typically the way that you will query a database is to write a SQL query. And SQL is a specialized language. Not everybody knows how to write good SQL. And typically data scientists would get tasked every time. So we have some internal tooling that allows us to...

query some of our data, and it's pretty good. But what GenAI has enabled us is to be able to query that data in a conversational style. So you can basically go to the interface and say, show me the average DAUs in India from this date to this date. uh or or whatever you want so you literally like you're having a conversation so that has has the potential to unlock a lot more usage from across the company anybody who wants access to that

The way that we benefit from that is that those people are not coming to us anymore. So there's an opportunity cost, right? So we save time and then we can spend our time on building the next big thing. So we are investing in that. in that technology, building that interface. And we believe that's going to be a pretty big help for the rest of the company.

place where we're investing heavily is uh in our business intelligence capability so business intelligence for example one aspect of that is anomaly detection so we have metrics that we track And if those metrics are at some point in a day or in a week, they deviate from normal trend. That typically is a red flag that something has gone wrong or something has gone really well.

It depends on which side the deviation is. And we want to know. We want to know about what it is and why is it and what can we do about it. All of those questions, typically a data scientist would handle, including detecting the anomaly in the first place. So what we have, we are investing in automating is...

Moving away from a world where a data scientist literally has to open 50 dashboards every morning to an AI doing that. So it's automated. So we can get more sleep in the morning. That's great. And not only that, but... The future that we hope we get to is a place where not only does the AI detect the anomaly, but it actually even runs. the first set of sql queries that a data scientist would run anyway that okay well here's the deviation let me run a few data

queries to understand what are the key drivers, which segments is this affecting, which geographies is this affecting the most, and then present that data to the data scientists when they log in. So it's not just there was an anomaly. OK, let's figure it out. There was an anomaly. Here's where it's prevalent. Here's the way it showed up in our DAUs. And then it's up to the data scientists to do the next step. And by that time, like 50%, 60% of the work's already done.

So I feel like that's a productivity accelerant.

Fostering AI Adoption and Product Focus

Yeah. I mean, I love that your main metric for success is like, did I get a lie-in? Like, am I sleeping better? It feels like an incredibly important thing. And I guess, yeah, you solve the problems that keep you awake at night. So that's a good decision-making process. Sleep is super important. As I'm getting older, I'm realizing. Absolutely. All right. So, yeah, I like the second example about...

automating monitoring stuff. I think this is something that software teams in general do very well, particularly infrastructure teams, but then it's not really been that pervasive in data science until maybe recently. So yeah, that seems like a very good... use case. On your first use case, this was about getting better self-service analytics, I think. So I always thought this is being something that helps.

like other teams your commercial teams or less technical teams but actually you're saying it's really it's a benefit to the data science team uh because you're not being bugged all the time to jump in and do stuff. I just find the hard part is getting other teams to adopt it and have that confidence to do things themselves. Can you talk me through how you've persuaded all these less technical teams to actually go and do their analytics?

their own analytics. Yeah, I think it's still a work in progress. It is a good thing that you're pointing your finger at. I think the way that we've done it is we have these sort of... pilot sessions where or pilot periods where we invite the more data savvy non-technical users into the pilot and and they volunteer because they are very interested in this capability and we have them play around with it

first and then ask a ton of questions and that helps us make the prompts better that helps us sort of make the interface better etc and we're at that stage right now so where where we have sort of the more data savvy folks helping us make the product better and then ultimately the goal is once they're satisfied with it then they can help evangelize with their teams and their direct reports and for a product manager who say is not so

tech savvy, they're more likely to listen to their manager who is a PM rather than to a data scientist. So we believe that that way of evangelizing is likely to be more successful. Okay, so I like this sort of order of who gets to trial these new ideas. So it starts off with the most technical people and so your top data scientist, whatever, gradually moves on to product managers and then it's sort of everyone else. So yeah, interesting flow.

Yeah, I mean, I'm a big believer that you can do the best data science in the world, and if nobody uses your output, then it's useless. Well, of course, yeah, you've got zero impact if you're just doing fun stuff by yourself.

Okay, so actually, in terms of who the data team has to collaborate with, would you say a product manager is the most important people from your perspective, or are there other teams where you think, or other roles where you think, okay, we really need to get on the good side?

It's essentially the way that decisions get made at Duolingo or historically have been made are basically the EPD trifecta. So that's engineering, product, and design. So you have engineers, product managers, and designers. And then now, increasingly, we are the fourth leg in that stool. So it will be EPDD, so engineering.

product design and data science many many teams across these pillars we are already considered an integral fourth pillar a fourth stool a fourth leg in the stool and in some ways the way to summarize the way i think about The role of a data scientist is that a data scientist is basically a product manager with a technical hat.

They think like a product manager. They think like as owners of the product, they're invested in its success. They're all constantly problem solving and they're bringing data to come up with a better hypothesis for what to do next. That's very cool. I've not really thought of data scientists being a type of product manager. Suppose you are a data scientist, you want to get more product focused. What's step one in doing that?

Just investing more time in the context. So I can give you my own example. So when I joined, actually, I really do credit our CFO, Matt Skrupa. when i joined um so i joined from a culture where basically it's you join and you hit go and you go you don't you're you have no time to for for better or for worse it's it's just a very sort of fast-paced culture. I really enjoyed it when I was there. When I joined Doolingo, the first piece of advice that he gave me was, Bilal, you've just joined.

we have a very rigorous selection process obviously we've been looking for ahead for two years and now finally we have you so I don't have any expectations from you for the first month. I want you to sit and be a sponge and get context of what Duolingo does, what our product teams work on, how they work on things, what have we tried in the past, why have things failed?

if they have failed and why have things succeeded build that mental model and and then augment it and i really kind of that advice really resonated with me i kind of pass that along to everybody who joins the company and talks to me Because I feel that's really important. Without the right context, you can do the best data science in the world. You can build the best economic models in the world, but you're just going to be shooting in the dark, A, and then probably misfiring most of the time.

And if you have the context right, you still get A for effort, even if your solution is 80%. I do have an idea of just... making sure that new hires just get to soak up a bit about company culture, what is needed, what are the problems you're actually trying to solve. So having that thorough onboarding just seemed like a good way to set people up for success.

Yeah, we now invest a lot in preparing the onboarding documentation. So we have a whole, basically a master doc that provides for each pillar, depending on which pillar you're joining, a history of... what teams have been working on what types of problems, links to the experiment dashboard, the types of experiments they run, a list of people that the new hire should meet in their first day, their first week, the first month.

Driving Product Innovation with AI

And just to give them a head start to start developing that context and those relationships. That seems incredibly useful. All right. So I'd love to zoom out a bit to the company level. So there's a lot of companies where, I mean, there's a few buzzwords being thrown about, like they're trying to become AI first or AI ready or AI enabled. What does that mean for you as a manager at Duolingo?

I think the way that we think about AI is how can it actually enhance our productivity. So I believe a lot of hype around AI, at least initially, has been around cost-cutting. How can AI be cost-cutting?

i think that's important but i think more important is or more uh sort of exciting for us as a company is how can it actually make our product better so the example that i gave you on the video call feature where you can actually have a conversation is an example where it's enhancing our product it's unlocking a capability for providing speech practice, which never existed before. I mean, the only way to do that before AI was for us to actually hire tutors, like human tutors and interact.

and for them to interact with users. And when we have 50 million DAUs, that's just not possible. This is not feasible. So AI has unlocked that. Our stance towards AI is we are invested in making AI improve our product. And that's where we see the promise the most. I love that because, yeah, certainly like...

Cost cutting is important, but it's only your CFO is going to be really excited about it. Whereas actually, if you're trying to improve the customer experience, you're making 50 million people happy. And if we sit and do nothing, AI costs are going to go down. because of everything that's happening in the industry. So the costs have been coming down because of new models coming out, things getting cheaper, the foundational labs making better and cheaper models that go faster.

less compute. So the costs are going to come down on their own through market dynamics. So even if we do nothing, that cost component is going to help us. Okay. And do you have any advice for other data leaders on like... what they should be doing to encourage their data team to have more use of AI? I think definitely creating an environment where they have the incentives to try. different ai tools so and not penalized when there is a productivity drop so a lot of times we use

AI tools, and they really suck. And then we evangelize that learning among everyone else, or we sort of trial and error more. to see what works, what doesn't work. But just making room in the team to do that and for that not to be punitive, I think is really important because that's how you discover what's useful.

An example for how it's been really helpful for me is custom GPTs. So I have a custom GPT that I call something like a leadership ally. It's been a game changer for me because it's kind of like having an executive coach. It's not... perfect. A lot of times it's very agreeable, which I don't like. And I've tried to prompt it to be very critical. And it does at times a really good job.

But it's been very, very helpful in the way I develop communication with the rest of the company, the way I communicate with my direct reports, the way I communicate with the team. And I find that super duper helpful. It's been a very big game changer for me. I don't have the idea of having something to just second guess or at least critique your decisions. But yeah, certainly AI being too agreeable.

It's a common problem. I want you to tell me things that are wrong. It's like, oh yeah, you've done a great job here.

The Future: Synthetic A/B Testing

Okay. I think one of the big fears is around job security with AI. Are you going to automate yourself away? I know you talked a bit about psychological safety. Do you have any tips on how you can ensure that your team feel comfortable around that? So it's an interesting question around job security. So if you believe my hypothesis about AI being sort of augmenting, not replacing.

productivity, then you can imagine that the demand for data scientists is actually going to go up rather than down. Because if we can do more, then the marginal cost of output that a data scientist produces goes down. because each data scientist can do more and if that's the case well every company is going to want more data scientists because they can do more with us in my view

I believe in that hypothesis. Giving the teams the right incentives to play around with AI, get comfortable with it, use it in their... uh work is really really important for data scientists to be open-minded about it not close-minded about it because yeah it's definitely here it's not going away it's the productivity enhancer they should think about it that way and they should find ways

in which it's super helpful for them. Absolutely, yes. I feel like in most companies, it's not like there's a shortage of data science problems to be solved. So productivity enhances, productivity boosts, they're going to be incredibly useful, I don't think.

Well, there may be some companies where they're like, okay, dated time, it's just too expensive, we'll get rid of them. But if you're showing your value, you've got that trust with management, then it ought to be an overall benefit, I think. Okay, wonderful. All right, so... Can you talk with me, what are you most excited about in the world of data and AI? Yeah, it's a good question. So we do a lot of, so this is aspirational. I believe you're asking an aspirational question.

I believe the future, what I'm going to discuss is something that I believe is the future of how we think about sort of experiments and testing and A-B testing in particular. So we are very... big into A-B testing. We run hundreds of A-B tests every quarter at Duolingo. Every single product feature that gets shipped out is A-B tested first. And this is true for a lot of companies, not just Duolingo. Now one of the perils of A-B testing is

that there typically is a blast radius around AB tests. Around 50%, I think on average, AB tests are not successful. They don't get launched. So our...

capability of generating great hypotheses is not very good. So we test things and then half the time we're wrong. It doesn't work out. And what happens with... with those things that don't work out is at times things actually go get worse that we we do more harm than good in in some experiments so there's a blast radius What if there was a way for us to actually pre-assess the blast radius and not do experiments on humans that are on our users that would actually not benefit them?

I believe there is a lot of research going on in this, but there is definitely a world where we will be doing synthetic A-B testing. So instead of real users, we can actually use LLMs to mimic the behavior and decision-making of actual humans, and then be able to run hundreds of A-B tests, not in a quarter, but overnight. Now, there's a lot of problems to solve around this. There are problems on how best...

mimic human decision-making. There's problems of cost, but I believe this is the future and I'm pretty excited about it. That's a pretty radical idea, the idea that you've got...

or bots instead of real people. You can run your A-B test against those and see what performance best is you're doing optimization in a safe way before you're exposing your users to... sketchy new features uh all right uh i really like that as an idea uh yeah uh i guess fingers crossed uh we can make it happen or someone can make it happen finally uh i always want uh new people to follow uh whose work are you most excited about at the moment

Along the lines of this type of synthetic A-B testing, there is a team at Amazon, my former team actually is working on this, and then they're collaborating with some really top... academics. So there's an economist at MIT, Victor Chernozoukov. He's a very famous econometrician. I follow his work, and he's involved in some of this. They just released a paper.

believe it's called agentic economic modeling. So that's basically an economics-based framework around thinking about synthetic A-B testing. So I do follow his work quite a bit. yeah that would be that would be something that i'm uh or at least a person that i'm following currently

Okay, wonderful. Yeah, I mean, it just seems such a fascinating topic. I think definitely something to dive into in more depth. Maybe a future Data Framed episode. Wonderful. All right. Thank you so much for your time, Bilal. Thank you.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android