Why We Need More Black Data Scientists w/ Matthew Finney

Speaker 1

00:01

So why is it fairness part of our process here? It's because, well, as data scientists and statisticians and researchers, we had good intentions. We lack those mechanisms for action. We lack things in our process that force us to consider hard questions. We need to use our brains a little bit more than we need to for other problems that we solve every day. And so why don't we solve these hard problems? It's because we lack incentives as

00:27

a community data scientist to do something. Um, it's a hard problem, and we have no transparency and no accountability for the models that we produce. Right, So that means that we have little hard business reason to prioritize fairness and to spend time working on addressing this hard problem. Well, you see a black tech green money. Let's talk about algorithmic bias. You probably like, yo will. What in the

00:53

world is algorithmic bias? The wikipedias is it describes systematic and repeatable error in the computer system that create unfair outcomes, such as privileging one category over another in ways different from the intended function of the algorithm. Now we can debate whether these things are intended or not intended. But

01:14

that's a different conversation for another day. But these canna have a direct impact on you when it determines which political ads you see, or how many cops are deployed in your neighborhood, or even your insurance premiums, how much

01:28

you pay for insurance. It was a study that show even though black Americans are four times more likely to have kidney failure, an algorithm to determine the priority of patients on a kidney transplant list put black patients lower on the list than white patients, even when all other factors remain identical. So today on Black Tech, Green Money, we're hearing from Matthew Finney, who's a data scientist is

01:53

strategy consultant at Harvard. He was a speaking from Afro Tech World and in his day job, he phillips AI decision systems to help large organizations and make an impact on their most challenging business emission problems. I can sometimes be a reluctant technologist, don't get me wrong. In the last decade we have made some amazing feats with artificial intelligence.

02:17

We've been able to figure out what you want to buy before you knew you wanted it we can have a self driving, artificially intelligent electric car, and if that was enough, we put it in space. We've trained AI to read mammograms with particular skill at diagnosing a set of highly invasive cancers that radiologists had missed, but we still hadn't figured out how to make our technology treat others the way that we would want to be treated. So I promise I'm not just gonna stick to that

02:47

gloom and doom topic today. So what are we gonna do. First, we're gonna define and measure algorithm bias. Then we're gonna figure out how we can isolate the root causes of poor algorithm behavior, and finally, we're going to learn how we can all take action to make algorithms more fair. So let's get started. I want to evaluate algorithmic bias here through the lens of a case study, and we'll learn how to, through this case study, apply the tools

03:13

more generally. Kidneys are really important. Obviously, their main function in our body is to help us filter out waste, and so there's a metric of kidney function called the glomerular filtration rate that's very important for diagnosed and kidney disease However, this metric is really hard to measure directly. If you were going to measure directly, you need to collect the waste from the kidney over the period of twenty four hours. So it's not practical, it's not fun

03:41

for anyone. That's why in the seventies they developed an algorithmic way to estimate this metric. UH Doctors can take a sample of your blood and measure the level of asset called creatomy that's in your blood sample, and there's a Russian equation that takes that crowdning metric and turns it into a kidney function index, this creating any metric

04:07

that they use. When researchers were developing the model, they realized that creating is highly sensitive to someone's muscle mass, you know, given that it's actually a byproduct of muscle activity.

04:19

And so when they were trying to make the algorithm as accurate as they could, researchers determined that because African Americans have higher muscle mass, they have higher baseline crawdning levels, and so they decided that they were going to adjust the c k D EPI algorithm, this kidney function algorithm, to increase kidney function index scorers for African Americans to

04:41

control for this muscle difference. Here, a higher kidney function score indicates that your kidney is healthier, so African Americans were being given kidney index scores that were showing their kidneys were healthier than a white person with the same observable metrics. Interestingly, the United States is the only place in the world that we do this race correction for kidney functions, and there are many other places in the world where we have a large population of people with

05:08

African heritage. This is because people see that this correction is unfair. There are two specific definitions of fairness that we use in the algorithm community. The first is group fairness, and the idea behind group fairness is that in your data set, you have groups that are identifiable and they should be treated similarly to the population as a whole. Right, So a group could be all people with blue eyes, people with red hair, everyone who lives in Minnesota, all men,

05:40

people of Latin heritage. All those are examples of groups. And if you have an algorithm that is grouped fair that means that the algorithm treats all of these groups similarly to the rest of the population. Regardless of whether or not the algorithm has that information about the sensitive attribute. That means someone's in a group or not. So let's look at the second definition, individual fairness. Individual fairness means

06:05

that similar individuals should be treated similarly. In An example of that is, let's say you have two people who have equal incomes and equal credit history, and they're applying for credit at a bank, and the bank uses an algorithmic decision system to determine whether or not to extend

06:23

credit and a certain credit limit to the customers. So, given that they had the same income and the same credit history, even though one is male and the other's female, both individuals should get the same credit limit if the algorithm is individually fair. So now let's dive into this kidney function algorithm again and let's think is this algorithm fair. So first we'll look at the group fairness of the

06:47

c K D E P I algorithm. UM. The chart here on the rank is taking a look at the media number of days that adults in the United States who received kidney transplants spent on the waiting list for a kidney before they receive the transplant. UM something stands out almost immediately here, and it's that African Americans can spend over twice as long as Caucasians on the waiting

07:15

list for a kidney in the United States. Right, So, African Americans are spending years on the waiting list, and part of this is because of the c K D e PI algorithm that's giving them higher kidney functions scores even though their kidney might not be functioning well, and that puts them at a lower priority on the waiting list for a kidney. So this is treating African Americans as a group different from groups of other Americans, and

07:42

that's something we should be concerned about. This algorithm is not group fair. So now let's consider is this algorithm individually fair. Individual fairness means that we treat similar individuals similarly. And in this algorithm, we can have two individuals who have the same muscle mass and the a level of

08:00

creating me measured in their blood. But if one of them is white and one of them is black, they're going to get different scores for their kidney function, such that the black person will get a score indicating a healthier kidney than the white person. Um this is concerning, right, This is not individually fair and the medical community starting

08:21

to come around to this. So last year in the Journal of the American Medical Association, they published an article asking to reconsider the use of race and the kidney function algorithm. And there was a sentence here that I thought was really important. With the e G. F Our equation that's being used, it asserts that existing organ function is different between individuals who are identical except for race.

08:46

Race is causing African Americans to get unfavorable scores of their kidney measurement function that might lead them to get a lower priority on the waiting list to receive an organ that's desperately needed. This might seem obvious that these types of scenarios are bad, right, and we shouldn't be using race for something that could have unfair outcomes that cause life or death situations for people. But this keeps

09:15

happening over and over again. Any week you can open up the newspaper and see a new algorithm that was racist or sexist. You know, name YOURYSM. There's an algorithm that is suffering from it. So let's talk about how and why this happens. First, I want to just talk about how we make models. Algorithmic models are function of

09:36

three things, technology, people, and process. On the technical front, you know, that's where we consider the data that you're using to train your model and the specific algorithm for example, so that could be a neural network, that could be

09:50

a linear progression, that could be anything in between. On the people front, you know, that's where we consider the role of people like myself, data scientists, business owners who come up with the business requirements for these algorithms, and the end users who actually take the algorithms and put them into practice to make decisions. And the last component here are the processes, the processes that we use to tread our models, to evaluate our models, and apply them

10:17

in practice. And by breaking down the process of building a model into these three components, we can evaluate them individually when we want to determine the root cause of algorithmic fairness or algorithmic bias. So how did we make a biased kidney function model in the context of these

10:35

three components. First, let's look at technology. So when researchers were developing the c K D E p I algorithm, they had many different ways that they could consider that we're technologically feasible to measure and estimate e g. F R. There was a direct way of measuring at gloom earlier filtration rate, which was very difficult but not impossible, and

10:59

we could have on with that as medical community. There were other alternatives to things that we can measure in the blood Beyond looking at the creatomy, which is sensitive to muscle mass. We could have instead decided to look at sistat and see, which is another indicator of kidney function that has no sensitivity to muscle muscle mass. And there were also better ways of measuring muscle mass that were technologically possible beyond just looking at someone's race to

11:27

estimate muscle mass. Right, So technology wasn't the constraint here that let us to have a unfair algorithm for measuring kidney function. Let's evaluate the people. Now it's gonna sound like I'm glossing over this one, but I really do want to assume the researcher's best intentions here when they decided to build this regression model for measuring kidney function.

11:49

And I also want to assume that the doctors have only the best intentions and the best interests of their patients and mind when they make decisions on ordering this test and recommen patients for kidney transplants, So I don't think that people are the constraint here either. That led us to have a biased model. So now let's look at the process. The process here for building this model

12:11

was optimized for overall accuracy of the model. So we mentioned how when researchers decided to include race in the model that they were training, they got a slight overall accuracy boost in the model, and that was the driving factor in the decision to include race as a predictor of kidney function. That process, that's where I want to

12:31

dive deeper. That's where our failure was. We had a process that was optimized for accuracy and not for fairness objectives, and because of that, that's how researchers developed a kidney function model that was biased racially and had led to

12:50

unfair outcomes. A couple of years ago, the US Department of Education Civil Rights Data Collection released information showing that black and Latino students lack access at the high school level to high level science and math classes and predominantly white schools, calculus was offered across fifty percent of them. In predominantly minority schools, just thirty three physics sixty seven percent for white, forty percent for minority, algebra eight fo

13:34

percent for white, seventy one percent for minority. Now this matters because these have downstream effects. High aptitude in these STEM fields us higher representation in STEM careers. So when we're not represented well, the systems don't get built for us or even with our input appropriately considered. So how can these systems that weren't built with our input play

13:57

out negatively in our communities? As a data scientist, you know, we are in a profession where there's a high emphasis on overall accuracy and a number of procedural technical controls that promote that. On the technical side, we have many metrics like just overall vanilla accuracy, MSc, precision recall, you name it, specialized metrics to measure the accuracy of our models. And then we have procedures like p testing that help us make determinations about whether or not we should deploy

14:32

a certain model into practice. But we don't have that same infrastructure for fairness. Um. As someone who's been in the room where it happens, you know, I can tell you where I think specifically, this type of process breakdown affected our our kidney function model that we've been evaluating. So let's look at specific things that they missed. UM. First, let's address this chart here on the right. This is a chart that shows muscle mass by ray among a

15:00

population of the US adults. The blue line represents white Americans and the red line represents Black Americans. So we can see that while on average, black Americans have a slightly higher muscle mass and white Americans UM, this shift is so slight that the distributions of muscle mass by race overlap almost entirely. What this tells me as a data scientist and a statistician is that an individual's race tells me next to nothing about that person's muscle mass.

15:32

And so, as a researcher developing a kidney function algorithm, if I was concerned about muscle mass, I would have seen this chart and said, Wow, race is not a predictor for muscle mass. That's going to help us, uh improve the accuracy of our algorithm in a way. That's fair, because you know, if we treat individuals as just members of a race, we're actually not going to give that

15:54

person the best healthcare. So nothing in their process forced them to look at whether or not race is predictive um, in in in a broad sense for their objective, which was to control for muscle mass. Nothing also forced them to consider what the impact of using race would be on the fairness of their model. So they didn't consider

16:17

the societal impacts of using race and healthcare. They also didn't consider, um, how that would impact individuals you know, who are on the waiting list for a kidney, and how that might lead to individuals who are equally qualified to receive a kidney uh be uh differentially prioritized on the list to receive that kidney based on race. So why isn't fairness part of our process here? Um? It's because well, as data scientists and statisticians and researchers, we

16:50

had good intentions. We lack those mechanisms for action. We lack things in our process that forced us to consider hard questions. UM. It would be really easy to say that we have biased algorithms because there are biased individuals who want to encode their bias and the algorithms. UM. And while I can't rule that out completely, let me tell you that of the time that is not the case. Right.

17:15

Here's my hypothesis. Fairness is context specific um, meaning that depending on what type of algorithm we're training, there might be a different fairness subjective, and there might be different rules for what's fair and what's unfair. So, for example, there could be some healthcare scenarios where race is actually an important predictor of a person to have overall health or or risk for a disease, and those scenarivos might be areas where it's fair to include race in an algorithm.

17:50

But it's something like this kidney function algorithm, we can see that including race is clearly unfair. Um. And it's because that there are these multiple notions of fairness with different context dependencies that fairness is actually a hard problem to solve. And for data scientists, you know, this is a hard problem without a unique, closed form mathematical solutions, meaning we need to use our brains a little bit more than we need to for other problems that we

18:17

solve every day. And so why don't we solve these hard problems. It's because we lack incentives as a community data scientist to do something. Um, it's a hard problem, and we have no transparency and no accountability for the models that we produce. Right, So that means that we have little hard business reason to prioritize fairness and to spend time working on addressing this hard problem if no one's ever going to be able to see, you know, the steps that we took to address it and the

18:47

impact of our work. So, considering this process and mechanism failure for fairness, how will we end algorithmic bias? So I want to return to this idea, yeah, that algorithmic models are a function of three major components technology, people, and process. This is actually a question I asked often, and I've asked in conversations about algorithm algorithmic fairness with all kinds of people technologists, computer scientists, mathematicians, lawyers, ethicist, activists,

19:21

policy makers, and sociologists and many more. Right, And so I found through these conversations and through some of my own research that there are many existing approaches to addressing algorithmic bias, and they generally fall in the technology and people the veins. And so that's what we're looking at here, just a couple of those different approaches that are already out there that allows to address algorithmic fairness on the

19:47

technology front. I want to highlight that we already do have class of algorithms that are always fair or fair within certain constraints, and we're not always using them our work. That's the problem. But there are tools out there that allows to implement these very directly. So IBM, for example, recently released a toolkit called AI Fairness three sixty UM and it has fair machine learning algorithms and machine learning diagnostics already implemented in Python that can be adapted to

20:22

any other type of prediction problem. Now, if you're a little bit more adventurous, there's also a community of academics who are on the cutting edge of research of algorithmic fairness. And I'll point out the Symposium on the Foundations of Responsible Computing as one place where you can go and learn about a lot of those really cutting cutting edge

20:43

research topics. All these videos from the symposium are actually publicly available on YouTube, so that you can add your leisure learn about these topics from the academics who developed them themselves. On the people front, right, we have a lot of existing organizations that attack length education and tackling the social movement component of this as well. Just to name a few of organizations that are doing many great things.

21:08

Are we have data for black Lives and the Algorithmic Justice League that are tackling that social movements and social activism approach to encouraging algorithmic fairness. And then there's also an organization called AI for All that is UH tackling

21:25

the education. So given that we see a lot of existing work out there on the technology and people fronts, I want to turn our attention to process where there's relatively less existing work, and that's where the focus of my research is what mechanisms can help us to build

21:43

fair algorithmic models. I'll return to those challenges that we discussed before, the fact that algorithm fairness is hard to define and hard to measure, and because of a lack of transparency and accountability, we have a few incentives to actually go in an and tackle the heart problem. So first I want to propose an approach that will allow us to make this hard problem a little bit easier for us to solve. And it's called a fairness statement.

22:11

So what is a fairness statement? That's an application specific commitment to defined and measurable fairness goals. The scope of this fairness's statement is going to include defining the relevant fairness objective or constraint for the specific algorithm that we're working on developing. So, for example, that could be we want to make sure that African American people and white

22:35

people received similar kidney functions scores for similar actual kidney function. Now, now that we've defined a fairness objective, we can document potential sources of bias that might impact our fairness subjective and also the downstream impact will see two individuals or groups, right, So this might be the place where we raise well, if our algorithms racial racially bias, we might see African Americans play prioritize at a lower priority on the kidney

23:05

waiting list, and I might leave to adverse healthcare outcomes for that population. Finally, once we've documented the source of biases, we can identify appropriate procedural and technical controls that we would would take to mitigate the unacceptable risks. Right. So that could be, for example, implementing one of the classes

23:26

of fair algorithms that we discussed before. One of the key benefits of the fairness statement is that it gives data scientists a named goal they can work towards, and that helps them informed choices and trade offs in the

23:40

development of algorithms and the deployment. So, for example, if we had a fairness statement that was in place for the researchers who developed the c k d EPI algorithm for kidney function UH, that might have helped them say, hey, we could include race and have a slight bump and overall accuracy for our algorithm. But that presents a high risk of unfair outcomes. Therefore, the cost of this solution outweighs the small benefit of controlling for race and measuring

24:15

kidney function. Now, the other key thing fit here is that this allows algorithmic developers to catch problems early, at the stage when the algorithm is still in development and before it's been deployed into the world. This might mean that we catch an issue before it actually creates harm for people in real life. So now that we've talked about how we can make the UH fairness problem a little bit less hard, now let's talk about how we can incentivize people to actually tackle it. I want to

24:50

propose an approach called the algorithmic Practice audit. So what is this? As an independent third party review of an organization's algorithmic the season outcomes. On the process front, we might evaluate questions like are we using a representative training data set to trade our model. We might also question whether or not the organization is using fair classes of algorithms when they exist to train models. On the outcome front, we might evaluate the actual fairness objective that was in

25:24

the fairness statement. Is the model meeting the stated fairness goals. We might also be able to look at whether or not biases introduced by humans in the last mile of the algorithmic decision making process, right, so in that stage where the algorithm has made a prediction and then it takes a human to go and implement it and turn

25:44

it into a decision. So a key benefit of this is that it's a forcing function that allows our data scientists to actually invest time in algorithmic fairness because there are penalties. There are real penalties um to not actually having a fair algorithm. And another key benefit is that this can be a signal for your organization to your customers and shareholders that any algorithmic services you provide are

26:10

correct and fair. Right, So imagine that you're a customer and you can transact with an organization that you know has fair algorithms and that is certified as such, or you can spend your money with another organization where it's upanly or whether or not their algorithms are fair. You might choose as a customer to spend your money with

26:28

an organization that has fair algorithms. Now, if you're a shareholder, you might also be at more confident in an organization that you know is UH spending time and energy on algorithmic fairness, because that might be a signal to you that the organization won't end up on the front page of the New York Times for having an unfair racist

26:50

algorithm in the future. And I want to just highlight that while this seems like a hard problem, these types of mechanisms actually work and we can implement them to make change in the way that algorithm predictions happen. So let's look at the example of the system Risk indicator. In the Dutch government developed the system Risk Indicator to

27:16

detect benefit fraud. Right, But while the government developed it, it was only applied by a certain number of cities, and the cities that applied this algorithm um only applied it to some of the applications for benefits that they received, and specifically it was applied in low income and immigrant neighborhoods, So these populations of people were specifically targeted by the

27:39

algorithm to identify possible benefit risk. This is unfair and and the Dutch court actually UH did an investigation and found just as much. UM. They shut down this algorithmic system because of the possibility of discrimination based on socio economic status, ethnicity, and religion. Essentially, what they found was that the algorithm did not meet the stated fairness objectives of the Dutch government because it was discriminating against people

28:12

based on immutable characteristics. And because of that, they stopped using this algorithm UH in benefit processing for Dutch citizens and residents. So we know it works. What will you do to create fair algorithms? I want to leave you with a couple of my suggestions, UM, and this is something that we can tackle as organizations and also as individuals. In an organization, you might question whether or not you're using existing classes of fair algorithms, such as those released

28:47

by IBM and the AI three sixty tool kit. You might also consider whether or not you have mechanisms in place to ensure algorithm fairness, such as the Fairness Statement and the algorithmic Practice audit. As an individual, you might do an inventory of all the algorithmic decisions that occur in your life. You know, with customers that you work with, with companies that you buy from, with your employer, with

29:13

your apartment building. These are everywhere. And then once you've done that inventory, you might request and review algorithmic audits from the organizations that you know are making some of the most impactful decisions about you using algorithms. Black Tag Green Money is a production of Black the af Road Say from the Black Effect podcast Network and iHeart Media. Is produced by Morgan Dabon and me Well Lucas, with

29:49

aditional productive support by Love Beach Merissa Lewis. Special thank you to mikead Davis, your main Hall of It Necessarianto learn by guests and other technistuff does the Innovatives an Afro tech dot com and join your Black Tech Green Money. Leave us a five star rady on iTunes. Go get your money. Peace in Love,

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript