Yeah. Welcome to How to Citizen with Baritune Day, a podcast that reimagined citizen as a verb, not a legal status. This season is all about tech and how it can bring us together instead of tearing us apart. We're bringing you the people using technology for so much more than revenue and user growth. They're using it to help us citizen. I have been working over the past year to try to integrate my own thinking around technology, and last year
I wrote a bit of a manifesto. Back in I was invited to speak at Google IOH, an annual developer conference held by Google. They wanted me to share my thoughts on what the future of technology could look like. I went on a journey to try to understand how all my data existed amongst the major platforms, amongst app developers, and what came out of that was a set of principles to help guide us more conscientiously into the future. Now. The first principle of my manifesto is all about transparency.
Like I wanted to understand what was going on inside the apps, behind the websites I was spending all my time on. When I want to know what's in my food, I don't drag a chemistry set to the grocery store and inspect every item point by point. I read the nutrition label. I know the content, the calories, the ratings. I shouldn't have to guess about what's inside the product.
I certainly shouldn't have to read thirty three thousand word legally's terms of service to figure out what's really happening inside. It's pretty simple. We make better decisions about the things we consume when we know what's in them. So if I'm checking out an app on the app store right and I see upfront that it's going to harvest my data and slang it on some digital street corner, can I interest you in so data? I can ask myself, Hey, self, are you okay with this app harvesting your data and
slanging it on a digital street corner? And then, having asked myself that question, I can decide whether or not to download it. I don't have to hope that it won't screw me over. I can know, but check it out. This nutrition label idea hasn't just existed in the vacuum of my own brain. It's a real thing. There are actual people making nutrition labels in the world of tech.
In the same way that I walk into a bakery and I see a cake that's been baked, and I might think to myself, I wonder what's in that cake. We would want the same thing for a data set, where even if you encounter that data set in the wild, you, as a data practitioner, will think to yourself, I wonder if this is representative. Cash of Malinsky is one of those people. These labels are a little different from what
I propose that Google, I yoe. Their data nutrition labels aren't for consumers like me and you at the end of the assembly line. Instead, therefore, the people at the very beginning the data scientists. Now, Kasha's data nutrition labels are an easy to use tool to help data scientists pick the data that's right for the thing they're making. We interact with algorithms every day, even when we're not aware of it. They affect the decisions we make about hiring,
about policing, pretty much everything. And in the same way that we the people ensure our well being through government standards and regulations on business activities. For example, data scientists needs standards to Kasha is fighting for standards that will make sure that artificial intelligence works for our collective benefit or at least doesn't undermine m Hi. Hello, how are you feeling right now? Kasha? I'm feeling pretty good the
beginning of another way. Kasha is the co founder and lead of the Data Nutrition Project, the team behind those labels. They've also worked as a digital services technologist in the White House, on COVID analytics at Mackenzie and in communications at Google. Yeah. Yeah, so I've kind of I've jumped around. Yeah, so why don't you introduce yourself and just tell me what you do. My name is Kasha Shamalinski, and I
am a technologist working on the ethics of data. And I'd say, you know importantly to me, although I have always been a nerd and I studied physics along time ago. I come from a family of artists. Actually, the painting behind me is by my brother. There's another one in the room by my mom um. And so I come from a really kind of multidisciplinary group of people who are driven by our passions. And that's kind of what I've tried to do too, and it's just led me
on many different paths. Where does the interest in technology come from? For you? You know, I don't think that it's really an interest in technology. It's just that we're in a technological time. And so when I graduated from university with this physics degree, I had a few options, and none of them really seemed great. Uh. You know, I could go into defense work, I could become a spy, or I could make weapons, and that really wasn't so
interesting to me. Was being was Was spy really an option? Uh? Yes, so you know I could do that, um, but I didn't end and none of these are really interesting because I wanted to make an impact and I wanted to drive change, and I think that was around you know, um, early thousands, and technology was the place to be. That's where you could really have the most impact and solve really big problems. Um. And so that's where I ended up. So I actually don't think that it's really about the
technology at all. I think that the technology is just a tool that you can use to to kind of make an impact in the world. I love the way you describe the interest in technology is really just an interest in the world. So do you remember some of the first steps that led you to what you are doing now? So when I graduated, I actually applied to many things and didn't get them. And what I realized that I really didn't know how to do it at all. Always tell a story. Um, and coming out of a
fairly technical path, I couldn't really make eye contact. I hadn't talked to a variety of people. I mean, I was definitely one of the only people who had my identity in in that discipline at that time. I went to a school where the the head of the school at the time was saying that women might not be able to do science because biologically they were inferior in some way. Oh that's nice, very welcoming environment. Oh yeah, super welcoming. And I was studying physics and at the time,
I you know, it was female identified. I now identify as non binary. Um. But it wasn't like a great place to be doing science, and I just felt like coming out of that, I was, UM. I didn't know how to talk to people. I didn't know what it was like to be part of a great community. And so I actually went into communications at Google, which was strange duringdustory industry. I went from this super nerdy, very male dominated place to like a kind of like the
party wing of of technology at the time. Right, So people who are doing a lot of marketing and communications and talking to journalists and telling stories and trying to figure out like what's interesting it has this fit into the greater narratives of our time. So while at Google, I got to see inside of so many different projects that I think was a great benefit to being part
of that strategy team. So I got to work on core Search, I got to work on image search, I got to work on Gmail and Calendar, and I started to see the importance of first of all, knowing why you're building something before you start to build it, right, And there were so many times that I saw really really cool product at the end of the day, something an algorithm or something technical that was just really cool, but there was no reason that it needed to exist.
Right from from a person perspective, from a society perspective, I am relieved to hear you say that. There's been one of my critiques of this industry for quite some time. It's like, whose problems are you trying to solve? And so you were at the epicenter of one of the major companies seeing some of this firsthand. Yeah, that's exactly right, And it was endemic. I mean It just happens all the time, and it's not the fault of anyone in particular.
You just put a bunch of really smart engineers on a technical problem and they just find amazing ways to solve that. But then at the end of the day you say, well, how are we actually going to use this? And that would fall to the comms team, right or the marketing team to say, okay, now what are we
going to do with this? Um So that was one thing and that's why I actually ended up moving into product management, where I could think about why we want to build something to begin with, and to make sure we're building the right thing. Um So I got closer to the technology after that job. The second thing that I became aware of is the importance of considering the whole pipeline of the thing that you build, because the thing that you build, it's d n A, is in
the initial data that you put into it. And I'm talking specifically about algorithmic systems here. So one example I have from my days when I was at Google. I actually I worked out of the London office and there was a new search capability and it was trained entirely on one particular accent and then when other people tried to use that, if they didn't have that very specific accent, it wasn't working so well. And I really didn't know
much about AI at the time. I hadn't studied it, but I realized, you know, bias in bias out like garbage in garbage out. You you feed this machine something, the machine is going to look exactly like what you fed it. Right, you are what you eat. We'll be right back. We use these terms data, we use this terms algorithm and artificial intelligence. And so before we keep going, I'd love for you to pause and kind of explain what these things are in their relationship to each other. Data, algorithms,
artificial intelligence. How how does Kasha define these? Yeah, thank you for taking a moment. I think that that's um. Something that's so important in technology is that people feel like they aren't allowed to have an opinion or have thoughts about it because they don't quote unquote don't understand it.
But you're right, it's just it's just a definition all issue. Often. So, data is anything that is programmatically accessible that is probably in enough volume to be used for something by a system. So it could be records of something, it could be whether information. It could be the notes taken by a doctor that then get turned into something that's programmatically accessible. There's a lot of stuff and you can feed that
to a machine. I'm really interested in algorithms because it's kind of the practical way of understanding something like AI. It's it's a mathematical formula and it it takes some stuff and then it outputs something. So that could be something like you input where you live and your name, and then the algorithm will churn and spit out something like you know what race or ethnicity it thinks you are.
And that algorithm, in order to to make whatever guesses it's making, needs to be fed a bunch of data so that it can start to recognize patterns. When you deploy that algorithm out in in the world, you feed it some data and it will spit out what it believes is the pattern that it recognizes based on what it knows. You know, there's different flavors of AI. I think a lot of people are very afraid of kind of the terminator type AI. I'll be back as as
we should be, because the terminator is very scary. I've seen the documentary many times and I don't want to live in that world. Yeah, legitimately very scary. UM. And so there's this there's this question of Okay, is the AI going to come to eat our lunch? Right? Are they smarter than us? And all the things that we can do, and that's like you know, generalized AI or
or even kind of super AI. We're not quite there yet. Currently, we're in the phase where we have discrete AI that makes discreet decisions and we leverage those to help us in our daily lives or to hurt us. Sometimes data as food for algorithms. I think it's a really useful metaphor. And a lot of us out in the wild who aren't specialized in this, I think we're not encouraged to
understand that relationship. I agree, And I think the the relationship between what you feed the algorithm and what it gives you is so direct, and people don't necessarily know that or see that. And what you see is the harm or the output that comes out of the system, and what you don't see is all the work that
went into building that system. You have someone who decided in the beginning they wanted to use AI, and then you have somebody who went and found the data, and you have somebody else who cleaned the data, and you've got somebody or somebody's who then built the algorithm and train the algorithm, and then you have the somebodies who coded that up, and then you have the somebody's that's deployed that, and then you have people who are running that.
And so when the algorithm comes out the end and there's a decision that's made you get the loan, you didn't get the loan. The algorithm recognizes your speech, doesn't recognize your speech see ease, you doesn't see you. People think, oh, just change the algorithm. Oh no, you have to go all the way back to the beginning because you have that long chain of people who are doing so many different things, and it becomes very complicated to try to
fix that. So the more that we can understand that the process begins with the question do I need AI for this? And then very quickly after where are we going to get the data to feed that? So that we make the right decision. The sooner we understand that as a society, I think the easier it's going to be for us to build better AI because we're not just catching the issues at the very end of what
can be a year's long process. Mm hm. So so what problems does the Data Nutrition Project aimed to tackle? We've kind of talked about them all in pieces. At its core, the Data Nutrition Project, which is this research organization that I co found a bunch of very smart people. We were all part of a fellowship that was looking
at the ethics and governance of AI. And so when we sat down to say what are the real things that we can do to drive change um as practitioners, as people in the space, as people who had built AI before, we decided let's just go really small. And obviously it's actually a huge problem and it's it's very challenging. But instead of saying, let's look at the harms that come out of an AI system, let's just think about what goes in. And I think we're maybe eating a
lot of snacks. We were hold up at the M I T Media Lab, right, So we were just all in this room for many many hours, many many days, and I think somebody at some point picked up, you know, a snack package, and we're like, what if you just had a nutritional label like the one you have on food, you just put that on a data set. What would that do? I mean, is it possible? Right? But if if it is possible, would that actually change things? And we started talking it over and we thought, you know,
we think it would. In our experience in data science as practitioners, we know that data doesn't hum with standardized documentation, and often you get a data set and you don't know how you're supposed to use it or not use it. There may or may not be tools that you use to look at things that will tell you whether that data set is healthy for the thing that you want
to do with it. The standard process would be a product manager CEO would come over to the desk of data scientists and say, look, we have all this information about this new product we want to sell. We need to map the marketing information to the demographics of people who are likely to want to buy our product or click on our product. Don't make it happen. And the data scientist goes okay, and the person goes, oh yeah, by tuesday, and the persons like, oh okay, let me
go find the right data for that. There's a whole world. You just google a bunch of stuff and then you get the data, and then you kind of poke around and you think, as seems pretty good, and then you use it and you build your algorithm on that. Your algorithm is going to determine which demographics or what geographies or whatever it is you're trying to do. You train it on that data you found, and then you deploy
that algorithm and it starts to work in production. And you know, no fault of anybody, really, but the industry has grown up so much faster than the structures and the scaffolding to keep that industry doing the right thing. So there might be documentation on some of the data, there might not be in some cases. We're working with a data partner that was very concerned how people were going to use their data. The data set documentation was
an eighty page PDF. Zero that data scientist who's on deadline for Tuesday is not going to read eighty pages. So our thought was, hey, can we distill the most important components of a data set and its usage to something that is maybe one sheet two sheets right, using the analogy of the nutrition label, put it on a data set, and then make that the standard so that anybody who is picking up a data set to decide
whether or not to use. It will very quickly be able to assess is this healthy for the thing I want to do. It's a novel application of a thing that so many of us understand. What are some of the harms you've seen some of the harms you're trying to avoid by the data scientists who are building these
services not having access to healthy data. Yeah. Let's say you have a data set that health outcomes and you're looking at people who have had heart attacks or something like that, and you realize that the data was only taken from men in their sixties. If you are now going to use this as a data set to train an algorithm to provide early warning signs for who might have a heart attack, you're gonna miss entire demographics of people, which may or may not matter. That's a question. Does
that matter? I don't know, But perhaps it matters what the average size of a body is, or the average age of a body is, or maybe there's something that is gender or sex related, and you will miss so all of that. If you just take the data at face value, you don't think about who's not represented here. I remember examples that I used to cite in some talks. It was the Amazon hiring decisions. Amazon software engineers recently uncovered a big problem. Their new online recruiting tool did
not like women. It had an automated screening system for resumes, and that system ignored all the women because the data set showed that successful job candidates at Amazon were men. And so the computer like garbage in, garbage out. The way we've discussed said, well, you've defined success as mail. You've fed me a bunch of female that's not success. Therefore my formula dictates they get rejected, and that affects people's job prospects. You know, that affects people sense of
their self worth and self esteem. That could open up the company to liability, all kinds of harms in a system that was supposed to reread efficiency and and help. Yeah, that's a great example, and it's, you know, a very true one. And I think that one was pretty high profile. Imagine all the situations that either have never been caught or we're kind of too low profile to make it into the news. It happens all the time because the algorithm is a kind of a reflection of whatever you've
fed it. So in that case, you had historical bias, and so the historical bias in the resumes that they were using to feed the algorithm showed that men were hired more frequently and that was success. It also comes down to, in terms of the metrics, how you're defining things. If your definition of success is that someone was hired, you're not necessarily saying that your definition is that person
was a good ended up being a good worker. Or even if you're looking at the person's performance reviews and saying success would be that we hire somebody who performs well. But historically you hired more men than women. So even then, if your success metric is someone who performed well, you're already taking into account the historical bias that there are more men than women who are hired. So there are all different kinds of biases that are being captured in
the data. Something that the Data Nutrition Project is trying to do with the label that we've built is highlight these kinds of historical issues as well as the technical issues in the data, and that I think is an important balance to strike. It's not just about what you can see in the data. It's also about what you
cannot see in the data. So in the case that you just called out there with the resumes, you would be able to see that's not representative with respect to gender, and maybe you'd be able to see things like these are all English language resumes, But what you would not be able to see are things like socio economic differences or people who never applied, or you know, what the job market looked like whenever these resumes were collected, So you'll kind of not be able to see any of
that if you just take a purely technical approach to what's in the data up. So the data set nutrition label tries to highlight those things as well to data practitioners to say, before you use this data set, here are some things you should consider, and sometimes will even go as far as to say you probably shouldn't use this data set for this particular thing because we just know that it's not good for that, and that's always
an option, is to say don't use it. Right. It doesn't mean people won't do it, but at least we can give you a warning, and we kind of hope that people have the best of intentions and are trying to do the right thing. So it's about explaining what is in the data set or in the data so that you can decide as a practitioner whether or not it is healthy for your usage. After the break, it's snack time. I'm back hungry. So I'm holding a package your food right now, and I'm looking at the nutrition
label nutrition facts. It's got servings per container, the size of a serving, and then numbers and percentages in terms of the daily percent of total fact cholesterol, sodium, carbohydrates, protein, and a set of vitamins that I can expect in a single serving of this product. And then I can make an informed choice about whether and how much of that food stuff I want to put in my body, how much garbage I want to let in. In this case,
it's pretty healthy stuff. It's uh dried mangoes. If you're curious, what's on your data nutrition label? Yeah, a great question. And now I'm like kind of hungry. I'm like, oh, it's a snack time. I feel like it's snack time. Um. This is the hardest part to me about this project is what the right level of metadata is. So what are the right elements that you want to call out
for our nutritional label? You know, what are the facts and the sodiums and these kinds of things, Because you know, that The complication here is that there are so many different kinds of data sets. I can have a data set about trees in Central Park, and I can have a data set about people in prison. So we've kind of identified that the harms that were most worried about
have to do with people. Um not to say that we are, you know, not worried about things like the environment or other things, but when it touches people or communities is when we see the greatest harms from an algorithmic standpoint in society. And so we kind of have a badge system that should be very quick, kind of icon based that says this data sets about people are not This data set includes subpopulation data, so you know,
includes information about race or gender or whatever status. Right, this data set can be used for commercial purposes or not. We've identified, let's say, tend to fifteen things that we think are kind of high level almost like little food warning symbols that you would see on something like organic or it's got a surgeon General's warning right exactly. So at a very high level we have these kind of icons.
And then underneath that there are additional very important questions that we've highlighted that people will answer who own the data set? And then finally there's a section that says, here's the reason it was made. The data set was made, it's probably an intended use. Here are some other use cases that are possible or ways that other people have used it, and then here are some things that you just shouldn't do. So how do we make this approach
more mainstream? Mainstream is a tough word because we're talking about people who build AI, and I think that is becoming more mainstream for sure. Um, but we're really focused on data practitioners, so people who are taking data and then building things on that data. But there's kind of a bottoms up approach. It's very anti establishment in some ways, in very hagriculture. And so we've been working with a lot of data petitioners to say what works, what doesn't,
is as useful as it not. Make it open source right, open licenses, use it if you want, and just hoping that if we make a good thing, people will use it. A rising tide lifts all boats, we think, so you know, we're not cag about it because we just want better data. We have better data out there, and if people have the expectation that they're going to see something like this,
that's awesome. There's also the top down approach, which is regulation policy, And I could imagine a world in which in the future, if you deploy an algorithm, especially in the public sector, you would have to include some kind of labeling on that right to talk about the data that it was trained on and provide a label for that.
So it's kind of a two way approach, you know. Yeah, no, I mean when I think of analogus, like most of us don't know civil engineers personally, but we interact with their work on a regular basis through a system of trust, through standards, through approvals, through certifications, and data scientists are on par with like a civil engineer in my mind, and that the erect structures that we inhabit a regular basis. But I have no idea what rules they're operating by.
I don't know what's in this algorithm, you know, I don't know how what ingredients you used to put this together that's determining whether I get a job or vaccination. What's your biggest dream for the Data nutrition project? Where does it go? So I could easily say, you know, our dream would be that every data set comes with a label. Cool, But more than that, I think we're
trying to drive awareness and change. So even if there isn't a label, you're thinking about, I wonder what's in this and I wish it had a label on it. In the same way that I walk into a bakery and I see a cake that's been baked, and I might think to myself, I wonder what's in that cake, and I wonder, you know, if it has this much of something, or maybe I should consider this when I decide whether to have four or five pieces of cake.
We would want the same thing for a data set where even if you encounter that data set in the wild, someone's created it. You just downloaded it from some repository on gith hub. There's no documentation that you, as a data practitioner, will think to yourself, I wonder if this
is representative. I wonder if the thing I'm trying to do with this data is responsible, considering the data, where it came from, who touched it, who funded it, where it lives, how often it's updated, whether they got consent from people when they took their data, And so we're trying to drive a culture change. I love that and I love the idea that when I go to a bakery, one of the questions I'm not asking myself is is that muffin safe to eat? Right? Is that Kate gonna
kill me? It literally doesn't enter my mind because there's such a level of earned trust in the system overall that you know, these people are getting inspected, that there's some kind of oversight that they were trained in a reasonable way, so I know there's not arsenic in the muffins.
So this brings me to zooming out a little bit further to artificial intelligence and the idea of standards, because I'm getting this picture from you that there's kind of a wild West in terms of what we're feeding into the systems that ultimately become some form of AI. What does the world look like when we have more standards in the tools and components that create AI. I think that our understanding of what AI is and what kinds
of AI there are is going to mature. I imagine that there is a system of classification where some AI is very high risk and some AI is less high risk, and we start to have a stratified view of what needs to occur in each level in order to reach an understanding that there's no arsenic in the muffins. So at the highest level, when it's super super risky, maybe we just don't use AI. This seems to be something that people forget, is that we can decide whether or
not to use it. Like, would you want an AI performing surgery on you with no human around? If it's really really good? Do you want that? Do you want to assume that risk? I mean that is dealing with your literal organs, your heart. So I think that you know, ideally what happens is you've got a good combination of regulation and oversight, which I do believe in, but then also training and you know, good human intention to do
the right thing. So when I think about these algorithms, I think of them as kind of automated decision makers, and I think they can pose a challenge to our ideas of free will and self determination because we are increasingly living in this world where we think we're making choices, but we're actually operating within a narrow set of recommendations. What do you think about human agency in the age
of algorithms? WHOA These are the big questions? Um, Well, I mean I think that we have to be careful not to give the machines more agency than they have. And there are people who are making those machines. So when we talk about, you know, the free will of people versus machines, it's like the free will of people versus the people who made the machines. To me, technology is just a tool, and I personally don't want to live in a world that has no algorithms and no
technology because these are useful tools. But I want to decide when I'm using them and what I use them for. And so my perspective is really from the point of view of a person who has been making the tools, and I think that we need to make sure that those folks have the free will to say, no, I don't want to make those tools, or this should not be used in this way, or we need to modify this tool in this way so those tools don't run
away from us. Um So, I guess I I kind of disagree with the premise that it's people versus machines because people are making the machines and we're not at the terminator stage yet. Currently it's people and people, right, So so let's so let's like work together to make the right things um for people. Yes, Kasha, thank you so much for spending this time with me. I've learned a lot and now I'm just thinking about Arsenic and my muffins. Thanks so much for having me. I've really
enjoyed it. Garbage in, garbage out. It's a cycle that we see that doesn't just apply to the world of artificial intelligence, but everywhere. If I feed my body junk, it turns to junk. If I fill my planet with filth, it turns to filth. If I inject my Twitter feed with hatred, that breeds more hatred. It's pretty straightforward, but it doesn't have to be this way. In essence, kashas to standardize thoughtfulness, and that fills me with so much hope.
We're all responsible for something or someone, so let's always do our best to really consider what they need to thrive. If we put a little more goodness into our ai, our bodies, our planet, our relationships, and everything else, we'll see goodness come out. And that's a psychle I can get behind goodness in, goodness out. This is just one part of the how does citizen conversation about data? Who
does data ultimately benefit? If the data is not benefiting the people, the individuals, the communities that provided that data. Then who are we uplifting at the cost of others justice? Next week, we dive deeper into how it's collected in the first place, and we meet an indigenous geneticist reclaiming data for her people. See you. Then we asked Kasha what we should have you do, and they came up with a lot. So here's a whole bunch of beautiful
options for citizening. Think about this. Like people, machines are shaped by the context in which they're created. So if we think of machines and algorithmic systems as children who are learning from us, we're to parents. What kind of parents do we want to be? How do we want to raise our machines to be considerate, fair, and to build a better world than the one we're in today.
Watch Coded Bias. It's a documentary that explores the fallout around m I T media lab researcher Joy will Ameni's discovery that facial recognition don't see dark skin face as well, and this film is capturing her journey to put for the first ever legislation in the US that will govern against bias and the algorithms that impact us all. Check out this online buying resource called the Privacy Not Included
Buying Guide. Mozilla built this shopping guide which tells you the data practices of the app or product that you're considering, and it's basically the product reviews we need in this hyper connected era of data theft and hoarding and non consensual monetization. Donate if you've got money, you can distribute some power through dollars to these groups that are ensuring that the future of AI is human and just, the Algorithmic Justice League, the a c L You, and the
Electronic Frontier Foundation. If you take any of these actions, please brag about yourself online. Use the hashtag how to citizen. Tag us up on Instagram at how to Citizen. We will accept general direct feedback to our inbox common at how to citizen dot com and make sure you go ahead and visit how to citizen dot com because that's the brand new kid in town. We have a spanky new website. It's very interactive. We have an email list you can join. If you like this show, tell somebody
about thanks. How to Citizen with Baryton Day is a production of I Heart Radio Podcasts and Dust Light Productions. Our executive producers are Me Barryton Day Thurston, Elizabeth Stewart, and Misha Yusuf. Our senior producer is Tamika Adams, our producer is Ali Kilts, and our assistant producer is Sam Paulson. Stephanie Cohen is our editor, Valentino Rivera is our senior engineer,
and Matthew Lai as our apprentice. Original music by Andrew Eathan, with additional original music for season three from Andrew Clausen. This episode was produced and sound designed by Sam Paulson. Special thanks to Joel Smith from My Heart Radio and Rachel Garcia at dust Light Productions.