Considering The Ethical Responsibilities Of ML And AI Engineers - podcast episode cover

Considering The Ethical Responsibilities Of ML And AI Engineers

Jan 28, 202439 minEp. 27
--:--
--:--
Listen in podcast apps:

Episode description

Summary
Machine learning and AI applications hold the promise of drastically impacting every aspect of modern life. With that potential for profound change comes a responsibility for the creators of the technology to account for the ramifications of their work. In this episode Nicholas Cifuentes-Goodbody guides us through the minefields of social, technical, and ethical considerations that are necessary to ensure that this next generation of technical and economic systems are equitable and beneficial for the people that they impact.
Announcements
  • Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
  • Your host is Tobias Macey and today I'm interviewing Nicholas Cifuentes-Goodbody about the different elements of the machine learning workflow where ethics need to be considered
Interview
  • Introduction
  • How did you get involved in machine learning?
  • To start with, who is responsible for addressing the ethical concerns around AI?
  • What are the different ways that AI can have positive or negative outcomes from an ethical perspective? 
    • What is the role of practitioners/individual contributors in the identification and evaluation of ethical impacts of their work?
  • What are some utilities that are helpful in identifying and addressing bias in training data?
  • How can practitioners address challenges of equity and accessibility in the delivery of AI products?
  • What are some of the options for reducing the energy consumption for training and serving AI?
  • What are the most interesting, innovative, or unexpected ways that you have seen ML teams incorporate ethics into their work?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on ethical implications of ML?
  • What are some of the resources that you recommend for people who want to invest in their knowledge and application of ethics in the realm of ML?
Contact Info
Parting Question
  • From your perspective, what is the biggest barrier to adoption of machine learning today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
Links
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Transcript

Unknown

Hello, and welcome to The Machine Learning Podcast. The podcast about going from to delivery with machine learning. Your host is Tobias Massey, and today I'm interviewing Nicholas Sifuentes Goodbody about the different elements of the machine learning workflow where ethics need to be considered. So, Nicholas, can you start by introducing yourself?

Yeah. Well, first of all, Tobias, just thanks for having me on. It's a real pleasure. I've been listening to a lot of old episodes, and so to be in 1 is very exciting for me. In terms of who I am and what I do, my name is Nicholas Fuentes Goodbody. I'm the chief data scientist at WorldQuant University. WorldQuant University is a free and open university. We offer a master's in financial engineering and a free and open,

basically, boot camp or certificate course in data science. That's what I built and run. And so that's a little bit about who I am and what I do. And do you remember how you first got started working in machine learning? I gotta say it's been a not a straight path. Let's say the the models are linear, but my path to data science has not been linear. So I started off my career as a professor of Mexican literature.

So even though I took a little bit of statistics in college and a little bit of computer science, shout out to the liberal arts that makes you mix thing up. I mostly focused on the humanities. And so for several years I worked in Spanish, and what happened is I ended up getting a professor job at a department of translation studies in the Middle East and Qatar. And in translation studies, there's a very humanity side to that field, translating poems and translating novels.

And then on the other side of that field, there's a very technical part to it, which is machine translation and everything, large language models, you know, natural language processing. It all kind of bleeds into that. And so I saw what the people on the other side of the fence were doing, and I kinda got more and more intrigued and started teaching myself Python.

And from there, I ended up in a boot camp. And 1 note I wanna say about the boot camp is sometimes when you are just starting off in this field, you often think that you're gonna have these weaknesses, but I just wanna tell you guys that these can sometimes be a superpower. For example, I had spent so much time teaching and working with languages. I thought, oh, no. I'm here with all these theoretical physicists.

They know all this math, and I'm really rusty on all of this. But it turned out that when it came to fields like natural language processing, I was ahead above them because they hadn't been working with language. And, also, all that teaching led to me being offered a job at the boot camp where I was working. And so since then, I've basically been combining data science and teaching.

So often what I tell students is you think you have lots of weaknesses, but they can always end up being superpowers. And that's how I ended up in data science. Absolutely. And particularly in analytical roles or roles where you are trying to provide insights to end users having that domain knowledge and the expertise in the nontechnical elements of the problem are very valuable and something that is often overlooked in the pure computer science path.

That's a really good 0.1 thing you learn when you're a Spanish teacher is you can quickly recognize when people don't understand anything you're saying, and that can be very helpful when you're in a business meeting and you need to stop, pause, slow down, and explain, whether it's a visualization or an analysis that you're showing with your nontechnical colleagues.

And bringing us now to the question at hand of ethics and the ways that machine learning and AI can potentially be in conflict with or ways that you need to be thinking about the ethical implications of your work. Before we get too deep into ethical theories, the trolley problem, etcetera, who's responsible for identifying and addressing these concerns in the process of the ideation of what are we going to use AI for, how are we going to build it, how are we going to deliver it.

Yeah. 1 thing as I was sort of thinking about what we were gonna talk about on this podcast is ethics and AI. You know, often we think about ethics, how we treat each other person to person, but ethics in AI is really how we treat each other algorithmically. And as we increasingly do that,

being able to think through these ethical problems is extremely important. And so I would kind of flip the question and say, who isn't responsible for working in ethics and AI? I would say, on the 1 hand, global organizations need to be issuing

guidelines based on expert opinion. For example, UNESCO has its recommendations on the ethics of artificial intelligence where they talk about basing AI in values like protection and promotion of human rights and principles like sustainability and explainability. I think governments need to regulate AI so that the AI products reflect and protect the values of those societies.

Example of that would be the European Union's AI act and and sort of what runs the gambit between acceptable risk in an AI model and an unacceptable risk. In the private sector, I think we need to incorporate ethics, and you see this increasingly with companies having positions like a chief ethics officer or an AI officer or responsible AI officer. And when it comes to actually building the models or the inception building of the models, stakeholders

need to think about fairness from the beginning. They'll often need to balance model performance with fairness and its environmental impact, so they need to be aware of those issues from the beginning. And then practitioners down at the bottom of the stack, they need to be educating stakeholders. They need to be building ethical models, and they need to be deploying them in ethical ways in an ethical context.

So I would say everyone is responsible from AI, all the way from the UN down to the lonely data scientist training their, decision tree late at night. And given the wide breadth of responsibility, again, hirking back to the lack of uniformity in educational background and contextual background, that brings in a broad question of how are we going to appropriately educate everybody on ethics both from the philosophical and the practical perspective?

How do you identify when a line of code that you're writing or a decision that you're making in a model might have some potential real world implication in terms of how you are going to ethically impact the end user or the, I guess, end subject of the model because it can be very difficult to differentiate between, oh, I think that the reasonable range of values for a bank account balance is 100 to 5,000,000,

and then you automatically leave out people who are unbanked or who are living paycheck to paycheck where maybe they're only able to keep $50 in their bank account. And I'm just wondering if you can talk through some of that aspect of how do we properly incorporate these ethical considerations into the design and development process given that not everyone has a PhD in ethics?

Well, 1 thing that you can do, and I've seen this increasingly, and this is 1 thing that's kind of surprised me as I was doing research for this podcast, is you might have ethicists working as a member of an AI team. So not just a chief AI analytics officer, but rather somebody working with the team and helping them work through those particular issues. When it comes to education, I think educators have a really important role to play. For me, I'm a big advocate of project based learning.

So taking a student through an end to end data science project and always adding an ethical component to kind of get through those main points of thinking about whether a model does harm or doesn't do harm, whether it's biased against the data is biasing it or whether it's making biased predictions. And so I would say using project based learning that takes people through ethical problems before they get there is a really good way to do that. And then, you know, also having people

in situ who can help with those ethical problems. The other thing, now that you mentioned it, is the idea of domain knowledge. Data scientists should always be thinking about, you talk about the banking problem. So this would be an example of where having some domain knowledge or working with somebody who has domain knowledge would be extremely helpful. So being curious and asking questions and asking them through an ethical lens would be a big step in the right direction. And then the other

challenge with ethics is that a lot of times, there is no correct answer. There are contextually acceptable answers, but there's, in a lot of cases, not going to be any definitive right answer for a given question. And I'm wondering, what are some of the ways that you see people trying to address that fact in the ways that they build the model and in the ways that they present the outputs of that model.

I mean, the truth is just thinking about ethics is already a step in the right direction, and so often you're right. There are gray areas. But if you can say that as part of your process, we looked through things through an ethical lens and considered these following issues, and you can present that context to your stakeholders, I think that is the foundation of waiting through these ethical problems even when they are, like you say, murky. And

now digging into the models themselves, the impacts that they can have in the real world, we alluded to a toy example of acceptable ranges for a bank account. But what are some of the different ways that AI can and does have positive or negative impacts given the ways that ethics is or is not included in the development and deployment of those models? Yeah. Well, maybe what we should do is start let's start with the good news. Let's do positive first, and then we'll move on to negative.

So positive, I think that deploying machine learning models can provide consistency and transparency to things like government processes. I met someone at a conference recently who works for the LA government, and and he developed an algorithm to decide which small businesses would be audited for tax issues. And the beautiful thing is that it was

a logistic model. So, by the way, everyone, these traditional models can still be really helpful and really useful, especially from an ethical and explainability standpoint. But what it provided was a very clear reason why if somebody were selected to be audited, everyone knew why. And so it provided a consistent set of rules to apply over the entire population, and it did it in an explainable way so that everyone knew what was happening.

They can also be speaking of positive, models can be kind of a force multiplier for positive initiatives. I saw a, example from TensorFlow where in the United Nations, they have the universal periodic review where nations give each other recommendations to kind of run their governments better. And you can imagine this is a huge volume of data. If every nation in the world is giving you advice, how do you categorize that advice? Well, it turns out that it's a very simple

machine learning categorization problem. So all of a sudden, you can take all of this data that might be hard for government officials to go through, and you can quickly organize it for them and do it in a quick way. So you can use these tools as a very helpful hammer to help smooth problems. I don't know how you smooth things with a hammer, by the way, but that's it's a mixed metaphor. So let's move there to negative impact. This is where things get a little bit scary.

On the 1 hand, machine learning can supercharge disinformation. 1 of the links I passed to you was an article from Herets where there's a consulting group in Israel called Team Jorge or Team George, and they basically traffic in creating disinformation campaigns.

From my own state of California, this happened where governor Gavin Newsom, he delayed renewing the operating license for a nuclear plant, and all of a sudden, there was a huge outpouring in all of social media, which mysteriously disappeared as soon as all of the safety issues were solved and the campaign was dropped.

They can be a tool in mass surveillance. This kind of came to the forefront when we saw China's repression of Uighur populations, not just locally in that part of the country, but across that country. And China has really been on the forefront of using facial recognition in policing. That's not to say that we in the west are not far behind, but this is an example of mass surveillance, what, a neural network can do when used for mass surveillance.

And then Cathy O'Neil, the author, has the idea of weapons of mass destruction. So these are kind of opaque, unregulated, difficult to understand, difficult to test models that perpetuate systems of oppression or injustice.

And the case par excellence for that would be crime prediction where police departments will use historical data to decide where to send patrols, but because that historical data is a product of a biased policing system, it will end up causing over policing in the same communities that have been over policed in the past. So those are some positive ways in which AI can have a sort of a positive outcome from an ethical perspective and then some negative ethical impacts that it can have.

And I was also listening to an interesting conversation recently about the challenge of amplification of bias due to machine learning where the effect of machine learning is that it it doesn't actually understand anything about what it's doing. It will just pair it out whatever it is that you gave it in the first place.

But the benefit is that unlike with a human, you can query its internal state where even if you don't have a nice explainability graph of what was the reasoning behind this output using something like Shapley values, etcetera, you can at least just ask it the same question different ways multiple times to try to introspect its internal

reasoning, which you can't do with a human. Because if you keep asking the same question multiple times, 1, they'll just get annoyed and walk away, also, they're not necessarily going to be able to consistently give you the same answers to the same questions. There you go. So looking for consistency and interrogating your models is is a good way to bring an ethical lens to them. That's a really good point. I didn't think of that.

And continuing on that subject of how do you identify and address bias is the question of explainability, or what are some of the other ways that you can understand whether or not the data that you're using to train the model is biased in a certain direction or even if you're unable to upfront identify those biases, being able to understand from the user interaction with your model what are some of the misconceptions or cultural or social biases that they're propagating,

and then how to feed that back into the model itself to be able to retrain, redeploy, address those concerns? Well, 1 thing that you can do is you can always look for protected classes in your data to see if there are imbalances in that data. And if there are, you might consider using things like oversampling or undersampling to debias that data and train your model.

The other thing you can do is you can look at its predictions and see sort of like in the precision and recall school of thought. I'm forgetting the names of the metrics right now, but you can see whether they're causing more or less harm to certain groups. So those are 2 ways to do it, and there are utilities that you can use. In fact, for example, like, fair learn is basically a wrapper for scikit learn models. So you can wrap your base model in your fair learn mitigator and debias

that model. And so those are some things that you might look for to combat bias in your model and some ways in which you might mitigate it when you deploy it train it or deploy it. And so that last question also assumed that whoever is producing this model is interested in receiving and correcting for that feedback of, hey. This model that you produced is affecting this particular portion of the populace,

and it's having a negative result, and you need to do something to address it because there are certain actors who will willfully produce models in that way. And then there are also questions of machine learning models being used for attack capabilities in cybersecurity or also people who are set on forcing bad behavior in otherwise benign models through things like generative adversarial inputs, etcetera.

And I'm wondering what are some of the ways that we, as industry practitioners, can be thinking about how best to build the tools and also counteract some of these bad actors in the ecosystem? Yeah. Oftentimes, when we think about ethical quandaries, we think that we, as a data scientist, will suddenly be asked to train a model that's gonna be used for political repression. I don't think that's actually the case. I think oftentimes

when you're faced with that sort of what seems like kind of a blank and white problem to me, you know which way you should go. And if you're not, you're making a conscious decision to violate those ethical rules. And so I don't worry so much about data scientists being involved in those sorts of issues because I think that we, as humans, mostly know how to navigate them.

What I would say is that the best way to fight these sorts of issues going beyond technical protection solutions would be education, making educational resources widely available and widely accessible in terms of how to build AI models and how to incorporate ethical considerations into the building of those models is really important because oftentimes, it's not that

somebody is faced with a black and white ethical issue. There's a smaller issue of bias that they never really considered because they've never heard of a similar story in which that bias appeared. And so for me, open educational resources, open source, education in general kind of is the best way to shine a light on those kind of dark corners of the Internet or dark corners of machine learning.

And accessibility is also an interesting question in these ethical concerns of what are the interfaces that you're providing for this model to be able to be interacted with, particularly in this world of large language models where they are primarily being trained on English language corpora, and what are some of the ways that we should be thinking about how to increase the availability and applicability of AI and ML so that a broader population

can be involved in the process and can also take ownership of the problems that a predominantly western focused culture is generating these AI and ML capabilities.

Tobias, now you're getting me excited with these questions because they're really getting to the heart of it. What I would say is in translation studies, there's this idea called epistemicide, where if a target culture is just pulling in translations usually from the west, what they're erasing is not so much knowledge, but ways of knowing and local ways of knowing. And so

this can also happen with large language models. And so I think that we have an obligation not only to be training models here in the west on our own corpora, but making those tools widely available so that people can do them in their own target languages. For example, the Center For AI in Abu Dhabi just released their JICE large language model, which is the 1st large language model to work in Arabic. So making these tools available well, let's think about it this way.

Making sure that people know how to use these tools and use them responsibly and can use them in their own context to kind of empower their own local context is really important. So if there are ways that you can do that, it would be contributing to open source. Right? So I'm a big fan of transformers,

but there are lots of other libraries that you can make available that are available to everyone to use. So contributing to those projects where everyone has accessibility to them, there are initiatives like the Allen Institute for AI where they're building open alternatives to large language models like ChatGPT.

Another thing that you can do besides contributing to open source is you can also teach the Stanford Center for I think it's AI in humans or AI in people. They have AI for all, which creates curriculum for how to incorporate AI responsibly into your life for kids in grade school all the way up through high school. And not to toot my own horn, but WorldQuant University is also a nonprofit that has resources that teach people how to use AI and use it responsibly.

So I don't know. Am I answering your question, or am I just talking around it? No. I I I think that those are all useful elements of this question. And another aspect of it is the availability of the raw data to be able to produce models that are more befitting different underrepresented cultures and language communities. And I know that there are efforts that are in flight to be able to help with collecting and generating more of that information so that it can be used in training contexts.

Another aspect of the problem, particularly with large language models, again, from a conversation I was listening to recently, is because of the fact that they are so expensive and complex to produce, they are largely static. And so if you ask who is the president of Argentina,

it might give you an answer that was correct at the time that it was trained, but it might not be correct at the time that the question is asked. And so I'm wondering what are some of the ways that we can, particularly from the lens of misinformation, help to address this static nature of these expensive models? Mhmm. Well, I had a colleague in Doha who was actually working on identifying misinformation.

So build using a large language model to fact check the work that comes out of the answers that come out of a large language model. So that's an example of that. But, yes, absolutely. Support your local corpora. That should put that on a bumper sticker. Yeah. No. That there you go. Going into the merch shop for the podcast.

Good. Good. And to that end as well for communities who want to be able to help contribute to these efforts, what have you seen as far as the tooling and workflows that help to empower these people to be able to bring the models into their own localized contexts without having to rely on these huge and expensive and general purpose models that are useful until they're not? Mostly, what I have seen is small groups of people who are really committed

to the idea of local knowledge. For example, 1 of my instructors is hand annotating a corpus in Hausa, a language from Nigeria, because he wants to create his own large language model or at least a small large language model on that corpora. And so, what I would say is it's often an example of people taking tools that they learn about through open sources

and then applying them to their local context because they're passionate about those. So anything we can do to support those issues, either through education or through getting involved ourselves, is, something that should be done.

And bringing us now to 1 of the thornier problems of equitability and accessibility and ethical considerations around even just the application or creation of machine learning and AI is the environmental impact that they can have, particularly when you're dealing with massive models that require massive amounts of energy to collect and compute and generate the training data and the resultant models.

And I'm curious what you are seeing as some of the means of mitigating some of those environmental considerations around the energy consumption.

Yeah. This is an excellent question because most of the time when you're training large language models, the environmental impact is kind of extracted away abstracted away, I should say, because it's taking place in a data center. But the truth is is that training these large language models is up in the echelon of, like, having a car, buying a car. Right? So they truly do have a large impact.

1 thing that we can do that regulators can do is they can make it a requirement for large language models to disclose how much carbon it takes to build them. So that could be 1 thing that we could do at the regulation level. At the practitioner level, there are lots of different ways that we can mitigate the amount of energy it takes to train our models. On the 1 hand, we could just generally try to move less data

when we're trying try to do more with less. So for example, if you're building a model based on sensor data, is there processing that you can do at the edge location? It's gonna be much cheaper for you if you're then doing it on your CPU or GPU. Data type matters. The fact of the matter is floats take up more space than ints. And so if you can get by with an int, go with an int.

Compression, it's much easier to compress files and then decompress them at run time than try to move that large amount of data. And then also, for example, if you're training like a large neural network, zeros, sparse data can be very helpful. So if you have very small values in your matrices, changing them to 0 as long as it doesn't affect your performance too much can really help simplify that matrix math, which means less energy. So those are some ways you can do it on the data side.

Moving to the model building side, on the 1 hand, like I said before, using traditional ML models over a neural network can often be a money saver, a time saver, and a environmental resource saver. And, you know, it's not as if you have to not use, for example, any you know, if you're building a natural language processing model, it's not as if you need to completely

divorce yourself from all of those tools. You can create embeddings by using kind of the head of a BERT you could using a BERT model, and then you can just attach it onto a decision tree. Right? Another thing that you can do if you have if you have access to the way your cluster is configured,

you can do things like power capping. So capping how much power your CPU or GPU can use at a given time. That will lengthen your training time a little bit, but not so much, and it will really help with the power issue. Another thing that you can do, I learned this from some folks at actually from MIT where you are, is you can do responsible scheduling. Oftentimes,

it's less impactful to train your models at night. Or for example, if you're at a place where you have seasons doing it during the winter, being conscious of your hyperparameter space, you might not wanna just throw the whole parameter at your space and just tune the whole thing. Thinking about ways that you can limit that space, so limit the amount of training that you do, and then early stopping. There are points of diminishing returns with your neural networks, with your ensemble models. And so thinking about ways in which you can stop early when your performance is good enough, that's another really good way. And then, you know, we always talk about reduce, reuse, recycle,

transfer learning. Right? Instead of training a model from scratch, take a ready trained model, fine tune it with your data, and so that way you are not spouting out the carbon of a car, but rather just doing a little bit of fine tuning, and then you can use that model just as well. The good news is that most of the times, price and environmental resources are aligned. So all of these are good for your bottom line. They're not just good for the planet.

And also from the serving perspective, it has implications both from the economical and environmental considerations, but also from the accessibility where if you are trying to bring AI to a rural community or a community that is not as well resourced, then

you can't expect them to throw a model into a server rack filled with GPUs to be able to run it. You want to be able to enable them to run it on lower powered devices, edge devices, and that's also where things such as federated learning and also network pruning to be able to reduce the final size of the model for serving and runtime can also have a benefit. Yeah. Those are really good points. Thank you for bringing those up.

And in your work of thinking about the ethical considerations, helping to train new data scientists who are going to be contributing to this field, who are going to be building and training and deploying their own models, What are some of the ways that you are seeing them try to grasp and grapple with these ethical conundrums of the space and some of the interesting or innovative or unexpected ways that you've seen them try to

apply the concerns of ethics and equitability to the work that they do? Yeah. For me, what mostly surprises me are there are areas where I don't have a lot of domain knowledge. All of a sudden, a student will pop in with a little bit of domain knowledge and be able to quickly pick apart or identify an ethical problem. For example, in the data science program that I run, we have a project where students work with data from the 2015 Nepal earthquake.

And it's amazing to see my Nepalese students pick apart the data when it comes to castes and see how the cast breakdown within different provinces and different departments within the country basically are masking issues of bias. And so that's where I find things are really exciting, where people are taking the tools that I'm teaching them and using their own local knowledge to identify issues that I never would have been able to think of.

And in that work that you're doing and in your experience as a practitioner and a teacher, what are the most interesting or unexpected or challenging lessons that you have learned while trying to grapple with these ethical implications of machine learning and AI? Yeah. I think what has really shocked me is how important ethics is in really every aspect of a data scientist job, from working with stakeholders to handling data

to building and maintaining and deploying models. I said this before, but I think it bears repeating. We think of ethics of how we treat each other person to person, but data scientists are really at the forefront of how we're gonna treat each other and interact with each other algorithmically. And

I think that puts us in a very privileged, but a position of a lot of with great power comes great responsibility is what I would say. And so thinking about how these models are gonna work in the real world

is a challenge that I think data scientists really need to incorporate kind of in every phase of their work. So that's what surprised me is just how pervasive ethics is in machine the reduced barrier to entry for being able to comprehend and work with and build machine learning capabilities with the tools that we're generating, reducing the level of mathematical and statistical and technical knowledge to be able to apply some of these libraries, particularly

for machine learning problems that are on the smaller scale and not something that a company is going to spend the time and investment on building and generalizing. No. Absolutely. That's that's an excellent point. That's a good addition. I 100% agree. And so for people who are working in this space or considering working in this space, what are some of the resources that you found useful for helping them to, bring to mind some of these questions of ethics and how to apply them to their work?

Well, I already mentioned it, but I wanna mention it again. The Fairlearn library, it's now at version 0.9. It's has all sorts of wonderful tutorials on what is bias, but also how to use their library in debiasing data and debiasing a model. So I would definitely recommend that. Another library that I really like is called AI Fairness 360. It's by IBM. I know it's compatible with Python 3.11, but I'm not sure how actively maintained it is. I don't know about 3.12,

but it's still lots of good stuff, and they have lots of great demos on their website to check out. Beyond that, and I'll put a link here in the show notes, is there's a machine learning admissions calculator.

So you can, before you train your model, see how hard it is or see how much carbon you might use to train your big language model. And then another book which recently came out from O'Reilly, and I have to admit I'm a total fanboy of O'Reilly, but I think they have a lot of good content, which is, it's Practicing Trustworthy Machine Learning

by Yara I'm not gonna be great at this name. Yara Prusa Chhatkun and Matthew, Makader, and they have a whole section on explainability and interpretability, which we kind of talked a little bit here, but ideas about building robust models, how a model

will act when it encounters things it doesn't know, on using synthetic data to kind of save on your carbon footprint, all sorts of things like that. So a really comprehensive look with lots of great case studies. So those are a few resources that I would recommend.

And another thing too that I think people should walk away with is the idea that as people who are building and deploying these models and these capabilities, we should also be working to demystify them so that the people who are the end consumers, the people who are interacting with these AI capabilities, aren't just treating them as some magical,

enigmatic, inscrutable black box that whatever they happen to say is automatically the right answer, that these are ultimately just human systems. They are mediated by machines, but there is no magic involved. They are systems that can be understood and can be built and owned by people who aren't part of big tech, etcetera. I think that's a really good point, and I would echo it. Sometimes

we think that math is involved, and somehow math is unquestionable because 1 plus 1 always equals 2. But these are much more on the statistics side where things are a little bit more murky. And so, absolutely, there's a lot of gray area, and these are tools that should be questioned.

And don't worry, guys. Nobody's gonna come for your job if you explain how you're doing the magic you're doing. Okay? You're still the person that can do it. So take time to explain to people how the tools that you're building work. Are there any other aspects of this question of machine learning and AI and the ethical and equitable and accessibility questions that come around it that we didn't discuss yet that you would like to cover before we close

the show? I think what I might just add is a call to action, which is if you're interested in learning more about machine learning, I encourage you to jump in and to always consider ethics as 1 part of your machine learning learning journey. And to also I would love it if people have any ideas for me to reach out and talk with me. I'd love to hear how you're using these things in your local context. So those would be my 2 calls to action. Yes. Great.

Well, for anybody who does want to get in touch with you, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest barrier to adoption for machine learning today. The biggest barrier for adoption for machine learning today. I think the biggest barrier to adoption for machine learning today is that people are very comfortable with descriptive statistics, your business analytics. And

so the sort of data analyst world, everyone understands isn't comfortable with. But when all of a sudden you move into the predictive world and things aren't nearly so black and white, but you're having a model make predictions that are sometimes ambiguous, they can be interrogated. They can be sourced ethically. Right? But it's not always spitting out the same answer. That can be very hard for people to make the mental leap or to incorporate ambiguity

into their own business systems. So I think that kind of idea that ML isn't consistent in the same way that a straightforward Python function is. I think that is a barrier. It's kind of a mental leap that we need to make. Alright. Well, thank you very much for taking the time today to join me and share your thoughts and perspective on the challenges and potential solutions to ethics and equitability in AI and ML. It's definitely a very

large and important space and an important consideration for everyone who's working and also interacting with these systems. So appreciate the time you've taken, and I hope you enjoy the rest of your day. Oh, well, thank you very much, Tobias. It's fun too, after having listened to this podcast so many times, to actually be on it. I wanna encourage everyone again to reach out to me. If you can spell my name, Nicholas Fuentes Goodbody, you can definitely find me, and, I look forward to hearing from everybody.

Thank you for listening, and don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management, and podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at themachinelearningpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hoststhemachinelearningpodcast.com with your story. To help other people find the show, please leave a a review on Apple Podcasts and tell your friends and coworkers.

Transcript source: Provided by creator in RSS feed: download file