Banana Data Podcast

Dataiku•www.dataiku.com

Welcome to the Banana Data Podcast! We're a data science podcast focused on the latest & greatest of the DS ecosystem, sprinkled in with our musings & data science expertise. With topics ranging from ethical AI and transparency to robot pets, our hosts, Christopher Peter Makris & Corey Strausman, are here to keep you up to date on the latest trends, news, and big convos in data. If you're looking to keep the knowledge up, be sure to also subscribe to our weekly Banana Data Newsletter! Register here: https://banana-data.com/

Follow on

Episodes

Banana Byte: Understanding the Value of Deep Learning

Deep Learning has become a mainstay in today's data science and AI practices - but what makes it so valuable? On this Banana Byte, we explore when, why, and how to use deep learning, and how it compares to (and might replace!) other common algorithms. During our off-season break, we'll be releasing more of these Banana Bytes - which are short, bi-weekly segments we run live on LinkedIn and Twitter, where we discuss the latest headlines and topics in the data science space. Be sure to t...

Jul 24, 2020•16 min

Banana Byte: The Hidden Costs of Cloud Computing

Many claim that Cloud has stolen the computing show - providing scalability, cost savings, loss prevention, and more - it's taken the world (and the headlines) by storm. So, on this Banana Byte, we ask - is cloud computing inevitable? Or is it just a disruptive buzzword whose negatives outweigh the benefits? During our off-season break, we'll be releasing more of these Banana Bytes - which are short, bi-weekly segments we run live on LinkedIn and Twitter, where we discuss the latest he...

Jul 17, 2020•16 min

Banana Byte: Zoom Privacy

Zoom conferencing software recently made headlines for its huge leaks in privacy and security, pushing a number of big corporations to block the software and push for new privacy legislation. During this Banana Byte session, we cover the things Zoom overlooked - and what it means for data privacy, usability, and user experience. During our off-season break, we'll be releasing more of these Banana Bytes - which are short, bi-weekly segments we run live on LinkedIn and Twitter, where we discu...

Jul 10, 2020•16 min

Data Nuance & Human-in-the-Loop Monitoring

For our Season 3 finale, we're taking a look at model accuracy, the threat of generalized results, and how to understand and demonstrate the nuanced results of your models. Is the onus on scientists and journalists to subdue buzzy headlines or should media consumers be more wary of extrapolated statistics? We also take a peek into how the NYT applies Machine Learning to their comment moderation, and how human-in-the-loop monitoring works behind the scenes, especially in fast-paced and ethic...

Jun 05, 2020•22 min•Season 3Ep. 6

Fighting Cheating AI & Redefining AI companies

AI is meant to help us expedite processes and get to the conclusions quicker. But, what happens when the process that AI takes to get to the end goal is erroneous? In this episode we discuss how you can prevent your AI from cheating and define what it means to be a successful AI company in today’s tech-saturated world. Specification Gaming: The Flip Side of AI Ingenuity (DeepMind Blog) The New Business of AI (and How It’s Different From Traditional Software) by Martin Casado and Matt Bornstein (...

May 22, 2020•27 min•Season 3Ep. 5

The Messiness of Data

With the upcoming 2020 presidential election, there's a lot for data scientists and analysts to learn from the political realm and its unending streams of messy data. Will and Triveni sit down with seasoned political data expert, Grace Turke-Martinez, Analytics Director at The Messina Group to understand how political data professionals extrapolate insights from messy data, work around human indecision, and forecast using imperfect data sets. Why You should Care about the Nate Silver v. Nas...

May 08, 2020•22 min•Season 3Ep. 3

Analytics in the NFL & Revolutions in Data Discovery

This episode, in honor of draft season, we’re discussing the NFL’s newest tactics to quantify and predict players’ success, and diving into Spotify’s case for data discovery. Leaving behind the problems of “not enough data,” Will and Triveni ask new questions: when we have so much data, where do we start, how do we organize it, and how can we use it? Catch up on what we’re reading: How We Improved Data Discovery for Data Scientists at Spotify - https://labs.spotify.com/2020/02/27/how-we-improved...

Apr 24, 2020•20 min•Season 3Ep. 2

Deepfakes & Data Upskilling

In our season 3 kickoff, we’re challenging ourselves to ask --who grants authority to those in charge of validating content? How do we remain cognizant of big tech and corporations that shape our content and decisions? In a landscape filled with big, competitive players - we explore how data scientists should focus their learnings. Check out what we’ve been reading: Attestive CEO on Using DLT to Fight Fake News, Insurance Fraud, and Deep Fakes by Samuel Haig (CoinTelegraph) Expanding at-home lea...

Apr 10, 2020•26 min•Season 3Ep. 1

Is AI Worth it?

In our season 2 finale, we’re asking about the business impact and ROI of data science - what are our measures of success, who calls the shots, when should we see returns, and how do we know this is all worth it? From ROI To RAI (Revenue From Artificial Intelligence) by AJ Abdallat (Forbes) What’s the Best Approach to Data Analytics? by Tom O’Toole (Harvard Business Review) Making Data Science Useful by Cassie Kozyrkov (Strata Data Conference) BI and Analytics Delivering over 1300% ROI according...

Mar 27, 2020•24 min•Season 2Ep. 11

The Roles in Data Science, feat. Tristan Handy, CEO & Founder of Fishtown Analytics

With Tristan Handy, CEO & Founder of Fishtown Analytics, we ask -- who should be part of the data science process? Bearing both technical requirements and business objectives, the data scientist cannot run the show on her own. We ask what it means to collaborate intra-, inter-, and out of teams, when to do bring heads together, and how to do it successfully. To download DBT, be sure to check out https://www.getdbt.com/ . You can also learn more about Fishtown Analytics and Tristan Handy at h...

Mar 14, 2020•25 min•Season 2Ep. 10

How We Talk about AI, feat. Karen Hao, MIT Technology Review

On this week’s episode, Karen Hao, Senior AI Reporter at the MIT Technology Review, shares what it’s like to cover AI in the peak of the hype cycle. We’ll walk through the dangers of inaccurate AI reporting, striking the delicate balance between realistic and exciting, and the what, where, and how we should be reading about AI in the news. Karen Hao is the artificial intelligence reporter for MIT Technology Review. In particular she covers the ethics and social impact of the technology as well a...

Feb 28, 2020•26 min•Season 2Ep. 9

The Everyday of AI

So many pieces of our lives are intertwined with AI - from our phones to our commutes, we’re constantly being supported by (and maybe relying on) algorithms to select our next move. On this episode, Will & Triveni take us through the many unexpected places we find AI - and challenge what it means to be a responsible AI consumer. We tried Amazon’s bizarre Alexa microwave and weren’t convinced by Sarah Perez (TechCrunch) Why Some Cities Have Had Enough of Waze by Tala Salem (U.S. News) This Ne...

Feb 14, 2020•24 min•Season 2Ep. 8

The Future (and the now) of AI with Azalia Mirhoseini, Senior Researcher at Google Brain

AI constantly promises the cutting edge. So, what’s behind the newest, hottest AI trends out there? This episode, Triveni & Will sit down with Azalia Mirhoseini, Senior Researcher at Google Brain, named on Technology Review's 35 Innovators under 35 to explore what’s really going on behind the scenes, and what’s actually overrated, underrated, and just right in the field. Azalia Mirhoseini, 35 Innovators Under 35: Visionaries (MIT Technology Review)...

Feb 01, 2020•25 min•Season 2Ep. 7

Do I do AI?

This AI podcast has been live for two seasons - but we haven’t stepped back to ask - what even is AI? In this episode, Triveni & Will work through their definitions of AI, exploring theories, use-cases, and examples of what they think qualifies as AI - and how we measure it. Do statistics count as AI? Does AI need to include Arnold Schwarzenegger? Who has actually achieved AI? Is it AI or not? A Score Card with 4 Dimensions by Florian Douetteau (Medium) AI Is Not Just for Big Tech Companies ...

Jan 17, 2020•25 min•Season 2Ep. 6

Finding Community in Data Science with Reshama Shaikh, key scikit-Learn sprint organizer

Now that we’ve covered how open source works, we’re looking to pull back the curtain and see who’s actually contributing. In part 2/2 of our series on open source, we sat down with Reshama Shaikh, a statistician and key organizer of scikit-Learn sprints, to learn about the ups & downs of open source contributing, as well how a Sprint in Nairobi benefits Fortune 500 companies in the US. Reshama Shaikh is an independent data scientist/statistician and MBA with skills in Python, R and SAS. I wo...

Jan 03, 2020•25 min•Season 2Ep. 5

Why Open Source? feat. Andreas Mueller, a Core Contributor of scikit-Learn

Open Source software such as scikit-Learn, Python, and Spark form the backbone of data science. In a two-part series, we’re covering the ins and outs of open source - and how this special type of software supports 98% of enterprise-level companies’ data science efforts. In part 1, we’re chatting with Andreas Mueller, a core contributor of scikit-Learn aboutthe value in open source versus corporate software, and what it looks like to run and govern this type of community-written (and driven) proj...

Dec 20, 2019•27 min•Season 2Ep. 4

Predicting AI Trends for 2020

As we near the end of the decade, Will and Triveni place their bets on the biggest data science trends for 2020- including AutoML, explainable AI, Cloud computing, and federated learning. They’ll also reflect on whether or not the trends of 2019 lived up to their hype. - Keras inventor Chollet charts a new direction for AI: a Q&A by Tiernan Ray (ZDNet) - On the Measure of Intelligence by François Chollet (Cornell University) - Federated Learning: The Future of Distributed Machine Learning by...

Dec 07, 2019•24 min•Season 2Ep. 3

Life after Production, a Tale of Technical Debt with Dan Shiebler, Twitter Eng

Triveni and Will sit down with Dan Shiebler, Senior ML Engineer at Twitter to tackle the final frontier of data science: production. From technical debt to model maintenance, they’ll look at what it means to have a model in production, when it's time to take a model out of production, and how challenges of technical debt can affect the entire data science pipeline. Be sure to subscribe to our weekly newsletter to get this podcast & a host of new and exciting data-happenings in your inbo...

Nov 15, 2019•30 min•Season 2Ep. 2

The Essentials (and not-so essentials) of Data Science Pipelines

In our season 2 inaugural episode, we’re debating how to approach data science pipelines (are they cyclical or linear? How should we test them?) - and how tools like Python and Kafka may not be all they’re hyped up to be in AI. Be sure to subscribe to our weekly newsletter to get this podcast & a host of new and exciting data-happenings in your inbox! Learn more about the articles referenced in this episode below: Standards comic (xcd) We are Living in “The Era of Python” by Rinu Gour (Towar...

Nov 01, 2019•23 min•Season 2Ep. 1

What Makes a Good Data Science Practice

For our season 1 finale, Triveni and Will give their two cents on the most important aspects of a data science practice. From intentional data to getting outside perspectives, they walk us through how to build not only a scalable AI practice, but one that is responsible, ethical, and interpretable. We’ll be back for Season 2 in October - but in case you miss us too much, be sure to subscribe to the Banana Data Newsletter , and rate our podcast! Articles mentioned: Poor Quality Data, Fraud in GPS...

Sep 13, 2019•29 min•Season 1Ep. 10

The Death of Data Viz, Cross-Cultural AI, and AI Auditing

In our second-to-last episode of the season, Triveni and Will explore the data world’s shifting attitude toward standalone data visualizations (are they dying? Who are they for?), how to respond to global AI practices (what are global AI standards? How do different countries vary in their AI approaches?), and the feasibility of an AI audit. We’ll also see how Spark fits into the infrastructure of our data science systems. Be sure to subscribe to our weekly newsletter to get this podcast & a ...

Aug 30, 2019•26 min•Season 1Ep. 9

Prioritizing training data, model interpretability, and dodging an AI Winter

This episode, Triveni and Will tackle the value, ethics, and methods for good labeled data, while also weighing the need for model interpretability and the possibility of an impending AI winter. Triveni will also take us through a step-by-step of the decisions made by a Random Forest algorith As always, be sure to rate and subscribe! Be sure to check out the articles we mentioned this week: The Side of Machine Learning You’re Undervaluing and How to Fix it by Matt Wilder (LabelBox) The Hidden Co...

Aug 16, 2019•27 min•Season 1Ep. 8

Building accessible queries, codes, and speech using AI

Accessibility, by definition, is about making tasks more achievable. In episode 7 of the podcast, Triveni and Will explore how AI is shaping our world to become more accessible, and how we as data scientists can help it get there, diving into Salesforce’s new “unstructured” querying tool, the various physical manifestations of AI, and even questioning some of their previous takes on ethics. They’ll also walk us through the contributions of BERT to the NLP space, and the how and why it’s been so ...

Aug 02, 2019•22 min•Season 1Ep. 7

AI Meets World: GDPR, the AI Job Apocalypse, and AI’s carbon footprint

When we release our AI into the world, its impact extends far beyond the business and tech we’re working on. On this episode, we’re diving into the consequences of AI on consumers, housing, and the environment through the lens of GDPR, the supposed “AI job apocalypse” and some controversial takes on models’ carbon emissions. As always, be sure to rate and subscribe! Be sure to check out the articles we mentioned this week: How is the GDPR Doing? By Josephine Wolff (Slate) The Answer to the A.I. ...

Jul 19, 2019•27 min•Season 1Ep. 6

A New Kind of Relationship with AI: Robopets, AI Art, and AI EQ

As AI continues its embedding into our lives, humans will have to start evaluating how we as humans interact and build relationships with our artificial intelligence applications. On this episode of Banana Data, we’re taking a look at what AI means for the individual - from emergent Robopet friendships to AI art as a medium, to what emotional intelligence looks like in AI - and how we can produce better, more human AI systems. The Second Coming of the Robot Pet by Arielle Pardes (WIRED) The Past...

Jul 05, 2019•24 min•Season 1Ep. 5

The future of data according to predictions, Python 3.0, and people.

On episode 4 of Banana Data, we’re taking a look at how our data is changing. With models in the wild skewing our future data sets, the impending shift to Python 3.0, and navigating a public distrust of Machine Learning, Triveni and Will talk through how our current decisions in AI will heavily influence its future. They’ll also take a stab at explaining GANs - in English. Learn more about the articles referenced in this episode below: Here’s a prediction: In the future, predictions will only ge...

Jun 22, 2019•27 min•Season 1Ep. 4

Culpability in AI failures, Fooling NNs with NNs, AI for cancer screenings, and Epsilon Greedy Multi-Armed Bandits

This week we’re diving into some deeper impacts of AI’s successes and failures- asking where responsibility lies for an algorithm’s failures, and the endless benefits of accessibility and responsibility that come with AI implemented in healthcare. We’re also taking a deep dive into Epsilon greedy multi-armed bandits and how we can more accurately describe our successes (and our failures) in AI. When algorithms mess up, the nearest human gets the blame by Karen Hao (MIT Technology Review) Google ...

Jun 07, 2019•23 min•Season 1Ep. 3

Biased Data & the Perfect Answer, Multi-Armed Bandits, and the GPUs Behind Your Neural Networks

On episode two of the podcast, Triveni and Will look at how digital assistants may perpetuate biased data, how multi-armed bandits can build a top-notch recommendation system (and win over Triveni’s heart), and their interview with Mark Buckler , PhD candidate at Cornell and author of the article, “How to Make Bad Deep Learning Hardware” on why understanding hardware may be the key to building your best models yet. Learn more about the articles referenced in this episode below: How Digital Virtu...

May 23, 2019•33 min•Season 1Ep. 2

Being an Ethical Data Scientist, Federated Learning in Healthcare, and Dropping the “Best Model” Approach

Welcome to the Banana Data Podcast! For our inaugural episode, our hosts Triveni and Will challenge the idea that the “best model is the most efficient,” the current ethical gaps of data collection, and how methods like federated learning can help keep private user data, well, private. Be sure to subscribe to our weekly newsletter to get this podcast & a host of new and exciting data-happenings in your inbox! Learn more about the articles referenced in this episode below: One Model to Rule T...

May 09, 2019•22 min•Season 1Ep. 1

← Prev