Data Skeptic

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Last refreshed: June 24th, 2026 at 1:55 AM

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Give Users the Wheel

What if you could simply tell a recommendation system what you want instead of relying on likes, dislikes, and watch history? Kyle Polich talks with Fuyuan Lyu about the DPR framework, which combines large language models and traditional recommender systems to give users direct control over recommendations through natural language. Together they explore how conversational interfaces could transform platforms like YouTube, TikTok, and news feeds while preserving the strengths of modern recommenda...

Jun 23, 2026•35 min

AutoLike

This episode explores AutoLike, an innovative reinforcement learning framework designed to systematically audit "black box" social media recommendation systems like TikTok's "For You Page." Guest Hieu Lee explains how the tool can drive a feed towards specific benign or sensitive topics, revealing the system's personalization mechanics and potential harms. The conversation also covers the challenges of scaling such audits, the engineering behind the framework, and the crucial role AutoLike plays in informing regulators and platform designers about platform accountability and content moderation.

Jun 17, 2026•35 min

Student Spotlight: Aaron Payne, Data Analyst

Aaron Payne, an MBA student and Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to explore the practical application of business analytics. They delve into a challenging forecasting project for Comfama in Colombia, detailing data cleaning, interpretability, and the selection of advanced models like SARIMAX and ensemble methods. The conversation highlights how data science can drive meaningful social impact and offers advice for career growth.

May 01, 2026•26 min

The Future is Agentic in Recommender Systems

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations. The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommen...

Apr 25, 2026•49 min

Book Ratings and Recommendations

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

Mar 27, 2026•39 min

Disentanglement and Interpretability in Recommender Systems

Ervin Dervishaj, a PhD student at the University of Copenhagen, discusses his research on disentangled representation learning in recommender systems, finding that while disentanglement strongly correlates with interpretability, it doesn't consistently improve recommendation performance. The conversation explores how disentanglement acts as a regularizer that can enhance user trust and interpretability at the potential cost of some accuracy, and touches on the future of large language models in ...

Mar 10, 2026•31 min

Collective Altruism in Recommender Systems

Ekaterina (Kat) Fedorova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning...

Feb 27, 2026•55 min

Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

Feb 18, 2026•34 min

Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-ba...

Feb 02, 2026•27 min

Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and colla...

Jan 26, 2026•50 min

Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challeng...

Dec 26, 2025•38 min

Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research...

Dec 18, 2025•52 min

Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering fo...

Dec 08, 2025•40 min

Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net , Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities spa...

Nov 23, 2025•37 min

DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, an...

Nov 13, 2025•33 min

Shilling Attacks on Recommender Systems

Kyle Polich and Aditya Chichani discuss shilling attacks, where malicious actors use fake profiles to manipulate recommender systems, particularly collaborative filtering. They explain various attack types like segmented and bandwagon attacks, and explore detection methods such as behavioral analysis and PCA. The conversation highlights the evolving challenge, including the use of LLMs for fake reviews, and the substantial economic cost of these fraudulent activities.

Nov 05, 2025•35 min

Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal fr...

Oct 29, 2025•52 min

Bypassing the Popularity Bias

Oct 15, 2025•35 min

Sustainable Recommender Systems for Tourism

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that ...

Oct 09, 2025•38 min

Interpretable Real Estate Recommendations

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about. Dr. Mukherjee, explains how his team deve...

Sep 22, 2025•33 min

Why Am I Seeing This?

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

Sep 08, 2025•50 min

Eco-aware GNN Recommenders

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

Aug 30, 2025•45 min

Networks and Recommender Systems

Kyle reveals the next season's topic will be "Recommender Systems". Asaf shares insights on how network science contributes to the recommender system field.

Aug 17, 2025•18 min

Network of Past Guests Collaborations

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers.

Jul 21, 2025•34 min

The Network Diversion Problem

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved". At the center of the discussion is the network diversion problem, where the goal isn't to block...

Jul 06, 2025•46 min

Complex Dynamic in Networks

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. BarzelLab BarzelLab on Youtube Paper in focus: Universality in network dynamics, 2013...

Jun 28, 2025•56 min

Github Network Analysis

In this episode we'll discuss how to use Github data as a network to extract insights about teamwork. Our guest, Gabriel Ramirez, manager of the notifications team at GitHub, will show how to apply network analysis to better understand and improve collaboration within his engineering team by analyzing GitHub metadata - such as pull requests, issues, and discussions - as a bipartite graph of people and projects. Some insights we'll discuss are how network centrality measures (like eigenvector and...

Jun 22, 2025•37 min

Networks and Complexity

In this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory. In complexity theory, we are about the runtime of an algorithm based on its input size. For many graph problems, the interesting questions we want to ask take longer and longer to answer! This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.

Jun 14, 2025•18 min

Graphs for Causal AI

How to build artificial intelligence systems that understand cause and effect, moving beyond simple correlations? As we all know, correlation is not causation. "Spurious correlations" can show, for example, how rising ice cream sales might statistically link to more drownings, not because one causes the other, but due to an unobserved common cause like warm weather. Our guest, Utkarshani Jaimini, a researcher from the University of South Carolina's Artificial Intelligence Institute, tries to tac...

May 24, 2025•41 min

Power Networks

May 16, 2025•42 min

Hosted on Libsyn

For the best experience, listen in Metacast app for iOS or Android