Daily Paper Cast

Jingwen Liang, Gengyu Wang•dailypapercast.transistor.fm

We publish 10 episodes every day to discuss 10 AI research papers. Both the podcast scripts and audio are generated by AI. The 10 papers are selected from the highest-voted ones on Huggingface Daily Paper (https://huggingface.co/papers). Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, NLP, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art

Last refreshed: July 27th, 2025 at 9:35 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

PixelWorld: Towards Perceiving Everything as Pixels

🤗 Upvotes: 10 | cs.CV, cs.CL Authors: Zhiheng Lyu, Xueguang Ma, Wenhu Chen Title: PixelWorld: Towards Perceiving Everything as Pixels Arxiv: http://arxiv.org/abs/2501.19339v1 Abstract: Existing foundation models typically process visual input as pixels and textual input as tokens, a paradigm that contrasts with human perception, where both modalities are processed in a unified manner. With the rise of embodied and agentic AI, where inputs primarily come from camera pixels, the need for a unifie...

Feb 04, 2025•20 min•Ep. 471

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

🤗 Upvotes: 8 | cs.RO, cs.AI Authors: Gaoyue Zhou, Hengkai Pan, Yann LeCun, Lerrel Pinto Title: DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Arxiv: http://arxiv.org/abs/2411.04983v2 Abstract: The ability to predict future outcomes given control actions is fundamental for physical reasoning. However, such predictive models, often called world models, remains challenging to learn and are typically developed for task-specific solutions with online policy learning. ...

Feb 04, 2025•20 min•Ep. 470

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

🤗 Upvotes: 6 | cs.CL, cs.AI, cs.CR, cs.LG Authors: Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin, Peter Lofgren, Francesco Mosconi, Clare O'Hara, Catherine Olsson, Linda Petrini, Samir Rajani, Nikhil Saxena,...

Feb 04, 2025•21 min•Ep. 469

Scalable-Softmax Is Superior for Attention

🤗 Upvotes: 6 | cs.CL, cs.AI, cs.LG Authors: Ken M. Nakanishi Title: Scalable-Softmax Is Superior for Attention Arxiv: http://arxiv.org/abs/2501.19399v1 Abstract: The maximum element of the vector output by the Softmax function approaches zero as the input vector size increases. Transformer-based language models rely on Softmax to compute attention scores, causing the attention distribution to flatten as the context size grows. This reduces the model's ability to prioritize key information effec...

Feb 04, 2025•24 min•Ep. 468

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

🤗 Upvotes: 3 | cs.LG, math.OC, stat.ML Authors: Fabian Schaipp, Alexander Hägele, Adrien Taylor, Umut Simsekli, Francis Bach Title: The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Arxiv: http://arxiv.org/abs/2501.18965v1 Abstract: We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedule ...

Feb 04, 2025•22 min•Ep. 467

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

🤗 Upvotes: 3 | cs.LG, cs.AI Authors: Bartosz Cywiński, Kamil Deja Title: SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Arxiv: http://arxiv.org/abs/2501.18052v2 Abstract: Diffusion models, while powerful, can inadvertently generate harmful or undesirable content, raising significant ethical and safety concerns. Recent machine unlearning approaches offer potential solutions but often lack transparency, making it difficult to understand the changes they int...

Feb 04, 2025•20 min•Ep. 466

GuardReasoner: Towards Reasoning-based LLM Safeguards

🤗 Upvotes: 46 | cs.CR, cs.AI, cs.LG Authors: Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi Title: GuardReasoner: Towards Reasoning-based LLM Safeguards Arxiv: http://arxiv.org/abs/2501.18492v1 Abstract: As LLMs increasingly impact safety-critical applications, ensuring their safety using guardrails remains a key challenge. This paper proposes GuardReasoner, a new safeguard for LLMs, by guiding the guard model to le...

Feb 01, 2025•21 min•Ep. 465

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

🤗 Upvotes: 22 | cs.CL Authors: Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu Title: Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Arxiv: http://arxiv.org/abs/2501.18585v1 Abstract: Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thi...

Feb 01, 2025•23 min•Ep. 464

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

🤗 Upvotes: 15 | cs.CL Authors: Arthur Douillard, Yanislav Donchev, Keith Rush, Satyen Kale, Zachary Charles, Zachary Garrett, Gabriel Teston, Dave Lacey, Ross McIlroy, Jiajun Shen, Alexandre Ramé, Arthur Szlam, Marc'Aurelio Ranzato, Paul Barham Title: Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Arxiv: http://arxiv.org/abs/2501.18512v1 Abstract: Training of large language models (LLMs) is typically distributed across a large number of accelerators to reduce ...

Feb 01, 2025•24 min•Ep. 463

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

🤗 Upvotes: 15 | cs.AI, cs.CL, cs.CV, cs.LG Authors: Yuxin Zuo, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, Ermo Hua, Kaiyan Zhang, Ning Ding, Bowen Zhou Title: MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Arxiv: http://arxiv.org/abs/2501.18362v1 Abstract: We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 bod...

Feb 01, 2025•19 min•Ep. 462

Large Language Models Think Too Fast To Explore Effectively

🤗 Upvotes: 10 | cs.AI, q-bio.NC Authors: Lan Pan, Hanbo Xie, Robert C. Wilson Title: Large Language Models Think Too Fast To Explore Effectively Arxiv: http://arxiv.org/abs/2501.18009v1 Abstract: Large Language Models have emerged many intellectual capacities. While numerous benchmarks assess their intelligence, limited attention has been given to their ability to explore, an essential capacity for discovering new information and adapting to novel environments in both natural and artificial sys...

Feb 01, 2025•26 min•Ep. 461

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

🤗 Upvotes: 10 | cs.LG, cs.CL Authors: Benjamin Feuer, Chinmay Hegde Title: WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Arxiv: http://arxiv.org/abs/2501.18511v1 Abstract: Language model (LLM) post-training, from DPO to distillation, can refine behaviors and unlock new skills, but the open science supporting these post-training techniques is still in its infancy. One limiting factor has been the difficulty of conducting large-scale comparative analyses of synthetic ...

Feb 01, 2025•20 min•Ep. 460

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

🤗 Upvotes: 10 | cs.CV, cs.AI, cs.CL, cs.LG, cs.RO Authors: Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang Title: PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Arxiv: http://arxiv.org/abs/2501.16411v2 Abstract: Understanding the physical world is a fundamental challenge in embodied AI, critical for enabling agents to perform complex tasks and operate safely in real-world environments. While Vision-Language Models (VLMs) hav...

Feb 01, 2025•25 min•Ep. 459

o3-mini vs DeepSeek-R1: Which One is Safer?

🤗 Upvotes: 6 | cs.SE, cs.AI Authors: Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura Title: o3-mini vs DeepSeek-R1: Which One is Safer? Arxiv: http://arxiv.org/abs/2501.18438v1 Abstract: The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at appa...

Feb 01, 2025•20 min•Ep. 458

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

🤗 Upvotes: 1 | cs.AI, cs.CL, cs.HC Authors: Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig Title: CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation Arxiv: http://arxiv.org/abs/2501.16609v1 Abstract: While much work on web agents emphasizes the promise of autonomously performing tasks on behalf of users, in reality, agents often fall short on complex tasks in real-world contexts and modeling user preference. Thi...

Feb 01, 2025•21 min•Ep. 457

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

🤗 Upvotes: 28 | cs.CL Authors: Yubo Wang, Xiang Yue, Wenhu Chen Title: Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Arxiv: http://arxiv.org/abs/2501.17703v2 Abstract: Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we challenge this paradigm and propose Critique Fine-Tuning (CFT), a strategy where models learn to critique noisy responses rather than simply imitat...

Jan 31, 2025•23 min•Ep. 456

Atla Selene Mini: A General Purpose Evaluation Model

🤗 Upvotes: 24 | cs.CL, cs.AI Authors: Andrei Alexandru, Antonia Calvi, Henry Broomfield, Jackson Golden, Kyle Dai, Mathias Leys, Maurice Burger, Max Bartolo, Roman Engeler, Sashank Pisupati, Toby Drane, Young Sun Park Title: Atla Selene Mini: A General Purpose Evaluation Model Arxiv: http://arxiv.org/abs/2501.17195v1 Abstract: We introduce Atla Selene Mini, a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini is a general-purpose evaluator that outperforms the best SLMJs and G...

Jan 31, 2025•25 min•Ep. 455

Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts

🤗 Upvotes: 14 | cs.AI, cs.CY, cs.LG Authors: Clément Desroches, Martin Chauvin, Louis Ladan, Caroline Vateau, Simon Gosset, Philippe Cordier Title: Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Arxiv: http://arxiv.org/abs/2501.14334v2 Abstract: The rapid growth of artificial intelligence (AI), particularly Large Language Models (LLMs), has raised concerns regarding its global environmental impact that extends beyond greenhouse gas ...

Jan 31, 2025•29 min•Ep. 454

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

🤗 Upvotes: 8 | cs.SE, cs.AI Authors: Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura Title: Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation Arxiv: http://arxiv.org/abs/2501.17749v1 Abstract: Large Language Models (LLMs) have become an integral part of our daily lives. However, they impose certain risks, including those that can harm individuals' privacy, perpetuate biases and spread misinformation. These risks highligh...

Jan 31, 2025•22 min•Ep. 453

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

🤗 Upvotes: 8 | cs.CV Authors: Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu Title: Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks Arxiv: http://arxiv.org/abs/2501.15891v1 Abstract: Image-based virtual try-on (VTON) aims to generate a virtual try-on result by transferring an input garment onto a target person's image. However, the scarcity of paired garment-model data makes it challenging for existing methods to achieve h...

Jan 31, 2025•22 min•Ep. 452

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

🤗 Upvotes: 6 | cs.CR, cs.AI, cs.CL, cs.LG Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu Title: Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation Arxiv: http://arxiv.org/abs/2501.17433v1 Abstract: Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-tuning attacks -- models lose their safety alignment ability after fine-tuning on a few harmful samples. For risk mitigation, a guardrail is ty...

Jan 31, 2025•22 min•Ep. 451

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

🤗 Upvotes: 6 | cs.CL, cs.AI Authors: Jenna Russell, Marzena Karpinska, Mohit Iyyer Title: People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text Arxiv: http://arxiv.org/abs/2501.15654v1 Abstract: In this paper, we study how well humans can detect text generated by commercial LLMs (GPT-4o, Claude, o1). We hire annotators to read 300 non-fiction English articles, label them as either human-written or AI-generated, and provide paragraph-length ex...

Jan 31, 2025•20 min•Ep. 450

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

🤗 Upvotes: 29 | cs.AI, cs.CV, cs.LG Authors: Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine, Yi Ma Title: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Arxiv: http://arxiv.org/abs/2501.17161v1 Abstract: Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities...

Jan 30, 2025•23 min•Ep. 449

Optimizing Large Language Model Training Using FP4 Quantization

🤗 Upvotes: 15 | cs.LG, cs.CL Authors: Ruizhe Wang, Yeyun Gong, Xiao Liu, Guoshuai Zhao, Ziyue Yang, Baining Guo, Zhengjun Zha, Peng Cheng Title: Optimizing Large Language Model Training Using FP4 Quantization Arxiv: http://arxiv.org/abs/2501.17116v1 Abstract: The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 prec...

Jan 30, 2025•22 min•Ep. 448

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

🤗 Upvotes: 11 | cs.CV Authors: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu Title: DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Arxiv: http://arxiv.org/abs/2501.16764v1 Abstract: Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian sp...

Jan 30, 2025•23 min•Ep. 447

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

🤗 Upvotes: 10 | cs.CL, cs.LG Authors: Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou Title: Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Arxiv: http://arxiv.org/abs/2501.16975v1 Abstract: Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocab...

Jan 30, 2025•23 min•Ep. 446

Open Problems in Mechanistic Interpretability

🤗 Upvotes: 10 | cs.LG Authors: Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath Title: Open Problems ...

Jan 30, 2025•26 min•Ep. 445

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

🤗 Upvotes: 5 | cs.LG, cs.AI, cs.CL Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain Title: Low-Rank Adapters Meet Neural Architecture Search for LLM Compression Arxiv: http://arxiv.org/abs/2501.16372v1 Abstract: The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of the...

Jan 30, 2025•22 min•Ep. 444

IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

🤗 Upvotes: 4 | cs.CL, cs.AI Authors: Sankalp KJ, Ashutosh Kumar, Laxmaan Balaji, Nikunj Kotecha, Vinija Jain, Aman Chadha, Sreyoshi Bhaduri Title: IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding Arxiv: http://arxiv.org/abs/2501.15747v2 Abstract: Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage...

Jan 30, 2025•20 min•Ep. 443

Histoires Morales: A French Dataset for Assessing Moral Alignment

🤗 Upvotes: 3 | cs.CL, cs.AI Authors: Thibaud Leteno, Irina Proskurina, Antoine Gourru, Julien Velcin, Charlotte Laclau, Guillaume Metzler, Christophe Gravier Title: Histoires Morales: A French Dataset for Assessing Moral Alignment Arxiv: http://arxiv.org/abs/2501.17117v1 Abstract: Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align wit...

Jan 30, 2025•21 min•Ep. 442

← Prev Next →

Hosted on Transistor

For the best experience, listen in Metacast app for iOS or Android