Daily Paper Cast - podcast cover

Daily Paper Cast

Jingwen Liang, Gengyu Wangdailypapercast.transistor.fm
We publish 10 episodes every day to discuss 10 AI research papers. Both the podcast scripts and audio are generated by AI. The 10 papers are selected from the highest-voted ones on Huggingface Daily Paper (https://huggingface.co/papers). Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, NLP, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

🤗 Upvotes: 46 | cs.LG, cs.AI Authors: Chenwei Lou, Zewei Sun, Xinnian Liang, Meng Qu, Wei Shen, Wenqi Wang, Yuntao Li, Qingping Yang, Shuangzhi Wu Title: AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Arxiv: http://arxiv.org/abs/2505.11896v1 Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities but often face challenges with tasks requiring sophisticated reasoning. While Chain-of-Thought (CoT) prompting significantly enhances re...

May 21, 202521 minEp. 771

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

🤗 Upvotes: 39 | cs.LG Authors: Jeffrey Willette, Heejun Lee, Sung Ju Hwang Title: Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Arxiv: http://arxiv.org/abs/2505.11254v1 Abstract: The attention mechanism of a transformer has a quadratic complexity, leading to high inference costs and latency for long sequences. However, attention matrices are mostly sparse, which implies that many entries may be omitted from computation for efficient inference. Sparse attentio...

May 21, 202521 minEp. 770

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

🤗 Upvotes: 34 | cs.AI, cs.CL, cs.CV, cs.HC Authors: Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong Title: Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Arxiv: http://arxiv.org/abs/2505.13227v1 Abstract: Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user...

May 21, 202522 minEp. 769

Faster Video Diffusion with Trainable Sparse Attention

🤗 Upvotes: 29 | cs.CV Authors: Peiyuan Zhang, Haofeng Huang, Yongqi Chen, Will Lin, Zhengzhong Liu, Ion Stoica, Eric P. Xing, Hao Zhang Title: Faster Video Diffusion with Trainable Sparse Attention Arxiv: http://arxiv.org/abs/2505.13389v1 Abstract: Scaling video diffusion transformers (DiTs) is limited by their quadratic 3D attention, even though most of the attention mass concentrates on a small subset of positions. We turn this observation into VSA, a trainable, hardware-efficient sparse atte...

May 21, 202526 minEp. 768

Thinkless: LLM Learns When to Think

🤗 Upvotes: 28 | cs.CL, cs.AI Authors: Gongfan Fang, Xinyin Ma, Xinchao Wang Title: Thinkless: LLM Learns When to Think Arxiv: http://arxiv.org/abs/2505.13379v1 Abstract: Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference. However, applying elaborate reasoning for all queries often results in substantial computational inefficiencies, particularly when many problems admit straightforward ...

May 21, 202518 minEp. 767

Model Merging in Pre-training of Large Language Models

🤗 Upvotes: 27 | cs.CL, cs.LG Authors: Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Zhou Xun, Siyuan Qiao, Liang Xiang, Yonghui Wu Title: Model Merging in Pre-training of Large Language Models Arxiv: http://arxiv.org/abs/2505.12082v2 Abstract: Model merging has emerged as a promising technique for ...

May 21, 202523 minEp. 766

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

🤗 Upvotes: 23 | cs.LG, cs.AI, cs.CL Authors: Hengli Li, Chenxi Li, Tong Wu, Xuekai Zhu, Yuxuan Wang, Zhaoxin Yu, Eric Hanchen Jiang, Song-Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng Title: Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Arxiv: http://arxiv.org/abs/2505.13308v1 Abstract: Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although ...

May 21, 202525 minEp. 765

Qwen3 Technical Report

🤗 Upvotes: 117 | cs.CL Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng Wang, Qin Zhu, Rui Men, Ruize Gao, Shix...

May 20, 202522 minEp. 764

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

🤗 Upvotes: 43 | cs.AI, cs.CR Authors: Yue Liu, Shengfang Zhai, Mingzhe Du, Yulin Chen, Tri Cao, Hongcheng Gao, Cheng Wang, Xinfeng Li, Kun Wang, Junfeng Fang, Jiaheng Zhang, Bryan Hooi Title: GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Arxiv: http://arxiv.org/abs/2505.11049v1 Abstract: To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason befo...

May 20, 202524 minEp. 763

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

🤗 Upvotes: 42 | cs.CV, cs.CL Authors: Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman Title: MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Arxiv: http://arxiv.org/abs/2505.10610v1 Abstract: The rapid extension of context windows in large vision-language models has given rise to long-context vision-language models (LCVLMs), which are capable of ha...

May 20, 202519 minEp. 762

Visual Planning: Let's Think Only with Images

🤗 Upvotes: 33 | cs.LG, cs.AI, cs.CL, cs.CV Authors: Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang, Anna Korhonen, Ivan Vulić Title: Visual Planning: Let's Think Only with Images Arxiv: http://arxiv.org/abs/2505.11409v1 Abstract: Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely on pure text as the medium for both expressing and structuri...

May 20, 202522 minEp. 761

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

🤗 Upvotes: 76 | cs.CL Authors: Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li Title: Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Arxiv: http://arxiv.org/abs/2505.10554v1 Abstract: Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning. Prior work has shown that outcome-based reinforcement learning (RL) can incidentally elicit advanced reasoning behaviors such as s...

May 17, 202522 minEp. 760

System Prompt Optimization with Meta-Learning

🤗 Upvotes: 48 | cs.CL, cs.AI, cs.LG Authors: Yumin Choi, Jinheon Baek, Sung Ju Hwang Title: System Prompt Optimization with Meta-Learning Arxiv: http://arxiv.org/abs/2505.09666v1 Abstract: Large Language Models (LLMs) have shown remarkable capabilities, with optimizing their input prompts playing a pivotal role in maximizing their performance. However, while LLM prompts consist of both the task-agnostic system prompts and task-specific user prompts, existing work on prompt optimization has focu...

May 17, 202522 minEp. 759

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

🤗 Upvotes: 49 | cs.CV, cs.AI Authors: Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, Ran Xu Title: BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Arxiv: http://arxiv.org/abs/2505.09568v1 Abstract: Unifying image understanding and generation has gained growing attention in recent research on multimodal models. Although design choices for image unders...

May 16, 202519 minEp. 758

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

🤗 Upvotes: 36 | cs.CV Authors: Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian Title: DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Arxiv: http://arxiv.org/abs/2505.04410v1 Abstract: Dense visual prediction tasks have been constrained by their reliance on predefined categories, limiting their applicability in real-world scenarios where visual concepts are unbounded. While Vision-Language Models (VLMs) like CLIP have shown promise in open-vocabulary tasks, t...

May 16, 202519 minEp. 757

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

🤗 Upvotes: 26 | cs.DC, cs.AI, cs.AR Authors: Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei Title: Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Arxiv: http://arxiv.org/abs/2505.09343v1 Abstract: The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architec...

May 16, 202522 minEp. 756

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

🤗 Upvotes: 83 | eess.AS, cs.SD Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He Title: MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Arxiv: http://arxiv.org/abs/2505.07916v1 Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-t...

May 15, 202522 minEp. 755

Seed1.5-VL Technical Report

🤗 Upvotes: 86 | cs.CV, cs.AI Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui ...

May 14, 202521 minEp. 754

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

🤗 Upvotes: 53 | cs.CL, cs.AI, cs.LG Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, QingKai Fang, Kang Zhou, Kangyang Zhou, Lei Li...

May 14, 202522 minEp. 753

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

🤗 Upvotes: 48 | cs.CV Authors: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan Title: Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Arxiv: http://arxiv.org/abs/2505.07747v1 Abstract: While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D gen...

May 14, 202522 minEp. 752

Learning from Peers in Reasoning Models

🤗 Upvotes: 34 | cs.CL Authors: Tongxu Luo, Wenyu Du, Jiaxi Bi, Stephen Chung, Zhengyang Tang, Hao Yang, Min Zhang, Benyou Wang Title: Learning from Peers in Reasoning Models Arxiv: http://arxiv.org/abs/2505.07787v1 Abstract: Large Reasoning Models (LRMs) have the ability to self-correct even when they make mistakes in their reasoning paths. However, our study reveals that when the reasoning process starts with a short but poor beginning, it becomes difficult for the model to recover. We refer t...

May 14, 202521 minEp. 751

Unified Continuous Generative Models

🤗 Upvotes: 32 | cs.LG, cs.AI, cs.CV Authors: Peng Sun, Yi Jiang, Tao Lin Title: Unified Continuous Generative Models Arxiv: http://arxiv.org/abs/2505.07447v1 Abstract: Recent advances in continuous generative models, including multi-step approaches like diffusion and flow-matching (typically requiring 8-1000 sampling steps) and few-step methods such as consistency models (typically 1-8 steps), have demonstrated impressive generative performance. However, existing work often treats these approac...

May 14, 202518 minEp. 750

REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

🤗 Upvotes: 26 | cs.CL Authors: Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal Title: REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback Arxiv: http://arxiv.org/abs/2505.06548v1 Abstract: Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data...

May 14, 202521 minEp. 749

Bielik v3 Small: Technical Report

🤗 Upvotes: 53 | cs.LG, cs.AI, cs.CL, 68T50, I.2.7 Authors: Krzysztof Ociepa, Łukasz Flis, Remigiusz Kinas, Krzysztof Wróbel, Adrian Gwoździej Title: Bielik v3 Small: Technical Report Arxiv: http://arxiv.org/abs/2505.02550v2 Abstract: We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing. These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterpar...

May 13, 202525 minEp. 748

Bielik 11B v2 Technical Report

🤗 Upvotes: 44 | cs.CL, cs.AI, 68T50, I.2.7 Authors: Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej, Remigiusz Kinas Title: Bielik 11B v2 Technical Report Arxiv: http://arxiv.org/abs/2505.02410v2 Abstract: We present Bielik 11B v2, a state-of-the-art language model optimized for Polish text processing. Built on the Mistral 7B v0.2 architecture and scaled to 11B parameters using depth up-scaling, this model demonstrates exceptional performance across Polish language benchmarks ...

May 13, 202524 minEp. 747

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

🤗 Upvotes: 79 | cs.CV, cs.CL Authors: Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang Title: Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Arxiv: http://arxiv.org/abs/2505.04921v1 Abstract: Reasoning lies at the heart of int...

May 10, 202523 minEp. 746

On Path to Multimodal Generalist: General-Level and General-Bench

🤗 Upvotes: 55 | cs.CV Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, Hanwang Zhang Title: On Path to Multimodal Generalist: General-Level and General-Bench Ar...

May 10, 202521 minEp. 745

Flow-GRPO: Training Flow Matching Models via Online RL

🤗 Upvotes: 36 | cs.CV, cs.AI Authors: Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang Title: Flow-GRPO: Training Flow Matching Models via Online RL Arxiv: http://arxiv.org/abs/2505.05470v1 Abstract: We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equ...

May 10, 202524 minEp. 744

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

🤗 Upvotes: 57 | cs.CV Authors: Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang Title: Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Arxiv: http://arxiv.org/abs/2505.02567v2 Abstract: Recent years have seen remarkable progress in both multimodal understanding models and image generation models. Despite their respective successes, these two domains have evolved indepen...

May 09, 202518 minEp. 743

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

🤗 Upvotes: 35 | cs.CL Authors: Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang Title: ZeroSearch: Incentivize the Search Capability of LLMs without Searching Arxiv: http://arxiv.org/abs/2505.04588v1 Abstract: Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities b...

May 09, 202522 minEp. 742
Hosted on Transistor
For the best experience, listen in Metacast app for iOS or Android