Daily Paper Cast - podcast cover

Daily Paper Cast

Jingwen Liang, Gengyu Wangβ€’dailypapercast.transistor.fm
We publish 10 episodes every day to discuss 10 AI research papers. Both the podcast scripts and audio are generated by AI. The 10 papers are selected from the highest-voted ones on Huggingface Daily Paper (https://huggingface.co/papers). Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, NLP, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art
Last refreshed: β“˜
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Distilling LLM Agent into Small Models with Retrieval and Code Tools

πŸ€— Upvotes: 49 | cs.CL, cs.AI Authors: Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang Title: Distilling LLM Agent into Small Models with Retrieval and Code Tools Arxiv: http://arxiv.org/abs/2505.17612v1 Abstract: Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of...

May 27, 2025β€’22 minβ€’Ep. 801

QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization

πŸ€— Upvotes: 39 | cs.CL Authors: Weizhou Shen, Chenliang Li, Fanqi Wan, Shengyi Liao, Shaopeng Lai, Bo Zhang, Yingcheng Shi, Yuning Wu, Gang Fu, Zhansheng Li, Bin Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan Title: QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization Arxiv: http://arxiv.org/abs/2505.18092v1 Abstract: This technical report presents QwenLong-CPRS, a context compression framework designed for explicit long-context optimization, addressing prohibitive computati...

May 27, 2025β€’23 minβ€’Ep. 800

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

πŸ€— Upvotes: 38 | cs.AI Authors: Hui Shen, Taiqiang Wu, Qi Han, Yunta Hsieh, Jizhou Wang, Yuyue Zhang, Yuxin Cheng, Zijian Hao, Yuansheng Ni, Xin Wang, Zhongwei Wan, Kai Zhang, Wendong Xu, Jing Xiong, Ping Luo, Wenhu Chen, Chaofan Tao, Zhuoqing Mao, Ngai Wong Title: PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Arxiv: http://arxiv.org/abs/2505.15929v1 Abstract: Existing benchmarks fail to capture a crucial aspect of intelligence: physical reasoning, the integrated ability to combi...

May 27, 2025β€’22 minβ€’Ep. 799

Scaling Image and Video Generation via Test-Time Evolutionary Search

πŸ€— Upvotes: 33 | cs.CV, cs.AI, cs.LG Authors: Haoran He, Jiajun Liang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Ling Pan Title: Scaling Image and Video Generation via Test-Time Evolutionary Search Arxiv: http://arxiv.org/abs/2505.17618v1 Abstract: As the marginal cost of scaling computation (data and parameters) during model pre-training continues to increase substantially, test-time scaling (TTS) has emerged as a promising direction for improving generative model performance by allocating a...

May 27, 2025β€’25 minβ€’Ep. 798

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

πŸ€— Upvotes: 25 | cs.CL, cs.AI, cs.CE Authors: Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang Title: MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback Arxiv: http://arxiv.org/abs/2505.17873v1 Abstract: Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existi...

May 27, 2025β€’20 minβ€’Ep. 797

NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

πŸ€— Upvotes: 86 | cs.AI, cs.CL, cs.CV Authors: NovelSeek Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Zhiyin Yu, Xiaohan He, Songtao Huang, Shaowei Hou, Zheng Nie, Zhilong Wang, Jinyao Liu, Runmin Ma, Tianshuo Peng, Peng Ye, Dongzhan Zhou, Shufei Zhang, Xiaosong Wang, Yilan Zhang, Meng Li, Zhongying Tu, Xiangyu Yue, Wangli Ouyang, Bowen Zhou, Lei Bai Title: NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Arxiv: http://arxiv....

May 24, 2025β€’21 minβ€’Ep. 796

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

πŸ€— Upvotes: 49 | cs.CL, cs.AI Authors: Tingchen Fu, Jiawei Gu, Yafu Li, Xiaoye Qu, Yu Cheng Title: Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Arxiv: http://arxiv.org/abs/2505.14810v1 Abstract: Instruction-following is essential for aligning large language models (LLMs) with user intent. While recent reasoning-oriented models exhibit impressive performance on complex mathematical problems, their ability to adhere to natural language instructions ...

May 24, 2025β€’24 minβ€’Ep. 795

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

πŸ€— Upvotes: 43 | cs.CL, cs.AI, cs.LG Authors: Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen Title: Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Arxiv: http://arxiv.org/abs/2505.16410v1 Abstract: Recently, large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL). However, leveraging the RL algorithm to empower effective mu...

May 24, 2025β€’22 minβ€’Ep. 794

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

πŸ€— Upvotes: 37 | cs.CV, cs.AI, cs.CL Authors: Alex Su, Haozhe Wang, Weimin Ren, Fangzhen Lin, Wenhu Chen Title: Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Arxiv: http://arxiv.org/abs/2505.15966v1 Abstract: Chain-of-thought reasoning has significantly improved the performance of Large Language Models (LLMs) across various domains. However, this reasoning process has been confined exclusively to textual space, limiting its effectiveness in visu...

May 24, 2025β€’18 minβ€’Ep. 793

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

πŸ€— Upvotes: 36 | cs.CV Authors: Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu, Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang Title: KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Arxiv: http://arxiv.org/abs/2505.16707v1 Abstract: Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based ...

May 24, 2025β€’21 minβ€’Ep. 792

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

πŸ€— Upvotes: 31 | cs.CV, cs.AI Authors: Benjamin Schneider, Dongfu Jiang, Chao Du, Tianyu Pang, Wenhu Chen Title: QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design Arxiv: http://arxiv.org/abs/2505.16175v1 Abstract: Long-video understanding has emerged as a crucial capability in real-world applications such as video surveillance, meeting summarization, educational lecture analysis, and sports broadcasting. However, it remains computationally prohibitive for VideoLLMs, ...

May 24, 2025β€’20 minβ€’Ep. 791

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

πŸ€— Upvotes: 23 | cs.CV, cs.AI, cs.CL, cs.LG, cs.MM Authors: Chengqi Duan, Rongyao Fang, Yuqing Wang, Kun Wang, Linjiang Huang, Xingyu Zeng, Hongsheng Li, Xihui Liu Title: GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Arxiv: http://arxiv.org/abs/2505.17022v1 Abstract: Visual generation models have made remarkable progress in creating realistic images from text prompts, yet struggle with complex prompts that specify multiple objects with precise ...

May 24, 2025β€’25 minβ€’Ep. 790

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

πŸ€— Upvotes: 22 | cs.LG, cs.CL, cs.CV Authors: Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, Chongxuan Li Title: LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Arxiv: http://arxiv.org/abs/2505.16933v1 Abstract: In this work, we introduce LLaDA-V, a purely diffusion-based Multimodal Large Language Model (MLLM) that integrates visual instruction tuning with masked diffusion models, representing a departure from the autoregressive paradigms domi...

May 24, 2025β€’20 minβ€’Ep. 789

Scaling Diffusion Transformers Efficiently via $ΞΌ$P

πŸ€— Upvotes: 21 | cs.LG, cs.AI, cs.CV Authors: Chenyu Zheng, Xinyu Zhang, Rongzhen Wang, Wei Huang, Zhi Tian, Weilin Huang, Jun Zhu, Chongxuan Li Title: Scaling Diffusion Transformers Efficiently via $ΞΌ$P Arxiv: http://arxiv.org/abs/2505.15270v1 Abstract: Diffusion Transformers have emerged as the foundation for vision generative models, but their scalability is limited by the high cost of hyperparameter (HP) tuning at large scales. Recently, Maximal Update Parametrization ($\mu$P) was proposed f...

May 24, 2025β€’23 minβ€’Ep. 788

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

πŸ€— Upvotes: 80 | cs.CL Authors: Hyungjoo Chae, Sunghwan Kim, Junhee Cho, Seungone Kim, Seungjun Moon, Gyeom Hwangbo, Dongha Lim, Minjin Kim, Yeonjun Hwang, Minju Gwak, Dongwook Choi, Minseok Kang, Gwanhoon Im, ByeongUng Cho, Hyojun Kim, Jun Hee Han, Taeyoon Kwon, Minju Kim, Beong-woo Kwak, Dongjin Kang, Jinyoung Yeo Title: Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Arxiv: http://arxiv.org/abs/2505.15277v1 Abstract: Web navigation is a unique domain that can automate many repetitive ...

May 23, 2025β€’23 minβ€’Ep. 787

MMaDA: Multimodal Large Diffusion Language Models

πŸ€— Upvotes: 56 | cs.CV Authors: Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang Title: MMaDA: Multimodal Large Diffusion Language Models Arxiv: http://arxiv.org/abs/2505.15809v1 Abstract: We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i...

May 23, 2025β€’21 minβ€’Ep. 786

Scaling Law for Quantization-Aware Training

πŸ€— Upvotes: 56 | cs.LG, cs.CL Authors: Mengzhao Chen, Chaoyi Zhang, Jing Liu, Yutao Zeng, Zeyue Xue, Zhiheng Liu, Yunshui Li, Jin Ma, Jie Huang, Xun Zhou, Ping Luo Title: Scaling Law for Quantization-Aware Training Arxiv: http://arxiv.org/abs/2505.14302v1 Abstract: Large language models (LLMs) demand substantial computational and memory resources, creating deployment challenges. Quantization-aware training (QAT) addresses these challenges by reducing model precision while maintaining performance...

May 23, 2025β€’20 minβ€’Ep. 785

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

πŸ€— Upvotes: 43 | cs.CV Authors: Sule Bai, Mingxing Li, Yong Liu, Jing Tang, Haoji Zhang, Lei Sun, Xiangxiang Chu, Yansong Tang Title: UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Arxiv: http://arxiv.org/abs/2505.14231v1 Abstract: Traditional visual grounding methods primarily focus on single-image scenarios with simple textual references. However, extending these methods to real-world scenarios that involve implicit and complex instructions, particularly in c...

May 23, 2025β€’18 minβ€’Ep. 784

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

πŸ€— Upvotes: 41 | cs.CL Authors: Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao Title: Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Arxiv: http://arxiv.org/abs/2505.15045v1 Abstract: Large language model (LLM)-based embedding models, benefiting from large scale pre-training and post-training, have begun to surpass BERT and T5-based models on general-purpose text embedding tasks such as document retrieval. However, a fundamental limitation ...

May 23, 2025β€’21 minβ€’Ep. 783

Efficient Agent Training for Computer Use

πŸ€— Upvotes: 32 | cs.AI, cs.CL, cs.LG Authors: Yanheng He, Jiahe Jin, Pengfei Liu Title: Efficient Agent Training for Computer Use Arxiv: http://arxiv.org/abs/2505.13909v1 Abstract: Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajecto...

May 23, 2025β€’23 minβ€’Ep. 782

This Time is Different: An Observability Perspective on Time Series Foundation Models

πŸ€— Upvotes: 28 | cs.LG, cs.AI Authors: Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise RamΓ©, Qiqi Ren, Afshin Rostamizadeh, Jean Ogier du Terrail, Anna-Monica Toon, Kan Wang, Stephan Xie, David Asker, Ameet Talwalkar, Othmane Abou-Amal Title: This Time is Different: An Observability Perspective on Time Series Foundation Models Arxiv: http://arxiv.org/abs/2505.14766v1 Abstract: We introduce Toto, a time series forecasting founda...

May 23, 2025β€’22 minβ€’Ep. 781

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

πŸ€— Upvotes: 24 | cs.CL, cs.AI, cs.LG Authors: Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He Title: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Arxiv: http://arxiv.org/abs/2505.15612v1 Abstract: Large Reasoning Models (LRMs) have shown remarkable capabilities in solving complex problems through reinforcement learning (RL), particularly by generating long reasoning traces. However, these extended outputs often exhibit ...

May 23, 2025β€’18 minβ€’Ep. 780

Emerging Properties in Unified Multimodal Pretraining

πŸ€— Upvotes: 87 | cs.CV Authors: Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, Guang Shi, Haoqi Fan Title: Emerging Properties in Unified Multimodal Pretraining Arxiv: http://arxiv.org/abs/2505.14683v1 Abstract: Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open0source foundational model that natively supports multim...

May 22, 2025β€’23 minβ€’Ep. 779

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

πŸ€— Upvotes: 48 | cs.LG, cs.AI, cs.AR, cs.CV, cs.PF Authors: Jintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jun Zhu, Jianfei Chen Title: SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Arxiv: http://arxiv.org/abs/2505.11594v1 Abstract: The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP...

May 22, 2025β€’21 minβ€’Ep. 778

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

πŸ€— Upvotes: 30 | cs.LG, cs.AI, cs.CL Authors: Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin Title: Optimizing Anytime Reasoning via Budget Relative Policy Optimization Arxiv: http://arxiv.org/abs/2505.13438v1 Abstract: Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, su...

May 22, 2025β€’22 minβ€’Ep. 777

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

πŸ€— Upvotes: 28 | cs.CV Authors: Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, Kede Ma Title: VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Arxiv: http://arxiv.org/abs/2505.14460v1 Abstract: DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been t...

May 22, 2025β€’21 minβ€’Ep. 776

Visual Agentic Reinforcement Fine-Tuning

πŸ€— Upvotes: 26 | cs.CV, cs.AI Authors: Ziyu Liu, Yuhang Zang, Yushan Zou, Zijian Liang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi Wang Title: Visual Agentic Reinforcement Fine-Tuning Arxiv: http://arxiv.org/abs/2505.14246v1 Abstract: A key trend in Large Reasoning Models (e.g., OpenAI's o3) is the native agentic ability to use external tools such as web browsers for searching and writing/executing code for image manipulation to think with images. In the open-source research communi...

May 22, 2025β€’24 minβ€’Ep. 775

Neurosymbolic Diffusion Models

πŸ€— Upvotes: 25 | cs.LG Authors: Emile van Krieken, Pasquale Minervini, Edoardo Ponti, Antonio Vergari Title: Neurosymbolic Diffusion Models Arxiv: http://arxiv.org/abs/2505.13138v1 Abstract: Neurosymbolic (NeSy) predictors combine neural perception with symbolic reasoning to solve tasks like visual reasoning. However, standard NeSy predictors assume conditional independence between the symbols they extract, thus limiting their ability to model interactions and uncertainty - often leading to over...

May 22, 2025β€’24 minβ€’Ep. 774

Chain-of-Model Learning for Language Model

πŸ€— Upvotes: 70 | cs.CL Authors: Kaitao Song, Xiaohua Wang, Xu Tan, Huiqiang Jiang, Chengruidong Zhang, Yongliang Shen, Cen LU, Zihao Li, Zifan Song, Caihua Shan, Yansen Wang, Kan Ren, Xiaoqing Zheng, Tao Qin, Yuqing Yang, Dongsheng Li, Lili Qiu Title: Chain-of-Model Learning for Language Model Arxiv: http://arxiv.org/abs/2505.11820v1 Abstract: In this paper, we propose a novel learning paradigm, termed Chain-of-Model (CoM), which incorporates the causal relationship into the hidden states of eac...

May 21, 2025β€’24 minβ€’Ep. 773

AdaptThink: Reasoning Models Can Learn When to Think

πŸ€— Upvotes: 58 | cs.CL, cs.AI, cs.LG Authors: Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li Title: AdaptThink: Reasoning Models Can Learn When to Think Arxiv: http://arxiv.org/abs/2505.13417v1 Abstract: Recently, large reasoning models have achieved impressive performance on various tasks by employing human-like deep thinking. However, the lengthy thinking process substantially increases inference overhead, making efficiency a critical bottleneck. In this work, we first demonstrate tha...

May 21, 2025β€’21 minβ€’Ep. 772
Hosted on Transistor
For the best experience, listen in Metacast app for iOS or Android