Daily Paper Cast

Jingwen Liang, Gengyu Wang•dailypapercast.transistor.fm

We publish 10 episodes every day to discuss 10 AI research papers. Both the podcast scripts and audio are generated by AI. The 10 papers are selected from the highest-voted ones on Huggingface Daily Paper (https://huggingface.co/papers). Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, NLP, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art

Last refreshed: July 27th, 2025 at 9:35 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

🤗 Upvotes: 12 | cs.CL Authors: Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam Title: ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Arxiv: http://arxiv.org/abs/2502.04306v1 Abstract: Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods rema...

Feb 08, 2025•21 min•Ep. 501

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

🤗 Upvotes: 10 | eess.AS, cs.AI, cs.CL, cs.MM, cs.SD Authors: Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi DAI, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue Title: Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Arxiv: http://arxiv.org/abs/2502.04128v1 Abstract: Recent advances in text-based large language models (LLMs), parti...

Feb 08, 2025•23 min•Ep. 500

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

🤗 Upvotes: 90 | cs.CL Authors: Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíček, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Werra, Thomas Wolf Title: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Arxiv: htt...

Feb 07, 2025•22 min•Ep. 499

TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets

🤗 Upvotes: 27 | cs.CE, cs.CY Authors: Yuzhe Yang, Yifei Zhang, Minghao Wu, Kaidi Zhang, Yunmiao Zhang, Honghai Yu, Yan Hu, Benyou Wang Title: TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets Arxiv: http://arxiv.org/abs/2502.01506v2 Abstract: The study of social emergence has long been a central focus in social science. Traditional modeling approaches, such as rule-based Agent-Based Models (ABMs), struggle to capture the diversity and complexity of human behavior, pa...

Feb 07, 2025•23 min•Ep. 498

Demystifying Long Chain-of-Thought Reasoning in LLMs

🤗 Upvotes: 26 | cs.CL, cs.LG Authors: Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue Title: Demystifying Long Chain-of-Thought Reasoning in LLMs Arxiv: http://arxiv.org/abs/2502.03373v1 Abstract: Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL) has emerged as a crucial method for developing these capabilities, yet the conditions un...

Feb 07, 2025•21 min•Ep. 497

LIMO: Less is More for Reasoning

🤗 Upvotes: 24 | cs.CL, cs.AI Authors: Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu Title: LIMO: Less is More for Reasoning Arxiv: http://arxiv.org/abs/2502.03387v1 Abstract: We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reason...

Feb 07, 2025•24 min•Ep. 496

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

🤗 Upvotes: 10 | cs.CL Authors: Jinyang Wu, Mingkuan Feng, Shuai Zhang, Ruihan Jin, Feihu Che, Zengqi Wen, Jianhua Tao Title: Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Arxiv: http://arxiv.org/abs/2502.02339v1 Abstract: Multimodal large language models (MLLMs) exhibit impressive capabilities but still face challenges in complex visual reasoning. While recent efforts attempt to enhance MLLMs' reasoning by incorporating OpenAI o1-like structured thinking through explicit...

Feb 07, 2025•22 min•Ep. 495

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

🤗 Upvotes: 7 | cs.CV Authors: Yiren Song, Danze Chen, Mike Zheng Shou Title: LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Arxiv: http://arxiv.org/abs/2502.01105v1 Abstract: Generating cognitive-aligned layered SVGs remains challenging due to existing methods' tendencies toward either oversimplified single-layer outputs or optimization-induced shape redundancies. We propose LayerTracer, a diffusion transformer based framework that bridges this gap by learning de...

Feb 07, 2025•26 min•Ep. 494

On Teacher Hacking in Language Model Distillation

🤗 Upvotes: 6 | cs.LG, cs.AI, cs.CL, stat.ML Authors: Daniil Tiapkin, Daniele Calandriello, Johan Ferret, Sarah Perrin, Nino Vieillard, Alexandre Ramé, Mathieu Blondel Title: On Teacher Hacking in Language Model Distillation Arxiv: http://arxiv.org/abs/2502.02671v1 Abstract: Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learning from human feedback ...

Feb 07, 2025•21 min•Ep. 493

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

🤗 Upvotes: 5 | cs.LG, cs.AI Authors: Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava Title: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Arxiv: http://arxiv.org/abs/2502.01618v2 Abstract: Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computat...

Feb 07, 2025•24 min•Ep. 492

Jailbreaking with Universal Multi-Prompts

🤗 Upvotes: 4 | cs.CL, cs.AI, cs.CR, cs.LG Authors: Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen Title: Jailbreaking with Universal Multi-Prompts Arxiv: http://arxiv.org/abs/2502.01154v1 Abstract: Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most p...

Feb 07, 2025•20 min•Ep. 491

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

🤗 Upvotes: 29 | cs.CV Authors: Hila Chefer, Uriel Singer, Amit Zohar, Yuval Kirstain, Adam Polyak, Yaniv Taigman, Lior Wolf, Shelly Sheynin Title: VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Arxiv: http://arxiv.org/abs/2502.02492v1 Abstract: Despite tremendous recent progress, generative video models still struggle to capture real-world motion, dynamics, and physics. We show that this limitation arises from the conventional pixel reconstructi...

Feb 06, 2025•19 min•Ep. 490

Inverse Bridge Matching Distillation

🤗 Upvotes: 22 | cs.LG, cs.CV Authors: Nikita Gushchin, David Li, Daniil Selikhanovych, Evgeny Burnaev, Dmitry Baranchuk, Alexander Korotin Title: Inverse Bridge Matching Distillation Arxiv: http://arxiv.org/abs/2502.01362v1 Abstract: Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation. However, like many modern diffusion and flow models, DBM...

Feb 06, 2025•20 min•Ep. 489

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

🤗 Upvotes: 16 | cs.SE, cs.AI, cs.CL Authors: Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen Title: ACECODER: Acing Coder RL via Automated Test-Case Synthesis Arxiv: http://arxiv.org/abs/2502.01718v1 Abstract: Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain. In this paper, we add...

Feb 06, 2025•20 min•Ep. 488

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

🤗 Upvotes: 12 | cs.LG, cs.AI Authors: Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang Title: QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Arxiv: http://arxiv.org/abs/2502.02584v1 Abstract: Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training ...

Feb 06, 2025•18 min•Ep. 487

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

🤗 Upvotes: 12 | cs.CL, cs.AI Authors: Maohao Shen, Guangtao Zeng, Zhenting Qi, Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan Title: Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Arxiv: http://arxiv.org/abs/2502.02508v1 Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time comput...

Feb 06, 2025•24 min•Ep. 486

Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?

🤗 Upvotes: 7 | cs.CL, cs.LG Authors: Wenzhe Li, Yong Lin, Mengzhou Xia, Chi Jin Title: Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? Arxiv: http://arxiv.org/abs/2502.00674v1 Abstract: Ensembling outputs from diverse sources is a straightforward yet effective approach to boost performance. Mixture-of-Agents (MoA) is one such popular ensemble method that aggregates outputs from multiple different Large Language Models (LLMs). This paper raises the question in...

Feb 06, 2025•24 min•Ep. 485

COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

🤗 Upvotes: 7 | cs.CV Authors: Xueqing Deng, Qihang Yu, Ali Athar, Chenglin Yang, Linjie Yang, Xiaojie Jin, Xiaohui Shen, Liang-Chieh Chen Title: COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation Arxiv: http://arxiv.org/abs/2502.02589v1 Abstract: This paper introduces the COCONut-PanCap dataset, created to enhance panoptic segmentation and grounded image captioning. Building upon the COCO dataset with advanced COCONut panoptic masks, ...

Feb 06, 2025•25 min•Ep. 484

The Differences Between Direct Alignment Algorithms are a Blur

🤗 Upvotes: 84 | cs.LG Authors: Alexey Gorbatovski, Boris Shaposhnikov, Viacheslav Sinii, Alexey Malakhov, Daniil Gavrilov Title: The Differences Between Direct Alignment Algorithms are a Blur Arxiv: http://arxiv.org/abs/2502.01237v1 Abstract: Direct Alignment Algorithms (DAAs) simplify language model alignment by replacing reinforcement learning (RL) and reward modeling (RM) in Reinforcement Learning from Human Feedback (RLHF) with direct policy optimization. DAAs can be classified by their ran...

Feb 05, 2025•20 min•Ep. 483

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

🤗 Upvotes: 83 | cs.CV Authors: Gaojie Lin, Jianwen Jiang, Jiaqi Yang, Zerong Zheng, Chao Liang Title: OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Arxiv: http://arxiv.org/abs/2502.01061v1 Abstract: End-to-end human animation, such as audio-driven talking human generation, has undergone notable advancements in the recent few years. However, existing methods still struggle to scale up as large general video generation models, limiting their potential in r...

Feb 05, 2025•26 min•Ep. 482

Process Reinforcement through Implicit Rewards

🤗 Upvotes: 44 | cs.LG, cs.AI, cs.CL Authors: Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan Yao, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou, Ning Ding Title: Process Reinforcement through Implicit Rewards Arxiv: http://arxiv.org/abs/2502.01456v1 Abstract: Dense process rewards have proven a more effective alternative to the sparse outcome...

Feb 05, 2025•22 min•Ep. 481

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

🤗 Upvotes: 25 | cs.CL Authors: Ahmed Masry, Juan A. Rodriguez, Tianyu Zhang, Suyuchen Wang, Chao Wang, Aarash Feizi, Akshay Kalkunte Suresh, Abhay Puri, Xiangru Jian, Pierre-André Noël, Sathwik Tejaswi Madhusudhan, Marco Pedersoli, Bang Liu, Nicolas Chapados, Yoshua Bengio, Enamul Hoque, Christopher Pal, Issam H. Laradji, David Vazquez, Perouz Taslakian, Spandana Gella, Sai Rajeswar Title: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Arxiv: http://arxiv.org/...

Feb 05, 2025•23 min•Ep. 480

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

🤗 Upvotes: 25 | cs.CR, cs.AI, cs.IR Authors: Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang Title: SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Arxiv: http://arxiv.org/abs/2501.18636v1 Abstract: The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating extern...

Feb 05, 2025•24 min•Ep. 479

Preference Leakage: A Contamination Problem in LLM-as-a-judge

🤗 Upvotes: 25 | cs.LG, cs.AI, cs.CL Authors: Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu Title: Preference Leakage: A Contamination Problem in LLM-as-a-judge Arxiv: http://arxiv.org/abs/2502.01534v1 Abstract: Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of mod...

Feb 05, 2025•22 min•Ep. 478

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

🤗 Upvotes: 19 | cs.CV, cs.GR, cs.LG Authors: Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin Title: SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Arxiv: http://arxiv.org/abs/2502.01639v1 Abstract: We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for eac...

Feb 05, 2025•25 min•Ep. 477

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

🤗 Upvotes: 15 | cs.AI, cs.CV Authors: Huanqia Cai, Yijun Yang, Winston Hu Title: MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models Arxiv: http://arxiv.org/abs/2502.00698v1 Abstract: IQ testing has served as a foundational methodology for evaluating human cognitive capabilities, deliberately decoupling assessment from linguistic background, language proficiency, or domain-specific knowledge to isolate core competencies in abstraction and reasoning. Yet, artificial int...

Feb 05, 2025•25 min•Ep. 476

AIN: The Arabic INclusive Large Multimodal Model

🤗 Upvotes: 12 | cs.CV, cs.AI, cs.CL, cs.HC, cs.LG Authors: Ahmed Heakl, Sara Ghaboura, Omkar Thawkar, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan Title: AIN: The Arabic INclusive Large Multimodal Model Arxiv: http://arxiv.org/abs/2502.00094v1 Abstract: Amid the swift progress of large language models (LLMs) and their evolution into large multimodal models (LMMs), significant strides have been made in high-resource languages such as English and Chinese. While Arabic LLM...

Feb 05, 2025•21 min•Ep. 475

s1: Simple test-time scaling

🤗 Upvotes: 54 | cs.CL, cs.AI, cs.LG Authors: Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto Title: s1: Simple test-time scaling Arxiv: http://arxiv.org/abs/2501.19393v1 Abstract: Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its...

Feb 04, 2025•23 min•Ep. 474

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

🤗 Upvotes: 28 | cs.CL, cs.AI Authors: Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong Title: Reward-Guided Speculative Decoding for Efficient LLM Reasoning Arxiv: http://arxiv.org/abs/2501.19324v1 Abstract: We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target...

Feb 04, 2025•22 min•Ep. 473

Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models

🤗 Upvotes: 12 | cs.CL, cs.AI Authors: Qika Lin, Tianzhe Zhao, Kai He, Zhen Peng, Fangzhi Xu, Ling Huang, Jingying Ma, Mengling Feng Title: Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Arxiv: http://arxiv.org/abs/2501.18119v1 Abstract: Due to the presence of the natural gap between Knowledge Graph (KG) structures and the natural language, the effective integration of holistic structural information of KGs with Large Language Mode...

Feb 04, 2025•21 min•Ep. 472

← Prev Next →

Hosted on Transistor

For the best experience, listen in Metacast app for iOS or Android