Daily Paper Cast

Jingwen Liang, Gengyu Wang•dailypapercast.transistor.fm

We publish 10 episodes every day to discuss 10 AI research papers. Both the podcast scripts and audio are generated by AI. The 10 papers are selected from the highest-voted ones on Huggingface Daily Paper (https://huggingface.co/papers). Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, NLP, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art

Last refreshed: July 27th, 2025 at 9:35 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

LongRoPE2: Near-Lossless LLM Context Window Scaling

🤗 Upvotes: 21 | cs.CL Authors: Ning Shang, Li Lyna Zhang, Siyuan Wang, Gaokai Zhang, Gilsinia Lopez, Fan Yang, Weizhu Chen, Mao Yang Title: LongRoPE2: Near-Lossless LLM Context Window Scaling Arxiv: http://arxiv.org/abs/2502.20082v1 Abstract: LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length, while preserving the performance on the original shorter context window. This is achieved by three contributions: (1) ...

Mar 01, 2025•23 min•Ep. 621

FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

🤗 Upvotes: 19 | cs.CL Authors: Guizhen Chen, Weiwen Xu, Hao Zhang, Hou Pong Chan, Chaoqun Liu, Lidong Bing, Deli Zhao, Anh Tuan Luu, Yu Rong Title: FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Arxiv: http://arxiv.org/abs/2502.20238v1 Abstract: Many challenging reasoning tasks require not just rapid, intuitive responses, but a more deliberate, multi-step approach. Recent progress in large language models (LLMs) highlights an important shift fr...

Mar 01, 2025•26 min•Ep. 620

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

🤗 Upvotes: 15 | cs.CL, cs.AI, cs.SE Authors: Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan, Zhou Zhao, Xuanhua Shi, Dongping Chen Title: CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale Arxiv: http://arxiv.org/abs/2502.16645v1 Abstract: Large Language Models (LLMs) have exhibited exceptional performance in software engineering yet face challenges in adapting to continually evolving code knowledge, particularly regarding the frequ...

Mar 01, 2025•22 min•Ep. 619

UniTok: A Unified Tokenizer for Visual Generation and Understanding

🤗 Upvotes: 15 | cs.CV, cs.AI Authors: Chuofan Ma, Yi Jiang, Junfeng Wu, Jihan Yang, Xin Yu, Zehuan Yuan, Bingyue Peng, Xiaojuan Qi Title: UniTok: A Unified Tokenizer for Visual Generation and Understanding Arxiv: http://arxiv.org/abs/2502.20321v1 Abstract: The representation disparity between visual generation and understanding imposes a critical gap in integrating these capabilities into a single framework. To bridge this gap, we introduce UniTok, a discrete visual tokenizer that encodes fine-...

Mar 01, 2025•25 min•Ep. 618

NeoBERT: A Next-Generation BERT

🤗 Upvotes: 11 | cs.CL, cs.AI Authors: Lola Le Breton, Quentin Fournier, Mariam El Mezouar, Sarath Chandar Title: NeoBERT: A Next-Generation BERT Arxiv: http://arxiv.org/abs/2502.19587v1 Abstract: Recent innovations in architecture, pre-training, and fine-tuning have led to the remarkable in-context learning and reasoning abilities of large auto-regressive language models such as LLaMA and DeepSeek. In contrast, encoders like BERT and RoBERTa have not seen the same level of progress despite bein...

Mar 01, 2025•24 min•Ep. 617

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

🤗 Upvotes: 9 | cs.LG, cs.AI Authors: Chenghua Huang, Lu Wang, Fangkai Yang, Pu Zhao, Zhixu Li, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang Title: Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance Arxiv: http://arxiv.org/abs/2502.16944v1 Abstract: Proximal Policy Optimization (PPO)-based Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human preferences. It requires joint training of an actor and ...

Mar 01, 2025•22 min•Ep. 616

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

🤗 Upvotes: 9 | cs.CV, cs.CL Authors: Liang Chen, Shuai Bai, Wenhao Chai, Weichu Xie, Haozhe Zhao, Leon Vinci, Junyang Lin, Baobao Chang Title: Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Arxiv: http://arxiv.org/abs/2502.20172v1 Abstract: The field of advanced text-to-image generation is witnessing the emergence of unified frameworks that integrate powerful text encoders, such as CLIP and T5, with Diffusion Transformer backbon...

Mar 01, 2025•22 min•Ep. 615

GHOST 2.0: generative high-fidelity one shot transfer of heads

🤗 Upvotes: 49 | cs.CV Authors: Alexander Groshev, Anastasiia Iashchenko, Pavel Paramonov, Denis Dimitrov, Andrey Kuznetsov Title: GHOST 2.0: generative high-fidelity one shot transfer of heads Arxiv: http://arxiv.org/abs/2502.18417v3 Abstract: While the task of face swapping has recently gained attention in the research community, a related problem of head swapping remains largely unexplored. In addition to skin color transfer, head swap poses extra challenges, such as the need to preserve stru...

Feb 28, 2025•19 min•Ep. 614

Kanana: Compute-efficient Bilingual Language Models

🤗 Upvotes: 47 | cs.CL, cs.LG Authors: Kanana LLM Team, Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon On, Seulye Baeg, Junrae Cho, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee, Minchul Lee, Miok Lee, Shinbok Lee, Gaeun Seo Title: Kanana: Compute-efficient Bilingual Language Models Arxiv: http://arxiv.org/a...

Feb 28, 2025•22 min•Ep. 613

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

🤗 Upvotes: 32 | cs.AI, cs.CL, cs.CV, cs.MM Authors: Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen Title: TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Arxiv: http://arxiv.org/abs/2502.19400v1 Abstract: Understanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstra...

Feb 28, 2025•22 min•Ep. 612

Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance

🤗 Upvotes: 27 | cs.CL Authors: Xueqing Peng, Triantafillos Papadopoulos, Efstathia Soufleri, Polydoros Giannouris, Ruoyu Xiang, Yan Wang, Lingfei Qian, Jimin Huang, Qianqian Xie, Sophia Ananiadou Title: Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance Arxiv: http://arxiv.org/abs/2502.18772v1 Abstract: Despite Greece's pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek...

Feb 28, 2025•25 min•Ep. 611

Language Models' Factuality Depends on the Language of Inquiry

🤗 Upvotes: 19 | cs.CL, cs.AI Authors: Tushar Aggarwal, Kumar Tanmay, Ayush Agrawal, Kumar Ayush, Hamid Palangi, Paul Pu Liang Title: Language Models' Factuality Depends on the Language of Inquiry Arxiv: http://arxiv.org/abs/2502.17955v1 Abstract: Multilingual language models (LMs) are expected to recall factual knowledge consistently across languages, yet they often fail to transfer knowledge between languages even when they possess the correct information in one of the languages. For example, ...

Feb 28, 2025•22 min•Ep. 610

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

🤗 Upvotes: 16 | cs.CL Authors: Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhicheng Zheng, Wenbo Su, Bo Zheng Title: Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Arxiv: http://arxiv.org/abs/2502.19361v2 Abstract: Recently, o1-like models have drawn significant attention, where these models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Langu...

Feb 28, 2025•24 min•Ep. 609

Towards an AI co-scientist

🤗 Upvotes: 15 | cs.AI, cs.CL, cs.HC, cs.LG, physics.soc-ph, q-bio.OT Authors: Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, Khaled Saab, Dan Popovici, Jacob Blum, Fan Zhang, Katherine Chou, Avinatan Hassidim, Burak Gokturk, Amin Vahdat, Pushmeet Kohli, Yossi Matias, Andrew Carroll, Kavita Kulkarni, Nenad Tomasev, Yuan Guan, Vikram Dhillon, Eeshit Dhaval Vaishnav, Byron Lee, Tiago R D Costa...

Feb 28, 2025•25 min•Ep. 608

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

🤗 Upvotes: 15 | cs.CL, cs.AI Authors: Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi Li Title: Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Arxiv: http://arxiv.org/abs/2502.19328v1 Abstract: Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correct...

Feb 28, 2025•18 min•Ep. 607

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

🤗 Upvotes: 13 | cs.LG, cs.SE Authors: Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu Title: Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Arxiv: http://arxiv.org/abs/2502.19414v1 Abstract: There is growing excitement about the potential of Language Models (LMs) to accelerate scientific discovery. Falsifying hypotheses is key to scientific progress, as it allows claims to be iteratively refined over t...

Feb 28, 2025•23 min•Ep. 606

Rank1: Test-Time Compute for Reranking in Information Retrieval

🤗 Upvotes: 11 | cs.IR, cs.CL, cs.LG Authors: Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme Title: Rank1: Test-Time Compute for Reranking in Information Retrieval Arxiv: http://arxiv.org/abs/2502.18418v1 Abstract: We introduce Rank1, the first reranking model trained to take advantage of test-time compute. Rank1 demonstrates the applicability within retrieval of using a reasoning language model (i.e. OpenAI's o1, Deepseek's R1, etc.) for distillation in ...

Feb 28, 2025•20 min•Ep. 605

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

🤗 Upvotes: 122 | cs.CL, cs.AI, cs.LG Authors: Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon, Vincent Moens, Amar Budhiraja, Despoina Magka, Vladislav Vorotilov, Gaurav Chaurasia, Dieuwke Hupkes, Ricardo Silveira Cabral, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach, William Yang Wang, Roberta Raileanu Title: MLGym: A New Framework and Benchmark for Advancing AI Research Agents Arxiv: http://arxiv.org/abs/2502.14499v1 Abstract: We introduce Meta MLGym and MLGy...

Feb 22, 2025•26 min•Ep. 604

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

🤗 Upvotes: 82 | cs.CV, cs.AI Authors: Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen, Andreas Steiner, Xiaohua Zhai Title: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Arxiv: http://arxiv.org/abs/2502.14786v1 Abstract: We introduce SigLIP 2, a family of new multilingual vi...

Feb 22, 2025•25 min•Ep. 603

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

🤗 Upvotes: 81 | cs.CL Authors: M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, Kang Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin, Zhenlin Wei, Chujie Zheng, Kaixing Deng, Shuyue Guo, Shian Jia, Sichao Jiang, Yiyan Liao, Rui Li, Qinrui Li, Sirun Li, Yizhi Li, Yunwen Li, Dehua Ma, Yuansheng Ni, Haoran Que, Qiyao Wang, Zhoufutu Wen, Siwei Wu, Tianshun Xing, Ming Xu, Zhenzhu Yang, Zekun Moore Wang, Junting Zhou, Yuelin Bai, Xingyuan Bu, Chenglin Cai, Liang Chen, Yifan Chen,...

Feb 22, 2025•24 min•Ep. 602

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

🤗 Upvotes: 51 | cs.CL Authors: Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhail Salnikov Title: How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Arxiv: http://arxiv.org/abs/2502.14502v1 Abstract: The performance of Large Language Models (LLMs) on many tasks is greatly limited by the knowledge learned during pre-training and stored in the model's parameters. Low-rank adaptation (LoRA) is a popular and effic...

Feb 22, 2025•23 min•Ep. 601

S*: Test Time Scaling for Code Generation

🤗 Upvotes: 39 | cs.LG, cs.AI Authors: Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica Title: S*: Test Time Scaling for Code Generation Arxiv: http://arxiv.org/abs/2502.14382v1 Abstract: Increasing test-time compute for LLMs shows promise across domains but remains underexplored in code generation, despite extensive study in math. In this paper, we propose S*, the first hybrid test-time scaling framework that substantially i...

Feb 22, 2025•24 min•Ep. 600

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

🤗 Upvotes: 24 | cs.CL, cs.AI Authors: Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo Title: Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Arxiv: http://arxiv.org/abs/2502.14768v1 Abstract: Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data du...

Feb 22, 2025•25 min•Ep. 599

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

🤗 Upvotes: 22 | quant-ph, cs.AI, cs.IT, cs.LG, math.IT Authors: Austin Yubo He, Zi-Wen Liu Title: Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning Arxiv: http://arxiv.org/abs/2502.14372v1 Abstract: The realization of scalable fault-tolerant quantum computing is expected to hinge on quantum error-correcting codes. In the quest for more efficient quantum fault tolerance, a critical code parameter is the weight of measurements that extract informat...

Feb 22, 2025•21 min•Ep. 598

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

🤗 Upvotes: 20 | cs.CV, cs.AI, cs.CL Authors: Shangqing Tu, Yucheng Wang, Daniel Zhang-Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li Title: LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Arxiv: http://arxiv.org/abs/2502.14834v1 Abstract: Existing Large Vision-Language Models (LVLMs) can process inputs with context lengths up to 128k visual and text tokens, yet they struggle to generate coherent outputs beyond 1,00...

Feb 22, 2025•20 min•Ep. 597

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

🤗 Upvotes: 18 | cs.CL, cs.AI Authors: Yein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong, Jaewoo Kang Title: Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Arxiv: http://arxiv.org/abs/2502.14258v1 Abstract: While the ability of language models to elicit facts has been widely investigated, how they handle temporally changing facts remains underexplored. We discover Temporal Heads, specific attention heads primarily responsible for processing ...

Feb 22, 2025•23 min•Ep. 596

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

🤗 Upvotes: 15 | cs.CL, cs.LG Authors: Ruotian Ma, Peisong Wang, Cheng Liu, Xingyan Liu, Jiaqi Chen, Bang Zhang, Xin Zhou, Nan Du, Jia Li Title: S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Arxiv: http://arxiv.org/abs/2502.12853v1 Abstract: Recent studies have demonstrated the effectiveness of LLM test-time scaling. However, existing approaches to incentivize LLMs' deep thinking abilities generally require large-scale data or significant training efforts. Mean...

Feb 22, 2025•23 min•Ep. 595

Qwen2.5-VL Technical Report

🤗 Upvotes: 97 | cs.CV, cs.CL Authors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, Junyang Lin Title: Qwen2.5-VL Technical Report Arxiv: http://arxiv.org/abs/2502.13923v1 Abstract: We introduce Qwen2.5-VL, the latest flagship model of...

Feb 21, 2025•21 min•Ep. 594

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

🤗 Upvotes: 31 | cs.CV, cs.RO Authors: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang Title: RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning Arxiv: http://arxiv.org/abs/2502.13144v1 Abstract: Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such...

Feb 21, 2025•21 min•Ep. 593

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

🤗 Upvotes: 28 | cs.SD, cs.AI Authors: Zihan Liu, Shuangrui Ding, Zhixiong Zhang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang Title: SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Arxiv: http://arxiv.org/abs/2502.13128v1 Abstract: Text-to-song generation, the task of creating vocals and accompaniment from textual inputs, poses significant challenges due to domain complexity and data scarcity. Existing approaches often employ multi-stage...

Feb 21, 2025•24 min•Ep. 592

← Prev Next →

Hosted on Transistor

For the best experience, listen in Metacast app for iOS or Android