π€ Upvotes: 27 | cs.CL, cs.LG Authors: Shangshang Wang, Julian Asilis, Γmer Faruk AkgΓΌl, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger Title: Tina: Tiny Reasoning Models via LoRA Arxiv: http://arxiv.org/abs/2504.15777v1 Abstract: How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning perfor...
Apr 25, 2025β’23 minβ’Ep. 711
π€ Upvotes: 24 | cs.LG, cs.AI, cs.CV, cs.IT, math.IT Authors: Shaden Alshammari, John Hershey, Axel Feldmann, William T. Freeman, Mark Hamilton Title: I-Con: A Unifying Framework for Representation Learning Arxiv: http://arxiv.org/abs/2504.16929v1 Abstract: As the field of representation learning grows, there has been a proliferation of different loss functions to solve different classes of problems. We introduce a single information-theoretic equation that generalizes a large collection of mode...
Apr 25, 2025β’21 minβ’Ep. 710
π€ Upvotes: 94 | cs.CL, cs.AI Authors: Khalil Hennara, Sara Chrouf, Mohamed Motaism Hamed, Zeina Aldallal, Omar Hadid, Safwan AlModhayan Title: Kuwain 1.5B: An Arabic SLM via Language Injection Arxiv: http://arxiv.org/abs/2504.15120v1 Abstract: Enhancing existing models with new knowledge is a crucial aspect of AI development. This paper introduces a novel method for integrating a new language into a large language model (LLM). Our approach successfully incorporates a previously unseen target la...
Apr 24, 2025β’20 minβ’Ep. 709
π€ Upvotes: 60 | cs.CL, cs.LG Authors: Yuxin Zuo, Kaiyan Zhang, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, Bowen Zhou Title: TTRL: Test-Time Reinforcement Learning Arxiv: http://arxiv.org/abs/2504.16084v1 Abstract: This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference while not having access to ground-truth i...
Apr 24, 2025β’25 minβ’Ep. 708
π€ Upvotes: 51 | cs.CL Authors: Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang Title: The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Arxiv: http://arxiv.org/abs/2504.15521v1 Abstract: As large language models (LLMs) continue to advance in linguistic capabilities, robust multilingual evaluation has become essential for promoting equitable technological progress. This position paper examines over 2,000 mul...
Apr 24, 2025β’22 minβ’Ep. 707
π€ Upvotes: 42 | cs.CV, cs.AI Authors: Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui Title: Describe Anything: Detailed Localized Image and Video Captioning Arxiv: http://arxiv.org/abs/2504.16072v1 Abstract: Generating detailed and accurate descriptions for specific regions in images and videos remains a fundamental challenge for vision-language models. We introduce the Describe Anything Model (DAM), a model designe...
Apr 24, 2025β’24 minβ’Ep. 706
π€ Upvotes: 35 | cs.AI, cs.CL Authors: Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr Title: Learning Adaptive Parallel Reasoning with Language Models Arxiv: http://arxiv.org/abs/2504.15466v1 Abstract: Scaling inference-time computation has substantially improved the reasoning capabilities of language models. However, existing methods have significant limitations: serialized chain-of-thought approaches generate overly long outputs, ...
Apr 24, 2025β’21 minβ’Ep. 705
π€ Upvotes: 59 | cs.LG, cs.AI, cs.CL Authors: Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang Title: Learning to Reason under Off-Policy Guidance Arxiv: http://arxiv.org/abs/2504.14945v2 Abstract: Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning (RL) with simple rule-based rewards. However, existing zero-RL approaches are inherently `...
Apr 23, 2025β’22 minβ’Ep. 704
π€ Upvotes: 50 | cs.CV Authors: Guo Chen, Zhiqi Li, Shihao Wang, Jindong Jiang, Yicheng Liu, Lidong Lu, De-An Huang, Wonmin Byeon, Matthieu Le, Tuomas Rintamaki, Tyler Poon, Max Ehrlich, Tuomas Rintamaki, Tyler Poon, Tong Lu, Limin Wang, Bryan Catanzaro, Jan Kautz, Andrew Tao, Zhiding Yu, Guilin Liu Title: Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Arxiv: http://arxiv.org/abs/2504.15271v1 Abstract: We introduce Eagle 2.5, a family of frontier vision-langua...
Apr 23, 2025β’20 minβ’Ep. 703
π€ Upvotes: 36 | cs.AI Authors: Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, Tianyu Pang Title: FlowReasoner: Reinforcing Query-Level Meta-Agents Arxiv: http://arxiv.org/abs/2504.15257v1 Abstract: This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretel...
Apr 23, 2025β’18 minβ’Ep. 702
π€ Upvotes: 33 | cs.LG, cs.AI, cs.CL Authors: Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-TΓΌr, Gokhan Tur, Heng Ji Title: ToolRL: Reward is All Tool Learning Needs Arxiv: http://arxiv.org/abs/2504.13958v1 Abstract: Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement learning (RL), parti...
Apr 23, 2025β’24 minβ’Ep. 701
π€ Upvotes: 25 | cs.CR, cs.AI, cs.CL, cs.LG, cs.MA Authors: Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel Title: X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Arxiv: http://arxiv.org/abs/2504.13203v1 Abstract: Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet, the vast major...
Apr 23, 2025β’21 minβ’Ep. 700
π€ Upvotes: 21 | cs.CV Authors: Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, Shengqi Liu, Yiying Yang, Xianfang Zeng, Gang Yu, Ming Li Title: StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians Arxiv: http://arxiv.org/abs/2504.15281v1 Abstract: 3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented textures, semantic misalignment, and limited a...
Apr 23, 2025β’23 minβ’Ep. 699
π€ Upvotes: 64 | cs.AI, cs.CL, cs.CV Authors: Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang Title: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Arxiv: http://arxiv.org/abs/2504.13837v1 Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning capabilities of LLMs, particularly in mathematics and programming tasks. It is widely b...
Apr 22, 2025β’22 minβ’Ep. 698
π€ Upvotes: 31 | cs.CL, cs.AI Authors: Yicheng Chen, Yining Li, Kai Hu, Zerun Ma, Haochen Ye, Kai Chen Title: MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Arxiv: http://arxiv.org/abs/2504.13835v1 Abstract: Data quality and diversity are key to the construction of effective instruction-tuning datasets. % With the increasing availability of open-source instruction-tuning datasets, it is advantageous to automatically select high-quality and d...
Apr 22, 2025β’19 minβ’Ep. 697
π€ Upvotes: 30 | cs.AI Authors: Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, Lichao Sun Title: NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Arxiv: http://arxiv.org/abs/2504.11544v1 Abstract: Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich t...
Apr 22, 2025β’21 minβ’Ep. 696
π€ Upvotes: 69 | cs.CL Authors: Shizhe Diao, Yu Yang, Yonggan Fu, Xin Dong, Dan Su, Markus Kliegl, Zijia Chen, Peter Belcak, Yoshi Suhara, Hongxu Yin, Mostofa Patwary, Yingyan, Lin, Jan Kautz, Pavlo Molchanov Title: CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Arxiv: http://arxiv.org/abs/2504.13161v1 Abstract: Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like C...
Apr 19, 2025β’23 minβ’Ep. 695
π€ Upvotes: 52 | cs.AI, cs.CL Authors: Yash Savani, Asher Trockman, Zhili Feng, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter Title: Antidistillation Sampling Arxiv: http://arxiv.org/abs/2504.13146v1 Abstract: Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without comprom...
Apr 19, 2025β’18 minβ’Ep. 694
π€ Upvotes: 28 | cs.CV Authors: Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan Title: Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Arxiv: http://arxiv.org/abs/2504.13169v1 Abstract: Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-c...
Apr 19, 2025β’20 minβ’Ep. 693
π€ Upvotes: 24 | cs.CV Authors: Lvmin Zhang, Maneesh Agrawala Title: Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Arxiv: http://arxiv.org/abs/2504.12626v1 Abstract: We present a neural network structure, FramePack, to train next-frame (or next-frame-section) prediction models for video generation. The FramePack compresses input frames to make the transformer context length a fixed number regardless of the video length. As a result, we are able to process a lar...
Apr 19, 2025β’24 minβ’Ep. 692
π€ Upvotes: 23 | cs.CV Authors: Zeqi Xiao, Yushi Lan, Yifan Zhou, Wenqi Ouyang, Shuai Yang, Yanhong Zeng, Xingang Pan Title: WORLDMEM: Long-term Consistent World Simulation with Memory Arxiv: http://arxiv.org/abs/2504.12369v1 Abstract: World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in p...
Apr 19, 2025β’22 minβ’Ep. 691
π€ Upvotes: 23 | cs.CL, cs.AI, cs.LG Authors: Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Conghui He, Lijun Wu Title: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Arxiv: http://arxiv.org/abs/2504.12322v1 Abstract: While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental ineffic...
Apr 19, 2025β’24 minβ’Ep. 690
π€ Upvotes: 35 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Yijun Liang, Ming Li, Chenrui Fan, Ziyue Li, Dang Nguyen, Kwesi Cobbina, Shweta Bhardwaj, Jiuhai Chen, Fuxiao Liu, Tianyi Zhou Title: ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Arxiv: http://arxiv.org/abs/2504.10514v1 Abstract: Color plays an important role in human perception and usually provides critical clues in visual reasoning. However, it is unclea...
Apr 18, 2025β’22 minβ’Ep. 689
π€ Upvotes: 35 | cs.CL, cs.LG Authors: Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei Title: BitNet b1.58 2B4T Technical Report Arxiv: http://arxiv.org/abs/2504.12285v1 Abstract: We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical...
Apr 18, 2025β’20 minβ’Ep. 688
π€ Upvotes: 27 | cs.CL, cs.AI Authors: Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, Wanjun Zhong Title: ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Arxiv: http://arxiv.org/abs/2504.11536v2 Abstract: While reasoning models (e.g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computat...
Apr 18, 2025β’23 minβ’Ep. 687
π€ Upvotes: 63 | cs.CL Authors: Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li Title: xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Arxiv: http://arxiv.org/abs/2504.10481v1 Abstract: With the release of the o1 model by OpenAI, reasoning models adopting slow thinking strategies have gradually emerged. As the responses generated by such models often include complex reasoning, intermediate steps, and self-reflection...
Apr 17, 2025β’22 minβ’Ep. 686
π€ Upvotes: 41 | cs.CL, cs.AI, cs.LG Authors: Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Qiushi Sun, Kanzhi Cheng, Junxian He, Jun Liu, Zhiyong Wu Title: Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Arxiv: http://arxiv.org/abs/2504.08672v1 Abstract: Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, w...
Apr 17, 2025β’20 minβ’Ep. 685
π€ Upvotes: 30 | cs.LG, cs.AI, cs.CL Authors: Ming Li, Yanhong Li, Ziyue Li, Tianyi Zhou Title: How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Arxiv: http://arxiv.org/abs/2504.10766v1 Abstract: As the post-training of large language models (LLMs) advances from instruction-following to complex reasoning tasks, understanding how different data affect finetuning dynamics remains largely unexplored. In this paper, we present a spectral a...
Apr 17, 2025β’22 minβ’Ep. 684
π€ Upvotes: 28 | cs.AI, I.2.7 Authors: Wenlei Shi, Xing Jin Title: Heimdall: test-time scaling on the generative verification Arxiv: http://arxiv.org/abs/2504.10337v2 Abstract: An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself. Recent work on long Chain-of-Thought reasoning has demonstrated great potential of LLMs on solving competitive problems, but their verification ability remains to be weak and not sufficiently investigated. In this p...
Apr 17, 2025β’20 minβ’Ep. 683
π€ Upvotes: 23 | cs.CV Authors: Tao Zhang, Xiangtai Li, Zilong Huang, Yanwei Li, Weixian Lei, Xueqing Deng, Shihao Chen, Shunping Ji, Jiashi Feng Title: Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Arxiv: http://arxiv.org/abs/2504.10465v1 Abstract: Multimodal Large Language Models (MLLMs) achieve remarkable performance for fine-grained pixel-level understanding tasks. However, all the works rely heavily on extra components, such as vision encoder (CLIP), segmentation experts, ...
Apr 17, 2025β’21 minβ’Ep. 682