🤗 Upvotes: 43 | cs.CL, cs.GR, cs.MA Authors: Zhenran Xu, Longyue Wang, Jifang Wang, Zhouyi Li, Senbao Shi, Xue Yang, Yiyu Wang, Baotian Hu, Jun Yu, Min Zhang Title: FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Arxiv: http://arxiv.org/abs/2501.12909v1 Abstract: Virtual film production requires intricate decision-making processes, including scriptwriting, virtual cinematography, and precise actor positioning and actions. Motivated by recent advances in au...
Jan 24, 2025•25 min•Ep. 411
🤗 Upvotes: 42 | cs.CL Authors: Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng Title: Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Arxiv: http://arxiv.org/abs/2501.12895v1 Abstract: Large language models (LLMs) demonstrate impressive performance but lack the flexibility to adapt to human preferences quickly without retraining. In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns LLM outputs with human preference...
Jan 24, 2025•23 min•Ep. 410
🤗 Upvotes: 39 | cs.AI, cs.LG Authors: Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haozhen Yu, Hongcheng Gao, Huabin Zheng, Huan Yuan, Jia Chen, Jianhang Guo, Jianlin Su, Jianzhou Wang, Jie Z...
Jan 24, 2025•19 min•Ep. 409
🤗 Upvotes: 31 | cs.CL, cs.AI, cs.LG Authors: Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan Title: Autonomy-of-Experts Models Arxiv: http://arxiv.org/abs/2501.13074v1 Abstract: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet...
Jan 24, 2025•20 min•Ep. 408
🤗 Upvotes: 13 | cs.CL Authors: Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao Title: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Arxiv: http://arxiv.org/abs/2501.12570v1 Abstract: Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities a...
Jan 24, 2025•23 min•Ep. 407
🤗 Upvotes: 13 | cs.CL Authors: Yantao Liu, Zijun Yao, Rui Min, Yixin Cao, Lei Hou, Juanzi Li Title: Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Arxiv: http://arxiv.org/abs/2501.13007v1 Abstract: Best-of-N (BoN) sampling, a common strategy for test-time scaling of Large Language Models (LLMs), relies on reward models to select the best candidate solution from multiple generations. However, traditional reward models often assign arbitrary and inconsistent scores, limiting the...
Jan 24, 2025•22 min•Ep. 406
🤗 Upvotes: 7 | cs.CL, cs.AI, cs.LG Authors: Elad Levi, Ilan Kadar Title: IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems Arxiv: http://arxiv.org/abs/2501.11067v1 Abstract: Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, ...
Jan 24, 2025•25 min•Ep. 405
🤗 Upvotes: 3 | cs.CV, cs.AI, cs.GR, cs.RO Authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli Title: Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Arxiv: http://arxiv.org/abs/2501.13928v1 Abstract: Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading met...
Jan 24, 2025•22 min•Ep. 404
🤗 Upvotes: 61 | cs.AI Authors: Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, Jiecao Chen Title: Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Arxiv: http://arxiv.org/abs/2501.11425v1 Abstract: Large Language Models (LLMs) agents are increasingly pivotal for addressing complex tasks in interactive environments. Existing work mainly focuses on enhancing performance through behavior cloning from stronger experts, yet such approaches often falter in rea...
Jan 23, 2025•21 min•Ep. 403
🤗 Upvotes: 59 | cs.CV, cs.AI, cs.CL Authors: Yilun Zhao, Lujing Xie, Haowei Zhang, Guo Gan, Yitao Long, Zhiyuan Hu, Tongyan Hu, Weiyuan Chen, Chuhan Li, Junyang Song, Zhijian Xu, Chengye Wang, Weifeng Pan, Ziyao Shangguan, Xiangru Tang, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan Title: MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Arxiv: http://arxiv.org/abs/2501.12380v1 Abstract: We introduce MMVU, a comprehensive expert-level, multi-discipline benchmark for evaluatin...
Jan 23, 2025•25 min•Ep. 402
🤗 Upvotes: 51 | cs.LG, cs.CL Authors: Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou, Junyang Lin Title: Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Arxiv: http://arxiv.org/abs/2501.11873v1 Abstract: This paper revisits the implementation of $\textbf{L}$oad-$\textbf{b}$alancing $\textbf{L}$oss (LBL) when training Mixture-of-Experts (MoEs) models. Specifically, LBL for MoEs i...
Jan 23, 2025•24 min•Ep. 401
🤗 Upvotes: 32 | cs.CV Authors: Daniel Garibi, Shahar Yadin, Roni Paiss, Omer Tov, Shiran Zada, Ariel Ephrat, Tomer Michaeli, Inbar Mosseri, Tali Dekel Title: TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Arxiv: http://arxiv.org/abs/2501.12224v1 Abstract: We present TokenVerse -- a method for multi-concept personalization, leveraging a pre-trained text-to-image diffusion model. Our framework can disentangle complex visual elements and attributes from as little as ...
Jan 23, 2025•26 min•Ep. 400
🤗 Upvotes: 31 | cs.AI, cs.CL, cs.CV, cs.HC Authors: Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi Title: UI-TARS: Pioneering Automat...
Jan 23, 2025•21 min•Ep. 399
🤗 Upvotes: 26 | cs.CV, cs.CL Authors: Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Ziyu Liu, Shengyuan Ding, Shenxi Wu, Yubo Ma, Haodong Duan, Wenwei Zhang, Kai Chen, Dahua Lin, Jiaqi Wang Title: InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Arxiv: http://arxiv.org/abs/2501.12368v1 Abstract: Despite the promising performance of Large Vision Language Models (LVLMs) in visual understanding, they occasionally generate incorrect outputs. While reward models (RMs)...
Jan 23, 2025•21 min•Ep. 398
🤗 Upvotes: 20 | cs.CL, cs.CV Authors: Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Heng Ji Title: Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Arxiv: http://arxiv.org/abs/2501.11733v1 Abstract: Smartphones have become indispensable in modern life, yet navigating complex tasks on mobile devices often remains frustrating. Recent advancements in large multimodal model (LMM)-based mobile agents have demonstrated the ability to perceive and...
Jan 23, 2025•23 min•Ep. 397
🤗 Upvotes: 18 | cs.AI, cs.CL Authors: Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler Title: Reasoning Language Models: A Blueprint Arxiv: http://arxiv.org/abs/2501.11223v2 Abstract: Reasoning language models (RLMs), also known as Large Reasoning Models (LRMs), s...
Jan 23, 2025•22 min•Ep. 396
🤗 Upvotes: 16 | cs.CV Authors: Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Haohan Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Tianyu Huang, Lifu Wang, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Yulin Cai, Jiaao Yu, Yixuan Tang, Hao Zhang, Zheng Ye, P...
Jan 23, 2025•21 min•Ep. 395
🤗 Upvotes: 15 | cs.LG, cs.AI Authors: Hongjin Su, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan Ö. Arık Title: Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Arxiv: http://arxiv.org/abs/2501.10893v1 Abstract: Autonomous agents powered by large language models (LLMs) have the potential to enhance human capabilities, assisting with digital tasks from sending emails to performing data analysis. The abilities of existing LLMs at such tasks ar...
Jan 23, 2025•19 min•Ep. 394
🤗 Upvotes: 48 | cs.CV Authors: Jiwen Yu, Yiran Qin, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu Title: GameFactory: Creating New Games with Generative Interactive Videos Arxiv: http://arxiv.org/abs/2501.08325v1 Abstract: Generative game engines have the potential to revolutionize game development by autonomously creating new content and reducing manual workload. However, existing video-based game generation methods fail to address the critical challenge of scene generalization, limiting their...
Jan 22, 2025•23 min•Ep. 393
🤗 Upvotes: 8 | cs.CV Authors: Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin Title: VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Arxiv: http://arxiv.org/abs/2501.09781v1 Abstract: This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs). We develop VideoWorld, an auto-regressive video generation model tra...
Jan 22, 2025•19 min•Ep. 392
🤗 Upvotes: 2 | cs.AI, cs.CR Authors: Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung Song, Youngjae Yu Title: SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Arxiv: http://arxiv.org/abs/2501.09284v2 Abstract: Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especiall...
Jan 22, 2025•22 min•Ep. 391
🤗 Upvotes: 53 | cs.CL, cs.AI, cs.LG Authors: Zhenru Zhang, Chujie Zheng, Yangzhen Wu, Beichen Zhang, Runji Lin, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin Title: The Lessons of Developing Process Reward Models in Mathematical Reasoning Arxiv: http://arxiv.org/abs/2501.07301v1 Abstract: Process Reward Models (PRMs) emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), which aim to identify and mitigate intermediate errors in the ...
Jan 15, 2025•19 min•Ep. 390
🤗 Upvotes: 38 | cs.CL, cs.AI, cs.LG Authors: Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao Title: Tensor Product Attention Is All You Need Arxiv: http://arxiv.org/abs/2501.06425v1 Abstract: Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that use...
Jan 15, 2025•21 min•Ep. 389
🤗 Upvotes: 25 | cs.LG, cs.AI, cs.CL Authors: Qi Sun, Edoardo Cetin, Yujin Tang Title: $\text{Transformer}^2$: Self-adaptive LLMs Arxiv: http://arxiv.org/abs/2501.06252v2 Abstract: Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce $\text{Transformer}^2$, a novel self-adaptation framework that adapts LLMs for unseen tasks in rea...
Jan 15, 2025•27 min•Ep. 388
🤗 Upvotes: 21 | cs.CL, cs.AI, cs.HC, cs.SD, eess.AS Authors: Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou Title: Mi...
Jan 15, 2025•24 min•Ep. 387
🤗 Upvotes: 21 | cs.CV Authors: Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang Title: VideoAuteur: Towards Long Narrative Video Generation Arxiv: http://arxiv.org/abs/2501.06173v1 Abstract: Recent video generation models have shown promising results in producing high-quality video clips lasting several seconds. However, these models face challenges in generating long sequences that convey clear and informative events, limiting their ability to support ...
Jan 15, 2025•22 min•Ep. 386
🤗 Upvotes: 18 | cs.CL Authors: Zhongzhen Huang, Gui Geng, Shengyi Hua, Zhen Huang, Haoyang Zou, Shaoting Zhang, Pengfei Liu, Xiaofan Zhang Title: O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Arxiv: http://arxiv.org/abs/2501.06458v1 Abstract: Building upon our previous investigations of O1 replication (Part 1: Journey Learning [Qin et al., 2024] and Part 2: Distillation [Huang et al., 2024]), this work explores the potential of inference-time scaling in large la...
Jan 15, 2025•25 min•Ep. 385
🤗 Upvotes: 16 | cs.CL, cs.AI Authors: Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang Title: WebWalker: Benchmarking LLMs in Web Traversal Arxiv: http://arxiv.org/abs/2501.07572v2 Abstract: Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handl...
Jan 15, 2025•24 min•Ep. 384
🤗 Upvotes: 12 | cs.LG, cs.AI, cs.CL Authors: Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu Title: SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Arxiv: http://arxiv.org/abs/2501.06842v1 Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks, yet their training remains highly resource-intensive and susceptible to critical challenges such as training instability. A predominant source of this instability...
Jan 15, 2025•21 min•Ep. 383
🤗 Upvotes: 8 | cs.CV, cs.AI, cs.GR Authors: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny Title: UnCommon Objects in 3D Arxiv: http://arxiv.org/abs/2501.07574v1 Abstract: We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available collection of...
Jan 15, 2025•22 min•Ep. 382