This paper introduces a novel two-player reinforcement learning (RL) framework, RLAD, designed to enhance the reasoning capabilities of large language models (LLMs). This framework jointly trains an **abstraction generator** and an **abstraction-conditioned solution generator** to propose and utilize **concise natural language descriptions of procedural and factual knowledge** called "reasoning abstractions." The core objective is to move beyond conventional chain-of-thought methods, which often...
Oct 29, 2025•16 min
The academic paper introduces **ADVISOR MODELS**, a novel framework for dynamically steering the behavior of rigid, **black-box Large Language Models (LLMs)** that are only accessible via an API. Unlike static prompting methods, this approach employs a second, lightweight model, the "advisor," which is trained using **reinforcement learning (RL)** to generate instance-specific, natural language advice for the main LLM. The research demonstrates that this method excels at personalization and adap...
Oct 29, 2025•13 min
This is research paper introduces and evaluates a novel framework called Test-Time Self-Improvement (TT-SI) for large language model (LLM) agents. This approach focuses on improving model performance efficiently during inference by adapting to challenging examples on the fly. The method involves three key steps: Self-Awareness (identifying uncertain test inputs), Self-Data Augmentation (generating similar training examples from these uncertain inputs), and Self-Improvement (performing a lightwei...
Oct 27, 2025•23 min
The academic paper investigates the common belief that Kullback-Leibler (KL) regularized reinforcement learning (RL) objectives, particularly when used for post-training large language models (LLMs), inherently promote or inhibit output diversity based on the choice between reverse and forward KL divergence. The authors challenge this intuition, demonstrating both mathematically and empirically that mode coverage and diversity primarily depend on factors like regularization strength and the rela...
Oct 27, 2025•16 min
The research paper explores how Large Language Models (LLMs) utilize their depth during inference, proposing a "Guess-then-Refine" framework to explain layer-wise prediction dynamics. The authors use the TunedLens method to trace intermediate representations, revealing that early layers function as "statistical guessers" by promoting high-frequency tokens as initial predictions due to limited contextual information. As processing continues through deeper layers, these initial guesses undergo "ma...
Oct 27, 2025•12 min
The academic paper proposes "thought communication," a new paradigm for multi-agent collaboration that allows large language models (LLMs) to exchange latent thoughts directly, akin to telepathy, instead of relying on lossy natural language. The authors formalize this process using a latent variable model where agent states are generated from underlying thoughts, proving that both shared and private thoughts can be mathematically identified. Guided by this theory, the proposed THOUGHTCOMM framew...
Oct 27, 2025•17 min
The research paper titled "Reasoning with Sampling: Your Base Model is Smarter Than You Think" by Harvard researchers introduces a novel, training-free iterative sampling algorithm inspired by Markov Chain Monte Carlo (MCMC) techniques to enhance the reasoning capabilities of large language models (LLMs) at inference time. This method, termed "Power Sampling," leverages the base model's own likelihoods to simulate sampling from a "power distribution," which sharpens the distribution toward highe...
Oct 26, 2025•16 min
The paper by Meta and Berkeley proposes a novel approach to address catastrophic forgetting in large language models (LLMs) during continual learning, introducing sparse memory finetuning. This method utilizes memory layer models, which are designed for sparse parameter updates, to selectively update only the memory slots that are highly activated by new knowledge relative to existing, pre-training data, using a TF-IDF ranking mechanism. The authors evaluate this technique against full finetunin...
Oct 26, 2025•14 min
The academic paper claims that pairwise-comparison-based RLHF is incapable of learning heterogeneous preferences, whereas tenary comparisons can. They propose **Expectation-Maximization Direct Preference Optimization (EM-DPO)**, a clustering algorithm that discovers latent user preference groups and trains an ensemble of specialized LLMs for each group. Crucially, the authors establish a theoretical link to econometrics, arguing that **binary comparisons are insufficient** for identifying hetero...
Oct 24, 2025•12 min
This paper provides a theoretical analysis of next-token prediction in language models, introducing the concept of the coverage profile ($\text{Cov}_N$) as a superior metric to cross-entropy for predicting downstream performance with Best-of-N (BoN) sampling. The authors establish a "coverage principle," demonstrating that maximum likelihood, or next-token prediction, implicitly optimizes the coverage profile, leading to faster generalization that avoids the spurious dependence on sequence lengt...
Oct 24, 2025•16 min
This paper introduces Reinforcement Learning from Human Interaction (RLHI), a new method for aligning large language models by learning directly from in-the-wild user conversations rather than expert-annotated data. This paradigm is built on two complementary approaches: User-Guided Rewrites, which leverage users' natural language follow-ups to revise unsatisfactory model outputs, and User-Based Rewards, which uses a reward model conditioned on a user's long-term interaction history (persona) to...
Oct 24, 2025•14 min
This paper discusses the "early experience" paradigm as a method for training autonomous language agents, aiming to bridge the gap between reward-free Imitation Learning (IL) and reward-dependent Reinforcement Learning (RL). This novel approach allows agents to learn from their own generated interactions, or "experience," without needing explicit external rewards, addressing a major challenge in real-world environments where dense feedback is often unavailable. The paper explores two core strate...
Oct 24, 2025•13 min
This paper examines emergent exploration in reinforcement learning, specifically using a goal-conditioned contrastive learning algorithm called SGCRL. The authors employ methodologies inspired by cognitive science, such as rational analysis and controlled intervention experiments, to analyze the implicit drivers of agent behavior in this reward-free setting. They demonstrate both theoretically and empirically that SGCRL's exploration is driven by an intrinsic reward signal based on representatio...
Oct 22, 2025•15 min
This paper introduces an experimental recipe for interventional analyses designed to study how training data specifically affects the behavior of language models (LMs). This methodology, termed "Rewriting History," involves a three-stage process: selecting target evaluation items, matching relevant pretraining documents to those items, and then modifying those documents before retraining the model to measure the effects. The authors demonstrate the utility of this approach through case studies o...
Oct 22, 2025•19 min
This paper attempts to provide a comprehensive framework grounded in the Cattell-Horn-Carroll (CHC) theory of cognitive abilities, which breaks down general intelligence into ten core, equally weighted cognitive components, such as General Knowledge (K), Mathematical Ability (M), and various forms of Memory (WM, MS, MR). The paper details specific, measurable tasks and examples—often drawn from human-centric tests like AP exams or psychometric assessments—to evaluate AI performance in each area....
Oct 22, 2025•16 min
This paper introduces a new formal framework called Learning from Language Feedback (LLF), which addresses the challenge of training AI agents, particularly large language models (LLMs), using rich natural language critiques and guidance instead of traditional scalar rewards. The authors formalize the LLF problem and introduce the transfer eluder dimension as a complexity measure to quantify how effectively language feedback reduces uncertainty about latent rewards, demonstrating cases where lea...
Oct 21, 2025•20 min
This paper introduces In-Context Pure Exploration (ICPE), a Transformer-based architecture designed to efficiently solve active sequential hypothesis testing problems, also known as pure exploration. ICPE meta-trains a model to map observation histories to actions and predicted hypotheses, enabling in-context learning to actively gather data and infer the correct hypothesis on new tasks without requiring parameter updates. The paper frames this as splitting the process into a supervised inferenc...
Oct 21, 2025•17 min
This academic paper investigates the concept of Preference Variance (PVar) as a metric for improving the efficiency of Direct Preference Optimization (DPO), a method for aligning large language models (LLMs) with human feedback. The authors establish a theoretical foundation demonstrating that the magnitude of the DPO training gradient is bounded by the PVar of a given prompt, meaning prompts with low PVar contribute minimally to learning. Experimentally, the paper validates that training LLMs u...
Oct 20, 2025•15 min
This rsearch paper by Sergey Levine's group introduces a self-supervised method for fine-tuning Large Language Model (LLM) agents to be more effective and aligned assistants, particularly in code generation. The core idea is to train agents to maximize the human's empowerment, defined as the user's ability to effect desired changes in the environment, rather than relying on costly explicit human feedback or inferred rewards. The paper details the mathematical connection between their Logit Thres...
Oct 20, 2025•14 min
Once again, instead of introducing new research, we discuss what Richard Sutton, a pioneer of reinforcement learning (RL), controversially argues that large language models (LLMs) represent a "dead end" for achieving general intelligence. Sutton contends that LLMs, which rely on imitation learning from vast datasets of human text, lack the ability to learn continuously "from experience" or possess a meaningful goal or ground truth, which are core to RL and animal intelligence. The conversation, ...
Oct 20, 2025•13 min
The research paper systematically investigates how reinforcement learning (RL) can enhance the agentic reasoning capabilities of Large Language Models (LLMs), particularly in tool-integrated environments. The authors conduct a comprehensive empirical study across three main dimensions: data curation, algorithmic design, and reasoning mode to demystify optimal practices for agentic RL. Key findings include that real end-to-end trajectories are crucial for strong Supervised Fine-Tuning (SFT) initi...
Oct 19, 2025•15 min
This paper introduces an **information-theoretic framework** designed to determine when multi-agent Large Language Model (LLM) systems transition from simple aggregates to integrated, synergistic collectives. The research utilizes a **group guessing game without direct communication** to experimentally test how different prompt designs—specifically, a control condition, assigning agent **personas**, and adding a **Theory of Mind (ToM)** instruction—influence emergent coordination. Findings sugge...
Oct 19, 2025•14 min
This paper introduces Learning-to-Measure (L2M) to address the challenges of meta-Active Feature Acquisition (meta-AFA), a sequential decision-making problem. Traditional AFA methods often struggle with scalability because they are designed for a single task and fail when trained on retrospective data containing systematic missingness in features. L2M overcomes these limitations by formalizing the meta-AFA problem to allow learning acquisition policies across diverse tasks and leveraging a pre-t...
Oct 19, 2025•16 min
Today, instead of introducing new research, we go deeper into Andrej Karpathy's insights. In his recent interview, he presents his perspectives on the current state and future of Artificial General Intelligence (AGI) and Large Language Models (LLMs). Karpathy argues that AGI is still about a decade away, asserting that the challenges, while tractable, are difficult and require incremental progress across many domains, including better datasets, hardware, and algorithms. He frequently contrasts c...
Oct 19, 2025•16 min
This research paper, by authors affiliated with NVIDIA, Carnegie Mellon University, Boston University, and Stanford University, focuses on the optimal strategy for incorporating reasoning data into Large Language Model (LLM) training. The central finding challenges the conventional approach of relying solely on post-training, demonstrating that "front-loading" reasoning data during the pretraining phase is critical, yielding a durable 19% average performance gain on expert-level tasks. The resea...
Oct 18, 2025•13 min
This paper investigates the effectiveness of deliberate exploration in enhancing the reasoning capabilities of large language models (LLMs) trained with reinforcement learning (RL). The authors propose and evaluate a novel representation-based exploration (RepExp) strategy, which uses a bonus derived from the LLM's hidden states to encourage the discovery of diverse and novel behaviors. The study employs a two-pronged evaluation methodology, first testing RepExp in an inference-time setting for ...
Oct 18, 2025•17 min
The academic paper discusses the critical flaws in current methods used to evaluate the robustness of large language model (LLM) defenses against jailbreaks and prompt injections. The authors argue that testing defenses with static or computationally weak attacks yields a false sense of security, as demonstrated by the fact that they successfully bypassed twelve different recent defenses with an attack success rate exceeding 90% in most cases. Instead, they propose that robustness must be measur...
Oct 18, 2025•16 min
The research empirically investigates the role of pretraining distribution and a new concept of task diversity in the emergence of ICL, particularly using models trained on linear functions. Findings indicate that increasing task diversity causes transformers to shift from a specialized solution to one that can generalize across the entire task space, a transition also observed in nonlinear regression problems. The authors constructed a phase diagram to characterize how task diversity and the nu...
Oct 16, 2025•20 min
This paper studies scaling reinforcement learning (RL) compute for large language models (LLMs), introducing a principled framework to predict performance. The authors develop ScaleRL, a best-practice recipe derived from ablating various algorithmic choices, and demonstrate its predictable scaling trajectory using a sigmoidal function to fit compute-performance curves. Accompanying figures illustrate validation performance over increasing GPU hours (log scale) for different RL configurations, sh...
Oct 16, 2025•14 min
This white paper by Anthropic, UK AI Security Institute , and The Alan Turing Institute demonstrates that a small, fixed number of malicious documents —as few as 250—can successfully create a "backdoor" vulnerability in LLMs, regardless of the model's size or the total volume of clean training data. This finding challenges the previous assumption that attackers need to control a percentage of the training data, suggesting that these poisoning attacks are more practical and accessible than previo...
Oct 16, 2025•14 min