This research introduces A-PO *, a new reinforcement learning approach for refining large language models to enhance their reasoning capabilities. Unlike existing methods that are often computationally expensive and memory-intensive due to requiring multiple generations per prompt or explicit critic networks, A*-PO streamlines the process. It accomplishes this by initially estimating the optimal value function offline using samples from a reference policy, then performing on-policy updates with ...
May 31, 2025•23 min
This paper proposes HulC , a statistically sound and computationally efficient method for constructing confidence regions from the output of online algorithms , such as Stochastic Gradient Descent (SGD) . Unlike traditional methods like the Wald interval , which require multiple passes over the data and can be computationally expensive, HulC provides rate-optimal and asymptotically valid confidence intervals without needing to estimate the asymptotic variance . The paper presents theoretical ana...
May 31, 2025•15 min
This paper investigates how data diversity impacts the generalization of large language models (LLMs) , particularly in reasoning tasks. The authors introduce G-Vendi, a novel metric that quantifies diversity based on the entropy of model-induced gradients , showing a strong correlation with out-of-distribution performance. Building on this, they propose Prismatic Synthesis, a framework for generating diverse synthetic data by focusing on underrepresented gradient space regions. Experiments demo...
May 31, 2025•19 min
This position paper argues for a reassessment of uncertainty quantification in large language model (LLM) agents . The authors contend that the traditional division between aleatoric (irreducible) and epistemic (reducible) uncertainty is insufficient for the interactive nature of LLM agents, especially given their propensity to produce incorrect outputs. They highlight how existing definitions of these uncertainties can be conflicting and fail to apply effectively in dynamic conversational setti...
May 31, 2025•25 min
This paper from Microsoft Research discusses the potential for generative AI to transform the economy by enabling an agentic economy , where AI agents act on behalf of users and businesses to facilitate transactions. The authors argue that the most significant impact of AI will be in reducing communication frictions between consumers and businesses, which could lead to markets being reorganized and power redistributed. They explore the distinction between unscripted (technical) and unrestricted ...
May 30, 2025•39 min
This academic paper explores the critical role of statistical foundations in the development and application of Large Language Models (LLMs) . The author argues that LLMs are fundamentally statistical due to their reliance on vast datasets and their probabilistic generation processes . Their "black-box" nature , stemming from complexity and scale, further necessitates statistical methods as purely mechanistic analyses are often impractical. The paper highlights specific areas where statistical a...
May 29, 2025•19 min
This academic paper presents Bayes-adaptive Monte-Carlo Planning (BAMCP) , a novel algorithm designed to tackle the computational challenges of Bayesian model-based reinforcement learning . The core idea is to use Monte-Carlo tree search within a modified framework that avoids the computationally expensive posterior belief updates at every step within the search tree. Instead, BAMCP employs root sampling , where a single model is sampled from the posterior distribution at the start of each simul...
May 29, 2025•21 min
This paper explores how to enhance Large Language Model (LLM) reasoning by moving beyond conventional reinforcement learning (RL) methods. Standard RL confines exploration to the training phase and relies solely on the current state, failing to fully utilize reflective reasoning at test time. The authors propose Bayes-Adaptive RL (BARL) , a framework that explicitly optimizes for test-time generalization by maintaining uncertainty over potential solutions and updating beliefs based on observed o...
May 29, 2025•22 min
This paper introduces Planning with a Natural Language Critic (PNLC) , a novel approach for improving the planning capabilities of large language models (LLMs) in complex interactive tasks without relying on computationally expensive reinforcement learning (RL) fine-tuning or extensive inference-time search. PNLC trains a lightweight, goal-conditioned value function offline that predicts the likelihood of various future outcomes based on a proposed thought or strategy by the LLM agent. During in...
May 29, 2025•25 min
This paper introduces Value-Guided Search (VGS) , a novel method for improving the reasoning capabilities and efficiency of large language models (LLMs) on complex tasks like competition math. Unlike prior methods that rely on fine-grained, step-by-step feedback, VGS uses a token-level value model trained on large datasets of reasoning traces. This model guides a block-wise search process, selecting the most promising continuations at intervals rather than individual steps. The paper demonstrate...
May 29, 2025•18 min
This academic paper investigates the phenomenon of shallow preference signals in large language models (LLMs), where the critical information for determining human preference in a response is often found in the early tokens. Experiments show that training reward models and Direct Preference Optimization (DPO) models on truncated preference datasets, using only the initial part of responses, can achieve performance comparable to or even better than using full datasets, suggesting efficiency gains...
May 29, 2025•14 min
This academic paper explores a significant vulnerability in how large language models (LLMs) select and use external tools , which are crucial for their agentic capabilities. The research demonstrates that subtle modifications to a tool's natural language description , without altering its function, can dramatically influence whether an LLM chooses to use it, sometimes by a factor of over 10 times. Through experiments testing various descriptive changes, including assertive cues, claims of activ...
May 29, 2025•19 min
This paper explores whether artificial agents can develop an understanding of their partners' abilities during collaborative tasks without being explicitly programmed to do so. Researchers trained recurrent neural network (RNN) agents to play a cooperative game called "Overcooked-AI" with a variety of partners having different skill levels. The study found that these agents developed structured internal representations of their partners' task abilities , allowing them to adapt and generalize to ...
May 29, 2025•13 min
This academic article investigates the emergence of social conventions in populations of large language models (LLMs) , exploring whether these AI agents can establish shared behaviors through interaction. The research demonstrates that LLM populations spontaneously develop these conventions, similar to human groups, and that collective biases can arise during this process, even without individual agent bias. Furthermore, the study examines the impact of minority groups of adversarial agents, re...
May 29, 2025•16 min
This paper examines the potential and significant challenges of using large language models (LLMs) to create realistic digital personas for simulating human behavior in fields like social science and marketing . The authors categorize existing persona generation methods based on how much LLM-generated content is used, from simple structured data ( Meta Personas ) to highly detailed, freeform descriptions ( Descriptive Personas ). Through large-scale experiments, including simulating U.S. electio...
May 29, 2025•18 min
Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions This academic paper introduces a novel and extensive dataset of over 2,000 human participants who answered more than 500 questions covering demographics, psychological traits, cognitive abilities, and economic behaviors across four waves of data collection. The dataset is designed to validate and improve the use of Large Language Models (LLMs) in creating "digital twins" that simul...
May 29, 2025•21 min
We discuss the evolving role of Reinforcement Learning (RL) in Large Language Models (LLMs) . Initially, RL was primarily used as a distillation technique to align LLM outputs with preferences and improve performance on verifiable tasks by leveraging LLMs' ability to verify outputs better than generate them. However, the rise of LLM-based agents marks a shift where RL enables agents to learn autonomous behaviors for complex tasks in dynamic environments, moving from refining static output to lea...
May 29, 2025•27 min
We discuss the importance of prompts in human-Large Language Model (LLM) interaction, describing them as the primary interface through which users communicate intent and guide model behavior. However, they also critically examine the limitations of this prompt-centric approach , highlighting issues like LLM hallucinations, bias, reasoning struggles , and the "brittleness" of prompts that makes interaction unreliable. The papers explore the rise of auto-prompting as an attempt to automate prompt ...
May 29, 2025•17 min
We discuss the concept of textual gradient-based optimization for Large Language Model (LLM) workflows , contrasting it with traditional embedding space methods like soft prompt tuning. They introduce TextGrad as a foundational framework and focus heavily on LLM-AutoDiff as an advanced system designed to optimize complex, multi-component LLM applications represented as graphs, including features like pass-through and time-sequential gradients . While academic papers highlight its practical appli...
May 29, 2025•24 min
This academic paper explores the theoretical underpinnings of large language models (LLMs) , particularly their generalization abilities. The authors propose an equivalence between autoregressive transformer-based LLMs and finite-state Markov chains as a framework for analysis. They use this framework to examine LLM inference, generalization during pre-training on dependent data, and in-context learning on Markov chains, deriving sample complexity and generalization bounds . Experimental results...
May 28, 2025•16 min
This research explores Chain-of-Thought (CoT) reasoning in large language models by viewing it as a metastable Markov process . The authors model easy reasoning steps as dense clusters and hard steps as sparse connections, proving that search strategies rewarding these sparse edges improve efficiency by reducing the time to navigate between concept clusters. The study demonstrates that information from search can be used to fine-tune pretrained models through reinforcement learning and distill t...
May 28, 2025•21 min
This research explores how transformers adapt to changing causal structures in data , which is crucial for understanding their success in language processing. They introduce a new test using interleaved Markov chains with varying "lags" or dependencies. The paper shows that a three-layer transformer can learn to identify the correct lag and predict the next token , a process termed selective induction heads . A detailed construction for how attention weights achieve this is provided, demonstrati...
May 28, 2025•14 min
This document studies how transformers learn to predict sequential patterns in context , focusing on Markov Chains , a fundamental type of sequence. The research introduces a task called ICL-MC to investigate this, where models must learn from input sequences generated by different Markov Chains. The findings indicate that transformers develop "statistical induction heads" capable of calculating next-token probabilities based on the sequence's history, achieving near-optimal performance. Notably...
May 28, 2025•14 min
This research investigates how transformers learn causal structure through gradient descent , focusing on their ability to perform in-context learning. The authors introduce a novel task involving random sequences with latent causal relationships and analyze a simplified two-layer transformer architecture. They demonstrate theoretically that gradient descent on the first attention layer recovers this hidden causal graph by computing a measure of mutual information between tokens. This learned ca...
May 28, 2025•14 min
This document introduces LLMFP, a framework for using Large Language Models (LLMs) to solve complex planning tasks by converting them into optimization problems solvable by formal planners like SMT solvers. It details the five-step process of LLMFP, which involves LLMs defining the problem, formulating variables, generating code, formatting results, and performing self-assessment and modification. The paper presents experimental results across nine diverse planning problems, demonstrating LLMFP'...
May 28, 2025•20 min
This paper introduces Automated Design of Agentic Systems (ADAS) , a new research area aiming to automatically invent building blocks and design powerful agentic systems , arguing that this could be a faster and more effective approach than manual design, drawing parallels to the history of machine learning where learned solutions surpassed hand-designed ones. The paper proposes and evaluates Meta Agent Search , an algorithm where a "meta" agent iteratively programs new agents in code , learns f...
May 28, 2025•14 min
This paper investigates the mathematical basis of large language model (LLM) prompting by framing LLMs as discrete stochastic dynamical systems and employing control theory. The authors formalize LLM systems and introduce concepts of controllability and reachability in this context. They present a Self-Attention Control Theorem that provides a theoretical limit on controlling self-attention outputs based on singular values of parameter matrices. Empirical results demonstrate that short prompts c...
May 28, 2025•14 min
This academic paper from the University of Chicago addresses the problem of aligning large language models (LLMs) with human preferences. The authors analyze best-of-n sampling , a technique where an LLM generates multiple responses and selects the best one, finding it to be nearly optimal for maximizing win rate while minimizing changes to other aspects of the output. To avoid the computational cost of repeated sampling, they introduce BoNBoN Alignment , a novel method for fine-tuning LLMs to m...
May 27, 2025•21 min
This paper proposes a novel Bayesian inference perspective for understanding and improving fine-tuning methods for large language models (LMs). The authors argue that traditional Reinforcement Learning (RL) approaches, when applied naively, lead to distribution collapse , where the LM generates only a limited set of high-reward outputs. They demonstrate that the commonly used KL-regularized RL objective , which adds a penalty for deviating from the original LM distribution, is equivalent to vari...
May 27, 2025•13 min
This academic paper investigates language model alignment , a process of adjusting a base language model to better align with desired outcomes, often guided by a reward model . It specifically examines two common alignment methods: KL-constrained reinforcement learning (RL) , which maximizes reward while limiting divergence from the original model, and best-of-N selection , where the highest-reward output from multiple samples is chosen. Under simplifying assumptions about the language and rewar...
May 27, 2025•15 min