This collection of excerpts introduces TEXTGRAD , a novel framework that applies the concept of automatic differentiation to complex AI systems composed of multiple large language models and other components. Instead of using numerical gradients like traditional deep learning, TEXTGRAD employs natural language feedback from LLMs to guide the optimization process. The framework, designed with PyTorch-like syntax for ease of use, transforms AI systems into computation graphs where LLMs provide tex...
May 21, 2025•18 min
We investigate the reliability of language model (LM) steering methods , which aim to modify model behavior without retraining. Researchers examined three techniques— DoLa, function vectors, and task vectors —on a wide range of LMs, finding that their effectiveness varies significantly across models and tasks. Contrary to prior research that suggested consistent performance or localization of function within models, this study reveals that these steering methods are often brittle , with assumpti...
May 20, 2025•17 min
This research presents Past Token Prediction (PTP) , an auxiliary technique designed to improve long-context diffusion policies for robots learning tasks through imitation. The core idea is to explicitly train the policy to predict past actions along with future ones, which helps address the issue of modern diffusion policies failing to capture strong temporal dependencies . A multi-stage training strategy is introduced, separating visual encoder training from long-context policy training using ...
May 20, 2025•16 min
This paper details research on recovering coherent event probabilities from Large Language Model (LLM) embeddings , as current LLMs often produce probability judgments in text that violate the axioms of probability theory. The authors propose a novel unsupervised learning method using a variational autoencoder (VAE) approach to enforce axiomatic constraints, specifically the additive rule for complementary events, within the latent space of LLM embeddings . They demonstrate that this method extr...
May 20, 2025•14 min
This academic paper proposes a method to improve the reasoning abilities of Large Reasoning Models (LRMs) by moving beyond inconsistent emergent behaviors. The authors introduce a system to explicitly train models in three key meta-abilities : deduction, induction, and abduction, using automatically generated, verifiable tasks. Their three-stage pipeline involves individual alignment of these abilities, merging them into a single model, and then applying domain-specific reinforcement learning . ...
May 20, 2025•17 min
This paper investigates how the predictability of the training environment influences the balance between two distinct learning modes in Transformer models: in-weights learning (IWL) , which is analogous to genetic encoding, and in-context learning (ICL) , which is compared to phenotypic plasticity. Drawing parallels from evolutionary biology, the authors explore how environmental stability (consistency of tasks) and cue reliability (clarity of in-context examples) affect which learning strategy...
May 20, 2025•22 min
This Google DeepMind paper investigates efficient exploration strategies for improving large language models (LLMs) through reinforcement learning from human feedback (RLHF) . The authors propose and evaluate various active exploration algorithms, contrasting them with passive methods. Their experiments, using a human preference simulator and the Gemini Nano model, demonstrate that active exploration , particularly using double Thompson sampling with epistemic neural networks (ENN) for uncertain...
May 19, 2025•14 min
This paper, authored by Google DeepMind researchers, explores the increasing reliance on large language models (LLMs) within information retrieval (IR) systems. It examines how LLMs function as rankers, judges, and assistants , powering various aspects from content creation to evaluation. The paper highlights the potential for biases to emerge from the interaction of these LLM-based components , providing empirical evidence that LLM judges exhibit a significant bias towards LLM-based rankers. Th...
May 18, 2025•26 min
This document introduces **BC-LLM**, a novel method for creating **Concept Bottleneck Models (CBMs)** that are both accurate and interpretable. Traditional CBMs rely on a predefined set of concepts, limiting their effectiveness and requiring significant human effort. **BC-LLM** addresses this by integrating **Large Language Models (LLMs)** within a **Bayesian framework** to iteratively discover relevant concepts. The LLMs serve as a **concept extraction mechanism** and provide **prior informatio...
May 17, 2025•22 min
This paper **explores the theoretical underpinnings of using transformer networks for in-context reinforcement learning (ICRL)**. The authors propose a **general framework for supervised pretraining in meta-RL**, encompassing existing methods like Algorithm Distillation and Decision-Pretrained Transformers. They demonstrate that transformers can **efficiently approximate classical RL algorithms** such as LinUCB, Thompson sampling, and UCB-VI, achieving near-optimal performance in various setting...
May 17, 2025•15 min
The sources discuss the importance of robust evaluation for Large Language Models (LLMs) throughout their lifecycle, highlighting a shift from traditional software testing methods due to the non-deterministic nature of LLMs. They cover various evaluation methodologies and metrics , including quantitative, qualitative, and the emerging "LLM-as-a-Judge" approach, while also acknowledging the limitations and biases inherent in these methods. The text outlines key challenges in LLM evaluation such a...
May 17, 2025•23 min
This research explores the challenge of learning human preferences over a large set of items using a limited number of ranked comparisons. The authors frame this as learning a Plackett-Luce model from K-way comparisons where K is much smaller than the total number of items. To address the computational complexity of selecting the most informative K-item subsets for comparison, they propose a novel algorithm called DopeWolfe , a randomized variant of the Frank-Wolfe method. DopeWolfe leverages ef...
May 16, 2025•13 min
This research explores efficient methods for gathering human feedback to build accurate preference models, crucial for modern AI development, especially large language models. Focusing on optimal designs , a technique for determining the most informative data to collect, the paper adapts this approach to lists of items representing potential questions and answers. The methodology is shown to be effective for both absolute feedback , where humans provide scores for items, and ranking feedback , w...
May 16, 2025•14 min
This document introduces a novel method for improving the alignment of large language models (LLMs) with human preferences using Reinforcement Learning from Human Feedback (RLHF) . The core contribution is a dual active reward learning algorithm that strategically selects both conversations to be labeled and the most appropriate human teachers to provide that feedback, thereby optimizing the data collected for training a reward function. It acknowledges the costliness of human feedback and the h...
May 16, 2025•18 min
This document explores active learning strategies for Direct Preference Optimization (DPO) , a method for aligning large language models (LLMs) with human preferences by directly optimizing the policy based on feedback. The authors propose a framework and two algorithms, ADPO and ADPO+ , designed for both online collection of new feedback and offline selection from existing feedback, aiming to efficiently choose the most informative preferences. Their approach linearizes the DPO objective at the...
May 16, 2025•13 min
This document introduces Active Preference Optimization (APO) , an algorithm designed to enhance the sample efficiency of Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs) . The authors highlight the costly bottleneck of collecting high-quality human preference data in current RLHF methods, which often rely on uniform sampling of prompt-generation pairs, leading to sub-optimal alignment under limited data. They demonstrate theoretically that uniform sampling can ...
May 16, 2025•12 min
This text introduces Diffusion Alignment as Sampling (DAS) , a novel approach for aligning diffusion models with desired characteristics by treating the problem as sampling from a reward-aligned distribution. DAS utilizes a Sequential Monte Carlo (SMC) framework enhanced with tempering and a specially designed proposal distribution to efficiently generate high-reward samples without requiring additional training of the diffusion model. The method demonstrates superiority over existing guidance a...
May 16, 2025•28 min
This academic paper introduces Test-Time Preference Optimization (TPO) , a novel method for improving the performance and safety alignment of large language models during inference without altering their core parameters. Unlike traditional alignment techniques that modify the model during training using numerical gradients, TPO leverages the model's own abilities to interpret numerical reward signals into textual feedback , iteratively refining generated responses through text-based critiques an...
May 16, 2025•24 min
This arXiv paper, titled GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment , introduces a novel approach for aligning Large Language Models (LLMs) with human preferences during the inference stage , without requiring expensive retraining. The authors propose the Autoregressive Reward Model to address the limitations of existing methods that use trajectory-level reward models , which are unsuitable for the autoregressive nature of text generation . They dem...
May 16, 2025•9 min
This paper introduces and explains Advantage-Weighted Regression (AWR) , a simple and scalable off-policy reinforcement learning algorithm that utilizes standard supervised learning techniques. The paper details AWR's theoretical basis , highlighting its connection to constrained policy optimization and its ability to effectively handle off-policy data through experience replay . The authors demonstrate AWR's competitive performance against existing methods on benchmark tasks and complex simulat...
May 16, 2025•19 min
This research explores ways to make Reinforcement Learning from Human Feedback (RLHF) more sample-efficient by leveraging imperfect reward models . The authors identify a key property of the KL-regularized RLHF objective , showing that a policy's ability to cover the optimal policy is linked to its sub-optimality, which suggests that higher policy value indicates better coverage . Building on this insight, they propose a novel transfer learning approach and a theoretically-sound algorithm, Trans...
May 16, 2025•18 min
This paper explores how transformers can be used for in-context linear regression in the presence of endogeneity . The authors demonstrate theoretically that transformers can effectively handle endogeneity by implementing instrumental variables (IV) techniques, specifically the two-stage least squares (2SLS) method, through a gradient-based approach that converges exponentially. They propose an in-context pretraining method with theoretical guarantees and show through experiments that trained tr...
May 15, 2025•12 min
This document introduces BC-LLM , a novel method for creating Concept Bottleneck Models (CBMs) that are both accurate and interpretable. Traditional CBMs rely on a predefined set of concepts, limiting their effectiveness and requiring significant human effort. BC-LLM addresses this by integrating Large Language Models (LLMs) within a Bayesian framework to iteratively discover relevant concepts. The LLMs serve as a concept extraction mechanism and provide prior information , enabling the model to...
May 15, 2025•21 min
This paper presents an academic paper exploring the difference between Bayesian and frequentist statistical approaches within the context of deep learning . The paper compares point estimation and distribution estimation methods, specifically focusing on how they perform when used with in-context learners , which are trained to make these estimations based on observed data. The authors conduct experiments across various models and tasks to evaluate the performance of these two paradigms , ultima...
May 15, 2025•12 min
This paper investigates whether Large Language Models (LLMs) utilize in-context learning (ICL) to perform reasoning consistent with a Bayesian framework . By using a simplified setting of biased coin flips and dice rolls, the authors analyze how LLMs update their internal probabilities based on provided examples. They find that LLMs often start with inherent biases (miscalibrated priors) but demonstrate behavior that broadly follows Bayesian updates when given sufficient evidence through ICL. Th...
May 15, 2025•13 min
This academic paper investigates whether in-context learning (ICL) in large language models (LLMs) functions like a Bayesian learner , aiming to explain why performance increases with more examples. The authors propose and derive novel Bayesian scaling laws that model the relationship between the number of in-context examples and prediction accuracy. Through experiments on synthetic data with toy models and real-world LLMs on various tasks, they demonstrate that their Bayesian laws accurately pr...
May 15, 2025•17 min
This paper introduces Posterior Mean Matching (PMM) , a novel generative modeling technique rooted in Bayesian inference. Unlike traditional diffusion models, PMM employs conjugate pairs of distributions to flexibly model diverse data types such as images and text. The core mechanism involves iteratively refining noisy data approximations through online Bayesian inference updates , with the convergence of the posterior mean to the true data sample forming the basis for generating new data. The a...
May 15, 2025•19 min
This work explores evaluating when a Conditional Generative Model (CGM) is suitable for an in-context learning (ICL) task. It introduces the concept of the generative predictive p-value , extending Bayesian model criticism techniques like posterior predictive checks (PPCs) to contemporary CGMs by generating simulated data to approximate sampling from a Bayesian model. This approach allows for assessing whether a model's inferences are reliable for a given ICL problem without requiring explicit a...
May 15, 2025•22 min
This paper highlights the challenge of aligning diffusion models with desired outcomes by optimizing reward functions , especially when gradient information is unavailable. The core contribution is the proposal of DSearch , a novel gradient-free method that reframes this alignment as a search problem on a dynamically constructed tree representing the diffusion process. DSearch utilizes heuristic functions and dynamic scheduling to efficiently explore the search space and identify high-reward sam...
May 15, 2025•14 min
This academic paper explores whether the in-context learning (ICL) process in Large Language Models (LLMs) behaves like Bayesian inference . The authors use the martingale property , a key characteristic of Bayesian systems with exchangeable data, as a framework for their analysis. They demonstrate that violations of this property and deviations in how LLMs' uncertainty scales with more data provide evidence that ICL is not Bayesian . The findings suggest that LLMs lack a principled understandin...
May 12, 2025•16 min