This academic paper introduces PC-SUBQ , a prompting strategy designed to improve the ability of Large Language Models (LLMs) to infer causal relationships from correlations . The strategy breaks down the complex task into sequential sub-questions that mirror the steps of the PC algorithm, a formal causal discovery method . Evaluating PC-SUBQ on the CORR2CAUSE benchmark , which provides correlation data to test causal reasoning, the researchers found that it significantly boosted performance acr...
May 24, 2025•12 min
This academic paper presents the parallel knowledge gradient method (q-KG) , a novel approach for batch Bayesian optimization designed to efficiently find the global optimum of costly, derivative-free functions when multiple evaluations can be performed concurrently. Unlike previous methods that build batches greedily, q-KG uses a decision-theoretic analysis to select a set of points that is Bayes-optimal for sampling in a single iteration. The authors address the computational challenge of maxi...
May 24, 2025•15 min
This paper introduces FunBO , a novel method utilizing Large Language Models (LLMs) to discover and refine acquisition functions (AFs) for Bayesian Optimization (BO) . The core idea is to treat the discovery of effective AFs as an algorithm discovery problem , leveraging FunSearch, an LLM-based approach for mathematical sciences. FunBO iteratively generates and evaluates candidate AFs written in code, aiming to improve BO's sample efficiency and performance across diverse optimization problems, ...
May 24, 2025•16 min
This paper introduces an innovative method for automating social science research , focusing on generating and testing hypotheses in a simulated environment . Leveraging recent strides in large language models (LLMs) , the approach centers on structural causal models (SCMs) as the foundational language for hypotheses and the blueprint for experiments. The authors detail a system that uses LLMs to propose hypotheses, design agents with specified attributes, run experiments involving these agents ...
May 24, 2025•11 min
This research proposes a novel approach to understanding the self-attention mechanism within Transformer neural networks , interpreting it through the lens of structural causal models (SCMs) . By viewing self-attention as a method for estimating an SCM for input sequences, the authors demonstrate that pre-trained Transformers can be used for zero-shot causal discovery , even in the presence of unobserved factors. This allows for learning the causal structure over individual input sequences by an...
May 24, 2025•14 min
This paper investigates whether Generative Pre-trained Transformer (GPT) models , trained solely for next-token prediction, implicitly learn a causal world model . By proposing a causal interpretation of GPT's attention mechanism , the authors suggest that these models can perform zero-shot causal structure learning for input sequences. Experiments in controlled game environments like Othello and Chess show that GPT is more likely to generate legal moves for out-of-distribution sequences when th...
May 24, 2025•20 min
This paper introduces Trace , a novel framework designed to optimize complex computational workflows end-to-end . Trace extends the concept of automatic differentiation by propagating the execution trace of a workflow as feedback, enabling the optimization of both differentiable and non-differentiable operations. The authors propose a general setup called OPTO (Output-Feedback Optimization) , where an optimizer iteratively updates parameters based on rich feedback and the computational graph. Th...
May 24, 2025•24 min
This paper presents research exploring adaptive inference-time compute for large language models (LLMs) to enhance performance and efficiency. The core concept involves training LLMs to perform capability-aware and mid-generation self-evaluations , allowing them to predict whether restarting a response would yield a better result without needing external reward models. The paper demonstrates two key techniques leveraging this capability: adaptive sampling , which resamples only when predicted as...
May 24, 2025•19 min
This paper introduces PRL (Prompts from Reinforcement Learning) , a novel method that automatically generates and refines prompts for Large Language Models (LLMs) using reinforcement learning. Unlike previous methods, PRL can create new, task-specific few-shot examples that were not part of the training data, leading to state-of-the-art performance across various natural language processing tasks, including classification, summarization, and simplification. The approach incorporates a reasoning ...
May 24, 2025•19 min
This academic paper proposes a novel method called Plugin for adapting closed-source Large Language Models (LLMs) to specific tasks without needing to access their internal weights or original training data. The key idea is to leverage token logits , which are the raw probability scores before the final token selection, viewing the adaptation as a label noise correction problem in a sequence setting. A separate, smaller model is trained on task-specific data to reweight these probabilities durin...
May 24, 2025•14 min
This academic paper proposes a novel approach to understanding how large language models (LLMs) learn from example demonstrations provided within the input, a process called in-context learning. The authors suggest viewing LLMs through a Bayesian perspective , considering them as implicitly inferring a latent variable that encapsulates task information. Based on this theory, they developed an algorithm to select the most effective demonstrations by training a smaller LLM to identify examples mos...
May 23, 2025•18 min
This academic paper presents Inference-Time Intervention (ITI) , a novel method for improving the truthfulness of large language models (LLMs) like LLaMA. ITI works by adjusting internal model activations during the process of generating a response, aiming to align the model's output with known facts and avoid common misconceptions. The research demonstrates that this technique significantly boosts performance on benchmarks like TruthfulQA , even with limited training data, while remaining compu...
May 23, 2025•14 min
This academic paper explores various methods for improving the text generated by large language models (LLMs) after they have been trained, focusing on inference-time algorithms. It categorizes these techniques into three core areas: token-level generation algorithms that operate on individual tokens, meta-generation algorithms which structure multiple generation steps, and strategies for efficient generation concerning both token cost and speed. The work formalizes the objectives of different g...
May 23, 2025•32 min
This academic paper investigates the mechanism behind in-context learning (ICL) in large language models (LLMs). The authors propose a theoretical analysis suggesting that ICL can be understood as kernel regression , where the model uses input-output examples within the prompt to make predictions on new data. Through analysis of attention patterns and experiments across different tasks, the study provides evidence that LLMs allocate significant attention to the demonstration samples , particular...
May 23, 2025•13 min
This paper introduces PANDA , a novel approach to personalizing large language models (LLMs) at the point of generating text, known as inference time. Unlike traditional methods that require costly retraining for each new preference, PANDA dynamically adjusts an LLM's output based on learned user preferences without altering the core model. By using context-aware preference weights and reward models , PANDA enables flexible and efficient tailoring of LLM responses to individual needs, validated ...
May 23, 2025•16 min
This research introduces InferenceGuard , a novel method for aligning large language models (LLMs) at inference time, aiming to ensure safe responses with high probability. Traditional alignment methods are costly and modify model weights, while existing inference-time techniques often lack strong safety guarantees. InferenceGuard reframes safe generation as a constrained Markov decision process (MDP) within the LLM's latent space, using state augmentation to guarantee almost sure safety. By tra...
May 23, 2025•14 min
This comprehensive survey examines in-context learning (ICL) in large language models (LLMs) , a capability that allows them to learn tasks from examples provided within the input. The paper explores advancements from theoretical viewpoints , such as mechanistic interpretability and mathematical foundations, and empirical perspectives , analyzing factors influencing ICL like pre-training data, model properties, and demonstration characteristics. Understanding ICL is crucial for improving LLM per...
May 23, 2025•33 min
This academic paper explores various methods for improving the text generated by large language models (LLMs) after they have been trained, focusing on inference-time algorithms. It categorizes these techniques into three core areas: token-level generation algorithms that operate on individual tokens, meta-generation algorithms which structure multiple generation steps, and strategies for efficient generation concerning both token cost and speed. The work formalizes the objectives of different g...
May 23, 2025•32 min
This academic paper investigates the mechanism behind in-context learning (ICL) in large language models (LLMs). The authors propose a theoretical analysis suggesting that ICL can be understood as kernel regression , where the model uses input-output examples within the prompt to make predictions on new data. Through analysis of attention patterns and experiments across different tasks, the study provides evidence that LLMs allocate significant attention to the demonstration samples , particular...
May 23, 2025•13 min
This research investigates the location of task recognition within Large Language Models (LLMs) during in-context learning. By employing layer-wise context masking on various LLMs and tasks (Machine Translation and Code Generation), the study identifies a "task recognition" point where the model no longer needs attention to the input context. The findings indicate potential for computational savings by reducing redundant processing and reveal a correspondence between this task recognition point ...
May 23, 2025•13 min
This document introduces LLM-AutoDiff , a novel framework for Automatic Prompt Engineering (APE) that aims to automate the challenging task of designing prompts for complex Large Language Model (LLM) workflows. By viewing these workflows as computation graphs where textual inputs are treated as trainable parameters, the system uses a "backward engine" LLM to generate textual gradients – feedback that guides the iterative improvement of prompts. Unlike previous methods that focus on single LLM ca...
May 22, 2025•19 min
This academic paper introduces metaTextGrad , a novel meta-learning approach designed to enhance large language model (LLM) performance during inference by learning better loss functions and initialization strategies , referred to as inference templates. While existing methods like TextGrad refine LLM outputs iteratively, they often require extensive manual tuning and are sensitive to prompt wording. metaTextGrad addresses these limitations by employing a meta-learning framework that optimizes t...
May 22, 2025•19 min
This paper introduces semantic operators , a declarative model for AI-powered data processing that leverages the capabilities of large language models (LLMs) for complex data transformations. The core concept is to provide a structured way to perform operations like filtering, joining, and aggregating data using natural language descriptions. By defining a "gold algorithm" for each operator, the system ensures accuracy while an optimization framework enables significant performance improvements ...
May 22, 2025•17 min
This paper explores how to measure the isolated causal effect of a specific linguistic feature, such as "netspeak" or "profanity," within a text on a reader's perception or behavior, like finding a review helpful . The core challenge lies in accurately representing the non-focal language —everything in the text except the targeted feature—as approximations of this non-focal language can introduce omitted variable bias and impact the accuracy of the estimated effect. The authors introduce a frame...
May 22, 2025•18 min
This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME , datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, ...
May 22, 2025•12 min
This paper presents J1 , a new method for training large language models to act as judges that evaluate other models' responses. The J1 approach utilizes reinforcement learning to encourage these judge models to produce detailed, step-by-step reasoning before making a judgment, similar to a chain of thought . By converting both straightforward and subjective tasks into verifiable problems with rewards for accurate judgments and consistency, J1 demonstrates improved performance across various ben...
May 22, 2025•19 min
This paper introduces ShiQ , a novel offline reinforcement learning algorithm designed for fine-tuning large language models (LLMs) by adapting traditional Q-learning methods. The authors address the challenges of applying Q-learning to LLMs, such as computational cost and initialization issues, by deriving theoretically grounded loss functions from Bellman equations . ShiQ enables off-policy, token-wise learning and is evaluated on various benchmarks, including multi-turn settings, where it dem...
May 22, 2025•18 min
This academic paper proposes a new causal framework for learning optimal strategies in natural language tasks that involve multiple steps, where the final result is only known at the end. Unlike methods requiring extensive data and multiple models, their approach utilizes Q-learning with a single model to estimate multi-stage decision processes. By performing gradient ascent on language embeddings , they optimize the process, coupled with a decoding strategy to convert optimized embeddings back ...
May 22, 2025•15 min
This paper introduces Multi-Objective Preference Optimization (MOPO) , a novel algorithm designed to align large language models with complex human preferences that involve multiple, potentially conflicting goals like helpfulness and harmlessness. Unlike prior methods that often reduce multi-objective alignment to a single score, MOPO frames the problem as a constrained optimization, maximizing a primary objective while ensuring secondary objectives meet certain thresholds. The paper demonstrate...
May 22, 2025•17 min
This paper present research on end-to-end learning for stochastic optimization , focusing on a Bayesian perspective . The authors propose that the standard algorithm used in this field has a Bayesian interpretation , effectively training a map that performs a posterior Bayes action . Building on this understanding, they introduce new algorithms for training decision-making tools for problems involving empirical risk minimization and distributionally robust optimization . The paper investigates t...
May 21, 2025•35 min