Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina∗

Best AI papers explained

Jun 14, 2025•28 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper investigates the suitability of large language models (LLMs) as substitutes for human participants in social science research. The authors examine LLMs' reasoning abilities using the "11-20 money request game," a test designed to evaluate strategic thinking. Their findings consistently show that LLMs generally fail to replicate human behavioral patterns, exhibiting less reasoning depth and inconsistent responses compared to human subjects. The study highlights several limitations of LLMs, including their reliance on probabilistic patterns rather than genuine understanding, sensitivity to subtle changes in prompts or language, and the potential for memorization of training data to be mistaken for true reasoning. Ultimately, the paper concludes that caution is essential when considering LLMs as human surrogates, suggesting they are currently better suited for generating novel ideas rather than simulating human behavior.

keepSave to notecopy_alldocsAdd noteaudio_magic_eraserAudio OverviewflowchartMind Maparrow_downwardJump to bottom

For the best experience, listen in Metacast app for iOS or Android