Learning Compositional Functions with Transformers from Easy-to-Hard Data - podcast episode cover

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Jun 02, 202512 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper presents a theoretical analysis of how transformers can learn k-fold composition tasks, which involve combining multiple permutations. It proposes that transformers can achieve this through a hierarchical process, where each layer learns progressively more complex compositions, referred to as "hops." The document details a curriculum learning strategy (Algorithm 1) and a mixed training approach (Algorithm 2), demonstrating how transformers can learn these tasks efficiently. The analysis includes lower bounds on the learnability of these functions within the Statistical Query (SQ) framework and provides proof sketches and lemmas to support the main theorems regarding the learning guarantees of the proposed algorithms.

For the best experience, listen in Metacast app for iOS or Android