Benefiting from Proprietary Data with Siloed Training

Best AI papers explained

May 06, 2025•18 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

We discuss a presentation and discussion on training language models (LMs) using distributed, or siloed, data, which is often proprietary and cannot be combined into a single dataset for joint training. The speaker highlights the importance of data for LM performance and the increasing trend of valuable data becoming proprietary, making traditional joint training approaches challenging. The presentation proposes a novel method, termed SILO Open LM, which adapts the Mixture-of-Experts (MoE) architecture and leverages "MoE-aware silo training" and "obtaining proxy data" to train LMs on isolated datasets and merge them into a single, general-purpose model. Experimental results comparing this approach to existing methods like weight merging and ensembling are presented, demonstrating significant performance gains on various benchmarks. The work also acknowledges limitations and open research questions, including improving performance on specialized tasks and scaling the approach to a larger number of datasets.

For the best experience, listen in Metacast app for iOS or Android