#84- FineWeb, the best dataset to pre-train LLMs. - podcast episode cover

#84- FineWeb, the best dataset to pre-train LLMs.

Jun 13, 202412 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Hey guys, in this episode I talk about the FineWeb dataset, the best pre-training open source dataset to date. In the episode I explain how they created the dataset and I also share some results.


Link to the huggingface blog: https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1

Instagram of the podcast: https://www.instagram.com/podcast.lifewithai

Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai

For the best experience, listen in Metacast app for iOS or Android