Caio Temer Episode 13 part 1

The Paleo Cyborg Podcast

Jan 17, 2024•1 hr 22 min•Season 1Ep. 14

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Caio Temer explains LLMs and machine learning. found that generalized learning happens after 10k epochs with a considerable amount of data per epoch, are there studies in this regard? ChatGPT Yes, there are studies related to generalized learning after many times, with a large amount of data per season. A notable example is the "Deep Bootstrap Framework", presented at ICLR 2021. This framework makes a connection between generalization and online optimization. He compares a "real world", where a model is trained on a finite number of samples by multiple times, with an "ideal world", where each training step uses fresh samples from a virtually infinite data set. The research found that models that train quickly in infinite data are the same as those that generalize well in finite data. This study offers a new perspective on design choices in practice and provides a theoretical path to understand generalization. Get started with ChatGPT Caio Temer show notes: Part One summary interesting topics for trading- -how I created the first bots -how models combine language models with autonomous robots -why conventional methods don't work about language models: -language models are simpler to obtain with other structures -problem in the current structure and how I solved it - development of intuitive neural networks using hash abstraction Caio developed a way for neural networks, through cryptography techniques that use sum, to learn to abstract and understand deeply quickly created a lib of a tiny language model that learns to predict all tokens at an absurd speed by using distance vector embeds instead of tokens Transformers: [](https://arxiv.org/abs/2205.13504) for example, if the network is exploding, there's something wrong with a component, or the update rate is wrong If the rate is right, the linearities are wrong, just exchange some linear or more linear connection like relu for tanh or add a tanh. softmax in deep layers as a projector reduces network convergence time by forcing layer alignment training the transformer one layer at a time, adding a copy of the first layer and freezing the first accelerates training and improves retention In sequence for sequence, a permuted linear dense solves 99% of problems in less time and better than the transformer. Giant embeds work like entire networks, only needing one activation at the end after reshaping “1 tanh layer of 1 million units solves everything better than any other network with the same number of parameters, at least in the tasks I use, I always start with it, then see how to reduce” Collabs: making a language model using Facebook's FastText https://colab.research.google.com/drive/1wVQrpzyY-SkCZTRZCcP6xJGRdg1ik0jR#scrollTo=pIW6-VwMuRlz https://colab.research.google.com/drive/1oABIZr1xiIu7DKc7AbbZfeBJFLkZW6Ep#scrollTo=XfR-3PpPYoFU https://github.com/rtkclouds/fast-js-language-model https://x.com/canalCCore2/status/1735044356535472278?s=20 GitHub gists: [](https://gist.github.com/rtkclouds/50b81d10736793f07cdca354516e8757) [](https://gist.github.com/rtkclouds/a6ee9afd96461ca94b3e9c22f78bda3a)

For the best experience, listen in Metacast app for iOS or Android