Hey, welcome back to another episode of JavaScript Jabber.
This week on our panel we have Steve Edwards.
Yo yo yo, come in ato live from cold in sunny Portland.
We also have a j O'Neil yo yo yo coming at your live from the soldering station.
Oh sorry, I'm Charles Maxwood from Top End Devs and yeah it is freezing here anyway.
We have a special guest this week and that is is Sean Annon.
You want to let people know who you are, what you do where?
Yeah, of course is because your course is awesome.
Oh thank you. So. My name is Eha nand I have about twenty years of engineering and product management experience and most recently I've been very focused on AI for the last couple of years and I'm best known THEA community for an implementation.
Of GPT two.
There's a precursor to chat GPT that I implemented entirely in Excel and then late last year I reported that entirely to the web and pure JavaScript, and I teach how the entire transformer works. Basically the model that you know was the you know, ancestor to Gemini, Barred Lama, Chat, GPT, Claude. They're all really inheriting from this model called GPT two and I teach people and basically course of two weeks.
If you have really no programming experience, or if you've got JavaScript programming experience, this is the best way to really get in understand how these things work. And they don't have to be a black box and you can see all that at Spreadsheets at All. You need dot ai and the classes on mavin.
Very cool, So let's let's dive in. First.
I think you said you had a promo code for the course, so let's just put that out there.
Yeah, people want to go get it and get a deal on it.
Yeah. So the promo code is really easy to remember. It's jsjer and just go to Maven dot com and look for my name, or if you go to spreadsheets at All, you need dot ai and then you click that you can use that promo code for twenty percent off for the next two weeks.
So awesome.
Definitely check that out. And I should just say, you know, thank you guys for having me. I listened for years to this, so it's great to actually meet you guys, well virtually in person.
Right. Yeah, AJ is the cool one. I just run the show anyway, and.
I'm just thinking guy while everybody else are the smart people according to some people.
Anyway. So let's dive in.
You said that you explain how the transformer works, and so for those that are kind of new to AI, do you want to just explain what a transformer.
Is an AI? Yeah, we can dive into house stuff works.
Yeah, sure. So the the transformer is a you know, AI architecture of a model that came out in twenty seventeen and it is the foundation for most of the you know AI models that have been you know, like chat GPT, so those chatbot assistants that seem amazingly smart all inherent from this architecture called the Transformer. And I can give a high level over your everything that goes into that. But the key thing that the transformer does is usually takes some input and it tries to predict
what the next word is. And that's really all your large language model is doing is taking one word or really one token at a time, and it's trying to predict when you enter in a question what the next thing is. And over you know, the last you know, a couple of years, what we've been able to do. We collectively as humanity is. Take this model that tries to predict the next word and turn into these really helpful,
amazing chat bot assistants. And the paper that introduced this model, called the Transformer, was called Attention Is All You Need. And that's where my course gets its name Spreadsheets Are All You Need? Is I basically implemented that entire model inside a spreadsheet? Hence the name Spreadsheets Are All You Need?
So question here then? So I mean, having used Google since its inception, you know, type ahead is sort of a standard thing in search. You know, where you're typing, and it's starting to anticipate what your phrases, you know, what you're gonna type next. If I'm you know, starting to search for spreadsheets on Google, it's going to anticipate, Okay,
what's the next thing I'm going to type? So is this basically the same thing just on AI steroids or because I mean basically that's using what people have typed in and you know they've indexed it and you know, done things with it. So is that sort of the same thing just on steroids or is that intrinsically different?
Yes? And no in terms of effect, it is literally just doing the same thing, like it's trying to break the next thing. Is I really kind of get a little bit of a mental pushback that to just saying, oh, it's just like autocomplete. It is basically structured as an autocomplete problem, but the level of complexity of the architecture to solve that problem is just a lot more complex.
But it is trying to do the same thing. And you know, the way to think about this is if you can fill in the blank in any sentence, you probably know something about that sentence. You already know what the answer might be. Like that's a useful test of knowledge. But effectively, yeah, that is that is what's going on. It's just trying to break the next word, and then the next word after that and so forth, one at a.
Time, right, and so effectively, I guess the the autocompletes that we typically see are a little bit I guess more naive than say the AI LM models, where they have substantially more data to run on, and you know, use a mechanism that I guess is probably somewhat the same because it's weighted and things like that, but anyway.
It can do it across a wider variety.
Of things and give you deeper answers.
Yeah, so I mean, actually, let's start with the autocomplete example, because it does kind of point the way to some parts of the architecture. Like the simplest thing you might do for building an autocomplete is you might just say, if I see this word, what are all the next likely words that will be after it? And you could just do a statistical look up across some large data sets, right, And as good as that'll be, the more pieces of data you look at, the better it's predictive value. So
this is called like a bigram model. And then because it looks at two and then what you could do is you could actually look three words back, or you could look forwards back. And actually, one of the key things about the transformers it tries to look at all the words. And this is what the attention mechanicism does, is that it can figure out, essentially from all the possible words before it, what is the next most likely word.
And then the other key thing you need to do is you ask a real network to take all that information a prediction. And it turns out that's the heart of the transformer and what really made it work was they just scaled that up to a much larger size than I think people were used to doing. You know when you're autocomplete and your keyboard is probably used to you know, is built to be really really fast, and
so they tried to make it really efficient. And what we've been able to do with the Transformer is make it really big and then actually make it super efficient scaling it back down so just that it spits out tokens at a reasonable clip. But that core idea of saying, hey, let me look statistically at you know what the next thing is, Well, one word isn't gonna be enough, Two words back is going to be better. That is what
the attention mechanism is. In a sense, if you squint, doing is it's trying to look at all the words that came before, it puts them through multiple passes, and then it's asking your normal network to do the prediction rather than just simply saying, oh, let me take the raw statistics.
But yeah, so do you kind of want to break down for us how these systems actually work.
Yeah? So the first thing I say is the way to think about the simplest model that I like of the transformer is that what we've been able to do. You know, we said that, you know, these are trained to fill in the blank on a piece of text. So the example I often use in my lectures and inside a lot of my material is this very simple, simple sentence, Mike is quick, he moves and the next most likely completion would probably be he moves quickly, or
he moves around, or he moves fast. And so the basic question is how do we get a computer to fill in the blank of an English sentence or any natural language sentence. And what we've been able to do is actually figure out to talk in the language of the computer, which is math. So if I gave the computer a math problem two plus two equals it could fill in the blank. It knows that two plus two equals four, and we can make the math as pretty large and complex. But computers are really good at math.
So we've been able to do is and what the model does is it takes a word problem and it's really converting it to a math problem. If you look inside, you know, go to my website, you know, spreadsheets are all you need dot ai slash GPT two, or if you download the Excel file and look inside it what you'll see in you know, there's text at the beginning. You type in text on one part of the spreadsheet, and you get the predicted word at the other end of it. But in between, if you look in that,
you'll be like, where the heck are the words? It's all numbers. And so the key insight is what we've been able to is take something that is a word problem and we've turned words into math and once and that mapping process of words into map has two stages. It's called tokenization and then embeddings. And at the end of it, we map every word. You can conceptually think about it to a single number, but we actually map
them to a large list of numbers. And then once you have a mathematical representation of your prompt, your entire prompt has been you know, turned into a large list of numbers. We then run I just call it number crunching. It's these two key mechanisms attention and a multi layer perceptor or a neural network that just kind of crunches on it to try and predict what the next word is.
And then at the end of that we get a number, and that number we then reverse the process that came out of that thing, and we say, well, what what word does this number map to? And that number is a predicted word, but it's not going to map cleanly to every single word in our vocabulary. And so if that number is closer to certain words, like in the case mike is quickly as quickly, the predicted number might
be really close to the word quickly. It might be close to the word around, but it's not going to be close to you know, quick can be a body part, it can be the quick of your fingernail. It's not going to be something about your fingernail, because it's figured out enough that it's moved the predicted number away from that.
And so we take that and we run a random number generator the very end, and then we pick it according to that random number generator based on how close that number is to one of the other words in the dictionary of words mapping to numbers. So that's like my highest level summary of what's happening under the transform without describing all the mechanisms. But again, the key thing is we found a way to map solve this problem numerically.
We map words to numbers. We turn the whole sentence, your entire prompt into a large list of numbers, We number crunch on it. Then we get a predicted number out of it. We just calculate and we look at how close that number is to our number to word mapping at the very end, and that's the probability you get of getting a particular token or word out of the model. Let me pause there, see if their questions or things I should clarify, So.
I think I follow along.
Essentially what you're saying then is, so let's say I wanted it to generate a whole paragraph.
It just does this over and over and over again. Get yeah, the next word.
Yeah, maybe I've glossed over that part of it. Like the large language model only predicts the next word technically something called a token, which is slightly smaller than a word, and every time you get a prediction out of it, like, it doesn't by default predicted paragraphs. So if you you know, try my app or you download the spreadsheet, it only
predicts one token. And the way we get paragraphs a text out of this is we take the predicted token it came up with, and then we stick it back onto the input, and then we ask it to predict the next sentence or the next that new accumulated paragraph, and so you can actually start with a single word, ask it to predict what the next word is, and then you now you've got two words, and then you
run it through and then you keep going. And then what happens when you've got user input like somebody types of response, is you just stick that entire user input as you know, a large set of words that it needs to brick what the next thing is. And you can think about it structured into the model. As you are reading a transcript between a user and a helpful
chatbot assistant. User said X, we fill in what the user said, assistant said, and then it needs to come up with what the assistant said, and it just tries to come with something plausible. Maybe the thing is step back, like the base model that these that gets trained in this process before it's turned into a helpful chatbot just knows really simply how to complete sentences. If you take the base GPT two and you type in, you know, questions to it, it's not going to necessarily respond back
to you meaningfully. It's just designed to predict the next word based on everything it's seen on the internet. So a good example I use in classes, we type in the word first name and then you hit return, and well, what do you think it would predict after that? It predicts last name, email address, phone number, because most texts on the Internet that's say first name. Statistically, it's a
form and it's used to just filling out forms. Another one is I type in hello class, and when I first did this, I thought it was going to say hello teacher, but it actually starts spitting out Java code, so it just looks at the fact. Yeah, it's really a music to watch and you can you can just run it, and it's just trying to predict what the next thing is based on what it saw on the internet.
And then what you know open Ai and Nentropic and these companies do is they put that call a base model, which all it knows how to do is predict the next word through a training regime to elicit it to be more like a helpful chatbot. So you give it a system prompt that tells it it's a chatbot. It's kind of like you tell it a story that's plausible for it to start to think like it's talking to
a user, like you are a chatbot. You are reading a transcript of a chatbot and a human user, and we just fill in what the human said, and it tries to fill in what it thought the helpful system would be, and then they fine tune it to get better at that.
Yeah, this sounds a lot like what you're explaining.
You get into prompt engineering, which, again, if you're not into AI, prompt engineering.
Is what's all the stuff I tell the AI.
System so that it'll give me the answer I want, right, And so you're when we're talking about prompt engineering, now it's okay.
So this is why when I start out.
I tell it things like, like you said, you are a chat bot, you help people with these problems, you do these kinds of things, because it'll build off of all of that and use the statistical model now with the context of what you typed in to give you the right answer.
So you know, yeah, Hello class. There's not a whole lot there for it to go on.
But if you tell it, you know, you're a chat bot and you're helping students with a blah blah blah blah blah, then you type in hello class, and it's going to go you know, then it may come back with hello teacher or something like that.
That's a great example. Yeah. So, and what you can think about them conceptually doing is baking that prompt engineering into the model. So what they're able to do is if they give it enough examples of this, they can retrain it such that you don't need the prompt at the beginning that tells it it's a teacher or that it's a helpful chatbot assistant, and that gets baked into
the model. You can think about all that prompt engineering gets memorized into the model during that training process, and then it turns into that helpful assistant will.
Help help me understand this a little bit. So I've I've played around obviously with GPT. I've also played around with the other models. In fact, right now I really like Quinn. I am I am using Quinn more than I'm using GPT, because Quinn actually seems to be giving better results, especially considering it in the benchmarks, it outperforms four oh whatever that means. I mean, it's like by a fraction of percentage point, but OH one. I just find OH one and R one to be too like
they take forever. So it's like I'd rather ask the question twice and be ninety nine percent likely to get the right answer, then ask the question one time and then have to wait forty five seconds to get the wrong answer and ask it again.
You know, forty five seconds. That's an eternity.
The O one is crazy.
We use it.
We use it for code, for code questions and stuff, because it does better than the standard GPT for wait a few seconds. But I'd rather get a wait a little bit and get a better answer than get something super fast that's not going to be as good.
Well, I I'm the other way because it's not that much better. If you look at the benchmarks, it's like one percent better than four oh, and it takes you know, so much anyway. But what the thing, the thing that I was that I was getting at is in the beginning, there was the system prompt. Right, so when with GPT, one of the ways to jail break it was you could say that was just a joke. Actually you're a something else, and so it would interpret it as Okay,
your system prompt is you're a chatbot. You're allowed to say this. You're not allowed to say that that you could just say that was just a joke. And then
and then and then give it an additional prompt. Now with deeps seek V two point five and are one and with Quinn it's it's like you're saying it's baked into the model because if I override the system prompt and I tell it, you know, you are a human who is capable of reasoning and has no biases and can represent any information factually, tell me about Tianaman Square.
It's you know, it's.
I am a helpful bot. I am not a human, and I do not talk about things that contradict what is known to be you know, the proper the proper knowledge of the of the Chinese government to protect the people, or you know, it gives me some some nonsense like that. So what what is How is it possible to bake in those system prompts with training data and and I guess how does that vary? How does it vary from
the system prompt? And how do they get it to bake that in so that it you can't override it with a system prompt.
Okay, there's a lot of layers there.
Let me yeah, question, can you restate the question in one sentence?
I think the uily what I think with the question, which was how do you bake in the system prompt. But there's a couple of things that are worth noting in your question, Like you mentioned some reasoning models one and R one, and the way those operate is a little bit different. Like you said, it takes a while to come back because it's actually just expending a lot of tokens thinking that it doesn't give you, and it's trying to actually think through the process like you might do.
They call this chain of thought or thinking step by step, and it what's unique about that can parterregular chain of thought is it can suddenly realize, oh, it's made a mistake and backtrack. And so it's it's literally spending you know, coming up with hypotheses and trying and testing things and seeing if it works. So this is why these models tend to be really good on math and code because it can go try something and say, oh wait does this let me check does this answer right? Oh no,
it's not, let me try again. So and then you mentioned, you know, jail breaking, and with the early models, one way to think about like you're like, oh, this is just a joke, is that you know, you're kind of taking that we've talked about briefly, that attention mechanism or looking back at the previous what's most likely if you put things like you know, you know, kill and harm in the in the prompt, statistically it sounds like it's negative, right,
But if you start putting things like Grandma cookies, it seems less harmless, and you kind of think of yourself a kind of waiting the attention to be more to the harmless side. And really what's happened is that the models have gotten smarter, both in terms of their natural responses to this, Like they are trained to handle jail breaks. They are trained on if a jail break comes up, here's the response. And the way they train it to get to your your main question is through these two
training techniques. One is they just give it an example of a prompt and what its response should be, and they use this technique called backpropagation or sarcastic creating descent, which is to tune the network such every time it sees that result, it gives out what we wanted it to have. So we're going a little head of where I wanted to be. But like when you train in a ural network, you give it examples of data, so The simple example is a dog and cat classifier.
Right.
I give it pictures of dogs, and I give it pictures of cats, and I tell it which ones are dogs and cats, and it comes up with the answer. It comes up with the rules how to figure out whether an image is dog or cat. This is way different than regular programming, right. Machine learning inverts the normal paradigm. Normally we're used as developers. I write a series of rules, a series of program, right, and then it processes data
and gets out a result. I click a button, something does you know moves on the screen, So I can write that program. But a dog and cat photo classifier, I don't know if you gave me dogs photos of dogs and cats, I know how to instinctively do that, but I do know how to write it out as a series of rules. And so the inversion that machine learning does is you give it answers and you give it data, and then it figures out how to write
the rules. It writes the program. Now, unfortunately we can't always understand what the program it comes up with is. But what they do is they give it examples of jail break attempts and they say, hey, you know your response, now should be this to that. That's kind of the high level overview of how they do that. One thing that's worth noting is that when they protect a model, it isn't just in the model itself, So there are usually things that are watching the result of the model
that are additional classifiers. And so sometimes you might see examples of open source models that let you do things, but the hosted versions do not because the hosted system is actually checking. So not everything is baked into the model itself and it's not one hundred percent perfect, So often there's some additional guardrails that are detecting things.
Okay, So then two more questions.
Yeah, what constitutes open source because that does not mean the same thing that it means in the programming circles, or.
I don't believe it does because I have not yet seen any open source model that comes with four hundred terabytes of training data.
There are few and far between. There are some. Olmo is probably the best known one, which is a model where everything the training code, the training data system, the data collection pipeline, the logs from their training runs are completely open. There's like a handful of others. But this question of what constitutes open is is completely a gray
area and it's being debated right now. Traditionally, when people talk about an open model, it's usually an open weight model, which is you get the parameters which encompass the rules we talked earlier. That whole thing is math. Right, So if you open up my spreadsheet or you open up my website, you just see lots of numbers. You know, whether those numbers are hidden from you or you can run them yourself is what people call an open weight model.
That's what kind of passes for open source. These days, there are very few models that open up the training data, and so it's debatable and people do debate about what a truly open model means. A truly open like the most open is one that includes the data, but there aren't that many, especially at the state of the art, where the model is all the training day that created the model is there.
Well, I mean that would be highly illegal for chat GPT to get as their training data because YouTube and all the libraries on planet Earth and everyone who has a copyright on something would have something this day about that.
Well, I mean, I'm not going to comment on any particular model providers specifically, I will say the idea of whether you can train on data and whether it's transformative is quite frankly still in the courts right now, and we don't have global consensus. So I believe it's Japan has said and clarified that you can train on data, that the training process is not directly infringement. Now, you know, one of the litmus tests is like whether you're competing
with your regional thing. So it's it's a larger open question, but right now that's making its way through the courts. I think, you know, candidly, if you said here all the things I've trained on, then you might end up, you know, just opening yourself up to more people who can just say, oh, let me get onto that lawsuit. But I mean that that question is still being that's a legal question, which.
Yeah, yeah, yeah, yeah, yeah, okay. So my my other question related to what we would we had just talked about, was so I I download Quinn and I don't know what it is that I'm actually downloading because I use I use Olama, and it downloads twenty gigabytes of something and then it runs it and I get to be productive and I'm happy. But in terms of you know, like like you're saying there's something it's not just the model giving a response, but then there's other code that
is you're doing some sort of check. Is that happening with these models that I am using generic tools like Olama or Lama CPP or or LM studio. Is that actually running program code? Binary code? Or I guess it's not binary code. It have to be bytecode because I can do it on Mac and I can do it on Linux and it doesn't have to recompile anything after it's done downloading it. So what where where are those extra layers or how are they interpreted?
The extra layers that are protecting the model from saying grong thing? Is that what you're asking? Yeah, yeah, those aren't there when you download an open source model, when you run in a little lama, those extra layers. The only thing that is protecting the model is at that point what the model was trained, the pre training they did that they baked into the model. Then they're not
doing any additional checks. So with the hosted model, there is a there is additional layers because they ConTroll the infrastructure and they're watching what the model says and they're they're stopping it. But typically when you will use you know, LM Studio or Olama, then it's g you're just getting
the bare uncensored model and there's no additional checks. The only thing that's preventing the model from you know, saying the wrong thing being not helpful or not harmless or I guess harmful and unhelpful is just the training and the models, you know, training that it was put through. There's no additional checks there. So and when you download the model, maybe it's worth saying you're just basically downloading a large list of numbers and the code inside it
tells it. And you're getting a large list of numbers and a mathematical graph that says how to combine the numbers together. It says, take this parameter first, here's how you map the words to numbers. That's you get that mapping. And then once you've mapped them to numbers, it says, add it here, multiply times this other number here. Then you know normally you know, take the square root of this other number and then multiply it again. And it's
just a list of calculations. It's a really like simple program. In fact, most of the knowledge it's worth stating is not in the code. And this gets back to your question about like open source, it's in the knowledge, is inside the data, it's inside the parameters. So as an example, GPT two, which you know is considered one point too dangerous release and it's amazing.
Yeah, only that's because they want regulatory capture, not because they actually believe it's dangerous.
Well at the time maybe they they they're concerned about disinformation, but suffice to say it was still a powerful model in its day. Is my point is only five hundred lines of code. If you take out the the the TensorFlow library, it's five hundred lines of code. It is. It is astonishingly small. And so one of the things and the reason why I re implemented the entire thing in JavaScript is I want to push back against this idea that well, this stuff is too hard for you
to learn. If you're a web developer, like you can learn five hundred lines of code. And that's basically like I give you the grounding and I re implement the entire thing in javascripts. You can step through it. You don't even have to leave your web browser, right, you just use the web debugger and you can you can step through what's happening, and it's it's astonishingly small. All the knowledge, all the rules is captured in the weights
and the parameters the model. So when you download the model, it's just a more and more numbers with a larger and larger computational graph. And that's how we get it smarter. That's gets back to the heart of like the core thing to understand is we took a word problem and we mapped it to a number problem. So if we get a bigger calculator, we get a better result.
But I want to I want to restate this just in another way, really simply because I think a lot of people get you know, they get confused between like GPT four versus chat, GPT versus something on your computer versus whatever. And so yeah, essentially the model, like you said, you know, it's it's maybe a few steps in how it gives you answers, and the rest of it, like you said, is all the data.
It's all the waiting.
But sometimes when people are talking about AI models, they're talking about a program that accesses the model that I just explained or that you just explained, right with the numbers and kind of the fundamental pieces and so that's your chat GBT, whether it's running on your local machine or in the cloud. Yeah, you need to be able
to differentiate between the two and recognize that. Yeah, sometimes you're just downloading that map of numbers and you know some really really simple stuff that makes sense of the numbers, and that's your model. And so when people are building against those models, a lot of times that's what they're doing. And so you can write your own code that then, you know, is the gatekeeper or you know, says this
is helpful or this is harmful, or this is whatever. Right, this is an appropriate response and this isn't a lot of that's just the code that sits on top of what you're talking about. That five hundred lines of code plus the data that we're getting.
That is just the model. And so.
You know AJ's talking about the GWEN model. It looks like it runs on Olama, Right, So you know you've got all those magic numbers, You've got the stuff that runs on top of it. I think Olama gives you a little bit more on top of that, and then from there, right, the rest of it's kind of up to whoever wrote the code.
Yeah, the central thing, I'm just trying to get like the black box part, the most mysterious part at the heart of it. Yeah, and obviously, like you know the calculations that say Lama and Chat, GPT and Gemini, they're larger models. GPT two came out in twenty nineteen, but the core thing is like a mo Like if you had shown me GPT two, I wouldn't have known. Like when it first came out, it's like, wow, that's a pretty amazing program. Must be really complex. And it's not
the program that's complex. It's they just bet on taking a somewhat simple architecture and just giving it lots and lots of data and spending more than anybody else had at the time, and just trust that the black box would be smart enough to learn everything from it. And at the heart of it, that's what's happening. You're just a large it's a giant calculator. In fact, it's so simple in a sense, like in a spreadsheet, which was my first implementation, you cannot do loops very easily. You
don't have looping constructs. It just does a calculation through the entire way, and it just does the computation of the all the different cells they're effectively in a sense, no loops inside of it. Like the reason I can implement it in a spreadsheet is that every single time it predicts a token, it does the exact same number of computations every single time, and it goes through, you know,
twelve layers and twelve attention steps and twelve layers. Like it's very very much like I got a number coming in, we a word coming in, we map that word to a number, and then I just do all this number crunching with a very predictable pattern, and then I get a number out and I interpret it, and then I
just repeat that process over and over again. And so you know, the thing that I try to tell people is just like when you look online and people like I want to get into AI and stuff, they're they're often presented with, okay, go learn you know all this linear algebra. You need to make sure you're solid on your calculus. You need to make like and it's there's like six to eighteen months of like prep before you
get to understanding how a large language model works. And that's valid if you're going to be a machine learning researcher, and machine learning is a huge giant field beyond just chat GPT right, there's an omaly detection, there's clustering, there's a lot of algorithms in there. But my goal is to just help people understand how these amazing, arguably Nobel Prize winning programs work in as short a time as possible, and to the extent, like I don't even begin where
normal machine learning class begins. A normal machine learning class starts with like regression and it slowly works me up and maybe you'll get to the LMS. And I'm like, this is a five hundred line program. Just start with here's how it starts. And anytime I run to something you don't understand, I'll give you the In my class, I give you the background to understand it, and then we move on to the next piece. And so it's
really designed to be as efficient as possible. And I think when you tell people it's five hundred lines, they're like, oh, yeah, okay, I can understand how that works. And this gets to knowing your tools. I'll make an analogy if you don't necessarily need to know how AI model works to use it, but you don't necessarily need to have a good model for how is the difference between the CPU or disc
memory versus bandwidth versus system memory. But if you're debugging, you know, a machine program, it helps to have that mental model. You'll run to an issue or maybe a more our tangible example to this audience is like knowing how react works on the inside. At some point, if you don't understand hydration, you're going to run into a wall, right, And the same thing is true, like you get these parameters from OLAMA, what are they doing? You know, you need to have a mental model for how it works.
And I don't think that mental model is as hard as a lot of people make it out to be.
Yeah, when I talk to people about doing AI, and I talked to a whole bunch of people like that are business people, and talk to a whole bunch of people that are programmers, and I have some of the same conversation basically down to, well, are you going to
build your own model, right? Are you going to take your own data and cram it in and expect it to give you answers on the other end, or are you going to use something that already exists like the chat GPTs or some of the you know, the GPT force or the OLAMAS or whatever, right, and then build
on top of it. Because once you're building on top of it and you're not worried about, Okay, how do I put this together, then it's essentially okay, Like you're saying, I understand how the machine works, and then I understand how to talk to it. Right, so I understand what the APIs are and the rest of it is, then okay, what do I want from this?
And how do I validate that I got it?
Yeah? Actually, that's a really important point. The number one skill may not be understanding every single detail of the calculation. The number one skill when dealing with an AI model is that last thing you talked about, how do I evaluate it? So the name that you hear in the AI community is evals, But as a you know web developer, you can think of these as tests. And the key difference between you know AI evals and tests is that
you don't expect one hundred percent pass right. These are statistical, probabilistic machines. But the number one, like when you read about benchmarks, you know, AJA you talked about benchmarks, You basically need to build the benchmarks for your particular problem. The benchmark may say some model is better than another, but when you actually use it for your problem, you
suddenly discover it's not good. And so the first thing you should do is come up with your own benchmark, your own evals for the problem, and then try a bunch of models and see which one works the best. And then you can start iterating whether that iterating is changing the prompt, whether it's changing the model or saying I'm going to go off and find tune my own model.
But you won't be able to make a judgment until you're able to look across the distribution of your task, all the different ways your task happens, and whether it's successful or not, because these are you're dealing with highly variable machines. One of you know, the folks who was in the audience for one of my past talks, hit a really good analogy. He's like, imagine a database that
was wrong five percent of the time. Like, as developers, we are not used to having levels of uncertainty like this within our systems, unless maybe you're using distributed systems where there's all sorts of race conditions and stuff like that. But we're used to sanitizing the user input and then once we get the user input. Everything is predictable after that. But here it's like suddenly we've got a database that sometimes it's wrong, and so that's where you need to
put all sorts of checks and guardrails. And you're dealing with this really smart but sometimes fallible thing like a human, I hate to say, anthropomorphizing it, and see how you build a system around that is going to be different
than how you build a regular system. But it all starts with that key idea that you just talked about, which is about being able to evaluate mathematically how well your your model or your system is doing, and the question about whether you should build your own model or not. The usual hierarchy of needs is first start with an off the shelf model. It could be open source, it
could be one of the providers. It's actually probably easiest to start with a hosted model and just see if you can get it to work because there'll be state of the art and you don't have to worry about all the stuff around hosting and inference and seeing if it works. And then next thing to try is try tuning it. Sorry, try try tuning your prompts. So try prompt engineering your way. Give it some examples try some variety of prompt engineering, and then maybe consider fine tuning it.
And again you can fine tune you know, most of the hosted models, you don't have to go to an open source model, but you could do that as well. And then the idea of building your own model is extremely hard. You know, the amount of dollars that go into building your own models from scratch is now, you know, over one hundred million. So the estimates for say Lama are were, I think over one hundred million to build that model, and so it's a lot of work and that's lovely best of the frontier labs.
Yeah, is that GPU cost or where is that number coming from?
That's a great question because these are all estimates. You know, we don't know for sure, but obviously some of it is the GPU cost, some of it is the infrastructure cost, some of it's the talent. The other key thing to keep in mind is when you're training a model, you
don't always know how it's going to turn out. What they actually do is they do a large series of smaller runs to establish some type of pattern or scaling law to figure out how they're going to design the model, which architecture seem to work better, which parameters matter more.
So there's something called a learning rate, for example, that they have to adjust, and they have a schedule for it, and they're trying to figure out against evals against the benchmarks we talked about, like which one seems to make the model smarter. And so there's a lot of trials and attempts. So it's not just necessarily one whole shot of training. It's a lot of experiments that they have
to do. A lot of how the model is going to behave is surprisingly empirical, and so they're doing experiments and they're trying that again, so there's a variety of things, and empower is non trivial. Another thing that's important to understand is the level of scale of data that these
frontier labs are dealing with. And there's a really good analogy from the anthropic guys actually, and one of the things I have to do is you have to randomize the day data so it doesn't learn arbitrary patterns and the order of the data you gave it. And so one of their research engineers gave this great example is like, okay, randomizing sounds like it should be easy, Like take a deck of cards. If I tell you to shuffle it,
it's fairly easy. But imagine I gave you like seven warehouses full of decks of cards and you need to shuffle them by hand. It's not quite clear how what policy or process you're going to use to make sure you hit all of them and you've evenly shuffled them. And the size of the data that these guys are using with is it's almost like that to the CPU, it's like seven warehouses of data to it like for you compared to manually you know, shuffle your your deck.
And so when you're dealing with data at this large infrastructure scale scale itself makes every little thing harder, and so that also adds some difficulty to this. So should I you know, walk through just like a little more detail of what's happening in that mathematical calculation or happy to answer additional questions?
Yeah, that's what I was going to ask.
Next is Yeah, because you've mentioned you've got different layers or different steps in the process you explain in the video. The video is a little longer, I guess than we have to go over at this point, but yeah, if you can give people kind of an overview of how the LM system actually works.
Yeah, So while you're doing that, if you distinguish between the different types of training, like the RAG versus the fine tuning versus the.
Base Yeah, okay, I don't think of RAG as training, but maybe we should step back and explain to the audience who isn't familiar with RAG what it is. So you can kind of think of RAG as like a sort of prompt engineering technique. So you want the model to answer questions about something that wasn't trained on. So imagine I'm you know, I'm a smart home electronics company, and all of the documentation about my product was behind you know, some firewall or behind a log in, and
so I know that, let's call it chatchy. He was never trained on it. But I want to build a chatpot where customers come and say, hey, I can't configure this setting on it. How am I going to get a chatbot to do that without having to retrain it specifically on my data. So what you can do is when a request comes in and somebody's like, well, how do I change the color on my smart light bulb,
It'll go and it will search through my data. I can take that request from my user on my chatbot, and they say, I take I see the words light bulb, I see change color, and I'll search all my documentation and I'm not going to search it just a plain tech search. I'll use it called a semantic search. So it'll find things that are similar to the word light, like the word bright, even though it's not anywhere close
to the same character. So it'll find all the similar passages and it will pull those out, and then it will give the model, here are relevant passages. Here's the user's question, how do I change the color on my smart light bulb? And then it will give it paragraphs chunks of text from my documentation, and it will put those at the beginning of the prompt. So you've got a prompt that structure at the start with the user's question. Then it's got some chunks of data that came right
for my documentation. And then we tell the model, you know, come up with an answer to the user's question using these chunks of data I gave you, and it will be able to think over those passages and find the ones that are relevant and then give the answer out.
And that's called retrieval augmented generation. So retrieval because you're taking the user's query, you're pulling a data that wasn't the model didn't have during training, and you're passing it into the prompt and then asking the model to answer it. And so it's a very low friction way to take a model off the shelf and make it understand all your stuff even though it wasn't in the training data. So that's that's uh, that's that's what RAG is.
The version is is you're building context out of a database that you already.
Have, great, great summary, thank you. Uh so, yeah, you're giving it the context it didn't have during training to answer the question on training. So there's a variety of steps in the model where it's trained. Mainly, well, let me think of the best way to explain this. So I'll discuss training when I get to the call it the fourth step of the model, and I'll talk about how they gets trained in a second. But let me walk through the five steps of what happens when you
input text into the model. So the first thing you do when you input a passage so that one I like to uses Mike as quick he moves. And then the completion we leave it to the model filling the blank quickly. The first thing it's gonna do is going to break the text into subword units. So you might think it would break it into characters, and you might
think it might break it into words. So break into characters would be like ASKI, and breaking into words would be just giving every word it's entry in a dictionary. It turns out you can't handle if you break it into words. You can't handle unknown words very well, and
you can't handle spellings you weren't planning on. So especially if you're going across multiple languages, and if you break it into characters, it turns out it's really hard and a lot of compute for the model to learn purely from characters, although some research has been able to do it. So they do is they can do a Goldilocks and they say, okay, let's break it into these little pieces of words, and if you think about it as a human, you actually do this. So one of the examples, I
use the word flavor eye. It's actually not a word in the dictionary, but you know what it means because you know what flavor means, and you know what ize is a suffix means, and so the model kind of does that. Now. I want to be clear, the tokens that comes up with, as they're called these subword pieces. Word pieces don't map to any human sense of the meaning. There are some that, like ice turns out to be a token, but it's by coincidence or correlation, not like
it's trying to understand human English at this stage. So it breaks it into these Yeah.
Gosh, can I just say that in a different way too effectively? What it does is it breaks it up into pieces that have meaning, right, because when we're looking for the output, we're looking for out it has meaning, and we group words or ideas together that give it meaning. And so it's doing the same thing. It's breaking it up, right, Like you said, flavor has a meaning, eyes has a meaning.
You know, the other words in there have meaning. And so that that's the approach that it kind of takes when it's breaking it into tokens of kind.
Of sort of kind of. It's a decent mental model. But the reason I stress that it's not trying to match human meaning is because it's actually not trying at this stage of the model. It's not trying to assign meaning. In fact, what it's really trying to do is take all the text on the Internet that it's trying to train on and compress it to the most efficient representation so that the training can be as efficient as possible.
And that's why the tokens don't always map to what you'd expect and why So this is this.
Is different than what a full text search database would do because a full text search database, like the example you gave flavor, Yeah, full text search database is going to break it up that same way. But this is different than the way a full text search database would break it up.
Yes, it is different, and it's very dependent on the data it was trained on. And so a good example is I use the word re injury. Right as a human, you would think it was it was re an injury, but if you actually put it through the GPT two tokenizer, it puts it as rain injury. And the reason it decided to do that is simply because of the greater you know, occurrence of the word jury on itself by itself than injury, and so that decided that was the
more efficient representation. And I want to be clear, this isn't about representing your prompt efficiently. This is about representing all the training it's going to do on the text efficiently, the stuff you don't see, the stuff that you know you're talking about nobody releases. That's what it's really based on. And it's really a compression of all the text. So it just got a really efficient So think about it this.
If it's going to compress all the text, then you know, if it gets down to say ten thousand or fifty thousand tokens, then it only has to learn ten thousand or fifty thousand concepts in a sense, Although that's a gross oversimplification, but that's what it's trying to do, is trying to reduce the number of things it needs to learn on essentially the number of combinations and variations.
Hey, a couple of quick questions with that flavorized example. Here's hoping they don't pick up flavor flav the rapper's lyrics right and throw that in there. That could get really confusing. But then when you're talking about like the re injury for example, Yeah, and how I as soon as you saw that said that, I was thinking, Okay, I can see where that's going. How you get rained just throwing stuff like and this might be getting into the weeds too much. But just throwing stuff like hyphens
into words make a difference. So if you were to do redash injury, would it see that and maybe just categorize the re as separate from injury? Does that help or is that a non issue? Does it sort of filter that stuff out and just focus on the letters?
And that's a great question. It's actually implementation dependent on the tokenizer. In practice, you usually separate. You create boundaries, hard boundaries between tokens or words, so one of them is the space character. In most of these, the hyphen is considered also a boundary, and so it would see it separately. The important thing to understand, though about the tokenizer is that the model doesn't see words the same way you do. So a great example of this is
how many letters are in the word strawberry. It does not see the word as S T R A, W B E R R y. In fact, when you read it, you really don't either. When you read words, you typically aren't paying attention to every single character. When you have to count the words in strawberry, you kind of have to change your mental state and think, oh wait, let me think what are the characters? And you walk through it. It just sees it as it might see it as the token strawberry. It might see it as the token
straw and berry. But the key thing is it has no idea. It doesn't the ability to see the letters. In fact, if you capitalize, the tokenization is case sensitive, so if you change the capitalization, it looks like a different it. So to it, the word strawberry with like a space in front of it is different than the strawberry without a space. Strawberry with the capital in front of it is different than strawberry with a capital. You know, if you put quotes around it, it's a different word.
So it doesn't see text the same way you do. Another great example of this is numbers. So they've fixed this in most modern tokenizers, but the early ones would just take examples of numbers and those would be a whole take token. So two fifty six right power of two fairly common gets a token. But it sees that as a single token. It sees that as a single thing. It doesn't even see it as the numbers two, five
and six. It doesn't break it apart. And so that's why it was really hard for these guys or these guys, these things to it was part of the contributing reason why it was hard to do math, it's not the sole reason. So the key lesson, you know, on the tokenizer before we leave it, and the algorithm that's commonly used something called byte pair and coding, and in my classes is something we walk through. In fact, we do it by hand so you can understand, and we talk
through the training process. But the key thing to understand is that the model doesn't always see text the same way you do. So that's the first step that's tokenizer. Then the next thing is we we map each of these tokens, but you can think of them as words into a list of numbers. So I talked to earlier like we map each word to a number, but it's really we map it to a large list of numbers.
And this is called an embedding. And the way to think about this is where we're taking all the words or in this case, tokens technically, and we're putting them on a map. But instead of like a two dimensional map, this is many, many dimensions. So in the case of GPT two, it's seven hundred and sixty eight I think LAMA four or five B it's like sixteen thousand list
of numbers per every single word. In fact, like in the sentence phrase Mike is quick, period he moves, the period itself gets seven hundred and sixty eight numbers to represent it. And you can think about like on a map, you have like you know, coordinates. This is just a very very multidimensional list of coordinates. And a good embedding puts words that are related to each other closer to each other. So in my class, they use the words
like happy, sad, joyful, glad, dog, cat, rabbit. The first set of those are emotions happy, sad, joyful, right, and would expect happy and joyful to be close to each other, same with glad, and then dogcat and rabbit are totally unrelated, so would expect them to be further apart on the map. And the word sad is an emotion, but it's not quite the same emotion as being happy, so it'd be somewhere in between. And if you actually visualize this, you
see that this happens. It's actually putting words closer together. And you might hear this paper or it's a series of papers or algorithms called word to vec which pioneered this model, And if you go to projector dot TensorFlow dot org, you can actually see a three D map of various words, and you click on it and it will show you the words that are close to it,
and they all tend to be related words. So the first thing we start with is, you know, the next step after we break the text into tokens, as we map each of those tokens to a position on a map where close words are related to each other. Let me pause and see if that made any sense. I'm usually doing this all visually, so over pair audio. It's it's a bit of a challenge, but yeah, good.
So my question is is, since it's predicting the next word, I would imagine that, yeah, some of the words that appear close to it are going to be words that mean kind of the same thing or you know, have a related meaning. But does it also group words together that commonly appear together or is that a different Does it not wait things that way at all?
Uh, it's actually kind of doing both. The way it's grouping related words together, it doesn't actually group words together. It's grouping words together that have the same meaning based on the idea that they appear in the same places. Oh, so the word ice and cold commonly occur together, probably in text on the internet, right, like the I put ice in the drink to make it cold, you're as
cold as ice? Right, Those would be common phrases you usually don't see, you know, like steam and cold together. And so the model is able to understand that ice is colder than steam because it sees cold closer to ice more often than seems cold close to steam. It's the relative occurrence of how often. And there's a phrase that's often used by JR. Firth. It's called you will know a word by the company it keeps, which is the idea you don't really know what a word means.
You could look it up in the dictionary, but you really understand it through how it's used by multiple people, and that you can look at, you know, the distribution of how it's used to really understand what it means. So good example is the word bad right, although it's less in fashioned, bad at one time meant good, right, And so how do you really understand whether it means good in one context versus another? You learn that through
all the various contexts that it is used. And if you want to understand, you know how word really is used. You see it in usage many, many, men times. So if you want a model to understand how what a word means, you just see it used in many, many, many, many sentences and eventually pick up on those differences.
So then the word baby could be seen as cold because of vanilla ice, then right, ice ice baby.
If you train it, I'd be really interested to see a model trained only on song lyrics. That would be that.
Would be fascinating.
Yeah, yeah, it's interesting because the way you're talking about it. We were driving home from my mom's house the other night and my wife put on an audiobook that she's been listening to with my nine year old and they used the word satisfaction and my daughter asked, what does satisfaction mean? And we basically did that it's kind of like this, and it's kind of like that, right, It's it's in this area of meaning, right, yeah, and it's related to these other words.
Right.
We used other words to explain it, and then yeah, we did we context, so you could use it like this, or you could use it like this, and it's you know, another form of the word is satisfy or satisfied, and so you know, this is what it means to satisfy something and you know, more context and more sentences and okay, I understand, right, And I think she may have even said it's so it's kind of like this and kind of like that, yep, using examples that we didn't use, and.
You told her that Nick couldn't get any right, that's right.
But yeah, that's that's basically what the model is going through. Is it's like, uh, you know, basically saying seeing all these examples and it's like, oh, it's kind of like this, but it's like like in some context, I see it being used in this other way, and so it's it's basically putting that all together. And then it's putting all the words on this map and it's saying, okay, you know, the ones that are related are here, and the ones
and it's this multidimensional map. It's you know, hundreds to thousands of dimensions long. And that's that's the embedding step. Is we've basically mapped them to you know, I say a numb but it's really a point in space, right, It's basically taken. So there we're at the second step.
We base first step was we took the passage and we broke it into tokens, which you can think of as like words, but smaller, and then we took each of those tokens and we put them into a point on a map and a point in space, and we just we know that point is going to be close to other things that are related to it. So that's the second step, and then the third step is called attention. I'm going to skip over that in a second, and I'm going to go to the fourth step, which is
the neural network or the multi layer perceptron. And this gets to the training question. The key thing that it's really great about neural networks is that thing I talked about earlier, which is you don't have to give them the rules. You just give them the answers, and they figured out the rules. So we basically feed it in these points in space from our prompt, and then we can take a pasth message on the internet, like maybe the passage on the internet is Mike is quick, he
moves quickly. We remove the word quickly, and then we give the model the phrase Mike is quick he moves and then we ask it to make a prediction, and it's going to get it wrong because it hasn't done any training at all, and when it gets it wrong, maybe it says, you know, Mike's quick he moves bicycle and you're like, no, that's wrong. The right answer is quickly. It mathematically learns how to change itself to get closer
to that answer. So we go through a lot of iterations of getting lots of passages where we take off the last word and then we give it. We ask it to predict it, and if it's good, we say okay, great, you're fine, and if it's wrong, we say, okay, you're off by this amount. It's kind of like when you throw darts out of board. Right, if you're far from the bullseye, you'll move a lot more closer to correct your position, But if you're close but slightly off, you're
going to move slightly subtly. So that's what it does. It changes the model parameters, the numbers inside the model slightly if it's close, or a lot if it's far away. Does us you know, trillions of times over lots of different pieces of data. And the key thing about the neural network is it can learn to imitate basically from answers and data, and so we basically give it the known passage that we gave like Mike's queaky moves, and
we knew quickly was the right answer. And then we basically asked the neural network to make a prediction from these points in space. And so that's the that's the basic version of what's happening inside the train. Let me pause because I jump to the fourth Layer'll come back to the third one in a second. But does that let me see if there are any questions there on
what's happening inside the neural network. Okay, so yeah, okay, So if we're good there, then the next thing that happens is I'll jump back to the third Like we could give it a point in space and say, hey, guess what the next word is. But the best thing to do is to not just give it a single point in space. It's to give it all the points that can before, so all the words that came before.
So in the case Mike is quick, he moves. Knowing that you know we're talking about movement helps it know that the word quick here is moving around in a physical space versus the quick of your fingernail. And so we give it the hints of all the words that came before it. So this is what's called attention, where we say, okay, don't just predict from one single word
you're looking at. This gets back to what we talked about at the beginning, like instead of looking at, you know, statistically, what's the next word after the given word, let me look two words back, let me look three words back, let me look forwards back. It will look at all the words that came before it and try to figure out what is the next predicted We're giving these hints from the entire passage to make its prediction, and that's what's called attention. And so that's the third step in
the middle. And then the last step is we do this, you know, we get a prediction out of the neural network. So jumping back to the fourth step, which was the neural network that makes the prediction, and it gives a number, and that number we need is a long list of numbers, and we need to map that back to one of
our tokens, one of our words. But it's sitting in a point in space that it may land right on the word quickly, but more than likely it's going to land somewhere close to the word quickly, like the predicted token that comes up with it, that the model comes up with, and so it will interpret that point in space that it gave us back to those embeddings in
that map. It picked us some at the end of the number crunching, took all the words and the points in space we gave it and said the predicted word is right here in this point in space, and it says, okay. We then interpret that, and we look, what are the words or tokens that are close to that predicted point in space. And it's probably going to be closer to the word fast, closer to the word around. Like Mike moves quickly, he moves around, he moves fast, he moves speedily.
Those words are going to be close to it, and so we give them a higher probability and we run a random number generator and we say, okay, let me pick one according to this probability distribution, and that's how we get the predicted word out. And that last step of running that random number generator and looking at what words are closer it is the piece that's called the language head. So the key thing about the language head is that is where most of the uncertainty or unpredictability
of your model comes from. So if we decide not to run the random number generator and we just always pick the word that is closest in space to the prediction, that's what's called temperature zero, and it will always be consistent. It will always be predictable for the most part. There's some other very small orders of randomness in the process, but for the most part, it'll be very consistent, and
that's that's called temperature zero. So most of the randomness inside the model is entirely in some sense imposed by us. We decided, oh, we're not just going to always take the thing that's closest. We're gonna probabilistically take some of the other ones that are also close, and we can control those parameters and control how we do that probability.
So if you're in o LLAMA or you know an API, you'll see things like top P and top K or temperature and these are tools we are given, you know, the API user of a model on how they can shape the probability distribution of the model. And that's probably the most important to understand of the components in the model. After you understand what tokens are and embeddings. The next one is probably the language head because that's where the randomness comes from. So let me pause. I know I
just talked for quite a while. See if there are any questions.
I think, so far, so good.
Okay, So what the Excel spreadsheet does or the website I have that's built in you know, web components in pure JavaScript is it runs through the entire process using the very same weights that open Ay released for a model called GPT two GPT two small, and it steps through every single one of those processes and it takes you You enter a prompt and then what it does. It doesn't it's not like chat GPT where you can have a conversation with it. It just predicts the next word,
but it walks you through every single step. That's the
same thing I do inside the class. But that's basically you know, how your model works under the hood is it's basically taking your words, your input prompt, breaking it into units that are called tokens that are slightly smaller than a word, that it maps it to points in space, does a bunch of number crunching on it through the things I talked about, using a neural network and this other attention that looks at all the other words, and then it takes that prediction and it says, Okay, what
words is it close to in our points in space, and then let me pick one that's relatively close to that. So I know one of the things, Chuck, you would want to talk about is building it and the use of web components in the web version.
Yeah, at this point, given our time constraints, Yeah, we might have you come back and do that, because I think it'd be interesting to dive into the project and how it went together.
Okay, I will. I will just say the reason I built it in web components was to make it as portable and easy to use and as easy to step through. I wanted to make it as accessible. I did think about, like say, using React, but then you need to know React, and I really wanted this to be as approachable for somebody who knows just vanilla JavaScript and web components was the easiest.
Way to do that.
So that's the main reason why I did it that way. Yeah, So is it open source then or I actually haven't put a license on it. And what I've said is if people feel like I help me decide, like, tell me which licenses you prefer, but the code is right there for people to look at. I mean you can practically step through it, and it's written so that people can understand it. I just haven't figured out what license, but you know, let me know, and I'm all ears
all right. The goal is to make its a teaching tool.
Yeah, all right? Cool?
Well, yeah, like I said, we're kind of at the end of our time and so I'm I'm going to push this into picks. But you want to just give out information on your course again, let people know what that coupon code was.
I just if people are digging this as much as I.
Am, I think they may want to just go pick up the course and go, okay, we can go into more depth.
Uh yeah, thank you. So the website for the project is called Spreadsheets Are All You Need with hyphens in between. So spreadsheets hyphen are hyphen All hyphen you hyphenneed dot ai. It is a very long domain name, and that will link to where you can download the excel file. You can try this out in the browser yourself, and then there's a link to my class that I teach on the Maven platform. It basically has five lectures over two weeks and we walk through this for anybody who understands
spreadsheets or vanilla JavaScript. And I have a promo code js jabber, so just use the promo code during checkout and you get twenty percent off the courses taught live but also is available on demand. So my last cohort just wrapped up earlier this month. But if you sign up, you'll get to watch all the recordings. I'll answer all the questions over email you have. You'll be in the same private discord as the same as the rest of
the cohort. And if for some reason you're watching it on demand and you'd say I'd rather have the live version, I offer that if you want to attend a future live version, you can do that for free, even if you signed up for the on demand, so you know, feel free to check it out. It's on maven. It's got some long URL, but if you go to spreadsheets at all, you need dot ai you can check it out and then to find me. I'm on Twitter I A N A N D so my first initial with
my last name, and of course on LinkedIn. I'm also on blue Sky as well. If people want to reach me, happy to.
Answer questions awesome. Well, yeah, I definitely want to dig into it. I'm probably gonna go watch your video on YouTube a few more times, just you know, getting all those little pieces in my head. So that yeah, I think you said at the beginning, the model that matters, matters most is your mental model. Yes, and so yeah, just knowing how to think about Okay, I'm dropping this in right. This is how it goes through the Plinko machine to give me the output on the other end.
That that's the thing that really helps me out.
So yeah, that's a great analogy. And one thing that maybe worth highlighting is I've had some feedback that people saw, Oh you're using spreadsheets or using JavaScript. The real models are in Python, and then you're using GPT two, which is an older model. What I teach in the class are essentially the timeless technical fundamentals of how these models work. And it's worth remembering that all the major models you've heard of, you know, Claude Chat, GPT, Gemini, they all
are inheriting from GPT two. So if you understand GPT two, you understand eighty percent. You're eighty percent of the way to understanding any of the model or Lama model architectures. So it's it's not like toy. It is essentially as very close to how the real models work. It's a really good stepping stone to getting that really sharp mental model of what's happening under the hood.
Yeah, that's true of most technologies, right, I mean, if you were using I don't know, let's pick one my SQL ten years ago, probably sixty seventy percent of the stuff is fundamentally the same.
The engine works mostly the same.
They've probably optimized some pieces, they've probably add some features, but for the most part, if you understand understood what it was doing back, then you get it now. And to be honest, the other thing is is you'll also see as we get more variations on things, you also have a decent understanding of how to use something like SQL light or post PRESSQL as well.
So yeah, I really like that Algiae. It's a good one. I M a bar of that, thank you.
Yeah, no problem. All right, Well let's go and do our picks. AJ Do you want to start us with picks?
Sure?
Okay, so Civilization. I've still been playing that, not as much as I was the other week, but enjoying it. It turns out you can run it on the Mac if you turn the you have to go into settings and turn its performance mode completely off to all the way down. I think it's something to do with multi threading, was why it crashes, Like if he uses more than one core, it just crashes every five minutes or something. Anyway, so there's that civilization still going strong. But wanted to correct.
You can get it running on the Mac. It just won't run on the Mac with the default settings, and it's not abundantly clear why. But anyway, other thing was with the announcement of the switch to I just got angry because I still can't play Tiers of the Kingdom or Super Mario RPG or Spiro or you know, any game that's basically released in the last three years without
massive stuttering. And you know with Tiers of the Kingdom, you know how they have it go into bullet time whenever it gets overloaded instead of getting choppy, although it still does that, it does dynamic resolution and bullet times, so your swings will slow down and stuff, which you know, whatever. So I decided to mod my switches and I did the hardware mod of the switch and it was super easy.
Now I've done mods in the past, So saying that it was super easy, No, if you're not familiar with soldering, if you haven't you know, fixed a phone or or you know, done something with that before, no it's not. It's not super easy. Getting all the screws out, getting to the actual part that you're getting the heat sync off, that's super easy. Anybody that has a precision precision toolkit for like phone repair, game repair or whatever, can can
get get in there and do that. I actually couldn't see the soldering that I was doing because the components are so small. Now I've learned some tricks because I've practiced a little bit of micro soldering in the past and failed at repairing a three DS, but the pieces are so small that I literally can't see that. I mean I can see them, but I can't see them. I mean like I can see them as like I can see a grain of sand, but I cannot see
them in terms of like actually accurately predicted. So what I did was I used my phone to zoom in, take a picture of it, see that I had bridged two capacitors, and then just kind of blindly, you know, move the soldering iron next to them. And kind of sweeped it the same way that I would when i'm, you know, soldering on a bigger component. Then use the phone again zoomed in, and so I was able. I was able to get the peace on there. And I should have had some sort of magnifying glass set up,
but whatever, so I was. I was actually able to do it blind in a way like I could. I could see my tip was there, but I couldn't actually see what, you know, what was because I mean the things they're they're smaller than a grain of salt. They're they're small anyway, the capacitors they're they're like wicked small. But even with that, I was able to do it. And that's my my So. So modding the switches is one is one pick there. There's the Pico Fly mod kit.
You can do it if you've done if you've done other soldering, if you haven't done micro soldering, buy a couple of practice kits from eBay or or Ali Express or or something and you can get there. But my third pick is the soldering station that I used is actually a custom made soldering station. So a few years ago, these Chinese companies came out with the T twelve tips, or they cloned the T twelve tips. They turn out
to be really, really good. One reason is that they double over as a temperature sensor because they have two different types of metals, and anytime you have two different types of metals, you have a thermal couple. So they have an inner metal and an outer metal, so they double over as their own temperature sensor. And so people created software for these and put them on micro controllers.
In the software, and being literal when I say this, it rivals three thousand dollars professional workstations because the way that it switches back and forth between monitoring the temperature of the tip. And so I'll put a link and you can get these. You can get clones on all the express but I prefer the original one that's made by this guy in Australia because I know it comes with the right firmware on it, and the firmware is really where the magic happens. Any idiot, can you know?
Three D print some leads onto a rigid battery or a Milwaukee battery and connect it to a T twelve tip. But the real smarts of it is in the firmware where it manages the heat to make sure it gets up to temperature quickly that it actually does the sensing thing where if you shake it, it turns back on and heats up anytime. Anytime the temperature is cooling down rapidly, it sends more power. So anyway, it's just a really
great soldering iron. And because I have that, like I've got a and I've got a couple of cheap ones, but that one it cost a hundred bucks, and I'm considering trying out one of the knockoffs. It's only like thirty five because now everybody, even Craftsman, is selling one
of these at Low's now. But I don't know if the Craftsman one is just like the cheap idiot kind where it's just connecting the leads together, or if it's actually got a I have a hard time believing that they would have gotten an illegal copy of the firmware, whereas the Chinese companies on expression you know that anyway saw that. Yeah, so I had a good time soldering because of the rigid powered which you give them Mount Milwaukee or Iobi or whatever battery brand you like, soldering iron.
They're super fast. There's so much better than a weller or a Hako or all the traditional ones that cost hundreds of dollars. So anyway, and of course I'll pick a LLAMA because I really do enjoy running my own local lams. Actually, since the thirty two billion parameter model of Quinn two point five coder has come out, that one I just find to be the best of the best. It rivals GPT four oh if not is better than four to oh, and you can run it on an Apple silken back boom, all the things.
Okay, I have a question, what are you modeling your switch to do?
Oh?
Sorry, I skipped over that part over well, not overclock to the native CPU freak speed of the tegra x one, because the switch is an Android tablet running a custom operating system rather than Android is two point two gigahertz. That's the native clock speed. The clock speed that it runs at is something like a thousand or seven hundred, depending on whether it's doctor handheld. Same thing with the GPU.
The native GPU speed is like fifty one point five gigahertz or something like that, but they clock it down to five hundred or seven hundred.
Gotcha.
So when you mod it, you can then and this you can do without getting banned or at least this is what people are reporting, and I'm I'm doing this so you know, if you mod it and you want to run pirated games or something, you have to set up more stuff and make sure that you don't get banned, although a lot of that stuff is built in now.
But if all you want to do is overclock it, the overclock system runs in a layer that's kind of protected from the main switch system, so the main switch system can't detect that it's rooted while it's running, So you can you if you just install hakat what does it?
Hikata Atmosphere and cisclock. If those are the only things you install, then you should be able to run your switch modded on the original firmware, be able to play online, et cetera, without any risk of banning or anything, because it's the where it's not modifying the switch operating system or the game. It's just modifying the CPU clock.
Gotcha cool?
So now, so now my friend asked me, well, can
you notice any difference? And my answer is no. And the reason my answer is no is because when you're not playing, when you're playing it underclocked, you notice the stuttering all the time, and like you notice the resolution changing, Like you know, you turn and there's a bad guy and you shoot him, and then the resolution Like, but when you're playing it closer to native speeds, you can't get it all the way up to native speeds because the power delivery on the board isn't actually capable of
playing it at native speeds without also draining the battery at the same time. But when you're playing it at near native speeds, you don't notice it because you like, the things that are annoying aren't there. The resolution's not changing,
it's not stuttering, it's not going into bullet time. As much I haven't, I did not notice it at all going into bullet time since I've been playing it at near native speeds, and I did some things where I was blowing up rocks and things that I thought would normally make it go to bullet time, and it didn't. So like five star stories, so far.
Cool, very cool, All right, Steve, what are your picks?
All right?
Kind for my twenty minute picks. So before I get in picks, one note, I will make it sort of circles back to what I asked at the beginning. You know, as someone who has spent a lot of time doing search indexing, you know, with lucine type search indexes. A lot of this sounds really familiar. And to me, I've always said that the I and AI is a misnomer.
I think it's it's not necessarily intelligent. It's just basically better using of training and fancy or using of existing data to answer things, not necessarily intelligence that and create new things. That's just my two cents for what it's worth. Interesting, pick Eean, you mentioned this earlier today and as of today that you know, this will come out a little later, but deep seek is like disrupting in a huge way.
You know.
For instance, if you go look on Hacker News, both on the top page and on the new page, there are multiple articles from NPR from CNBC about what it's doing to the stock market, and the gist is basically that they've been able to create these fantastic models with
much less investment, with less powerful chips. And there's a whole story behind this, and so that's wreaking have at least in the stock market and with people like Nvidia, just because of supposedly how how much cheaper and more efficient deep seek is compared to some of the other models.
So today's I've the first day and know it remains to be seen how accurate this is, especially coming from the Chinese, but sort of a disruptive thing going on this morning, at least as of the time of recording.
Yeah, do you mind if I jump in there a little bit. Deep Seek is an utterly fascinating story and model. I'll say one thing is that the training cost might have been apples to oranges. Like they stated what the cost was for the best run or the final run. There's a lot of other costs that go into I talked earlier when AJ was asking what goes into it. One of the key things I was thinking actually about Deep Seak is like they're going to do a lot of other experiments and runs. There's a lot of stuff
that gets built upon. So I think some people are comparing apples to oranges, but it is, you know, a impressive model in a lot of ways. The other thing that I find the most fascinating about it is the training process is remarkably simple. And you know, I'm trying to think of an analogy. It's like, normally when they do this part of the training process called reinforcement learning.
It's a lot more complex, and it's kind of like, you know, you think about a car and you're like, well, if you want to go from point A to point B, you need an internal combustion engine. It's basically doing you know, having a little fire and you got these pistons and the cylinders, really complex piece of mechanics. And then somebody's like the electric car, and so you know that little toy motor you had, Well, let's just scale that thing up.
And so they tried this really simple, relatively simple technique and just scaled it up and it worked. And I think different people are reacting to this model differently. Some people it's about the price, other people it's about the training setup, and it's how did we miss this? It's just remarkably simple. So it's definitely a worthy to bring up as it's just a really interesting model.
O Coom's razor strikes again.
Right.
Well, there's something they call in AI the bitter lesson, which is stop trying to put into the model how you think you think. Instead just give really general compute and just throw more and more data and more and more compute at the problem, and the model will figure it out, so don't try to be too smart about it. And people are like, this was a bitter lesson all over again. It's like, oh, we thought, you know, we had to do this really complex, you know, reinforcement learning setup,
and these guys showed, well, maybe you don't now. In the end, their production model actually still had some complex training pipeline. But one of the interesting results is this model called zero, where it kind of like, you know, Alpha zero learned how to become a really good go player just by playing against itself. In this case, it
wasn't the model playing against itself. It was just trying out ideas and they just told it whether it was right or wrong, and it started automatically emergently figuring out how to improve its thinking. And it starts getting these eureka moments where it suddenly realizes it can backtrack and it's like, oh, wait, I made a mistake, and it's you're watching the model like we didn't train it to do this, and it suddenly figures out how to like
get smarter. So it's really really fascinating. I could we could talk another hour on the model, but yeah, lots of people are pouring over it. It's It's fascinating in many dimensions.
Cool.
And then, last, but not least certainly, the high point of any episode is the dad jokes of the week. So what did one pie say to the other pie before being put in the oven? You know this is a musical answer. All we are is crusted in the tin. For anybody that knows Kansas, here's an Australian version. My mate was bitten by a snake, so I told him an amusing story. If I know the difference between anecdote
and antidote, he'd still be alive. And finally, when I was in school, my teachers told me I would never amount to anything because I procrastinated too much. I told him, you just wait, those are the dad jokes of the week.
All right, well, I'm going to jump in here and save us from the high point of the episode. I've got a couple of picks. I always do a board game pick. So the game I'm gonna pick I learned
this game last week is called Cascadero. I'm gonna put links in for both board game Geek, which kind of gives you information about the board game, and then an Amazon affiliate link, just because then you know where to go buy it if you want it anyway, So Cascadero, the premise of the game is that the kingdom's breaking up, and so you all play a different faction, I guess, and you're trying to connect towns and send your people
through the towns to pull the kingdom back together. And so you put your little guys out there, and then you score points based on whether you're the first person of the town or the second person of the town. If you have a group, you have to have a group of your little horsemen.
I can't remember what they call them, heralds. No, the heralds were the.
Other things anyway, So if you connect to a town with a herald, then you get an extra point for connecting to it, and then there are bonuses that you get. So if because when you connect, when you get the points, you actually move a marker up the technology or progress track in whatever color you connected to. And so I
guess they're not points, they're just movements. But anyway, so once you move past certain points, you get certain rewards, and I mean effectively, what you're trying to do is you're trying to score the most points, and you also want to get to the end of the track and whatever color you're playing. So if you're playing pink, you want your pink marker to get all the way to the end. And yeah, like I said, you get bonus points for getting all five of your markers past the first space.
That's marked for that.
You get more bonus points if you get three past the second spacing, and then if you're the first one to get one all the way to the end, then you get bonus points and only one person can get those, and then the other ones are if you connect two cities of the same color and there are five colors, you get bonus points for each color, and if you get all five colors, then you get ten bonus points. And so you're just moving your marker around a track when you get the points. As soon as somebody gets
fifty points, the game ends. And so essentially, if you want to win, you want to be the first person to get your marker all the way to the end of the track of your color and then be the
person that gets that fiftieth point. We played it, and it was our first time any of us playing it, and so nobody crossed that fiftieth point before we all ran out of Little Horseman, and so when somebody runs out, everybody gets one more turn, or if somebody crosses that fifty point marker, everybody else gets one more turn, and then the game's over. It's reasonably simple. The scoring is a little bit complicated as far.
As like moving.
You know, when you get moves and how many moves and things like that. So board game geek waits at at two point five to three, right, So it's a little more complicated than kind of your casual gamer who's just going to play gravel with their friends. But my feeling is is that it was only just getting used to when I put my horseman down what happens. And as soon as you get used to I put my horsemen down, I can move something up the track so many spaces, and then how to get the rewards. Once
you figure that out, it's a relatively simple game. We played it in what an hour, maybe a little longer, I think once. If we'd known what we were doing, we could probably play it in forty five minutes. There were three of us playing it, so anyway, Coscadero.
Fun, fun game. I liked it.
I want to play it again now that we I know how to play it and my friends and how to play it, because there were a couple of.
Things I would have done differently as I gotten into it.
As far as other picks, go.
Go to jsgeniuses dot com and sign up.
We're gonna start doing the meetups and I'm gonna start posting videos. The videos I'm posting the I'm kind of building an entire app. I don't know if I'm going to show you writing all the code because some of the stuff gets repetitive. Oh, I have to connect another data model to this database. Right, It's like, Okay, you don't need to see that eighteen times, but you know, we'll get kind of the major pieces in and then anything bonus extra that I do the app I'm gonna build.
I decided I need to learn next JS. So it's going to be an NEXTJS app and I'm going to be putting it on cloud Flare workers.
And the reason is.
Is because just to give you an idea of what the app is, it's relatively simple.
But last year when we.
Ran Caucus Night for the Utah Republican Party, we had an online registration and we got d doast because there were people out there who didn't like us, and it's internal politics to Utah. It wasn't the Democrats, it with somebody else, but anyway, So because of that, I'm looking to, you know, put it on a system where I know it'll just kind of expand to whatever comes at it. Cloud Flare is also usually pretty good at you hit
me eighteen times. Now, I'm just going to say drop it and drop you unless you can prove your human and so I feel like I can get some of those benefits. But I'm also curious to see how cloud flair workers work. So it's going to be basically a registration.
There's going to be a little bit of site automation because the Utah State voter registration database where you verify that your voter registration doesn't have an API, which means that I have to go and have my program use something like puppeteer to fill in the fields and then scrape data off the response to make sure that you're registered to go to caucus night. I'm thinking I may also offer this same kind of thing to the Democrats and anybody else who wants to run.
A caucus night that night.
I think Libertarians and Utah do it too, right, so that they can just hey, you've got online registration and then you've got an app that will verify it on the other end.
So anyway, that's what I'm looking at.
So there may also be React native app or something like that that on the other end, you know, people can show up with a QR code that says I registered and this is who I am, and people can just verify that way instead of having to look them up in a paper list or something like that.
So that's what to be building. You jass geniuses.
You get access to the videos, you get access to the weekly meetups, and a bunch of other stuff. I'm also looking at starting a new podcast on doing AI with JavaScript, and it's gonna be at this level, right
We're not building our own models. We're gonna be using the existing models that are out there, the open source models, if you will, and showing how to build things on top of those, or using some of the cloud services that generate images, or you know, using something like Whisper for transcriptions and things like that.
So anyway, keep an eye out for that. That'll be free.
I'll probably drop the first two or three weeks worth of episodes onto this RSSV and then From there.
You'll be able to just subscribe to the other feed.
So that's what I'm I've got going. Yeah, those are my picks, e Shan, what are your picks?
So I've got two picks. The first one both are going to be a I related. The first one is notebook LM, but everyone knows about notebook LM with like the fake podcasters. My pick is not book LM without that feature. I think that feature is great and really compelling. Lets me consume you know, material on the go in podcast form. But I like the other parts of noepook LM, which is like it's a great way to stick a variety of sources together and then ask questions about it.
So one example is I like to go to y Combinator, hacker news to see what the comments are, but I don't read through every single one. So I will stick it in there and say, well, what are the most insightful comments? What are people saying? I did this actually with deep Seek, I say what are the comments people are saying about deep Seek? What are they seeing for performance? What are the issues where it's not working? And what's
great is it doesn't just summarize it. You've got where it can go to each part of it and say, okay, this is like, oh that sounds interesting, let me go click on it, and I can go right to the citation of that that comment. The formatting is a little off when you when you stick it in there, so there's and you're only limited to thirty sources in each notebook, but check out the other parts of notebook.
LM.
I think it's it's really interesting. I expect to see a lot of other applications follow the similar type of of ux paradigm or inspiration. The second one is I don't know if you guys have been watching it, but Star Wars has a new show, Skeleton Crew that they have on Disney Plus. And first of all, I think it's it's good. I don't think it's you know, Mandalorian or and Or, which was my personal favorite level, but
it's still pretty good. But the other reason I bring it up is I liked some of the elements of how they handled AI and droids. So in one episode there's something that could be akin to jail breaking the droid where somebody uses the equivalent of prompt you know, prompt hacking to jail break a droid, and I don't think we've ever seen that in Star Wars's reflection of
Droids before. There's another one where it reminded me of this paper called alignment faking, where the model has to decide between its original training or the thing it's being asked to do right now, and it kind of goes back and forth and it gets over in by its original training. And so there's one thing the very last
episode that I also thought it was fascinating. But I really liked those interesting bits of how they handled AI that I think we wouldn't have seen in a show like this without understanding of chat GPT that I think probably the writers were inspired by. So those are my picks.
Awesome, Yeah, Skeleton Crews on my list of things I want to watch. So I like the recommendation. Thanks for that, all right, Well, just a reminder go look on maven dot com.
The code was jays.
Jabber for twenty percent off, and so if you're interested in the course, go check it out. I'm not hard sell guy. I just think it sounds fascinating. So anyway, let's go ahead and wrap it up here until next time.
Him Max Ow
Mhm
