Welcome to Ed Tech insiders where we speak with founders operators investors and thought leaders in the education technology industry and report on cutting edge news in this fast evolving field from around the globe. From AI to xr to K 12 to l&d, you'll find everything you need here on edtech insiders. And if you liked the podcast, please give us a rating and a review so others can find it more easily. Dr. Satya Nitta. Welcome to EdTech insiders.
Thank you, Alex. Great to be here.
It's really amazing to talk to you. I feel like your work and what you've been doing. And Merlyn mind is really cutting edge in terms of what we know about generative AI and how it can be used in the classroom. Start by telling us about your professional journey. You've been in AI for a long time. And what led you to EdTech and timberland mind?
Well, first of all, thank you for your kind words, Alex. So I have indeed been in AI for a long time I've been I guess it's been over a decade closer to 15 years that I've been aided thinking building or
playing with AI. And according to AI just before IBM Watson won Jeopardy, but even before that, I was working in a field called neuromorphic computing, which was kind of related to AI in that we were looking at, you know, how systems could be architected a little bit more like the human brain where memory and compute are intricately intertwined. Whereas classical one moment computers have a distinct separation between memory and computing. And so we were looking at
hardware architectures. And that eventually led me to the algorithm side of AI and my professional journey over the last 15 years has led me to this moment. And the majority of it, I would say, about 10 years, 11 years have been really in thinking very deeply about how we could use AI in education. My interest in education started with, you know, looking at the
history of AI. And you know, as a study that one of the things that stayed with me, was, you know, this incident very early on and in the history of the field. So there was a very famous conference called the Dartmouth conference and 1950s 1957, in fact, where the term artificial intelligence was
coined. And at the time, some of the luminaries from Marvin Minsky, Herb, Simon, Alan Newell, all these Turing Award winners, future Turing Award winners, we're all coming together at a meeting of the minds to talk about how do you make computers think. And so if you go back to the time, and this is well documented, if you go back to the time and go back to these people and ask them this fundamental question, but why do you want an intelligent machine? Why do you want an
intelligent computer? Now keep in mind, this is 1950s, we didn't know about computers, there, were just some adding machines at best that a very small handful of people even encountered leave alone thought about the potential for so one of the canonical use cases that throw out along with things like we'd like machines to get to tell jokes, and to create humor, and, and so on. One of the canonical use cases was if a machine can teach, that's a sign
of an intelligent machine. And so when I was starting this effort at IBM Research, and I was given the opportunity to bring AI into education with Watson, I went back to the time I went back to this grand challenge, it became a grand challenge in AI. How do you get machines to teach? And I said, you know, that would be fun to build. Because along the way, you'd actually have to invent
new technology. And I didn't, of course, realize how complicated the task was six years later, I left IBM, I concluded that, in fact, getting machines to teach is probably a terrible problem to solve for multiple reasons. So anyway, that's a story for a different day. But you know, that's how I got into the field.
Your background is so rich. And you know, the experience that you've done with Watson for education, I think is so meaningful for this moment where people are trying to figure out the best way to put these two very complex fields together. So tell us a little bit about Merlin minds primary mission, how does the Merlin mind assistant help teachers manage the complexities of technology in the classroom?
Thank you, Alex. So we started mine in mind with the idea that we wanted to bring the latest advances in AI into education in a safe, private, domain specific manner, and in particular, to focus on problems that teachers and students have in the learning process. So the very first product we built is called the Merlin assistant. It's focused on teachers in the classroom. It's a voice activated assistant, much like
an Alexa or a series. And currently, teachers are basically using it through what we call a symphony classroom, which is an AI hub or a smart speaker, and it has farfield mic and teachers can walk around the classroom and basically talk to the device. So the device itself actually doesn't do a lot, but its primary job is to essentially control the teacher laptop and project the teacher laptop onto the front of
classrooms screen. And so the main job of Merlin assistant is to automate a whole bunch of teacher workflows. So we studied the problem very carefully. And we said, what could a voice assistant do in education? And one of the things we landed up on was, certainly it can do question answering it can ask her offensive questions, and we'll answer them in the moment.
But when we studied teacher workflows, and we designed this, actually, with a lot of teachers input in mind, we started teacher workflows, we realized that thanks to COVID, and this was true even before COVID teachers are using 20 3040 different ad tech caps during the course of teaching week. They also have their own laptop, they have a front of classroom display, maybe document camera 20 other student laptops, they're trying to orchestrate. So their primary job is to impart knowledge, wisdom,
motivation to students. And they've been distracted by having to control all this technology, and in particular, the applications and some of these things. So you're giving somebody say, 10, Class Dojo points, or sending an email, sending an email link to your students, these involve being tethered to your desk tethered to your computer, opening up new tabs, navigating multiple menus to get to the right point and
performing an action. So we basically said, you know, that's something AI can do really well, all you have to do is issue a voice command, you know, send this link to my class, and the teacher can be anywhere in the classroom, they can be walking around and issue the voice command and like magic, the thing happens. And so we basically that was kind of the core premise AI assistants made automation. And the automations are deeply embedded in teacher
workflows. So we do all kinds of things like, you know, launch lesson for lesson plans from Google Drive, control presentations advanced the sixth slide, go to the three minute 50 Mark in a YouTube video, all with voice with an intelligent Atmos. So that's what the Milan assistant does today. And we've given it a Jenai makeover, which I'm happy to talk about as well.
Oh, yes, definitely talk about generative AI. But there's a lot to unpack there already. So I love your sort of term orchestrate the classroom, you know, any educator listening to this. And so in edtech, know that teachers are juggling so many different
tools. Right now, they have so many different technology pieces that have to work together and they're onstage, they're literally up there in real time having to make them work in any one, if you're missing a password, or this connection goes down, the whole thing goes offline. So I picture you know, literally an orchestra of
different technology tools. And Merlin assistant is sort of acts as the baton, the conductor's baton that allows the teacher to make them all work synchronously in the right time at the right pace, without having to you know, fiddle with a lot of wires and buttons. You mentioned, you studied a lot of teacher workflows to get to this. And one of the things I admire about Merlin asst, is the voice
interface. You know, we use voice interfaces for a variety of things in our mobile devices, but I don't think teachers are that used to things happening when they use their voice with technology. That's really exciting. How did you get to that type of interface? And how did teachers react to it in the classroom?
That's a good question, Alex, I like love, have you, you know, fresh the value prop is orchestration, because that's exactly what it is, what we're trying to do is give the teachers you know, time and cognitive space back so they can actually go back to doing
what they do best. So the way we landed on voice assistants was, you know, at the beginning of the company, in fact, the company was founded on this idea that, you know, at the time, Alexa Siri, Google Assistant, were very much involved as AI tools that were exciting. And that basically, you know, foretold the future right, in fact that we're now currently live in of AI doing some really interesting things for people.
And we realized that one of the things that this consumer assistants don't do is they're built for anyone anywhere, you know, trying to do anything, and they were trying to kind of boil the ocean at some level from our perspective. So we were inspired by them, the idea that you could actually command your computer command, your various tools, and whatever you're trying to do within your workflow with voice
was just very alluring. And so we decided, you know, let's actually attempt to build something like an Alexa or google assistant, but focused on the teaching profession. And let's do so in a private and safe and focused way where we're not actually taking a data and selling it, we're going to sell you a product, and your data is yours. And we'll we'll respect all the privacy regulations, and we'll be in compliance with COPPA FERPA GDPR. So that's kind
of how the idea came about. And voice is really interesting modality. So the second part of your question is how did teachers take to it? So one of the things we realized is, in fact, the primary tool that teachers use in education is their voice. And we felt if we do this, right, they should be so natural. One moment, they're talking to the children about something about say cell biology or earth sciences and tectonic
plates. Another moment is simply saying, hey, Merlin, send this link to my class or Hey, Merlin, pull up Google Alerts. So we can navigate to a particular spot that we're talking about to kind of the register intercontinental register, what what have you. And so our vision was, you know, like the build the Star Trek computer for the classroom, and let the teachers just interact seamlessly with the kids as well as the AI, which is there as an assistant doing all the mundane
things for them. And so a huge part of, you know, effort went into actually designing the user interface, you know, we realized, teachers might not actually know exactly what to
say in the moment. So the user interface has commands on screen that pop up in a contextual fashion, if they're navigating presentation, they can simply say, Okay, what is the command again, that allows me to go to the fifth, sixth slide, etc. So in fact, with the Generico that we're doing with the Assistant, we wouldn't even need that you can just talk to it, like you talk to a human. And so basically perform the actions. And the reception from teachers
has been amazing. It's the most gratifying part of what we do. We do this for them. Because we, we realize, by the way, why Teacher, why the teacher, the teacher is the single biggest lever on learning outcomes and realize that it's not technology, it's not some personalized tutoring system. It's not the fancy front of classrooms screen you have what the amazing computers you bought, it's a teacher, it's a human, it's a human in the loop.
And their primary job is to motivate and impart knowledge and wisdom to the next generations of children so they can grow up and take care of the planet. And we said, this is the most important thing we could be doing with AI in the moment. And that's why we built a system focused on the teacher. And I love the response. There are these courts that one teacher basically said, you can pry Merlin from me over my dead body and things like that, you know?
So, yeah, it's amazing. And I can give you a few stats, actually, the average teacher uses Merlin something like 30 times a day, our super users use it 100 times a day, once every 345 minutes. And we just love that engagement. And they do this. So we can even tell we've been looking at the data trends, you know, when there's a break, when this exam season, they come back for a new school year and the pickup right where they left
off. So we have all kinds of evidence of teachers being completely in love with this product,
it makes a lot of sense. I mean, it really feeds into one of the toughest parts of the educator profession, especially classroom teachers, which is just juggling, balancing, you know, making sure that you're keeping track of all of these different techniques and technologies and students all at the same time. And I can only imagine that teachers who get used to using it, as you say, 30 times 100 times a day, realize the power, it connects all their different
apps together. And then it becomes integral to how they think and how they teach and how they plan. I'm sure you know, instead of having to build an all this buffer time of like, we'll spend five minutes getting the video cued up, they know it's just going to be ready to go and you know, all the presentation. So it's it's powerful. And I think a lot of us in edtech would make our day is hearing the end users in your case the classroom teachers just glow and say this is changing
their life. It's really exciting. You mentioned that Merlin asst is in the midst of getting a gin a AI upgrade. And you know, one of the things that I really admire about the way that Merlin mind and you in particular see the field is that you really think very deeply about the education specific use
cases of generative AI. Tell us a little bit about how you think about that space and what you've been doing to sort of support the entire edtech ecosystem in making sure that Gen AI and education become best friends and not mortal enemies.
Well, I'm again, again, thank you so much for those kind words. So I'm so you're right, we're thinking very hard about Gen AI, we think it's obviously one of the most thrilling advances in technology. Since maybe even the birth of the Internet or the advent of the iPhone, I agree. It's tough, something very profound, the shift that's very profound. And the reality, though, is that language models have been around for a long
time. In fact, we've been working on language models since the start of the company with you know, all voice technology has language models built in. And so one of the things we knew about language models, and that we know is these things have probabilistic systems. And so Ingenico AI, as the term itself suggests, is basically an AI that's trying to output you know, the next word as a
prediction. And so when we studied this very carefully, and we've been interested in this ever since the transformer paper came out, then there's really cool things that came out with Google releasing Bert and t zero and T five and flan and a whole bunch of open source models. We've been playing with them. And one of the things we realized is, you know, these outputs can be sometimes problematic. These things can hallucinate. They can make up
facts. They seem to be very confident, but they're actually giving you false information. So we tune into that as something that could be extremely dangerous for education because it's really important to get your facts right in this profession. And you cannot I mean, imagine teaching a kid around facts or misconceptions. they'll carry with them for the rest of their lives. That's just, you know, scary from our perspective. And I'm sure you will agree. So we studied this
very carefully. And of course, one of the things as GPD, three came out, we started building our own set of large language models. And we started zooming in on some of the issues that all these models have even GPT for the state of the model today exhibit same sort of issues, which are endemic to the technology. And that's important thing to this set of architectures with the transformer architecture, what's endemic is they will make up facts, they will basically capture all kinds of information
that's going back and forth. So they're not entirely private. So an example of that is that as a student, I might say, Look, I'm being bullied in Mrs. Morelos class by x and y. And suddenly, you know, the fact that this student is getting bullied in somebody's class by x and y, there's a lot of sensitive information that just went between the school or the classroom, and the cloud provider and the API provider.
And potentially an application of this actually was fronted by some kind of an application that's hitting GPT, three GPT. For Bob, what have you, there's multiple people who was was in possession of highly sensitive PII. And the probably all in violation of some of the privacy regulations that are out there.
So we will focused on privacy safety is another major thing you can basically trick GPD for into giving you all kinds of things that are inappropriate for a classroom, they've done a great job in improving safety from when they released to where where it has now. But age appropriateness for classroom is just an entirely different issue altogether, you really don't want these things telling you how Molotov cocktails are made
and rushed, you know. So we looked at safety, privacy, hallucination, and also efficiency. So these models are very big, gigantic, costs a lot operate them costs a lot to even perform a simple inference where I asked you to perform a command, and it's basically taking several seconds and then coming back. And it's just very expensive in the whole process.
So we said, it's possible to potentially build Gen AI, or a different set of language models that are not based on big tech that can once again be made to be private, safe, age appropriate. And, importantly, hallucination free. And this is where a majority of our effort has gone. So that's kind of our approach to Gen AI.
Yes. And that litany of potential problems that you just named is causing a lot of school districts and educators and parents to be hesitant and concerned and rightfully so about, you know, what is the exact relationship between generative AI and especially student use cases, it's one thing for a teacher, it still possibly cause problems, but it's one thing for a teacher to ask AI to generate something, and then at least they're seeing it before it goes in front of
students. It's another to put it directly in front of students, you know, privacy, security, age appropriateness, hallucinations. And as you mentioned, at the end, efficiency is a big deal, because whenever you call a giant, generalized LLM, like a chat GPT for it costs money, which can raise the price of the tool, it can raise the budget of the school, there's a lot of reasons why you don't want a sort of open ended cost coming your way as well. So all makes a lot of sense. So let's break it
down. You've been working on these problems in a couple of different ways. You've been looking at purpose built, generalized LLM that can answer lots of questions, but will ideally not have any of these problems. You've also been looking at these LLM classification models that can take the output of any other LLM, including a general one and evaluate it against some of the possible problems that you've
just outlined. So especially age appropriateness, is this sort of appropriate answer or not, you can catch that answer coming out, evaluate whether it's appropriate or not through a AI model, and then make sure that students aren't exposed to it, unpack all that I did my best to summarize, but unpack all that in a way that our listeners can really get their heads around it.
So you write what we're interested in is. So like I said in the previous response, we're really interested in tackling hallucinations, efficiency, privacy safety, with generative AI. And so what we're trying to build is basically a suite of large language models. So we are building, you know, a suite of large language models, each of us will be composed of a series of much smaller models. And the interesting thing here is each of the smaller models is
specialized in some tasks. So for instance, I might have a model specialized and question answering. And another model is specialized in Topic detection or summarization. Okay, and so the reason we're taking this approach is, you know, we can basically get the same performance as a generalist model with a much smaller model on specific tasks. You just want to have a much smaller model, doing all the things that a generalist model does at the
same time. Okay, and so what this does then is make much more efficient models that much faster, much cheaper to build and operate. And education is very cost sensitive business. So that's the basis of our platform. And that will power both our own solutions, as well as anybody else who wants to build solutions with us. Now, importantly, as a part of this platform, we are building and we have built one specific model that checks for the safety of inputs and outputs from large
language models. Okay. And this is important because one of the techniques that has been around in AI for a very long time is called ensembling. ensembling, is this idea that you can stitch together multiple AI tools in the service of a solution. So for instance, I'm a student or I'm a teacher, and I'm asking the language model a question. And sometimes a question I asked might be, you know, inappropriate, high school teenager wants to have fun,
might be inappropriate. So before the questions even sell to a model, and before we give it a chance to answer, it should be possible to put a filter in there to say, hey, is this something I should be passing on to the model? And at the other end, it should be possible to put another filter to say, hey, is the answer The Model gave appropriate for my audience. And all of this is basically done with a second large language
model. And so what we did as a public service was, we fine tune the large language model on a variety of categories. So we pick them, but we encourage interested entities to reach out to us and partner with us because we're happy to continue evolve in this particular model. But we picked over 100 150 topics that we felt were inappropriate for education, but which large language models generalist models would actually be perfectly happy to answer.
And we dumped it out there. And we basically are encouraging people to, you know, use it as a quick safety check for inputs and outputs of alums. So that's the whole idea. The idea here is, you know, build a platform, you know, have multiple small models focused on specific tasks that are of interest for education, for instance, question answering question generation, summarization, topic detection, and also build models
to check safety. And in our own solutions, we actually use the safety model both as an input and output filter along with, you know, routing, the models and routing, the user prompts to the appropriate model that we pick to respond. So that's kind of the basis of our platform,
it's incredibly important work, especially at this moment in time, where we're, you know, sitting here at near the end of summer of 2023, where we're almost at the year anniversary of generative AI sort of bursting onto the scene. And I think, you know, one of the things that and people are still struggling on how to use it, and how safe it is to use. So I
think this is so vital. And I think one of the things that I'd love to highlight about your approach and your answer that without ensembling technique that, you know, I've been doing my best to learn as much about this world as I can is that all of us who got exposed to the chat GPT, fours or bards or mid journeys, they are these hyper generalist models, and that's partially what makes them feel so amazing and magical. You can ask it anything, and it'll know
how to do it. And it does that because it's trained on unbelievable amount of data, although sometimes people say not even enough data to do as much as it does. That's why hallucinates partially. But it's a generalist model. It's an answer anything machine. And I think what you're focusing in on is that is two things, right? That are purpose built for education that answer anything machine, you know, bringing in a wise person to a classroom will
answer any question. And then putting them in front of a room of teenagers might not actually be that smart, because the teenagers might ask some very strange things, and you have no way to stop them. So that filter approach is incredibly important. But as well as the filters, there's this concept of education specific MLMs, that do specific things that students might need, and commonly done, like note taking, or question generation, or summarization, or
all of these things. And you can train models to do a specific task with much less data than you need to train a model to do every kind of task. And that is a key point. Yeah. Can you discuss that a little and how that affects this world?
You're absolutely right. So models are basically, you know, judged by the size of the model. Okay. So before I go into kind of a little bit of a journey about discussing, you know, what makes a lot more larger or not. The important point to stress here is that you can use transformers to build language models, and focus them on specific tasks and train them to do that particular task extremely well. And they will do it. So it's kind of a specialized brain, if you will.
I genuinely dislike some of the comparisons to the human brain. But a good metaphor is to say, look, you know, from the age of three or four, I learned how to play the violin. So I'm an amazing violinist as opposed to, you know, people like you and I were, you know, we're generalists. We do a lot of things reasonably well, but we're not you know, kind of an era you'd sack Berman level of
violent play. So And then that's true across a number of human endeavors, you can be a great athlete, you're trying to train yourself to be a great athlete. So we're literally doing the same thing with language models, we're saying to a language model, I don't really care that you're, you're amazing at a 5060 100 different tasks, I want you to be amazing at this one, or this two, or these three tasks, which are all related to
each other. Okay. And the way we can do this with much smaller models is the size of the model is basically the number of parameters in the model, okay, the weights, the layers, the nodes, and what have you. So if you build smaller models, and you train it on specific data, lesser data, now the size of the data, you need to train, the model is basically 20 Exercise parameters. So 500 billion parameter model, and it 2
trillion tokens. If I have a 1 billion parameter model, I only need you know, 20 billion tokens. And this makes a big difference in terms of the compute efficiency, the compute cost, how much time it takes to build a model and retrain it and so on. And crucially, it also makes a difference in terms of, you know, the efficiency with which the model will run. And so what we basically decided to do
was to go down this route. And this is an open question in AI circles, which is, what is the smallest model you can build, that will compellingly do a job that a much better journalist model will do just as well, okay. And we don't know what the limit is yet. But we're pushing it, we're down to, you know, a billion parameter models, and in some cases, they do extremely
well for certain tasks. However, more complex tasks like reasoning, chain of thought, etc, they require larger, larger models, but you still don't need, you know, the gigantic models, we still think this is an open research issue, no AI, as I mentioned, but even you know, for, there are certain tasks that according to the more demanding and and we still think it's possible to do them with much smaller models and these very large generalist models, and that's kind of our approach.
Yeah. And that difference between specialists in generalized models is really a big difference. So you know, that you're mentioning the number of tokens and the number of parameters, tokens is a measure of input data into how many sort of discrete piece units of data go into a model. And parameters are a measure of complexity how complicated the model is in its, you know, quote, unquote, thought process. And this might be a really goofy
metaphor. But this is what came to mind as I was hearing you talk about this, I was on a plane recently. And you know, when you write about to get on a plane, the last agent scans your ID with a little tiny machine. And the only thing that machine does is tell you if an ID is real or not, it's the only thing it can do nothing else totally specialized. And it's a classification model, right? It has two outputs, yes or no, it's real, it's not real. That's all
you need. At that moment, you would never want to ask a supercomputer to tell you if your idea was real, because it would cost a lot it would be hard to use and train on. And that kind of specialist use case, especially when you chain them together potentially or you know, can be incredibly powerful, and also much safer, because it can be trained not on the internet, it can be trained on whatever data you need it to
be trained on. So talk about the datasets that you could train these kinds of smaller models on if you wanted them to be safe.
Yeah, no, that's a good point, Alex. So you're right. I mean, one of the issues with machine learning and the spin true for as long as the field has been around, is garbage in, garbage out. So if you have a model trained on everything on the internet, then it will occasionally we shouldn't really be surprised to us that it'll occasionally put
out garbage. And so what we're doing is our approaches, you know, actually, if you filter and if you filter all the data, and you train it on the most important sets of datasets, in particular, if I want a model to be fantastic at something very pedagogic, like, you know, spaced repetition. And in fact, I'm kind of giving away a little bit of a preview, one of the things we're working on is pedagogically aligned large
language models, okay. And we're deeply informed by you know, some of the world's leading cognitive neuroscientists, Bruce McCandless from Stanford, has joined the company as an advisor to help us on this effort. We'll have a lot more to say about this in the coming months. But the idea is, if I want this thing to act as a pedagogical agent, then I need to give it a whole bunch of data that shows a human acting and pedagogically
sound ways. So, for instance, you know, a teacher getting a student to diagnose or to get to an answer by themselves by asking them a series of questions and giving them some examples and metaphors and analogies. And suddenly, if you were to have imagine, you have all kinds of amazing learning moments like that capture into a model, and then the model learns that, okay, this is how I'm
supposed to act. So one of the attributes of specialist models isn't just that, you know, I can make these much smaller, but I can make them do things that journalist models just weren't
trained to do. In fact, that's really kind of the most important thing that we're attempting to do, which is, by the time we're done We should have models here that educators will will will say, Okay, I want to use this thing all the time, because this does something that GPT four does, that bar doesn't, it just acts in ways that I simply cannot get the larger models to do. And it's, it's actually very sound from the science of pedagogy that I've
learned in college. So these are the attributes of kind of a bespoke approach to building language models.
That's a great point. Because, you know, lest we think that a smaller model might just do a portion of what a generalized model does, and nothing else, it can actually do that portion much better, if you instruction fine tune it to be specifically reactive to that.
So the idea of pedagogically trained models that can, you know, really find the teaching moment or the right metaphor or the right, you know, pace to teach, if you're talking about spaced repetition and paste it to test memory is so powerful. And then, you know, with the ensemble learning approach, that becomes a really powerful model.
And you can also have a model that's there to make it fun, or a model that's there to make it level to their students, right, you know, reading level or a model that and these all can be better than the generalist models if they're trained the right way, at that specific task. So it's almost like getting a committee of these super educators in the same place. And you can choose as an end user, which of them to use. That's, that's how I'm thinking about I'm not sure if that's
really accurate. But it's like, you can really put together incredibly specialized use cases for education, which are very different than the kinds of things that we use chat GBT for generally.
No, you're absolutely right. So that's I mean, that the last bit is exactly that. So what we're really interested in is, you know, taking the technology, the underlying technology and extending it to specialize things that are important to a domain that the generalist model suggests we'll never get to.
Okay. So in fact, one of our thesis is that this is how generative AI will really play out in the in the medium term, which is, each domain will specialize already we have met GPT, which models are shown on medical data, and I think Bloomberg pro bloom, GPT, or something for finance. And so we're envisioning something like that for education models built from the ground up with specific training and data to say this is how you act as a really great educator. And that's what we
want to put out there. For our teachers.
Yeah. And then you can have, you know, trained educators putting together the models in all sorts of ways, maybe even someday, an AI metamodel that says, given the tasks that you're trying to do, oh, you're trying to do a lab report that involves X, Y, and Z? Well, I'm going to suggest these models that you put together to do that there's so much potential in the idea of lots of specialized models versus one giant generalized
model. When it comes to education use case, it gets the mind going very, very quickly. And I'm excited to hear that you're working with Stanford, obviously, we will all keep our eyes out for any additional announcements about some of the
things that are coming out. So one of the things that's incredibly interesting about this moment, is that there are different ways to slice specialized ed tech or special specialized LLM, as you mentioned, you know, medicine and finance, those are specialized by domain by, you know, what they know. And then there's pedagogical or safe, or you know, appropriateness, LLM 's that are specialized in sort of how they communicate or what they do. How do you see these playing together in the future?
Yeah, that's such a brilliant question. That's a good observation. Yeah, I mean, so there is actually a couple of different slices that one can look at. One is tasks. And so the way we look at it as a specialist on a particular task, and if so, is it as good as the state of the art out there like GPT, four, do you prefer, by the way is the best model on the planet, it is still the benchmark, then something else better comes along as a generalist model, this is the
benchmark. And so what we would like to do, and what we're attempting to do, is both simultaneously, so on specific tasks, we want to be as good as GPD for and on specialized domains like education, we want to be the state of the art, because you want to get a model to, as you said, you know, use spaced repetition or be some kind of Socratic instructor, if you want to model to be these things, then there is no other model out there that does this. Well. So then you become kind of
the benchmark. And so that brings us a fascinating question, but how do you know more is actually working well, and to the benchmark discussion that you and I were having before we started this podcast. So a good way to look at it as tasks and domains. But even within domains, you're still doing tasks, which are the generalist NLP functions that you want to model to do really well.
Right? So if there's specific tasks, and then there are domains of knowledge that underlie those tasks. So if you're asking it to, you know, teach me about the French Revolution, then it has to both know The history of the French Revolution and be trained on enough data to understand it, but also know the pedagogy to actually teach it to you in a way that you'll retain it and understand it and be able to synthesize it at a deep level. And those two things are pretty different. If I might just
quickly interject, that kind of forgot a really important point to make here as well, which is, one of the ways we reduce hallucination is education to hyperlocal business. And so the French Revolution example, got me thinking about this. So you don't want a model to just teach the French Revolution from whatever it has been trained on. You want it to teach the French Revolution, from the corpus of material that your school district or your teacher or your
professor chose. And so all of education is a hyperlocal business. And that's the most important thing, education is hyperlocal. Each school district democratically chooses its board, which says this is what I want you to teach from this is a curriculum, these are the standards. And we want to support that. And so in fact, a really big part of our architecture allows for people to upload their own content and to get the model to only operate
on that content. So it's domain specificity with hyper locality in mind, if you will,
yes. And that hyper locality, that idea of having a controlled data set, or a data set that's focused on your particular curriculum or publisher means that you combined with the fact that you don't need as much data to train models to be very specific. So you don't need the whole internet to train a model, you could use the textbooks that are in your particular school district and have already been approved and, you know, procured and that amount of data is
enough. Yes, in theory to be able to do exactly the things you need. Is that right?
Yeah, I think it's too late to put in some additional data like Wikipedia, etc. Because one of the interesting things is language models need just enough data, to learn to understand the rules of language, if they've seen enough samples, then what they understand is, here are the patterns of say the English
language. And so what what so so our approach, if I were to summarize our approach, we're trying to build a smallest models possible, that understand the formal rules of English language, and the specific instructions we give it. But then we want this model to follow the instructions on specific corpuses of information that we want them to. And then we want them to be safe and private. So that's basically the overall approach that we're taking,
that makes a lot of sense. So there's sort of a foundational layer of learning and data from large corpuses, such as all of Wikipedia, but safe corpuses what you know, ones that are all of the open Coursera content, all of Encyclopedia Britannica, you know, no discussion boards on strange websites, and that allows it to understand instructions, respond to them know how to compose sentences,
all of that. And then you can build in your own data and have it actually teach through the lens that you have decided as a school or district or an educator. Exactly, incredibly exciting work. So you mentioned the benchmark discussion we were having earlier. Right. Before we started recording, we talked a little bit about this, but I'd love to talk a bit about it on the podcast, I'm an open book
here, at least. So one of the things that, you know, I've sort of been realizing again, and again, as I start to learn more about this space, is we keep talking about models that are better, that work better than others. But you know, What Does better mean is actually a little bit of a moving target. And what people do is actually create their own benchmark assessments for lack of a better word benchmark sets of tasks that a model has to do to prove that it
has a certain trait. So you know, on hugging face, they have all these open source models, and they compete on a specific set of tasks that have been phase has approved that said, these are the ones that we want it to pass. And you know, being at the top of the leaderboard means that you're the best at those particular assessments, or
benchmarks. And it strikes me that, you know, we don't have something like that for education, we don't know what are, you know, we don't have a way to measure if a model is doing great pedagogy, or whether they're being very safe and age appropriate or whether they're, you know, making sure to be really careful with personally identifying information. I'm curious how you think about that, because I know you have gone down that idea maze in the past.
Wonderful observation. So when we were putting our defining our age appropriateness model that's out there on GitHub and having phase, we discovered that, you know, there isn't actually a very good way to measure whether this is doing a good job or not, because nobody actually invented a benchmark. Okay, and the benchmark discussion, the point you made about benchmarks as a way to measure the goodness of a
model is spot on. And the closest we come to any education specific benchmark is one that's called GSM. 8k, which is grade school math, equilibrium school math or something like that. That's what the benchmark is. Okay. But none of these things actually check for look, is that PII being transferred across the wires? Is it appropriate for a particular grade level or age
level? So we debated that very much about whether we should create a benchmark, then we realize, you know, we could start it, but we really liked it
to be a community effort. And by that, I mean, you know, there's a few very well known institutions that can come together and say, let's define this benchmark, then we can potentially, you know, build a model to check for the benchmark and get into somebody like hugging face or someone to host it and to basically, you know, test models against these benchmarks. I do think that's absolutely necessary for
education. Because, you know, learning is actually a very sacrosanct activity, I realized that whenever I speak to teachers and and the care and the thought they put into how they allow certain things in classrooms or don't allow certain things, and they really care. And it's time we treated, teaching and learning as just as important as the medical profession, as you know, patient care and therapies for curing
patients. So what we are very eager to do is to see if there are people who could collaborate with AI, we don't have to be the lead institution, we just want to participate. And that's also why we put out this model out there. Because I think it's important for education to use appropriateness, filters, and safety checks and PII checks before the hedge and AI models. And I do think it's time we came together to did this as a community, I love
that call to action. It's a great challenge for the entire ed tech community to rally around because it's the right set of institutions doing this in the right way, which would involve research and, you know, trust and educator input in all sorts of ways could
really be transformative. I would also imagine, by the way that companies like you know, the Disney's of the world are trying to say, well, I'd love to build an interesting chat bot for kids who want to talk to Disney characters, but I can't because you don't know what's age appropriate. And I don't want to be liable for a seven year old hearing something very strange from you know, Donald Duck. Exactly. There's commercial reasons for this as well.
Terrific example, actually,
the education use case is actually arguably a subset of the age young child use case. Yes. So there's all sorts of overlapping stuff here. I'm so excited, we will definitely continue this conversation. But we are coming on time. And I've got to ask you the two questions, we end every podcast on what do you see as one of the most exciting trends in the EdTech landscape right now? And I know we're all we've been talking about AI and MLMs and specialized MLMs. And you
can use that. But I'm curious if you're on a zoom out and said, what else are you seeing too?
Well, I think one of the most exciting things I'm seeing is an embrace of generative AI. So we've been doing this for a long time. And for the longest time, we were talking about AI and its potential to transform education. And people would look at us as you know, I believe in Skynet, I believe Terminator two, something along those
lines. Science fiction has taught us to mistrust and to be in fear of these intelligent machines, then, thanks to the chat, GPT moment and November 2022, everybody kind of understands that, okay, there's a really powerful technology here that we can embrace and use. But along the right, the path to embracing AI, what I'm really encouraged by In fact, the pandemic was a terrible
thing for the world. But one of the very few good things that came out of it is that education got digitized properly, and America, a huge amount of resources were poured into the industry. And people realize that it's actually a really important thing to educate our children, even if they're, you
know, at home. And so, I really love this trend of more and more technology, community education in the right way, I think it's only going to help us in the long run, because we are in a fully digitally interconnected one. So that's, that's the most exciting trend outside of this larger embrace of AI and awareness of AI that's happening,
it makes sense. And you have a front row seat to that because your users of Merlin assistant are using multiple digital tools every day in concert with each other. And you know, the number of tools that teachers use every day has gone up enormously over the last few years, partially because of the pandemic, and what is a resource or two that you would recommend for somebody who wants to learn more about the really exciting topics we discussed today?
The best paper I've read, that makes sense of the general revolution thus far, is a paper by Kyle mahoe and co authors on language and thought it's I think it's I think the title might be something like this associating Language and Thought in large language models, but it's worth Kyle was originally at MIT, where he got his PhD. And there's a long story where, you know, I nearly ended up, you know, enticing him to join us at IBM research, but he slipped through the through
the cracks and, and then anyway, he's one of the stars and the cognitive science horizon, one of the rising stars, brilliant cognitive scientists and and a great AI technologist as well. And the paper that he wrote, along with several co authors, including Josh Tenenbaum is required reading for anybody who's really interested in what's actually happening here. Regenerative AI, this models really came All of thinking, Where is the gap between human
cognition and AI today? So I love it because it's not an AI technologist writing it's a proper cognitive scientists, and one of the world's leading ones, really taking a careful look at what does cognition really look like the human brain and do these models actually exhibited? So that's the resource I would point people to
phenomena. And as always, we will put a link to that paper disassociating Language and Thought in large language models call in a cognitive perspective by Kyle Maha Wald, Joshua Tenenbaum on a even Nova, etc, about six or seven authors. We will put that link and I'm going to ingest that just tonight because I'd love to learn more about this as well. Dr. Satya Nitta, CEO of Merlyn mind, which is changing how AI and technology in general is being used in the classroom.
Thank you so much for being here with me on edtech insiders.
Pleasure Solomon. Alex, thank you so much for having me.
Thanks for listening to this episode of Ed Tech insiders. If you like the podcast, remember to rate it and share it with others in the tech community. For those who want even more Ed Tech Insider subscribe to the free ed tech insiders newsletter on substack.