HockeyStick #8 - Generative AI in Action | HockeyStick Show podcast

⁠¶ Welcome to Hockey Stick: Diving into Generative AI

Miko Pawlikowski

00:00

I'm Miko Pawlikowski, and this is Hockey Stick. Starting with generative AI can be daunting. There's a lot of hype, a lot of development, and a lot of change, daily. It's easy enough to launch ChatGPT and ask for a poem on how Vim is superior to Emacs, but to get value from it professionally requires a bit more skill. Today, I'm joined by Amit Bahri, The author of "Generative AI in Action", a brand new book published by Manning.

00:29

Amit is a principal group technical program manager at Microsoft, where he leads the engineering team that builds the next generation of AI products and services on the Azure AI platform. He has over 25 years of experience in technology and product development, including the artificial intelligence and cloud platforms fields. And yes, you will learn what his mom thinks about ChatGPT. Welcome to this episode and thank you for flying hockey stick. let's start right away.

⁠¶ The Genesis of "Generative AI in Action"

00:59

why did you write the book?

Amit Bahree

01:01

being in the AI platform team at Microsoft, one of my roles, which more or less became the day job over the last year and a half was meeting with a lot of our customers, which are generally large enterprises where everybody wanted to know how do I use Gen AI, obviously took over the world, as I joke, my mom's a ChatGPT expert

Miko Pawlikowski

01:23

Oh yeah, I bet she is.

Amit Bahree

01:26

and I basically, at the end of the day, got tired answering and guiding the same thing again and again across multiple customers. so I said, What if this could be put down on paper and they could just learn themselves rather than we being the bottleneck in many ways, right? So in, in full transparency, it was a selfish exercise. so I don't have to repeat myself again and again in doing this could just point them and say, Hey, go read this and that'll at least give you a jumpstart.

Miko Pawlikowski

01:55

Yeah, exactly. Read the book. I love that. that's a completely valid, perfect origin story. so you mentioned that your mom is a generative AI expert, so I guess we'll interview her next time. But, for you,v what was your moment?

⁠¶ Amit Bahri's Journey into AI

02:11

When did you decide to go into AI? obviously you've been in it for a while. It wasn't as hot as it is right now back then. Can you tell us a little bit about your story and how you ended up doing it? What you're doing?

Amit Bahree

02:24

I am actually not a data scientist. I'm not a machine learning engineer. I know how to build models, but that's not what I live and breathe and dream up in the middle of the night, as I know many of my colleagues do. That's their passion. in my previous role before Microsoft, one of the things I was learning was, emerging technologies, understanding from a technical point of view, what they are, how they work, how they could be used, or mostly in the context of an enterprise setting.

02:51

And one of the technologies among a few was AI, a few years ago. In my role of looking at emerging tech, is how I got into AI. Of course, Gen AI or these underlying architecture principles that power these things today didn't exist. But I was quite fascinated. it was still a side job in the sense it was one of a few areas of emerging technologies to go dig and deep into. And then as that started getting more traction, I was the one eyed king in the kingdom of blind.

03:23

Because, I knew more than the others, didn't know, doesn't mean I know most. And then I was stuck with that. And then grew into that and got fascinated.

Miko Pawlikowski

03:32

as they say, 'the rest is history'.

Amit Bahree

03:34

It's still early days.

⁠¶ The Multifaceted Role of a Technical Program Manager at Microsoft

Miko Pawlikowski

03:37

so what does a principal group technical program manager, that's a mouthful, is that how you introduce yourself at parties?

Amit Bahree

03:45

No. Microsoft likes long names and titles. titles. Being a aside, I basically have officially two day jobs, unofficially three day jobs So I sit in what we call the AI platform team. We are the product team that builds all the AI products that power other products or our end customers. I have formally two buckets of responsibilities. Microsoft, our leadership goals, we sign large contracts with customers, within which we promise them either new or better AI features.

04:20

it could be brand new things that we're building with them or for them, or it could be improving existing features. So once we sign those contracts, those land on my plate to go deliver from a platform team perspective. So I'm responsible for a lot of the custom engineering on the platform, which is this. That's my first bucket of responsibilities. My second bucket of responsibilities is whatever we do custom in the first, make sure it's in the platform.

04:46

Because if you keep being custom, then there's no platform left. So the way I want you and the listeners who will get to this to think about it is, these large deals that we sign are the catalyst for us to go do things in the platform that we already are thinking, maybe it's not prioritized enough. so they are a forcing function to go improve the platform at the end of the day. and that helps not just that one specific customer, but all the rest of them as well.

05:14

And then my third unofficial one is anything and everything related to, Azure OpenAI coming from our, CEO and what we call our SLT, which is CEO and his, direct reports, in the context of customers where, it's a top of mind for many and. For many folks, their understanding is varied, which somewhat ties back to the genesis of the book.

05:35

so when Satya meets other, CEOs and they have a question or they're not happy about something or they need guidance, those get sent over and say, here's the team is going to go help you. And so then I go in and from an engineering point of view, support, see what they need or what they want. So those are, that's my day job, right? So custom engineering. And then supporting, Azure OpenAI related things, from our leadership team.

Miko Pawlikowski

06:03

and then there's your fourth job, which is writing books.

Amit Bahree

06:07

That, indeed, that is also a moment of insanity in some ways, but yes, that is the graveyard shift, as I call it, because, it's after the work is done, and, which is never done, these days at least, so yes.

⁠¶ The Sam Altman Saga and Microsoft's Position in AI

Miko Pawlikowski

06:20

Of course. I have to ask you, obviously, not that long ago, there was this entire drama of, Sam Altman being fired, and then rehired, and all of that. And a lot of people were wondering a lot of things. Satya was quite prominent during that entire conversation. What's your take on what happened?

Amit Bahree

06:44

couple of things. we were learning along with the rest of the folks on Twitter or Reddit or wherever one follows things, right? the conversations that, Satya and Sam were having was above my pay grade, just to be black and white about it. So we were following along and listening along just like rest of the world. I think the one difference is, we had a little bit in the machinery.

07:06

Obviously in our team, we do work from an engineering perspective closely with OpenAI and they're a massive partner to us. So I think in some cases, maybe we are a little more empathetic, I would say, because it's a little more closer to home. And, it's one big virtual team is loosely speaking, how to think about it.

Miko Pawlikowski

07:26

So there was one particular thing that I think it's interesting and, It might be that people are just reading way too much into that, but I think Satya went and said something along the lines of, 'don't you worry, even if OpenAI stops existing tomorrow, we're basically well positioned to continue, the innovation' and all of that. And a lot of people took it as saying, okay, they basically bought themselves OpenAI. is that roughly what's happening?

Amit Bahree

07:52

Couple of things. One is now I'm not a Microsoft spokesman. I'm just talking on my behalf. we don't own OpenAI. I don't think that is correct. I think people are reading too much into it. I think the thing I want the folks to understand is, Microsoft and Microsoft research investments in AI have been over 30 years. So it's not just today we've woken up. Or a few years ago, we've woken up and say, 'look, this is the thing to go in'.

08:19

I think the difference really is my mom didn't know about it, nor did she care. now she does. so I think, where we're coming from in some ways it's not new. And it's just become more in the limelight and people are becoming more aware, but We've been at it for a while, and both from a research perspective, products perspective, it's just more in the limelight now.

Miko Pawlikowski

08:41

Okay. let's leave Microsoft alone and talk a little bit closer to your book.

⁠¶ Why Generative AI is a Game Changer

08:45

So one of the questions that I keep asking everybody is. Their reason to think why GenAI is such a massive deal, right? Why is it such a big deal and why again, your mom, why does she know about it now? And she didn't before, I don't think she knew about BERT. I suspect, but she does know about ChatGPT and there's a good chance she's using ChatGPT, which is, next level. And, what do you think was so special recently?

09:13

What's the like hockey stick moment from your perspective of what's changed that it became, a household name?

Amit Bahree

09:20

it was ChatGPT itself that changed to make it a household name, and as we all know and perhaps understand what most people is, the roots of ChatGPT was a demo. It wasn't meant to where it is right now. And the fact that one doesn't have to know BERT or any of the other sort of technical mumbo jumbo, and I can just talk to it, I can just use it just as an end user. I think the simplicity is the power of it.

09:46

And the breadth of what a language understanding, it can do versus as we call it now, the traditional AI, which is very odd, by the way, in the first place, but, in the old AI, the pre gen AI, which is not old again, it's very much valid, of course, today. was very task specific, where you go deep in a certain task. So if you're in a company, in an enterprise doing a certain thing, using that, you understand it, you get its value, you know why it's powerful.

10:13

But you can't have a generic, free ranging, wider set of, conversations and thoughts, around it. So if you take a previous chatbot, for example, which is not powered by GenAI, and you say, and if Miko goes and says, hey, I'm hungry, It won't know what to do, I'm sorry. Whereas these things understand, they adapt, so I think the simplicity from a using perspective is the power. And that's why the likes of my mom and others in the world is talking about it, right?

10:47

Because it's not technical mumbo jumbo that a handful of people understand and you geek out in the corner. I can just use it.

Miko Pawlikowski

10:56

you behind this comparison? This is the iPhone moment for, artificial intelligence in general, in particular, like large language models?

Amit Bahree

11:05

is it the iPhone, the original one, or which was the one which got the 3G support, or when the App Store came up, is it that one? It's some variants out there, right? But, I look at it even simpler, because I think the, iPhone is still a very consumer thing at least. My world is very enterprising. Consumer is one side of the house. Enterprise is a very different kettle of fish in the sense of the problems and what they're trying to solve.

11:29

So I think if you look at a consumer sense, like my mom, that is an iPhone sort of comparison moment.

Miko Pawlikowski

11:34

If you go to Manning.com, you can actually browse portions of the book for free. So if you're listening along to that, go to Manning.com, find the book, and, look for figure 1. 1. It's a graph, that Amit took out of our world in data .org. And it's called 'language and image recognition capabilities of AI systems, have improved rapidly'.

11:56

And it's basically plotting, the human performance benchmark, which goes from minus 100, meaning that it's pretty bad and goes all the way to zero where it's comparable, I think, or maybe equivalent to human.

12:09

And, For everybody who's listening to that as a podcast and not seeing this on video, it's showing different, machine learning, AI, trends, it's got handwriting, recognition, speech, recognition, image recognition, and then it's got the reading comprehension and language understanding and what's mind blowing to me. And I suspect this is why you chose this particular graph is that. We've got the handwriting and the speech recognition that kind of goes slowly, looks linearly.

12:37

there was a little bit of progress and then somewhere in mid 2010s, it just goes out of control and, it goes all the way up, to very good results. And then 2016, I think on the graph starts the reading comprehension. It's basically, An arrow going straight up, same for language understanding. This is within two years. It goes from nothing literally to basically comparable to human performance. why did it happen then?

⁠¶ The Evolution and Impact of AI Technologies

13:07

What needed to happen this is not even a hockey stick. This is just like the right angle here. How do you explain that?

Amit Bahree

13:15

true. I actually never thought of the right angle. I think it's, it's three things coming together, right? So one is aspects of AI and the research behind it have gotten better in that time frame, right? So we started getting deep learning, transformers I don't think quite existed at that point in time. so fundamental architecture changes, or improvements, from a model perspective, model architecture. So I think that's one.

13:41

But I think crucially, maybe equally, maybe more crucially is availability of data at the scale you need. And then also compute most specific GPUs to, train and crunch through these. I think that it's that perfect storm of those three things coming together. if one of them didn't happen as much, it would be still slower. And that's why you see the linear progression in the others versus I don't know, is that a rocket thing?

Miko Pawlikowski

14:05

basically vertical.

Amit Bahree

14:08

so I think it's those three sort of things coming together. I personally believe, I don't think anything was planned or orchestrated. I think it's one of those happy accidents, how GPUs work and the number, the floating points it needs to do for graphics, which is gaming, is the same thing that AI models need to do. We as humans started spitting on more data, maybe thanks to social, thanks to actually iPhones and other smartphones and devices and whatnot.

14:34

And then, cloud capabilities in the context of GPUs and compute, improved. I guess there's a fourth one, which is inherent, but A lot of system engineering things started coming online, right? How do you run these? Because it's not like running them on one GPU, for example. You need clusters of machines. So there's a fair amount of systems engineering, in the sense of reliability, resilience, and so on, under the covers that, that has to make it all happen. Otherwise it won't run.

15:02

Lots of Physics and computer science. I keep saying that to my team, for example. so I think that's maybe a fourth dimension, which most people don't talk about, but, I think those are the things that perhaps enabled a bunch of this to go where we are right now.

⁠¶ Exploring the Landscape of AI Models

Miko Pawlikowski

15:16

There's another interesting reference that you have, it's called a survey of large language models and somehow I missed that I found it in your book. So thank you for that. and I think, page nine is where I found the figure three. it's going to be very difficult to describe on a verbal way, but. Imagine like a little anthill with a bunch of ants in it, swarming. And each one of those ants is basically a model.

15:45

And, the figure is making a distinction between the ones that are basically open source, publicly available, and the ones that are closed source, and it's only. graphing, up to, GPT-4 and LLAMA-2. So there's, way more of that. I think at some point I saw that, hugging face had a hundred thousand models uploaded to it. And I suspect after Lama three, it's probably doubled since, it gives you a little bit of a perspective. It's not just ChatGPT and it's certainly not just OpenAI.

16:16

And it, it shows you how much variety there is. And, frankly, I've been looking at this things for a while now. And I still, there's probably like half of this graph that I haven't actually even heard of, let alone, tried. I keep using this word Cambrian explosion, but it really does feel like that. They're just crawling out of every rock and hole, which is amazing. This is, such an exciting time to be alive, is that it's the right way of putting that.

16:44

why did you choose that figure, for your book?

Amit Bahree

16:48

I had two schools of thought when I originally said this would be a right one. I think one of them is what you touched on, yes, OpenAI and ChatGPT has the world's attention. but there's a lot of other innovation, a lot of other companies, a lot of other stuff going on as well. It's not only that. so I think it is more of awareness in that sense, because the book also is in my personal capacity. It's not a Microsoft-sponsored or a Microsoft book, right?

17:16

So in that sense, I felt I would be doing a disservice if I didn't make folks, at least aware, because you just know what you know. So I think that was my one aspect. I think the second aspect was also. Showing lineage because a lot of these models are complex, as base models of training. they're super expensive, both in the sense of data gathering, cleaning it up, actual training costs, and so on and so forth, which many don't really have the appetite.

17:48

or have the ability resources-wise to do that. So what I also wanted to show was, at the end of the day, it's still only a handful of base models that are further trained or fine-tuned and derived from. so it's a lineage aspect also I wanted to, because that gets lost in the noise as well. and again, the framing of the book is mostly in enterprises, so if you're in an enterprise setting, you just need to know the roots of the model you're using and the lineage it has.

18:17

So you can make an informed decision if that's the right thing or not the right.

⁠¶ Introducing Phi-3: A New Benchmark in AI

Miko Pawlikowski

18:22

Speaking of which, that reminds me, Phi-3 released last week. It seems to be punching above its, weight category, quite heavily. were you involved in any capacity in that project?

Amit Bahree

18:36

in a minor way. So if you go read the technical paper, I'm one of the 70 some people listed on that. it's a team sport. so the team that built the, the SLM, the Phi-3 is originally from our platform team. they've been moved out of that into the new GenAI team we've recently formed and publicly announced. so we work very closely with the team. even though I have roots in applied research, I don't think I can take credit to say I built the model, but I've been involved with it for sure.

Miko Pawlikowski

19:08

you're on the paper. That means you, you built it, you can claim that

Amit Bahree

19:13

I think Sebastian and the others have been very kind where some of us have been involved in providing feedback and input and guidance and what have you. I think they've been quite kind and then they've done the right thing. But that doesn't mean I can take foot credit. the way I think it is, it takes a village. Each village needs an idiot, and that's me. It's an important role. Somebody has to do it.

Miko Pawlikowski

19:34

Oh, wow. That, is a lot of authors. I just opened, and the paper, was released three days ago. looks like it, and it is an impressive number of people working on that. I've been reading people's opinions. I haven't actually read the paper. so I don't know how it explains how it's possibly this good. it happened a few days. Was it a week after LLAMA-3 was . Amit Bahree: Roughly. the main selling point being that they trained it on 15 trillion tokens or some ridiculous number like that.

20:04

And they were surprised that it kept getting better. Sounds like this one was, trained on a much smaller corpus of text. how do you explain, why it's so good?

Amit Bahree

20:15

so there's two things here. I think it's, and it's in the paper, it's 3 trillion tokens. I think the, again, this is a genesis from Phi-2, which is a genesis from Phi-1, which is a genesis from ORCA2. Those are all research models. one of the things we've come around to seeing is in the context of these new categories of small language models, is highly curated data sets is better. so one reason why you see Phi-2 and Phi-3 doing so much better.

20:44

Relative to, bigger models is because, a good chunk of the data is highly curated. There's two aspects to it, which we also publish. So there's this other paper, this textbooks is all you need if you or your readers have seen it. So basically a good portion of the corpus is high quality textbooks as input into the model to train on.

21:06

And then the second aspect of, data is not common crawl sucking stuff off the web, but again, highly curated, web data, or a very small subset of the web data, combined with the, Textbooks. that is also an interesting research thing where it's going now to say is like for these smaller models. How much higher quality data sets does carry a lot of weight. and that's really a lot of what you're seeing.

Miko Pawlikowski

21:35

So When people say curated, does it mean an army of humans? Like selecting, reading that and annotating and like discarding low quality stuff. Or is there like another model that does that work to pre select and it's models all the way down

Amit Bahree

21:53

It's not an army of humans because that's not scalable and doable at, you can do it as a one

Miko Pawlikowski

21:58

trillion tokens. Yeah.

Amit Bahree

22:00

Yes, you can do it as a one off maybe, but, especially. Phi-3 is a product. Phi-2 was a research model, two different things and from our perspective, the minute we're saying it's a product, we release it to production, it has to go through the right rigour and cycles from a Microsoft perspective. That means, We have to support it for a number of years. We have customers who are gonna use it and so on. So we can't just publish it with an army of people. 'cause that doesn't really scale.

22:27

So there is other models helping when you say how do you create it, at least in the context of this, it is synthetic data generated using, GPT-4, but then the humans are involved to make sure that, it is curated. Again, it's not an army of people, but it's. machinery evaluations and so on, machinery running to

Miko Pawlikowski

22:47

So this is synthetic data we're talking about. It's literally all generated by ChatGPT,

Amit Bahree

22:52

most, most is generated by GPT-4.

Miko Pawlikowski

22:56

So that always makes me wonder, if we train things on data coming out of a model. I'm, obviously no expert on this, but intuitively it seems to me like that data generated by GPT-4, any model really it's going to have certain attributes to it that don't necessarily represent, the web, is that not a problem?

Amit Bahree

23:20

Yes and no. I think one shouldn't be using the output of another model as your general data input only. I think you have to look at it in certain domains and specific of what you're trying to do. And then in that context, it would be okay. But that's where the human aspect also comes. You have to make sure evaluations are right. Cause guess what? The old school garbage in garbage out is still very much valid. but I think your intuition is correct in the sense.

23:48

One shouldn't think of it as, 'hey, I can go use an LLM, spit it out, and then use that to go train my own model', in the breadth, in the broad sense of it. but you'll also hear of more recent papers coming, and more recent news where, in general, this is not Phi-3, but in general, we have reached the points where we are sucking in all of the available Internet that one's reachable or allowed to reach.

24:14

And to train the models more and more, we are also then complementing it with the synthetic data, which, other AI is, generating, So I think you have to go put it back in which aspects of your existing models not doing great on, evaluating those, and then using that as a basis to strengthen that dimension, rather than just a more horizontal generic, if that makes sense.

Miko Pawlikowski

24:35

Yeah, It certainly does.

⁠¶ The Future of Small Language Models

24:36

And I think what I appreciated, I actually haven't seen the shortcut, the abbreviation SLMs for small language models until I opened your Oh, okay. which is, an indication of just how much focus we put on LLMs, the large language models.

24:54

And, I think that, to me, at least, I don't know if it's just like the part of me that loves running things on Raspberry Pis and gets excited about the possibility of actually running a decent enough model that I can speak to that actually runs on my phone or something like that. so 3 billion parameters, does that mean roughly with 4 bit quantization that we can run it on effectively any phone at this stage? Like it's going to need maybe a couple of gigs

Amit Bahree

25:21

everyone's asking that. so on a certain profile, so I think we talk about an iPhone 14 with a Bionic processor. You can run it. It can do a certain number of tokens per minute sort of generations. I think. To be able to go run it for Miko or Amit as a, I can run it on a phone and as an experiment, what have you is one thing, versus the ability to run it at scale for a production deployment is a different thing.

25:51

So yes, these are small language models and we do believe how LLMs after ChatGPT became a lot of hype. Some is good, some is not so good. SLMs will be the next set, in the context of the hype. But as I go to remind many of the folks I talk to, it's a small language model in relation to a large language model at the end of the day, I think 2.8 or 3.8 or whatever parameter we have on the mini one, because this is Phi mini, Phi-3 mini. It's also a family of Phi models.

26:25

This is the smallest of the ones that should be coming out and the paper touches on the others, at the end of the day, three billion parameter or whatever the exact number isn't small just from a computer science perspective, it is still a pretty big, complex thing. Yes. Compared to hundreds of billions of parameters, it is small, but it is not small. I think I have to go remind people that.

26:49

In relation, or relative to an LLM, yes, it's small, but by itself, it is still pretty complex and beefy in the sense of compute requirements and GPU requirements and, what it needs. It doesn't mean you'll go off and deploy a bunch of these on your Raspberry Pi with inference and, milliseconds and whatnot.

Miko Pawlikowski

27:12

I'm asking this, but one of the reasons I am asking this is because, I don't know if you followed the launch of Humane AI, that little gadget that kind of looks like something from Star Trek. and it looks like it hasn't been particularly well received because it's a little slow and a little clunky. I think, I watched.

27:31

YouTube review of that and I think they basically destroyed it a little bit by showing just how long you have to wait because it's effectively just uploading it to a cloud somewhere and then downloading the response and it's just not there. And, with Phi-3 and like the smaller models, all of a sudden everybody's thinking the same thing. Can we make it native? I think Apple announced some things about how they're going to work on making sure that the hardware in the newer iPhones.

28:00

to run this stuff at reasonable speed. And this feels like this would be, another hockey stick moment for this things. There's small language models where Siri doesn't suck Hey Google. "Okay Google" works. And Alexa actually listens to me, that kind of stuff. do you think it was that one of the motivations of the smaller model?

Amit Bahree

28:22

the premise you're touching on is, was one of the motivations. So if I rewind for a second, for a large language model, I go back, Again, if you cut through the hype for a second, laws of Physics and computer science. For these large language models, enormously complex, needs a lot of compute resources to run. And like any developer, programmer, computer scientist will tell you, laws of Physics, the scale means complexity, means latency, I have to process more things.

28:52

It takes time to get results back and there is no ways to cut those corners at the end of the day. So where you're seeing latency or things are slower, it's because of that. from our perspective, there's also another dimension to run this at cloud at Azure level globally across the hundreds of data centers and what have you. that's not simple or cheap, So if we can reduce our costs to run this at scale, we can make sure the service is cheaper for our customers as well.

29:24

I think this is also where, we as humans are awesome and we forget things. Because many of these models are exposed as an API. we as, at least developers for sure, have the expectation that it's an API call, so I'm going to get my response back in, milliseconds and what have you. because that's what we have been used to. The difference is, yes, it's an API call, but the machinery that's running behind, including the models itself, is super complex. and when things are slow, we get unhappy.

29:56

So I think that also needs to be a mentorship. So if you package it all of this up, that's a big motivation of, in some cases, a small language model would make more sense. But I also want to outline this. It doesn't have the same power as a large language model. I see a lot of comparisons to the bigger models and all, which is good. It's early days, but at the end of the day, not an apples and apples comparison.

30:21

For example, A lot of people, including me, have been guilty of just using GPT 4 as a knowledge database, more and more people are instead of googling or binging or whatever you do, you just ask the thing. So you're using it as a big, fancy database.

30:33

So if I put that in the sense of the world knowledge, again, it's not factually correct whether it doesn't have the world knowledge, it only has the publicly accessible knowledge as of its cutoff training, but ignoring that point, the small language models will not have that because they've not been trained on that volume of data. So I think the other dimension is whilst the compute profile, what we've been talking is one, you have to think of the SLMs in the right use case.

31:01

What am I trying to do? If I'm trying to do understand an entity in a workflow, I can use a small language model. I don't need the power of these large language models necessarily. equally if there's different languages that one has to use, not English, for example, a small language model may not be as powerful or as good as a large language model. So the way we should think about it is they shouldn't be competing. They're complementing each other.

31:27

in what you're trying to solve at the right step, use the right model because the beauty again is they're an API call. So it's not that if you're developing an application, you're stuck with one thing for the whole duration. You can choose at the right step for the right thing, for the right power. So I use.

31:42

Often with my teams and others, the analogy, like if a GPT or pick your model is like a Ferrari, if you're going to racing, you need a Ferrari, but if you are, if an SLM is like a Honda, and by the way, I don't get pick your brand, and you're stuck in morning rush hour traffic, and Honda is better than you pick the right thing for the right purpose. Is this really what I'm getting into? I would show them the compute profile.

Miko Pawlikowski

32:06

I completely agree. And I think these are separate use cases where I just want my okay Google and my Siri to not suck so much. I want it to understand what I mean half of the time and not have to say the thing three times. Not to wonder every time, what did I say differently now that it didn't catch the song that I wanted to play kind of thing. And that would already be like a big improvement for me, just interacting with that thing.

⁠¶ Real-time AI: The Quest for Seamless Interaction

32:33

when you were saying all those things, I was wondering whether there is a certain kind of minimal level where it will be, a certain number of tokens per second that will feel to most humans as real time is really what we're talking about here and beyond that point, probably doesn't matter. if you can't read it faster than it's being produced and you're not going to have that, that feeling of slowness. And there are some interesting.

33:00

things like Groq, the one with a Q at the end, I think there's suing Elon Musk over there, that has some dedicated hardware, and I saw some demo, I was doing something ridiculous, like 800 tokens a second on LLAMA 3. So was it 70B or something? Is it not just like the matter of waiting like 5-10 years for the dedicated hardware to get cheap and plentiful enough. And it won't be so much of an issue?

Amit Bahree

33:29

that's the whole story of computing history, if you go look at it, right?

Miko Pawlikowski

33:32

Yeah.

Amit Bahree

33:34

As hardware improves, but I think we also have to put it in the context of the scale of use. For example, if you have access to a data center with hundreds of GPUs of today's best in breed, let's say, and there's nobody else in, it won't feel slow to you, it'll be like, what's everyone complaining about? But if in the same data center, you have 4000 other users concurrently coming, it's a different story.

33:58

I think I have to also remind people when you are doing comparisons or the expectations you have to think of in the sense of the load, the traffic, how much, now, we as a cloud provider, that's a lot of our headache and a lot of customers saying like, why do you think I'm paying you? but I also go back, yes, and laws of Physics don't change as well. but overall, I think, Nvidia announced a whole bunch of new stuff. at their conference quite recently, network speeds are improving or have been.

34:26

So if you step back for a second, I think, just history of computing has been that hardware scales up and improves and helps the software. I think the one thing that I'm not, personally speaking, I can't predict the future. The one thing is, This is one of those back to your hockey stick points. The scale is almost at a global level. okay, it's not every human on the planet using ChatGPT or LLMs in some fashion, but quite a big percentage of people are.

34:54

In some manner, on a daily basis, for some people, it is eight hours a day. My mom may be once every other, once a week or whatever it is she does.

⁠¶ The Evolution of User Interfaces: From Apps to Voice Commands

35:04

but the breadth of humans using it is much more broader than it ever has been. So with that context, even as other underlying system and hardware improves, I think the perception of, is it actually getting improving, maybe slower than perhaps in the past where it was still more niche, if that makes sense.

Miko Pawlikowski

35:29

It does. And I think to follow the train of thought that you started here, the potential is probably higher just because it's so much more intuitive. You just talk to it. my mom when she needs to install an app, it's a whole thing. It takes a while. She needs to get used to it. She needs get comfortable with it, needs to remember the password. There might be another pin.

35:51

it's a whole thing, but, once she gets some kind of interface that's built into her phone, or whatever, where she can just talk to it, that kind of clears a lot of barriers and, a lot of people are picturing this feature where your phone is slowly turning to effectively listening device from Star Trek and, It's just doing what you want it to do. And maybe integrates with all the apps. I ordered that Rabbit R1. I'm still waiting.

36:18

I don't know where the delivery is supposed to be, but, that's one of the visions of the future is right there. You just talk to it and the model does things on your behalf, goes to this dodgy apps and clicks things, and you don't have to worry about that. And you don't have a learning curve. And I think that's a vision of the future that excites a lot of people. And, I suspect we might see something like that in the near future because I don't see any roadblocks for

Amit Bahree

36:44

No, I actually argue the other way. I see it actually is already happening now.

⁠¶ Democratizing Technology: Real-World Examples of AI in Action

36:49

And I can give you two, real examples. for example, I'm originally from India. And in India, as much as the country's made progress, there's still a, decent percentage of the population who is not very literate. Either they haven't finished school, or they dropped out early, or they've actually not gone to school. Now, it may be a small percentage at a country level, but if it's a country with 1.4 billion, a small percentage in absolute numbers is still a big number. a chunk of humanity.

37:21

And in that, we're seeing, for many people who are not comfortable reading or writing, some of the cheaper devices they have, it's not an iPhone or an Android phone, but they have a speech. So they print, there's a big mic.

37:36

In the middle of the phone, they can press that and talk to it and actually in natural language, in their language, they're asking questions and talking to it and that is, as it happens in some of these cases, it's out of some of our speech AI, which is understanding it and then responding back. So it's lowering the barrier and opening up this to a broader segment, which in the past was not possible.

37:59

so that's one example, because they don't need to know the language to go type it in or what have you. They can just talk to it normally how they would talk to it.

38:06

And then the second one was actually more of a ChatGPT example, which I think, Microsoft also published where it's, plugging in different languages for, again, in rural areas in India as farmers, like India is not, For those who don't know, it's not like the U. S. and others, where you have big farms with, hundreds and thousands of hectares or acres. They're usually small farms. Usually it's, the family which owns it.

38:31

And they don't really have the muscle at an individual level, to go understand pricing and markets and what's happening as they want to go sell there. green or whatever they're growing. So in that sense, we talked about democratizing was how they're using ChatGPT to actually, getting basically real time market information.

38:48

So they're empowered to go make a better decision, which until now was impossible because you need a computer, you need a modem, or you need to be online and those are the barriers. And it's like, they don't know how to use it, or in the language that they understand.

39:04

So these are actually happening today, like in production, so to speak, live, and the way we want to think about it as is democratizing AI, which is when I go back to how you started asking me the question, when we started talking, if I go back to my mom's or the example you used with your mom of the barrier of a new app or a new interface, if we free them up or make it easier in many ways, those are the democratizing, elements that is happening is not only about your, Okay.

39:32

how literate you are or not, but it's the, it's easing barriers basically. So of course it doesn't do everything. It doesn't mean all barriers are gone, but we see a lot of real examples, day to day life things that people are using it, which is absolutely fascinating.

Miko Pawlikowski

39:48

there was this very popular demo of, I think it's called bland AI where they had a billboard with a phone number to call and you can have thousands and thousands of parallel conversations with an AI to do things like booking and, Basically get like a first line human in a, experience really. And the demos were amazing.

⁠¶ The Dark Side of AI: Ethical Dilemmas and Security Concerns

40:09

And there's, like a million startups doing things around that at the moment, it also obviously has the dark side, right? Where people are worried that what does it mean, Can you go and sway an election now by just calling everybody in the US and telling them something that they want to hear and, personalize the message.

40:28

It is a brave new world, a weird world that we're entering here, Where some things that, you could always technically go and call everybody in the U. S., but it would take a while. now with those things, maybe you can do it convincingly in a shorter period of time, And maybe not that expensively. does that scare you?

Amit Bahree

40:51

yes and no. I think that's true with any aspect of humanity or technology. You can use it for good, you can not use it for good. And it's a choice you have to make. So I think that's sort of one. So in that dimension, it's not something new that we haven't been doing. I think what is new or what is more dangerous, if that's the word I want. I'm not sure if that's the word I want, but I can't think of a better one. More concerning, is how easy it is. And unless What things to watch out for?

41:20

How do you know what's true or not? So I think there's of course dimensions into it where we as humans have to recalibrate ourselves on, do I trust it or not? For example, Robocalling has been around for decades. The fact that I can cheaply call everyone is not the problem. now it is, it may sound like Amit or Miko's calling, which in the past You know it's not Amit or Miko calling. I think that's the really, things to think about and worry about.

⁠¶ Navigating the New Landscape: Security and Ethical Considerations

41:44

The way I reposition it as well, from a Microsoft perspective, and also I have a whole chapter in the book on that, is, the, there is new emerging threats from a security. So if you think of a traditional security aspect of your application or developer, DevStack, The way we're saying is look, there's additional new security threats you have to go think about.

42:07

And it's easy to get wrapped up in all the negative, but if you step back and say, look, as there were paradigm shifts, as you went from client server two tier, and I'm going to show my age now, applications to distributed applications and then to web applications, There's a lot of goodness, but then it also opened up the exposure to, a different threat vector. The surface area was different. In some cases it was broader, in other cases it was actually contracted.

42:32

And in that sense, this is no different. There is new emerging threats you have to think about and be cognizant of, and then also understand what is the risk of that. And sure, a threat could happen, but how often will it happen? And how do I mitigate that? Uganda will solve 100 percent everything. But you have to then hone it back down into what's your use case, how you're thinking about it, and so on.

42:53

so instead of, either ignoring it, which is not good, or putting your head in the sand like it's all doom, neither of those dimensions are going to be helpful. So I think part of it is understanding that, yes, there is a new set of threats that are emerging. Be aware of those. How do you solve for those? How do you manage those? And then In the context of a use case, in the context of how you're using it.

Miko Pawlikowski

43:18

It's a little bit like passwords, isn't it? We rely on the fact that it's not practical for someone to go and brute force your password because it would take a thousand years. And if someone goes and figures out how to make a computer that goes around the limitations of Physics and can do it a thousand times faster, all of a sudden a lot of passwords would be useless. And I think it's a little bit like that, right?

43:48

we got a technology that made things possible now, that we're relying on them just not being practical from, time and cost perspective. And now we have to deal with that. And the genie's out of the bottle, as they say, I think. and the cat's out of the bag.

Amit Bahree

44:04

That's a great analogy. I actually like that. I'm going to steal that in other places. But you're right, like there was a time where we didn't need passwords. It wasn't a problem. And then there was a time where we needed passwords, but it was simple passwords. You could do hello1234 or password1 or what have you. And then it was like, time where, okay, it needs to be a little more complex. And now you can find these, buy these on the dark web and all. and hence you need more complex passwords.

44:29

my one PSA is please use a password manager. 10 years, 15 years ago, if you were chatting, the concept of a password manager would be so alien. And here now, I'm sure as you do, I do tech support for my family. Unpaid, of course. my everything is go use the password manager. Here's how you set it up. And why shouldn't you reuse passwords? And let the thing do the heavy lifting for you. But you save it, right? With your master password and whatnot.

44:56

I think it's, yeah, it's the same analogy in that sense. it goes back to your thread vectors change society. Things are changing and, part of it is adapting. Some is good. Some is not good.

⁠¶ Generative AI in Action: A Guide to Practical Applications

Miko Pawlikowski

45:07

Let's circle back to your book. ultimately, that's how I learned about you existing. So as I was reading it, for anybody who's, interested in, go and pick it up on, manning.com, it's a very practical guide. It's called "Generative AI in action" for a reason. There is. Little time spent on the underlying details.

45:32

There is obviously the intro that covers everything that you would expect in terms of what is generative AI, the architecture, high level, what it means, references, overview of LLMs, transformer, smaller language models, that kind of stuff. And then it turns into basically a guide to show you what's possible with it, show you how you can go and call some API and get magic text being generated.

46:02

It shows you how to generate pictures, shows you how to generate other things like music, video, I think briefly code, all that kind of stuff. So I'm picturing this really as a kind of guide that you get yourself when you want to get into this without wasting any time on things that are not necessary for your journey, I will get you from zero to one on that. is that accurate description? Am I doing a good, marketing pitch here?

Amit Bahree

46:30

Mostly. Yeah.

Miko Pawlikowski

46:31

Mostly.

Amit Bahree

46:34

I'm not in marketing. Yes. I think that is an accurate description. I think the emphasis is on the "in Action" part, the premise of this is, you want to go build an app and right now. I go back to my year and a half of conversations from CEOs down across the Fortune 500 or whatever, which is our, from a work point of view, right?

46:55

A lot of these large enterprises, but this is not about just large enterprises, it's about if you're a company, you have a set of products you want to improve or make new, how do I use this GenAI and ChatGPT and LLMs and everyone's heard about it. And they don't know where to start or how to start. So that's really what I was trying to do, right? There's broadly speaking three parts to the book.

47:19

The first part is introductions and because you just know what you know, I can't just go dig into things without giving you some context and basis on what's possible, what's not possible. and that's the first part you're touching on. What I stay away from is it's not a science research book. I link to papers where there are people who generally are curious or they want to go deeper.

47:43

So we leave those crumb trails in a way saying, if you want to dig more in your own time kind of a thing, here's the things you can go read up and then that'll expose you to more dimensions, right? So it's not a science book, techie book in that sense, because At least in an enterprise setting, most developers and CTOs and CIOs or CEOs, they want to see like, how is it going to solve my business problem? How do I do it?

48:09

Some are interested in the science and the depth, but most just want to know at a high level, how it works deep enough, but not in the guts at least on the AI side of the science. so we leave the breadcrumbs and the trails pointing to papers where people can go deeper should they want to, but If you're a developer and you can use a set of APIs and SDKs, that is really for you and the way we say is because these, at least these LLMs are exposed as an API.

48:36

You really don't need to know any of the AI sort of mumbo jumbo, any developer can pick it up easily. So that's certainly why I was trying to position it. Part one is getting you a sense of the world from a technical perspective, but not go super deep. And then part two and part three is where we start going deeper on, okay, how do I use this in my production, solving my business problem, what I'm trying to do.

Miko Pawlikowski

49:01

making it very applicable, for example, at some point, the book is talking about image generation. And there is a short description of generative adversarial networks, and it doesn't include Ian Goodfellow getting drunk and going to a fellow student's graduation and then arguing with them and then going home and implementing a proof of concept algorithm to prove the other people wrong. And, next day discovering is actually working. It's giving you the kind of applicable.

49:34

This is used for scenarios where the data is complex and diverse, requiring realism, suitable for high quality images, data augmentation, style transfer. So it's prescriptive in a way, I would say, you give people, what they need to, get to get cracking with it.

⁠¶ Exploring Image Generation Techniques and Their Applications

49:51

speaking of which, let's talk a little bit about the images because you do cover a few interesting things like the VAE, the GANs, diffusion, vision transformers, to give people a sneak peek of what they're going to expect. Can you talk about why they're interesting and why they might be, something that you should be paying attention to. What are the breakthroughs?

Amit Bahree

50:14

I think one aspect is where ChatGPT and the LLMs is just the language part, is taking the hype. And I think most people understand that there's a different set of tech related but different on images, right?

50:26

and image understanding, image editing, the power of it on one hand, wherever you go on whichever social thingy with stable diffusion came out, lot of creativity on the image generation was there, but in a social setting, the thing really is, how do you expand that in a corporate, application setting and what can you do? It's one is like fun and wonderful in a personal social setting. But then how do I transfer that and then which area do I use in a work setting?

51:02

Not even have to be work, each of these techniques have their own power. I think most people don't really care or maybe nor should they care, but in some cases where it would matter It's good to know what is the underlying tech, so I know what to ignore versus not to ignore. Because, again, the hype wraps up a lot of this. If you come back to it, it's more of helping people ground themselves a little, because at the end of the day, the tech is still the tech, right?

51:28

What it is meant to do and how it is meant to do doesn't fundamentally move. if you're trying to solve one set of images, one kind of things, like diffusion models would be great for, that set of categories. And now there's multiple diffusion models. You can go pick which one you want. versus a transformer model. So again, we don't go, I have a few diagrams and images to, outline at a high level how these are, because there's papers on each topic, you can go read like hundreds of them.

51:53

but the intention is just to know, look, there's different buckets and categories. Each has its own strengths. And if what you're trying to solve for, just make sure you connect those dots. I guess the other analogy is if you're writing a book, word is easier than notepad kind of a thing, right?

Miko Pawlikowski

52:10

I was a little surprised to see a prompt engineering chapter, but I guess it makes perfect sense. You need a little bit of basics. What was your thinking, with that chapter? What was the goal you wanted to achieve with it?

⁠¶ The Art and Science of Prompt Engineering

Amit Bahree

52:26

in the context of LLMs, like prompt engineering is pretty crucial. it is how you steer the model fundamentally in many ways. the beauty of it is half art and half science. The frustration of it is it is half art and half science. but, fundamentally, at least with today's technology of where things are, prompt engineering is quite crucial.

52:50

And the way we also tell many of our customers and I tell is look, you have to start thinking about prompts as your IP in many ways, and I'm not talking about simple prompts. Like I, in the book, I use simple prompts to make the point. So 'tell me a story about a panda' is not really IP in the context of a prompt. and then prompts are also closely tied to how a model understands it.

53:14

So again, outside of simple prompts, where you are When you're using this concept of RAG, for example, as you start using a specific model or a family of models, which are closely related, you start picking up nuances on how the models interpreting things and working with things and so on. And then you're tweaking your prompts along with that, right? So it's cohesive together. and that intuition as you learn is also part of your IP and how you want to think about prompt engineering.

53:44

That also means. There is no universal prompts. Again, outside of the simple ones, I'm not talking about the simple, straightforward prompts. So you should not, or one should not just say, if I'm, let's say, using GPT 3, 4, whichever, the same prompts, which are complex ones, I can pick up and expect to work on, let's say, LLAMA or something else. They will work, but do they work at the same level and the same evaluation, the same criteria?

54:07

Probably not, because they are quite tied into how the model behaves. This is very loose, right? It's not a scientific thing, prompt engineering is quite crucial. It is how we talk, even though you are calling an API, but how you're talking to the model is through those. so I think it's worth spending time to understand how, what these are, how they work. There's a lot of hype around prompts as well. I would say don't believe all of it.

54:32

The one final point I want to make on it is prompts is also one of the new threat vectors. So I touch on it in a later chapter, I touch a little bit on prompt injection in the chapter you've seen. but we go a little more deeper in one of the later chapters. But prompt injection, as an example, is one of the new threat vectors. It's not the only one. so again, understanding that as well, but prompts, in today's world gets quite crucial.

54:55

At the end of the day, it's how we, in quotes, talk to the model.

Miko Pawlikowski

54:58

That makes sense. prompt engineering might be getting a little bit of bad rep just because of how many people are walking around saying that they have the ultimate prompt stuff like that. But at the end of the day, you do need to learn how to to these things. And it is one of the biggest frustrations. It's almost like, You're talking to a cat sometimes it can suddenly freak out and do something very weird at that moment notice. And there is little you can do to prevent that.

Amit Bahree

55:28

Yeah. and so we call it, or at least I call it, is you have to think of prompts when you're talking to the model as a parent. So for those who have had children or who are toddlers right now, it is what we call parentology. Somebody said this to me in one of my meetings and I loved it and I stole it from them. So if you're a toddler, your memories retention is lower. so many often you have to keep repeating. It's the classic, don't stick your finger in the wall socket, a thing.

55:58

Saying it one time doesn't help, you have to keep repeating. The way I want, generally speaking, folks to think is, your model's like a toddler, you have to keep repeating, keep thinking about it, right? and as silly as it may sound, it's like basic stuff, like for example, one of the side effects is what's called hallucinations where, non grounded. So you will get responses back, which are made up and it's not factual. That may be okay in one dimension.

56:22

If you are writing a creative story, it may not be okay in another dimension where in a business setting, you're answering things based on some policy or information or what have you. so in the prompt it figures like simple things like do not make up any information, only answer from this, you would think that would be obvious, so your intuition of a cat is not very far off

Miko Pawlikowski

56:48

that's an interesting comparison. Let's do one more. you talk about RAG in your book and I think a lot of people, I have heard the term and know that there is something to do with getting fresher data, can you give us an explanation for a five year old version of what that is and how it works

Amit Bahree

57:10

should open ChatGPT on the other screen and say, 'explain RAG for a 5 year old in summary'. RAG is Retrieve, Augment, Generate, right? So the technique originally came from Meta, Facebook, as the research paper. But fundamentally, it is crucial when you are using large language models, specifically in the context of a company or a business or what have you.

57:36

Basically, it is also a little clunky right now, but what it does is, as the name suggests, the model that you're using just knows what it knows, what it's been trained on, which is public data. That's one. And then as with these things, there's a training cutoff, right? At some point, you say, okay, I'm done collecting data.

57:54

I need to go off for a few weeks or a few months or whatever it is and go train this thing and then spit out a model and go through a bunch more other alignment and this and that, and then eventually have a model available. So Online, when you go and see, a lot of people using RAG to get fresh information, which is post training data, that is absolutely valid use case. For many others, the other thing is my proprietary information.

58:20

So especially in a company setting, your proprietary internal information, corporate knowledge, the model doesn't know because it's never seen it. In fact, if it does know that, then fundamentally there's a different problem. Because it shouldn't know that. for your business workflow, you often need to bring in your internal proprietary knowledge, whether it's a CRM or a database or an ERP, or you're solving a ticket or what have you, depending on the use case.

58:49

The only way the knowledge, you can bring in the knowledge is through This technique of RAG, so retrieve, augment, generate. Retrieve means I'm retrieving the information, which could be from my corporate enterprise systems, or, Google or Bing to get more fresh information. I'm augmenting it in my prompt, which goes back to prompt engineering. And then based on that, I'm saying please generate, or whatever I'm trying to do.

59:14

Generation could be a summary, or entity extraction, or depending on, whatever I'm trying to do. But that's what RAG is doing. It also gets clunky, by the way, because, it is the first generation. I do expect things to get improving in that.

59:27

you talk about complexities of RAG, but if I have to get proprietary in-house information, if I have to get more fresher information, the only ways I can do that is through RAG, without retraining a whole model, which, in theory, is an option practically for, I guess 99% of people is not an option. I don't know if that was for a five year old, but

Miko Pawlikowski

59:50

Yeah, that might have been a six and a half, maybe even seven, but I let, we'll let it

Amit Bahree

59:56

Thank you.

Miko Pawlikowski

59:57

time. Okay. so this is basically what, you're going to see if you look at, the early access version of the book, tells me that there is six more chapters coming very soon and there's a chapters 8-13 that cover things like, More on RAG, telling models, application architecture for GenAI apps, can they have evaluation and ethical on GenAI. I, I think at some point we're going to have to get you back to talk about the rest of the book.

01:00:31

but before I let you go, I wanted to, ask you for a few predictions.

⁠¶ Future Predictions: Multimodality, SLMs, and System Improvements

01:00:37

from where you stand, where, are we going to see the next evolutions and breakthrough,

Amit Bahree

01:00:42

one is , Multimodality, which basically, a lot of people today, primarily when they're using GenAI and the likes of ChatGPT is in one mode, i.e. language, text. But I do expect multimodality where I'm starting to combine language, images, text, video, and what have you, together. Not just generation, but input. We already are seeing that, by the way. That's already here today, like GPT V, which is vision, being one example of that.

01:01:10

But more and more multimodality, because our real world is that as well, right? So I see one and that happening. I do see SLMs to accelerate more, as we touched on. Again, they're not better, they're different. there's times you need one, and there's times you need the other, and there's times you need both. But, I do see more and more on that front because for many use cases, I need simple things. I don't need all the other power. so I do see that accelerating a lot.

01:01:37

And then I also see a third dimension is the, underlying, systems engineering things improving to be it cost effective from how much hardware and GPUs I need to run it to, latency around it, things like memory profile and so on and so forth. so I do see those sort of three and I guess I want to sneak in a fourth one, which is also all of the responsible AI aspects, which is one of the later chapters, we touched on the likes of prompt engineering.

01:02:04

I know I talked a little bit on hallucinations, but the new harmful things one can do. That's also a cat and mouse thing. I do see more research breakthroughs

Miko Pawlikowski

01:02:15

Do you expect we're still going to be doing transformers a year or two or three from now? Do you think it was big enough of a breakthrough that it's going to stay

Amit Bahree

01:02:25

I honestly don't know what I can tell you is it's what everybody's doing at the moment, which is not going away anytime soon. That's one side of it. Having said that. I think it's also pushing a lot of other areas around it where we can do things better. for example, we didn't touch on it, but each model has this concept of what we call a context window. How much, how big my prompt can be and in reality how many tokens can it be and how much can it send back?

01:02:52

So on one hand a lot of people get happy Hey, if I have a longer token, my context window is longer. It means I can stuff in more things I can ask you more things or I can generate more things On one hand, that's good. People get happy about it. What then I come and have to remind them like each token one length Increase is a quadratic increase in compute So it's four times, extra costly in the sense of computing profile. So just having a longer token, context window isn't necessarily good.

01:03:26

So there is research going on now to say, how can we do that? How can we, derivatives of the transformer architecture, which is, how can we increase? the token windows without having a quadratic increase on the compute profile. and that ties back to the attention mechanics of how the transformer architecture works. the way I would say it's the first one which has, reached the scale. And then now there's other research damage that's happening to make those profiles better.

01:03:51

will there be another big two? Which is, better than this, I'm sure, in the sense of humanity, absolutely.

Miko Pawlikowski

01:03:58

plus like you alluded to, there is a lot of value to being the first thing that's good enough, right? when we look at how technology works, it's better to have something that's good enough today than to have the perfect or, ideal solution much later. And typically there is enough momentum. By the time the better thing comes that it might not be as, attractive as one would think.

01:04:24

I think what was the paper from Google talking about basically a method of achieving infinite, attention span, or was that some other research that you're alluding to?

Amit Bahree

01:04:33

There, there's a few, there's a few papers. So there's one, in fact, Microsoft has on, like, how can I do 2 million tokens. that's one example. There's another one which is research going on called Ring Attention, which is different. I can't remember, I think it was Google? I can't recall off the top of my head. so there's multitudes of things going on. In parallel, like active research, on, how do we look at this differently. And then, that's just a context window.

01:04:59

There's other things, for example, like when we touched on RAG, I said, it's a clunky way of doing things. we didn't go deep in it, but it's a clunky way. So there's other things happening, like graph, can I do graph with RAGs, and so on and so forth. it's not only in one dimension, I was using one of these as an example, but across multitudes of dimensions. There is active research going on to improve those.

01:05:20

And as you start, as that starts formulating, cause look, research is one, getting something as a product that is deployable and running. That you can consistently, is a whole separate sort of scale and of its own separate complexities. but across these multiple dimensions as they come together, it'll just suddenly get improved. To your point, this is like version one, and it's a mad race across the board from academia to, commercial and whatnot. So it'll just be improving is how I see it.

⁠¶ The Role of Open Models in AI's Evolution

Miko Pawlikowski

01:05:55

the open models eventually prevailing and taking over? there's a lot of talk. Obviously people are excited about LLAMA-3. That's I think a lot of people call it GPT-4 class, comparable, a model that's effectively free to use. And, obviously Microsoft doing their own, research and releasing the Phi-3, in the open as well. Do you see this models eventually becoming the defacto standard?

Amit Bahree

01:06:26

they certainly have a place, for sure. I think, there's no question about that. I don't know if they're the de facto standard or not. I think the challenge would come down to is at the end of the day, with the current state of technology, training a model is super expensive. There is no shortcut around that.

01:06:44

So even if you have an open source model in the sort of near term, there's only a handful of companies who have the technical know how have the sort of the muscle, have the compute profile to be able to do that. so more and more until again, unless, as there's more fundamental breakthroughs to open that up more. a lot of the open source, it's back to that, the ant analogy used for, with the, one of the diagrams we have from the paper in the book.

01:07:14

the part I was trying to show there also is the roots are very few models that are derived from that. So I think once there's a lot happening, it's at the end of the day, there'll be just a handful of people who are publishing those and exposing those, that others are deriving from. So until that happens, as in fundamental breakthroughs at a cost point of view, where it becomes cheaper.

01:07:38

It all doesn't need hundreds of thousands of whatever it is, GPUs plus I don't know how many billions of tokens of data, to train them.

Miko Pawlikowski

01:07:47

Oh, don't worry. I

Amit Bahree

01:07:47

it won't be

Miko Pawlikowski

01:07:48

my crypto mining farm in my garage.

Amit Bahree

01:07:54

There you go. That is one way to do it. I think open source won't still be constrained with a few source models, which they go derive from.

⁠¶ Reflecting on the Rapid Advancements in AI Technology

01:08:03

But the fact, if you, the other way If you just look in the last one year, 12 months, which is nothing in the sense of humanity and technology, just see how much progress and how much improvement the models have made across both dimensions, whether they're open source or closed source or what have you. It is fascinating. and, the fact that there's literally new models every day is a good thing, but also not a good thing. So it has to stabilize to some extent. At some point it will.

01:08:28

but I think the open source community is absolutely critical. On the flip side, a lot of research breakthroughs also are coming from the research labs where, there's, at the end of the day, deeper pockets and muscle sports in the sense of financial and compute and data as well. It's a fascinating world we are in, which is your, one of your opening statements, because at least for a geek and somebody in the industry, these are far and few moments that one gets. So it's absolutely fascinating.

Miko Pawlikowski

01:08:59

Yeah, we'll be Sitting down with the grandchildren saying, ah, I remember in my day They released the first capable.

Amit Bahree

01:09:10

they're like, what? you use hundreds of GPUs and all this stuff? Why? I can just run it on my phone or whatever the phone looks like. I don't know.

Miko Pawlikowski

01:09:18

Yeah, exactly. You are so wasteful back in the day. Really very clunky

Amit Bahree

01:09:24

That's right.

Miko Pawlikowski

01:09:25

Well, we're going to have to wait a little bit until that materializes, but I completely agree It's a very interesting time to be alive, and I'm certainly grateful That I get to experience that. Amit, it's been a pleasure to host you. Thank you so much for coming

Amit Bahree

01:09:42

Thank you for having me.

Transcript source: Provided by creator in RSS feed: download file

HockeyStick #8 - Generative AI in Action

Episode description

Transcript