HockeyStick #3 - Generative AI with Mark Liu - podcast episode cover

HockeyStick #3 - Generative AI with Mark Liu

Apr 15, 20241 hr 17 minSeason 1Ep. 3
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Exploring the Frontiers of Generative AI with Mark Liu

In this episode of HockeyStick, host Miko Pawlikowski interviews Mark Liu, a finance professor and seasoned coder, to dive into the world of Generative AI. Liu, author of 'Learn Generative AI with PyTorch', shares his journey from finance to AI, emphasizing the importance of coding in modern finance and his transition to Python for teaching analytics. He recounts the creation of his books, starting with Python applications in finance, moving to machine learning and generative AI, and finally to PyTorch for efficient model training. The conversation explores the basics of generative AI, breakthroughs like GANs, transformers, and diffusion models, and Liu's predictions for AI's impact on future jobs and technology development. They discuss the significance of generative AI in various industries, ethical considerations, and the potential directions for Liu's future work, highlighting the critical role of hands-on learning in understanding and leveraging AI technologies.

00:00 Welcome to HockeyStick: The Generative AI Revolution

00:25 Meet Mark Liu: From Finance to AI Pioneer

02:56 The Journey of Learning and Teaching Python

08:00 Exploring Generative AI and PyTorch

10:39 The Magic of Generative AI: From Text to Lifelike Images

14:50 The Impact of AI on Industries and Job Security

21:18 Understanding the Breakthroughs Behind Generative AI

34:16 The Role of Hardware and Data in AI's Rapid Advancement

37:15 Exploring the Versatility of GPUs

38:17 The Future of AI: Predictions and Possibilities

39:49 Navigating the Hype Cycle of AI

42:09 Ethical Considerations and the Future of AI Regulation

45:39 The Impact of AI on Politics and Elections

46:58 The OpenAI Controversy: From Open Source to For-Profit

51:02 PyTorch vs. TensorFlow: Predicting the Future

54:21 A Deep Dive into Generative AI and PyTorch

01:14:22 Future Projects and Closing Thoughts

Transcript

Welcome to HockeyStick: The Generative AI Revolution

I'm Miko Pawlikowski and this is HockeyStick. Generative AI is on everyone's mind. From essays to photorealistic pictures to high quality videos it has changed the way we think about creativity and intelligence forever. If the AI won't steal your job, but somebody using AI will, then the best defense is to learn how this technology works ASAP.

Meet Mark Liu: From Finance to AI Pioneer

Today, I'm bringing you Mark Liu, the author of Learn Generative AI with PyTorch, a tenured finance professor and the founding director of the Master of Science in Finance program at the University of Kentucky and a veteran coder with over 20 years of experience.

In this conversation, we'll talk about learning through doing, how everybody can build generative AI models, the various breakthroughs that allowed for the current AI explosion to take place, and make some wild predictions about the future. Welcome to this episode and please enjoy. How are you doing today? Pretty good. Thank you Miko. glad to be here. Yeah, I'm very excited.

not only because I'm hoping to learn so many interesting things from your book, but also because I'm very curious, how does somebody who's a founding director of a master of science in finance and a tenured professor in finance, decide to go into AI. Tell us a little bit about your story. it goes back to, like five years ago, in 2017, our department wanted to launch a Master of Science in Finance program. And it is that point, I've been tenured for about five years.

I was always, very adventurous, trying to do new things. I was appointed the founding director to start an academic graduate program from scratch. And, I was very much into it. it was a lot of work. But I thoroughly enjoyed it. So our program launched in fall of 2017. And it's a one year program. at the end of 2017, We started to, place our students. the very first year we had 30 students in the program, which is a great number.

And, I talked to many employers, many companies, trying to place our MS Finance students. I heard the same thing again and again. they told me that they want somebody who not only knows finance, but also knows coding programming analytics and the number one programming language in finance is Python and I've been doing programming for many years So those are mainly, statistical, software to run regression for the finance research.

The Journey of Learning and Teaching Python

And then I had to learn Python from scratch in order to teach my students. And it turns out that Python is a very user-friendly programming language, so even if you never programmed before, you can guess what a block of code is trying to accomplish.

I started to run Python workshops to MS finance students and gradually I accumulated a lot of teaching notes and I also had to convince my students to use Python, because some of the students said that, "I can do everything in Excel, why should I learn Python", right? And then I told them that, Excel is not exactly a programming language, and you do need a programming language in order to automate things to make sense, more convenient, the bigger programs, that kind of stuff.

So what I did was I started to create fun projects in finance, like speech recognition and text to speech. So one example would be I add those features to a finance calculator. what you can do is that you can actually speak to a computer, and ask the computer to do a finance calculation. you can tell the program in a human voice "what is the present value of $1000 in five years", And then the program will do the calculation and tell you the answer in a human voice.

and then that caught a student's attention. So I started to do those kind of applications. And then after a year or so, I had plenty of projects. And then some students told me "you should write a book about it". So I started to, send the manuscript to no starch, press to publish the book.

The moment my colleagues, or my students, or a lot of my friends, even my family members, heard that I'm writing a programming book, In Python about the speech recognition and the text to speech, their first reaction was, "I thought you were a finance professor" that question came up again and again. And then I gave them a famous quote by a chief risk officer from Deutsche Bank. "banks are essentially technology firms now".

So there is a lot of truth in that because in order to be in the field of finance, you need to know a lot of technology, know programming, know, analytics and so forth. So that was my first book. in 2020, it's finally published in 2021. So I think I, signed a contract with them in 2019. And then after that, I. Started to, teach a course in the MS finance program. So it's called, Python, predictive analytics. so use Python to do machinery models.

for business analytics, and, I started to, teach students a lot of machine learning models, including, deep neural networks. And then, again, I, accumulated a lot of, notes. And then, I came across a video from DeepMind, showing how you can actually play Atari games like, Breakout, by training a computer program to play the game, at a superhuman level.

So what happened was, not only the computer program learned,, To play the game, it actually figured out a way to score very efficiently, a way human beings didn't know before. So you, dig a tunnel at the side of the wall, and then you send the ball to the back of the wall to score it very efficiently. When I saw that video, I was completely amazed. I told myself, "I gotta figure out how this works".

I spent several months experimented with different kind of programs, trying to figure out how it works. And eventually I figured it out. And that became my second book. it's machine learning animated. So it's published with CRC Press, last year.

Exploring Generative AI and PyTorch

And then, recently, once, ChatGPT was out, generative AI was very popular. I was very curious. I was trying to figure out how exactly a large language model works, and how a computer program can understand the human language. I spend a lot of time trying to figure it out. Before I was actually using TensorFlow. It worked pretty well for me with Atari games and so on and so forth. apparently it's not great in terms of GPU training. You can do GPU training, but there is an overhead.

So you have to program everything in CPU and then send it to the GPU. Do the calculation and then send it back. the overhead is just too much. So it ended up, not very fast. then I learned another AI framework called PyTorch. you can explicitly send a tensor to GPU to do the calculation and so on and so forth. It's a little more complicated than TensorFlow because you do have to send something to GPU and then, get it back.

So in terms of coding, you have to do a slightly more work, but in terms of performance, it's amazing. So I get to, train models. 7 to 10 times faster, compared to CPU training. as all those large language models, they have billions or hundreds of billions of parameters, right? So the speed is crucial. RIght now, I'm like, training a model with millions of parameters. which is fine. So for, even larger kind of language models, in my third book, which is with, manning publications.

So in this book, I'm doing generative AI with PyTorch. the reason I switched to PyTorch is because of dynamic, computing, graph, and then, the GPU training. I can train most models in a matter of minutes. sometimes I get a larger ones, maybe a couple of hours. That's it. I can see the model in action and then I can tune the model so that's the third book.

The Magic of Generative AI: From Text to Lifelike Images

So let me, conclude by quickly summarizing what I'm doing in the third book. the name, I think you just mentioned at the beginning. Learn Generative AI with PyTorch. Readers learn to create generative AI models from scratch, to create the different contents like, images, shapes, numbers, text, music, sound, so forth, all with PyTorch and deep learning models. And in particular, readers learn how to create.

a ChatGPT-style transformer from scratch, and then in particular, I teach readers how to create a GPT-2 XL with 1.5B parameters Of course, with 1. 5 billion parameters, it's very hard to train, right? It's very slow, number one. Number two, GPT-2 was trained with huge amounts of data, and regular readers don't have access to this training data, right?

but, I also teach readers how to extract the pre trained weights from OpenAI and then you load those weights into the GPT-2 model you created from scratch, and start to generate the text.

So the text you generate is very coherent without grammar errors, it's amazing, of course it's not as Powerful as ChatGPT GPT-4, but a normal person without access to super computing facilities, without access to larger amounts of training data can create a ChatGPT-style deep neural network from scratch, and use it to generate a text and generate a lifelike music. It's amazing. And that's the text part. on the image part, you can create like a color image.

You can also convert a horse to a zebra. You can convert blonde hair to black hair in images. You can add or remove glasses in images and so forth. So the whole experience is amazing. it worked better than anticipated. And that's a whole experience. Reminded me of famous quote, "technology advanced enough is indistinguishable from magic". The whole thing is really magic. That's my long answer to your question. Thank you for that.

just for anybody who's not familiar with Manning, the book, is currently available in what's called MEEP. That's for Manning Early Access Program, you can read the chapters as, they are produced, by Mark. So at the moment there is five chapters that are available, but I'm being told that 11, will be coming, very soon.

And the estimated time for the whole book to be available is May 2024, so for anybody who's eager and who might be thinking that the book is not finished yet, you can actually start reading it right now. speaking of the magic and the building from scratch, I think what I liked the most about your book, and what initially attracted me to actually go and read it, It's that 'build from scratch' thing.

And I love that you used Richard Feynman's philosophy, the quote, "What I cannot create, I do not understand". I think that's a very good motto to live by. it's absolutely great that, you take us on this journey to build things up, even though I've only read the five chapters so far. all of a sudden with ChatGPT, everybody started talking about this and this explosion.

The Impact of AI on Industries and Job Security

what were some other moments, other than chat GPT, where you realized, Oh man, this is going to blow up. This is going to be massive with generative AI. I believe you mentioned, the writer's guild of America versus AI, story. Can we talk about that for a minute? before I answer that question, I encourage you to read my chapter one for free, even if you don't have to buy my book. manning has a great feature.

If you go to manning.com and if you look for my book, Learn Generative AI with PyTorch, you can find it. I have a fairly long chapter one summarizing the state of the art in generative AI and also what I've been doing in the book. what Miko talked about, the Writer's Guild of America. So a few months ago, they, negotiated with, big firms. About, The threat of, AI.

And as a result, it's a, contract to limit, how much AI you can use in writing, in production, in order to protect the jobs of the writers. And, this is just one example of the, Disruptive power of AI in many different industries. writers is just one example, and it threatens many other industries. Another example is checkmate, which is online educational platform.

So college students go there to get tutoring service and so forth, and with the ChatGPT actually their business model is threatened, right? I think, in the month after the release of ChatGPT, their, stock price plunged by almost 40%. So that's how serious the, competition is. Those are just, a couple of examples. the potential of generative AI is huge, but at the same time, if you don't, catch up with the trend, there is, a risk that, your job might be, replaced by ai.

there is a, an interesting quote. I think there is a lot of truth. It says that, "AI will not take your job. somebody using AI will". So I think there is a lot of truth in that. So in order to avoid being replaced by AI, I think the best strategy is to get in the game. to learn about the general AI, to protect yourself in terms of, future careers. so that's, the big motivation, behind my books. the main motivation, of course, is intellectual curiosity. I'm by nature a very curious person.

So when I saw like ChatGPT works like magic, I really want to get it to the bottom of it. And they're trying to figure out how it works. So that's the main reason. But at the same time, I'm trying to teach my students. programming skills, machine learning skills, AI skills, generative AI skills in order to prepare them for the job market. so that, in the future, their skill sets will not be outdated. that's my second motivation for writing the books.

Do you buy in this comparison that AI is like personal compuers? And that, a lot of people were worried about how personal computers were going to just remove jobs. But what ended up happening was, some, small portion of jobs was eliminated, but most of the jobs were modified, and became, operating computers. Do you think that's the most apt comparison of what we're likely to experience with AI in the coming years?

the future, is hard to predict, but personally, I think, most likely, that's what's going to happen in the near future. if generative AI, you can actually use it to increase your productivity, to have more job opportunities. On the other hand, if you, basically, completely stay away from it, your skill sets might be outdated but at the same time, I think technology will make all this AI stuff more accessible to most people, right?

You don't necessarily have to be a programmer, so one example is Midjourney right? you can actually just go to a browser and then you can use Midjourney or DALL-E 2, DALL-E 3, or whatever to create a very fancy images. You can use a text prompt to create a. an image of what you meant, you don't have to draw yourself, in that sense, I'm optimistic. I think for most people, generative AI will be a very valuable tool to increase their productivity.

as long as, you keep up with the technology, I'm glad you mentioned Midjourney because I think for me personally, that was where

I realized

'okay, this is the hockeystick moment' because I remember the little tiny pictures, blurry from the GAN paper. and then all of a sudden I saw some pictures that were generated by Midjourney and I went and I, I tried it myself and, it was more or less able to produce almost everything I threw at it, other than some particular types of dinosaurs that just didn't recognize. That was like the one thing I knew, 'okay, they didn't train it on that kind of dinosaur'.

But, that was definitely one of those moments where I realized, wow. And the other is, I think, I live in London, one way or another, you end up using the tube a lot, and, usually you're annoyed at people who, play some music on like public transport. And then, at some point I realized that I was getting annoyed at people talking about generative AI, on the public transport and making noise.

And that's when you realize that, 'okay, so this has now gone, mainstream and, and everybody's talking about that'.

Understanding the Breakthroughs Behind Generative AI

But let's talk a little bit about, The actual underlying breakthroughs, that brought us to where we are. And, in particular, I'm thinking about GAN, the generative adversarial networks and transformers and diffusion. where should we start? what's the first important breakthrough that everybody should know about? I think, all the generative AI models, in my book are deep neural networks. machine learning is a very wide field.

there are many traditional machine learning models, random forest, linear regressions, this and that, but about, 20 years ago, deep neural networks became very powerful. one great thing about the neural networks is that you can scale it and, deep neural network can approximate any relationship, even if we human beings don't know what's the exact relationship, as long as you create a large enough model to capture it. so that's the foundation 20 years ago.

And then over the past, 20 years or so, many people. Breakthroughs in, deep learning field, and then, let's talk about it like a ChatGPT. Okay, so ChatGPT is a huge deep neural network trained on huge amounts of data. And before that, state of the art, natural language processing models are recurrent neural networks. So how it works was either progresses on the timeline. Let's say you have a sentence like, this is a sentence, right? So you have like four words in the sentence, right?

the model uses the first, word, "this" to predict the second word "is" and then it uses the first two words to predict the third word, and so on and so forth. it worked to some degree, but it's very slow because, you have to, predict one word at a time. And then in 2017, there is a huge breakthrough. There's a paper. called "attention is all you need" by a group of, Google scholars, and they used a different mechanism to capture the, relationship of different words in a sentence.

So it's called the attention mechanism and It's much more effective on top of that. it's not sequential. So which means one word can't pay attention to all other words at the same time. And this allows for, parallel training. And this has huge implications. number one, it works better in terms of capturing long-term relationships. between different words in a sentence so that you can understand the meaning of a long sentence, long text, number one.

Number two, because of the non sequential nature of, Attention mechanism. You can use parallel training. you can train the same model on many different devices. this makes training much faster. And this also allows you to train the model on more data. that's why ChatGPT became so powerful, because, you can train them much faster, and then you can train them on more data.

On top of that, the mechanism works much better than recurrent neural networks, because it can capture really long term relationships in a sequence, like as a text is a sequence, right? that propelled, uh, OpenAI to have all these models, including ChatGPT. now let's go to, the recent development, the text to image transformers. this is a new innovation in transformer models called, multimodal models.

The original transformer model, "attention is all you need", which powers the chatGPT, they only use text, right? So the input is a sequence of text, the output is also a sequence of text, but the multimodal models, the input and output can be, different formats, right? 32, 33, the input is a text and the output is an image, right? you can have a different, inputs, outputs. You can have audio, you can have video, Sora has videos, that kind of stuff.

but let's talk about what is the underlying mechanism behind multi modal models. DALL-E 2, DALL-E 3,s it has something to do with different models. So I think you mentioned that, at first the generated image is very grainy, right? the different models add noise to an image gradually. let's say there are like 1000 time steps.

And then at each time step, you can actually add a little bit of noise to the image and gradually you have a 1000 different images and each one becomes progressively noisier and at the end, it becomes completely noisy. And then what you can do is that you can give those images to a machine learning model and you can train the model to remove those noises, progressively, step by step. that's how, DALL-E and all those text to image models work.

first step is that you use a text prompt to generate a very grainy image, and then after that you use a model which is very much like a different models. You will progressively refine those models so that, you turn a very grainy image into a high resolution image. that's why, when you enter a like a shorter prompt and then, DALL-E 2 can give you a higher resolution image. capturing, what are you trying to produce in the text prompt. So that's actually chapter 14 of my book.

I'm going to talk about how you can add a little bit of noise to the image, one step at a time. And then you can use those, images to train the model to remove the noise step by step progressively, and very much like, DALL-E 2 trying to, make the image clearer and clearer step by step progressively. Generative adversarial networks, which was an interesting development, from Ian Goodfellow. How does that fit into the rest of what you just described?

Generative Adversarial, Networks, so it's great at generating different forms, of content. a lot of times when readers learn something, if you give them the end product, it's too complicated, right? So they may get frustrated and they just give up.

as an author, my job is how to make sure that readers stay engaged throughout the book and never get tired, never get frustrated, and gradually learn and finally learn to do the state of the art machine learning models generally by models like ChatGPT-style transformer to generate the text and the audio, right? So what is the idea behind the GANs? You have two networks. One is a generator network.

The other one is a discriminator network, so the job of the generator is trying to generate a piece of work similar to that from the training data set. let's use a grayscale image as an example, right? you have a training dataset of grayscale images of, handwritten digits, like 0 to 9. And then, those are the real images. And then you will ask the generator to generate something similar to that, so that it can pass as real in front of the discriminator.

before you train the model, the generator is terrible. So whatever the generator generated, completely like gibberish. it's like a snowflake on a screen, that kind of stuff. But, this is where training, comes in. you will have a training loop, and then, in each iteration, you will ask the generator to generate a bunch of fake images.

At the same time, you also have a bunch of real images from the training set and you give all those to the discriminator and ask the discriminator to determine whether each image is real or fake And then the generator's job is trying to create an image so that the discriminator would think it's real. that's the generator's objective. So therefore you have a loss function, and then you train the model.

You gradually fine tune the model parameters so that in the next iteration, whatever image generated by the generator will have a higher probability of passing as real. And then you do this again and again, you can do the thousands of iterations. And, if you do that, long enough, then eventually the generator will be able to create an image identical to the image from the training set.

So that's how GAN works you have a zero sum game, you have a competitive kind of two networks competing with each other, trying to outsmart each other and eventually, the generator gets better and better. So that's the idea behind GANs, it's a revolutionary idea. in 2014, 2015, Ian Goodfellow and his co authors proposed the model. a great thing about the model is it can generate different content: numbers. Images, shapes, even music, so on and so forth.

I love this idea because on top of that, you've got this built-in target point, right? When your discriminator can no longer discriminate between what you're generating. when you're finished, it's not arbitrary. You've got that. And the other reason why I love that is that it's got this anecdote attached to it that, legend has it, it was written one evening, when Ian was celebrating in a pub I think someone was graduating, some fellow students.

And, they were discussing a problem when they wanted to generate some pictures. And he came up with this idea that, 'oh, what you're suggesting is too complicated and you should, put two networks against each other'. And they laughed. he went home and, still slightly drunk. he wrote a proof of concept of that. And then turned out, that it actually worked out.

I think in one of the interviews later, he said that if he wasn't drunk, he probably wouldn't have done it because it sounded like a silly idea. Okay. Yeah, that's right. Yeah. how random some of those things are. How, weird and unpredicted. And I think one of the things I wanted to ask you about is also what made all of those kind of recent breakthroughs possible? what was missing? Because we've had the neural network since what the 80s or something like that.

all of a sudden, it looks like in the last few years, or maybe last decade or so, it was just like one breakthrough after another breakthrough just dropping. And if you try to keep up with currently written papers on AI, there's just so many of them. And it looks like every other day, there's something super interesting that's been developed and it's literally hard to keep up just with other people's ideas. What do you think enabled this kind of explosion in the recent years?

The Role of Hardware and Data in AI's Rapid Advancement

actually, like a neural networks was proposed even earlier than 1980s. I think in 1960s, researchers proposed artificial neural networks, basically modeled after human brain, The idea was a great one, but at that point, we didn't have the, hardware to support it, And then started in 1990s, early 2000s. The hardware becomes much more powerful, number one.

Number two

there was more research, more breakthroughs in the research field of, artificial neural networks. so one example is, LeCun's, uh, Convolutional Neural Networks. most neural networks are fully connected, dense neural networks, which means, a neuron in the previous layer is connected to all the neurons in the next layer, and it works great. Except that once your model becomes larger, the number of parameters, grow exponentially, and then it's very hard to train it, right? So that's a problem.

convolutional neural networks is, you localize the weights, okay? you have a filter, and then the weights in the filter is a fixed When you move the filter on an image, and then this greatly reduced the number of parameters. it makes, computer vision much more efficient. because of that in, Early 2000s, there were a lot of breakthroughs in computer vision, in convolutional neural networks, and I think that's a huge breakthrough. And then after that, you also have, GPU training.

GPU training became very popular in the past maybe 10 years or So. And there is, Huge game changer because as deep neural networks became larger and larger, It's very hard to train them, without, extra help, right? When you train on CPU. CPU is a general purpose kind of processor. you have to do many things on it. But, GPU is specialized. So you can do machine learning jobs much faster. and of course, we also have more and more.

training data available, and that also is necessary for large language models to work. it takes time, but I think, the past 20 years or so, we suddenly have, everything come together to make it work, basically, we've got gamers to thank for their breakthroughs in AI because of the graphic cards, the GPUs that they requested, right?

Exploring the Versatility of GPUs

you have a very good point, I think GPU was originally designed for gaming purpose, right? And then suddenly right now, it has a completely different purpose, And I have several GPUs at home, not very powerful I think it's powerful enough for me to experiment on different models. It costs maybe several hundred dollars, thousand dollars. I have three of them. Two of them are from my son. My son was playing video games. And then now he doesn't use those computers anymore.

And then he just gave it to me. And then I just simply take them out and use it for my own, But the cost is not that much. the cost is not that much unless you go for like the top of the line 80 gig ones, which are very hard to come by and also quite expensive. Yeah, so thank you gamers. Thank you for enabling the AI revolution in many ways. it goes back to what I was saying about how random some of these things seem to be.

The Future of AI: Predictions and Possibilities

so where do you think, we're heading? Like you said, the future is notoriously difficult to predict, obviously. But, if you were still going to venture and make a guess, that will probably prove completely wrong a few years down the line, where do you think we're heading with all of this?

if I had to venture to guess The large language models will become even more powerful in the near future, not only in terms of generating, cohesive text, but also generating images, generating videos and also Multimodal models will become very popular. Okay, you can generate not only images, text, you can also generate audio, video, sound, and so forth. other than that, I think, it really depends on, which breakthrough will come through in the near future.

And you never know if there's just one day suddenly is huge breakthrough, and then they'll completely, change the landscape of ai, just like what the ChatGPT did a couple years ago, right? the future is very exciting, but at the same time, like you said, it's very hard to predict. But, I think right now is a very fortunate time, a very exciting time for, tech enthusiasts. for anybody who is passionate about ai, about technology, is very exciting. So two follow up questions then.

Navigating the Hype Cycle of AI

one it's, like anything else, there are these fashion waves that kind of, come and go. and AI is now the latest hottest thing. So all the VCs, everybody's throwing money at it. But at some point people will probably move on to the next thing, just like they did with crypto and smartphones and internet and whatever else before, right?

So I'm wondering, where do you think we are in that, hype cycle, and what's going to happen when all of a sudden slapping AI-first on your startup, no longer make sure that you get funding. So that's question number one, follow up.

and then the second question is, if you were to plot, a graph of how you expect, the large language models to continue developing, I think we can all agree that there are some kind of like very exponential growth where somebody figured out, ChatGPT or one of those massive models. If you throw enough data at it, and you massage it for long enough, you can create this impression of, 'oh, this is magic, how on earth is that even happening?' But then, at some point it has to plateau, right?

it's not possible for it to go, at that kind of speed, into the sky. Feeling. Again, it's hard to predict the sense. course, all the usual disclaimers about predictions, but what's your take on what it means about us as humans? Does it mean that what we, cherish as one of the unique capabilities of humans, the human intelligence?

it's not actually all that unique, because it's hard to not have this feeling when you talk to one of those big large language models and, during the time it doesn't go haywire and start behaving weird, but on the times where it works well. It's really hard to not have this impression that you're talking to somebody with, some amount of intelligence to it. So does it mean that we're all some kind of statistical models and the intelligence that we demonstrate is also an emerging property?

What's your take on that? I don't think, many people in the world right now have a good answer to that question.

Ethical Considerations and the Future of AI Regulation

that said, I do want to point out that there are many people right now have concerns about, AI. Because of the potential damage it can do, so it's all about the objective function, So if you give a task to the model and, in terms of the last function, and then you can just try it again and again, and eventually it will become very good at, whatever objective you want the model to do so that is good, but at the same time, it can be bad the AI may not even know it, right?

It's just trying to accomplish a certain goal. It just happens that a human being is standing in the way of that goal. so in that sense, I do think that, Human beings need to be careful. I think like AI needs to be, regulated in to some degree. we cannot let it to, do whatever it wants. It may have serious.

negative consequences to human beings, I think that a lot of what you just described has been the main kind of concern for everybody making sci-fi movies from the Terminator and Skynet And, I certainly get that, but I think I'm probably more worried about.

going back to what we said about, you won't be losing your job to AI, you'll be losing your job to someone using, an AI, I think this probably applies here too, that you can just do, as an enabler, it scales up the amount of damage that, nefarious party can actually, produce, because using that to bad ends. a lot of the security that we rely on is practical, right?

Like for example, all the encryption keys that we use for everything are, only because it would be computationally too expensive to actually figure that out.

But then when you've got tools like this, it's easy to be scared about the possibility of that figuring out, and making things possible, that previously weren't, so I think I'm more worried about that scenario, where someone uses the AI to bad ends and it enables them to do more damage that they would be able to do with traditional methods, even in the current stage, if AI falls into the wrong hands, it can do a lot of damage.

not that catastrophic, but it can do a lot of damage to a lot of families, right? I think, There were like stories about, people use the generative AI to create a fake phone call to their parents and, demand a ransom money so I think it causes, financial damage and also a lot of emotional distress, like fake news. Fake video, a lot of deep fake stuff, so even at this stage I think you can do a lot of harm if you fall into the wrong hands Yeah, that's a very good example of the call.

Like you can technically go and call people and scam and, people do that, but there is a limit to how many people you can physically call in a day. If on the other hand, you have a powerful enough AI, you can scale it up and you can probably call everybody in the United States, a certain amount of times.

The Impact of AI on Politics and Elections

That's you concerned about the AI involvement in the upcoming election. so we have to be careful, but I think so far the impact that it's limited. but at the same time, I think all the parties, politicians need to pay attention to generative AI. Because of what it can do, fake news and so forth. imagine you are running a political campaign, right?

You must, get to know, analytics, how AI can influence your campaign either positively or negatively, if your team can utilize AI, uh, to, Strengthen your position legally, you're in a very good, position, it can help you, but on the other hand, if you're not careful, your opponents or somebody can use deepfake to disrupt your campaign for your cause that's why I think AI is so powerful and also so widespread. It affects every single industry in the economy, not just a few isolated sectors.

that's very unique. About AI.

The OpenAI Controversy: From Open Source to For-Profit

Did you hear about the Elon Musk lawsuit against OpenAI from a few days ago? obviously OpenAI initially started as an alternative to the big companies, and the massive labs like Google, Facebook and so on. And their pitch and the initial mission statement was to release everything open source. Now, hence the name OpenAI. And then somewhere along the way, that turned and it's currently a for profit, closed source company, worth, what, under a hundred billion at the moment.

we're recording this on March the 4th, a few days ago. Elon Musk, opened this lawsuit, where he alleges that, he was basically scammed because they turned the company around and they went against the initial mission. And, I think the opinions on the internet, vary from, 'okay, this is jealousy', because he's jealous of, of the success that open AI has seen. To, 'okay, this is a nice publicity stand. he probably has a point, but this is probably not going to start standing court'.

and I'm trying to make sense of, how much of that is actually valid and how much I should be worried about OpenAI being, at the forefront of this, a big closed source company.

I also heard that, many years ago when Elon Musk and the Sam Altman co-founded the OpenAI, their objective, was, a nonprofit organization, Given the competition from other big players in the industry, I think OpenAI was under pressure to commercialize ChatGPT and this may go against the original objective so I can see the argument from both sides.

on the one hand, we have to be careful like we just discussed about the use of, AI that may lead to, the end of humanity as we know it, if we're not careful. But at the same time, if we use that properly, I. It can be a great tool, that's why there is such a great market for, generative AI, so I think there is some tension, within the company, so you have different views. that's why, I think, a few months ago, within several days, Altman was fired and then get hired back and so on and so forth.

in the background, I think it's really just those two forces at play, so the force wants to make sure that, AI does not go out of control, harm human beings and at the same time, there is huge pressure from, industry peers to Commercialize those applications to make profits, Actually I'm glad that, Elon Musk actually made the lawsuit in the sense that it may, swing the pendulum to the other side so eventually what I think, uh, the view that we should

commercialize and make money out of it, I think that kind of view prevailed, right? that's why Sam Altman got hired back, but that can go too far, because, in the process of competition, making profits, you may sacrifice security, so I think, the lawsuit by Elon Musk can potentially put the original mission in check.

So to speak, and maybe, force OpenAI and other tech companies to think more about, guardrails around, AI to make sure It doesn't go out of control and harm human beings, time will tell if anything comes out of it other than, one billionaire being upset at another, but we'll see.

PyTorch vs. TensorFlow: Predicting the Future

So I'm going to ask you for one more prediction, and this time a little bit more down-to-earth. Pytorch. It appears to be still on the rise and, it appears to be the kind of go-to option for any new papers. TensorFlow seems to be, stagnating a little bit. you talked a little bit about the advantages of PyTorch and why you chose it for your book. and, I'm wondering, do you see this being like the prevailing platform?

because now I think that the main kind of breakthroughs for Pytorch was, you mentioned the GPU support, obviously, and also the built in, backpropagation, right, the autograd now, the other frameworks also provide the autograd. so I guess they're closing up the gap a little bit in that respect, if you were to venture one more crazy prediction, would you see Pytorch leading the way going forward? Are you going to update your book in a couple of years to port it to some other framework?

I think PyTorch is going to prevail in the near future. So I mentioned this in my book. So what PyTorch does is, using a dynamic computational graph, which means it creates, Computational graph on the fly so that, it's faster, it's more flexible. TensorFlow is using static computational graph. so it's slower. so that's the main difference. And, it affects the training speed greatly. so in TensorFlow, you don't really have to worry about which device you can use.

it's all done at the backend automatically by TensorFlow. but at a cost, If you have, an industry scale Models, and then you have a lot of GPU and you do a huge calculation Maybe the overhead is neglectable.

doesn't affect things much but for a lot of researchers it makes a huge difference because we already working with a lot of toy models not huge, therefore If you use the PyTorch, there is a little bit of inconvenience in the sense that you have to, specify whether to move this tensor to GPU, and then once you are done with it, you have to, get it back. But the benefit is huge because it, greatly. Increases the training speed.

I think like at least for, small players, regular readers, and also for researchers around the world. I think a PyTorch is much more convenient. It's much faster. And certain large corporations, they may not care that much. for regular people PyTorch is much more convenient, it's much faster and in the near term it may, win out. for anybody listening to this, I know that if I haven't, read your book before, I would probably be on manning. com, looking at it.

A Deep Dive into Generative AI and PyTorch

And then at some point I would reach chapter 4, where you're walking us through building a network that does, generation of anime faces. Which I thought was a pretty cool example. Can you give us a taste, for, anybody who's going to be doing that? what's the training gonna look like? what data we're going to use, how we're going to implement a network. And then in terms of training, what kind of hardware you need for the training to be, quick, how much time you need to, see for that.

give us an idea whether this is something that, someone who is comfortable with Python can just pick up on a Sunday, on a random weekend and go through, or whether there's any extra prep that's needed. in order to train a GAN model to produce the color images or for anime faces obviously you need the training data, right? the research community has a lot of human-created data for us to experiment on. So you can actually go to a website, download the anime faces.

I think tens of thousands of them, and then you need to create two neural networks. One is the generator, one is the discriminator, and the generator is trying to create an image that can pass as real in front of the discriminator. you just train the model, many rounds and then eventually you will see that the generator is able to generate a anime face, which is very much the one from the training set.

I want to mention that in order to, generate, color images of human faces, you don't need to use, convolutional neural networks because, we mentioned this earlier. So if you use a fully connected, dense neural networks. There are just too many, parameters and then the training will be too slow. So on the other hand, if you use the convolutional neural networks, you localize the weights. So the weights will stay the same in a filter and then you move the filter around the image.

So there's a way of greatly reduce the number of parameters in the model and make the model training much faster. this is on the software side, on the training side. In terms of hardware, so I trained it using, GeForce RTX 2060, like a GPU. I think right now the cost is three or four hundred bucks. It's not that expensive You can easily buy it or if you have a older gaming computer, you can just grab it and then put on your computer.

It's very easy to do, you don't really need a lot of knowledge about computer hardware to do it. Nowadays, computers are very user friendly you can Just pop it open and, change ports, very fast, that kind of stuff. So it took me like, 30 minutes to an hour to train the model. So it's very fast.

However, if you don't really want to bother with the GPU, you can train the same model with the CPU and, what you can do is, you can simply leave your computer on all night it may take, five, six or seven hours, but, it can be easily done overnight. You just leave the program on, go to sleep, next morning, you see the result. so in that sense, computationally, it's not that costly.

I think the most complicated model, would it be, chapter six, you have to convert, like a horse image into a zebra image. It's called, cycleGAN and then you have to convert like a blonde hair to black hair in images or black hair to blonde hair, Those kind of models are a little bit more. Time consuming, because you are using higher resolution, number one. Number two, you are actually training two generators and two discriminators.

Okay, so what, how CycleGAN works is that, you have two generators, let's use a horse and a zebra as the example, how to convert a horse image to a zebra image, right? So you have two generators. One generator is called a horse generator, the other one is called a zebra generator. So what horse generator does is that it takes in a zebra image and convert it into a horse image. And then what is a zebra generator does is that it will Take a horse image and convert it into a zebra.

And then you also have two discriminators. the horse discriminator will tell whether an image is a horse image or not, and then the zebra discriminator will tell if an image, if is a zebra image or not. and then, cycleGAN has another element a loss function has a component called a cycle loss. So what do you do? So I think the idea is really Ingenious. that's why I mentioned that with the right loss function you can't show anything. originally you have a horse image, right?

And then you give that image to a zebra generator to create a fake Zebra image. Okay. Now, you will use that fake zebra image as input to the horse generator, and ask the horse generator to convert the fake zebra image into a fake horse image. now here is the key if both generators do their job right, then the fake horse image you got will be Identical to the original horse image You so that's called a cycleGAN.

cycle loss is trying to minimize the loss between the original horse image and the fake horse image after a round trip. That's a very powerful tool because that forces the model, both models, both the zebra generator and the horse generator to generate realistic Images. so since your show is called HockeyStick I think that's like when I was like trying to experiment the different models I think that is pretty much like a hockeystick moment.

When I saw that, I was like, this is like a psycho loss is really ingenious because that component in the loss function is crucial for you to successfully convert a horse image into a zebra and a zebra image into a horse. When I saw that I was completely amazed not just by how well the model works, but also by, the, ingenious mechanism, devised by the researchers. again, there are tons of smart people in the profession.

So sometimes I see what they are doing, and once I understand what they are doing, I was completely amazed. I said, this method is amazing, the author must be a genius, I think there are tons of geniuses in our profession. Love that story. And also FYI, I'm totally stealing the quote from you with the right loss function. You can achieve anything. I think this should go on a t shirt. That's right, yeah. with the right loss function, you can achieve anything.

That's my belief, the concept of the loss function is very powerful. so loss function is another way of saying the objective function, right? you are telling the model what to achieve, what to do, it's very powerful. Yeah, I think what keeps striking me is that once you go and look into this ideas. They're not actually that complicated, there's not too much magic in it, but to come up with that idea initially, be the first one to propose that it does require certain a level of genius.

So I think, probably decades from now, kids will be learning a lot of that stuff in primary school or early in their education. And it just feels like we're really experiencing some kind of breakthrough in this profession, a hockeystick moment. Absolutely. it's good that a lot of smart researchers are working in the field. And sometimes when you get stuck on a question, you may work on it for years, right?

Without any breakthrough, and then suddenly, last year, like a strong line, year after year, suddenly, there is a aha moment, and then you figure out the way to tackle the problem and it worked. And then it's a method may become revolutionary, it may completely change the field You're about to finish, your book. Is there anything that you would do differently if you were starting to write it today? Would you make any different choices? Good question.

I don't think there are many things I would change. The reason is because even though it's a new book, actually I have been working on it for a couple years now, so I have, a GitHub, repository, before I, submit a proposal to manning so it's my way of working things out. couple years ago I started to, use PyTorch for machine learning models and I started to get into.

generative AI, and then I started to, use PyTorch to generate shapes, images, and then eventually I get into natural language processing, large language models, and then I had a lot of projects. on my computer writing book, it's my way of, organize things to, think things through to make sure everything works out. but I know that, in order to write a compelling proposal. I need to, first prepare well, right?

especially there are not too many good publishers out there, so you only have one shot with a good publisher. like manning is one of the great publishers. over the years I've read many books from manning and, I really enjoyed their books and, I knew that I needed to write a good proposal in order to work it out. I don't want to lose a chance. So what I did was, in the summer, I spent several months to create a huge github repository.

So I lay out all the chapters initially, like the first draft, and it had 17 chapters and, each chapter I use a Jupyter notebook to explain everything to the best of my ability. All the codes are there. So it's, pretty much like a book. Once I have that, then I spend another month to convert it into an actual book, a PDF file. a lot of tech people use latex. Latex is A word processing software, right?

especially if you have a lot of math, you can actually generate like a beautiful equation, my book has some like a equation, some math, but not a whole lot. But, it forces me to go through everything one more time, in the process of converting, the GitHub repository into a PDF file. I spent a lot of months converting everything. And also it looks beautiful because, uh, it exactly like a book.

you have a template, you have a cover, you have, table of content, you have each chapter, what is section number, what is section title, what is subsection so forth, you have images, in short, it's pretty much like a book to be published. and then I sent that, to manning, in the summer, along with the PDF file, along with the, proposal file, and then I have a link to the GitHub page. And then what manning did was send the book proposal to more than 10 reviewers in the profession.

The reviewers are all data scientists, people who know, AI in the profession, and they give comments on whether, this book should be published And then they give a lot of, very valuable feedback. the feedback was very positive, partly because it's a hot topic, partly because I spent a lot of time preparing it, right? but I did receive a lot of good feedback. to answer your question is because I have been through the several rounds.

now, there's not much I would change, because I have already incorporated, some feedbacks, great feedbacks from about the 12, reviewers on the proposal. Fair enough. How many copies have you sold so far? it's already sold more than a thousand copies. I think like it's a daily high was 58. So it says a lot about the demand for, generative AI and if you look at the, the top 10, from manning website every week, you will see generative AI is hot. a lot of demand.

And another trend is, Python PyTorch. I think that's, a lot of people are switching to PyTorch and, I think there is a book from Manning called, "Deep Learning with PyTorch". It's selling very well. And then there's another book called, "Large Language Models from Scratch". actually the book is also using PyTorch just as I do. But it's that just that focuses on large language models, but in my book focus on many different contents like large language models.

music, images, shapes, numbers And then another thing I want to mention is that, I did spend a lot of time thinking about, how to help readers learn progressively, step by step. chapter one, of course, is an overview of the book of the, generative AI landscape and, what is the book is trying to accomplish. Chapter two, it's a deep learning with PyTorch. So even if readers. Have no background using PyTorch.

after reading chapter two, they will be able to use, pyTorch to create, deep learning models. from A to Z you have you can do the whole thing. Okay? So that's very important. And then chapter three, we get into GANs. So you will use, GANs to generate, numbers and the shapes. So the models are very simple. you only have a two or three layers, of neurons in those models. So therefore, it's very easy to understand. It's easy to create, and the training takes a matter of minutes.

readers will not get, frustrated because everything is so simple. And then in chapter four, I kick things up a notch so instead of using fully connected dense layers, I use convolutional layers that's needed for image processing. If you want to create a high resolution color images, fully connected dense layers won't work It may work, but it's very slow.

On the other hand, if you use convolutional layers, it's much faster because you use filters, to move around the image, and then you just train the weights in the filter itself. So that's much more efficient and that kind of stuff and then so people learn to use the convolutional layers in chapter four to generate the color image and then in chapter five I kick things up another level.

So readers learn to select characteristics in images, you can choose to generate An image with eyeglasses or without eyeglasses You can transition from an image with glass to an image without glasses. So all those arithmetic kind of stuff and then chapter six is not out yet, but I will do the cycleGAN is computationally costly, because the reason I just mentioned it because they have two generators, two discriminators. and then chapter seven is about, variational auto encoders.

that's a different model from GAN. that is important, because it has a encoder-decoder architecture. We see it's very common. In machine learning models, for example, ChatGPT is like a decoder-only model, the original transformer paper attention is all you need has like an encoder part, and a decoder part that kind of stuff.

And then after that, I get into transformers, natural language processing, how to do tokenization, how to create a transformer from scratch, including like a ChatGPT-style, you can create a GPT from scratch, you can train it. I saw that you have, several posts on LinkedIn about how to create a GPT from scratch, right? my book does exactly that in, chapter 10, how to create a GPT from scratch. And then chapter 11 is how to create a small GPT from scratch and then train it. To generate text.

its focus is not mainly on creating, but on training a GPT from scratch. Of course, it's much smaller. It only has 5 million parameters. But you learn how to train a model from scratch. and after that it's music generation and then different models and then how you can use the langChain to chain together different, large language models. So that's the whole book, it's been a real pleasure to talk to you. I'm personally super excited, can't wait until the rest of the chapters become available.

So, you know, hurry up before I let you go.

Future Projects and Closing Thoughts

I'm curious whether you have your next idea for your next book already in mind or whether you're going to take a small break before book number four. So far I'm very busy with, writing the current book. I do get ideas from time to time. One example is, I think this is a text to Image, like a multimodal model thing, is amazing.

I think, there could be another book, there, just focused purely on diffusion models and, multimodal transformers, how to convert a text to image, or convert, text to video, There could be a book there. I thought about it, but, I didn't spend a lot of time on it because I'm busy writing the current book and the other, idea I thought about is, so this is also related to multi modal models. my first book is called a make a Python talk, right?

But it's actually using Google API to do the actual speech recognition, text to speech. I don't do any machine learning part. So I just use the Google API to do all the heavy lifting But, there are like open source models out there. You can actually train a model. To, do speech recognition, so that's actually a multi modal model, right? Because, speech recognition, basically the input is, audio, output is text, right? And then you can also do text to speech.

that can be another interesting project. I have some ideas on how they work, but I do have to spend a lot of time to experiment. so I would say in another two or three years, I may venture into one of those ideas and maybe write another book about it. Awesome. you're going to have one reader already interested in that. So definitely go for it.

Okay, let me ask you then, which idea do you like better, the speech recognition model or, just a book about, text to image, multimodal, transformer, which idea do you like better? I've been meaning to properly read the whisper, paper. So I think the speech, recognition is actually a pretty good use case, and I would definitely be interested in reading that. Good to know. I may put more emphasis on that project. Awesome. the feedback. All right. thank you so much.

It's been a pleasure, and hopefully I'll get you next time with your next book. Thanks a lot. Thank you.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android