So, yeah, once again, it gives me great pleasure to welcome Katcher Volkova Volkmer firm from Roshe Basel. And she's going to tell us about her deep learning is used in biomedicine. Good afternoon, everyone. And it's a pleasure to present to you today. I did a similar lecture about a year ago, but Baghdad still in person and it was quite, quite a bit more extensive. It was three hours of lectures and three hours of practise. So I had to condense the material a lot and I also updated it.
But just a little bit about me. I was born in Russia many years ago. I started in actual linguistics and English studies after school, got slightly bored after a couple of years and moved to try to bring in Germany. Studied competition linguistics. And while I was doing competition linguistic studies, I got really used in neuroscience. So my BGT wasn't cognitive neuroscience and much plunker for biological cybernetics.
After I finished my HD, I moved to UK and worked there as a data scientist for a few years and two very nice companies. I really learnt a lot there, but towards 2018 I started missing the continental life again and I was looking for opportunities in mainland Europe and I was again very, very fortunate to get the role of senior data scientist at Roche. Far more research and early development informatics department.
So all of my work is currently with the digital biomarkers group and particularly our focus on Parkinson's disease. So they are so-called data analysis seen lead and Siemers, Parkinson and I do a lot of analysis of human behaviour when they perform so-called active tests on the smartphones that we provide them that accompany clinical trials. But enough about me. Yes. So what's the plan for this lecture? We only have an hour. Unfortunately, I wish we had a whole semester.
I could tell you much more. So I want to go through four major points. What is a deep learning at all? What is it and what is it good for? We will learn some basics so that we can then further go and a bit more depth into two flavours of deep learning, convolutional neural networks and graph coalitional networks. And there's many more flavours, but you clearly don't have the time for all that. One note, there is going to be a lot of extra links almost on every slide.
Some of them are going to be just sources for images that I stole from very nice of resources. Some of them are going to be links to true resources where I can go and learn more in detail. And they all spelt out better because I don't want you to worry. Like, oh, if I click on this link, where is it going to take me? I've checked them all because some of them were already there a year ago. I checked them all this week. They all live and some were even updated.
So I really hope you will find this material helpful right before we go into deep learning itself. Me, as a machine learning practitioner, I want to emphasise some practical advice. First of all, you'll probably many of you heard that all models are wrong, but some models are useful. So when we apply machine learning or just even statistical models, we're trying to explain the data we observe in the world and we try to find patterns that are useful.
Any model will always have some bits and pieces of data that's not not explained, but we hope that it's just noise and not not important. So we care about the models that are useful and how do we build them. So another very frequent phrase that's heard from all machinery practitioners is garbage. Garbage in, garbage out, meaning that, well, we want our model to represent to explain and predict a certain phenomenon.
If the data we're training the model on is not representative of the phenomenon, the model will be useless as such. So it'd be really, really careful. This is a very unforgiving rule, especially in deep learning. Then again, when we talk about deep learning in particular, many people treat it as some kind of magical tool. Partially because it's really hard to interpret. But the usual machine learning pitfalls like overfitting and bias apply to deep learning as well.
It's not a silver bullet. So you have to be very watchful there as well. And when since I mentioned that deep learning is harder to interpret. This is kind of a part of the so-called there is no free lunch. So if you build a very simple, very easy to interpret model, for example, you have three variables and you're predicting some outcome and you can. See, like it can. It's very, very clear. Right. You Hucker coefficients, you have you order them cockpit's intervals.
That's OK. This feature is doing this. This feature is doing that. Lovely. You've learnt you have thousands, maybe even millions of parameters. It's impossible to interpret. Your model can be much more powerful for very complex data. But you lose this interpreter ability. And there are tools that help us to interpret models in deep learning as well. And I think it's also everybody's responsibility and duty to apply them, because I think it's very important to understand what the model does.
Right. Another device I would like to give you is that don't be just a number cruncher or really understand the problem you're trying to solve and the data in the early days of data size. I've often heard opinions like, oh, well, it doesn't matter that you don't know genetics, which I don't. I'm not a biologist. You will just do your number crunching and everything will work out. This is really dangerous, I think.
So at least if you're not the expert in the field where the data is coming from and the problem and trying to solve, make sure that you have a very strong connexion to somebody, to an expert, and then you can always show them some intermediate results so that they can help understand the problem better. And then there is also another aspect of machine learning. Often times, especially in research and also in industry models are built to interpret the data, not necessarily to predict.
And that's fine. But if you want your model to live on on some server or in some app that's called machine models in production and it goes into the area of Amelle ops personalisation and that calls for model maintenance, which means that even if you're very certain that, you know, turmoil is really great and performs well. Please continue testing it over time on new data.
There can be shift in what their true data for this Trufant looks like and make sure make sure to analyse and fix the errors. Otherwise, over time, your model will become useless. And I have a very, very forceful advice here. Please watch this course at least twice. It's from entering from Coursera. It's really, really great. And it is tailored towards deep learning. But the things you can learn about it, the things you can learn there, are applicable to other machinery players, flamers as well.
OK. This was a long slide. I think it's really, really important. I didn't want to skip over it. Now we can finally start our journey with deep learning. So what is deep learning? It's a set of methods that really took off in the last 10 years. It's a subset of other machine learning algorithms and tools. Machine learning is a bit older. It took off thanks to three factors. I mean, actually, the methods themselves solicited their philosophy was there since quite a long, longer period of time.
But there was no not not enough computer power to make them scalable. And there was not enough data itself to train these models on. So really was the the dawn of big data and faster computers, Duplantier finally took off. Machine learning itself is part it. There's a subset of artificial intelligence. So not all artificial intelligence needs machine learning to perform. But many, many more search algorithms do. And many and many of them are now the news funds.
They often rely on deep learning. And I think a big distinction between so-called classic or like something called old school machine learning and deep learning is the fact that in typical machine, like an old school machine learning project, you as a machine practitioner or you would do feature extraction manually on your own, like either you already receive a data set and you start looking at that and the plotting and exploratory data analysis and you think, well,
maybe I'll add a few more features here and encode this data this way and extract extra things, merge new datasets together, and then you perform your classification. Deep learning does feature extraction for you because the typical inputs it works on, like images or free or likesome speech recordings or text there, it's almost impossible to come up with a very useful set of features for all cases.
So this is also a very think important distinction between classical machine learning and deep learning. So when we talk about people who contributed to the right is of deep learning, there is, of course, many more than just these three very smart gentlemen. But I wanted to highlight their personalities in particular, because a couple of years ago, they actually got a Turing Award for contribution into to deepen your networks.
They are still very active in the field. And for example, I really like that they they even challenged their own old ideas. So they still contribute. And I really advise you to learn more about them. You know, if you have the time. Medicine is very close to my skin. I fell in love with digital health after I finished my page.
And I think I'm really, really fortunate to work in this field now, which is why, of course, when I want to update myself on what deep learning is good for, I first and foremost look how it can be used in medicine and the uses are plentiful so it can help in diagnosis. For example, you have an image of some some cells and you want to figure out like a biopsy and you want to figure out whether it's cancerous or not. Right. Medical imaging doesn't have to be just about diagnosing as you can.
Also, maybe you already know that somebody was Parkinson. All right. They can take brain scans over time and again. Deep learning can help to find to figure out how the disease is developing over time. Deep learning is supporting clinical trials. This is somewhat more challenging because clinical trials there, you have to be absolutely clear and very, very comfortable.
And like I mentioned before, Tony is sometimes hard to interpret. But nevertheless, the sum in some aspects, the burning down, it already helps. And last but not least, especially in the last couple of years, deep learning you have was shown as an amazing tool for drug discovery, which is a very, very the Borias very expensive process otherwise. So it could be that in deep learning would really be a tool to solve some huge bottlenecks in this area.
So these were the areas. Well, who are the companies that are very active in deploying medicines in particular? So some of my favourites and I actually local on the island are Babylon Hills. They are helping to scale the general health in UK. And they have. I'm not true whether that model is kind of falls into the class of deep learning, but they have a very clever Bayesian network that has passed the doctors exam GP exam.
So if you give it symptoms and it keeps asking interactively, go, do you have the symptoms or what about the symptoms? It can diagnose your disease pretty reliably. And then benevolently. I has shown some breakthrough in drug discovery recently. Deep Mind has its own department in health, where they also try to optimise certain aspects of health care. And of course, there is big pharma companies like Roche, where I work, Novartis.
AstraZeneca has been expanding their. Their stuff in in terms of machine learning, skill set. And last but not least, there are the usual tech giants like Google, Apple, Amazon, IBM and Philips. They all have very strong departments in-house. And health care, right? So deep learning has been going out and doing great things. And is there are there any horizons where it's still it still has to reach?
So for me, the most interesting topics are the following, where I hope that the next breakthroughs will take place, call in front and reasoning. So even though deep learning looks like it's super wise and it can just tell you whether there is a cat on the picture or not. And many, of course, many other useful things. It's still it's still just patterns. It's just this it's still correlation, essentially.
And what is still often missing is that this causal link and I really recommend you to read this book. The book of why it's very non-technical. It's very engaging. I think I read it twice already because, yeah, it's just really interesting to think about. It's a very different way of thinking about data. Then we still have lots of problems in algorithmic bias and bias and data.
And again, deep learning being somewhat more obscure. It's it's harder to catch this bias, but it's very important to cheque for it. Done this. Like I said, oh, we still need to increase transparency and interoperability in deep learning. And that in particular would help in clinical adoption, in my case, in my opinion. For example, if you want to submit a new algorithm for a diagnosis to FDA, it has to be very, very clear.
So I doubt that even something like a random forest would be very welcome there unless it's very clearly explained. Imagine how much trans deployed multiple would have. And a new point added recently is metal learning. And that is a hot topic right now as well, because the resulting models are highly, highly special. Establisher an example late and interactive examples of what it means. So currently the search is for a more General Lovering approach.
So a an algorithm that was trained on one thing and then can do and complete a different one or learn it much, much faster. Right. So this was just a very quick tour de force. Where can you learn more? There's online courses. I've done plenty of them during my pages and after and still do them. If I had the time. There is tons of podcasts. There's, of course, many, many books. So highlighted just two of the couple of dozen. There's so many YouTube channel. Some of them are absolutely brilliant.
I wouldn't think the only YouTube channels I would recommend is where the instructor tells you, oh, deep learning is so easy. You just type these two lines of code and you're done. Please don't follow that advice. Please try to understand in depth what is actually going on. So there is lots of blogs, of course, and a whole class of devotees from medium. I particularly read the like towards data science. And of course, there's always papers. I have a more detailed list on my blog.
So if you want to cheque it out, you're more than welcome. Right. So let's imagine that after the stock, your super inspired and you want to learn more about deep learning and maybe you already have a problem that you would like to solve and you have a different brilliance. So how do you do? Do go and programme motorcoach from scratch on your own. You do not have to do that. There are at least four and the number is always growing.
Platforms that can help you build your models. They all have slightly different flavours, different advantages and disadvantages within my team would tend to use by torch because it's very nice and friendly. But my first models I've built on denser flow and keris. Terrorist is like a very nice, friendly up four tenths of a century. So, all right, we're done with the main introduction. Let's look a bit closer to what kind of flavours exist in deep learning.
So we will start with the what are called Grandpa Perceptron. That's the main kind of simplest building block of deep learning. And we will also we will next go into multilayer Perceptron. And as you can see here, there is kind of four main 5min families. So there is convolutional neural net. They work on images. Mostly there's Grauwe neural nets that work on graph. And they kind of have a baby together with the Quackenbush.
And that's what we'll talk about them in more detail because it's kind of an easy step from convolutional neural nets to a graph, convolutional nets. But if you want to work with text or speech, then you might be better off with them. The recurrent neural nets and the prolonged short term memory. And that's the kind of an extension of governor. Some people have done a very good research or have achieved very good results on, for example, speech recognition with CNN is actually.
But yeah, it's it's a way to do it. I'm not a. And then a whole different beast is deeper enforcement, which I think will it's a very promising fields. And there is our turn coders and guns that we will not have time to go into them. So there's a generative anniversary networks or serial networks. So, yeah, these ones are out of scope today. But just all. These will not have a time. So there's not going to be lots of technical detail today, again, due to the lack of time.
But I still want you to understand it quite a detail. What Perceptron does. Perceptron is just a linear model. Nothing else is just a. Well, no, not exactly nothing else. But essentially, it's a linear model. So you have one year. So can you see my course or can you see my mouse? They like. I haven't used that so much before. Yes, yes, it is. OK, great. So, yeah. So here you have the inputs right at your features. So Perceptron or oppressive trons, they can work on tabular data.
In fact, some of my colleagues, to use it often times as a benchmark, you get a new tabular data set. You throw it onto the multilayer Perceptron. You get your ICOM and then on your better models you try to improve. You tried to find Overfitting or you tried to just get better accuracy overall. So here you have your inputs. And. On these inputs, you also apply certain weights and then you just send them up.
That's all you do when you add your bias, which is the intercept. So in statistics, these are coefficients. And this is the intercept. And then as the outcome, so you get this weighted sum essentially. And you look whether it's above zero or not. And drove from that. You judge whether you should say like one four zero on as the output. That's that's almost Perceptron. It it's quite all this [INAUDIBLE]. Was there even before the 60s.
And this is the building block for essentially all deep learning networks. And it can do quite like it. It can do linear separation. It can build a linear model. So. Well, you might be thinking how how do we have this input? Right. This is this is in our data. How do we find the weights? Because judging depending on his weights, you would have a model that is either totally off. Right. It's just not accurate at all. Here we have our ground truth as dots.
You have two classes. And then we have our predictions. And the model says, well, I'm super sure that this this point is right, which is not true. Then when when we adapt, when we change the weights and the intercept, then we can we can say, okay, now model things. And this is the good fit. And here it's actually 50/50. So this is an improvement, but it could do better. So we iterate on and I will tell you exactly how iterate.
And now we have actually if we count how many classes are marked blue and how many red, we see that this model is much more accurate. But it's not perfect. A perfect fit is actually a model that kind of maximises distance from the line where it's 50/50 to your training set. So, yeah, exactly how do we hold this line to rotate? How do we hold this line to find an optimal fit? So in in linear regression. But also in Perceptron approach. What to use is a gradient descent.
And here on the Y axis, we have our so-called cost function or quite how accurate our model is. So here it's basically at the bottom. You have no era or a minimum possible error. And the higher up you go, the worse your model is performing. And you do this by basically you're looking at the distance towards these dots and encoding when it were. Categorised correctly or not. So you can initially the weights are generated randomly.
It can be set to zero, but it can also generate randomly. It doesn't matter because once you have one outcome or like a couple of dots, you can build your gradient and the gradient tells you how far off kind of the steepness of the slope tells you how far off you are because what you want to get is to a completely parallel slope. When your gradient is zero. You know that you you've gotten there. It's it's a bit more tricky than that. It's a bit more detail than that.
But the beauty of gradient descent is that, you know, you always know where to go, in which direction and by how much you're off. So we can adapt your steps with time. And so you don't have to take a tons of tiny, tiny, tiny steps if you own in your direction, than you would have to do that. But you also know by how much approximately you need to go. So if you have very few data points and very few parameters, you can afford to do gradient descent as it is.
But I will also talk about the trick that allows you to do it faster. But let's first go into a slightly more complex architecture. So we only had one person trying before. Well, what if we have two or three or 16 or 200? We still have our inputs and then we still have our weights that we're fitting. And here we have two linear models and they will have their own beliefs and their own biases.
And then they can encode two different linear models. And by combining these two different models, we can actually apply classification, for example, to nonlinear citations like imagine if you actually had this kind of class dependency, that blue dodge. So it would basically have here maybe one feature and another feature. And we know that blue dots are typically at least half or over. All of our range on the x axis and more or less, half or more on the y axis sort an end situation.
And this allows us to encode for this linear nonlinearity. And of course, the other more layers we have, the more the more neurones to do. So this one, this in layers called the neurone, the more detail we can add to this one non-linearity on as many dimensions as we want. So. This is. This is where the. Again, this is the basics of all deep learning approaches, more or less. And a very important trick they see to cheque messages here. Sorry, okay. Apologies. Let's go back.
Doesn't want to go back. Meet Christopher Baqa'a in the bottom. Yeah, yeah, no, it's it's working out so yeah. So another very important aspect of deep learning is back propagation. And that's basically when you so you have your architecture before we had a very simple architecture was just one hidden leg. Like I said, you can have multiples and very quickly you get a fairly complex system where you have weights on every edge. And so you take your inputs. You'll generate your weights.
You apply them to these new neurones. You activate them or not, depending on the outcome. Then they themselves generate weights and so on. And essentially you get them to the output. For example, if you have a binary outcome, they say, okay, was this input from this example? It looks like I have maybe a probability of point six on one class and point four on the other class. But actually. We know that the right answer is just 100 percent on the first class and then zero or like.
Yeah, and zero probability on the other. How do we correct for this so we can calculate our cost function and we back propagate this thing? Well, yeah, you need to correct these weights in this direction and these weights misdirection. And this is called back propagation. And when you have it go forward, so do the wave from input awkwardly is called feed forward pass. And then the correction of these weights, the adjustment of these weights is called back propagation.
And one cycle of those two is called epoch. And usually when you're training you deploying models, you have multiple epochs for very, very simple scenarios. Just a dozen might suffice. But for very complicated data, you might need hundreds. But gradually, your gradually your model will converge. And go adjust the weights so that any input will generate most of the times correct results. So you already see that even very few days and very few neurones.
We already have so many parameters. These weights, for example. Right. It's quite natural. That was larger data sets and larger system architectures. The whole training process will slow down even worse, much faster computers that we have today. It's still gonna take too long. And and I think the great news is that it doesn't have to take this long. You don't have to build to to do great in the sand approach for every single input, every single future what you can.
You could take shortcuts. So some of the shortcuts are called Dyster Stochastic Gradient Descent and drop out. And they work on just two aspects of the neural net. So stochastic reading descent does gradient descent, but not on all of the inputs, just on a few of them at a time. So it takes Bachus to just select them randomly and does the gradient descent on them and drop out does kind of the opposite. So that randomly switches off nodes in in the in the hidden layers.
And that means that not all of them are activated at the same time. Not only do these tricks help you to train your network faster, but they also make it more robust. Deep learning networks are amazing and overfitting. If you give them the chance, they will just memorise the whole training set by heart and. Well, yeah, basically give you a perfect results on your training data and be awful and useless on any new data.
But yeah, the stochastic gradient descent and drop out help you to make these models more generalisable. I once heard a lovely metaphor on the podcast saying that so neural nets are very good at finding Mockett maximum optimum. Basically. Yeah, the perfect the perfect victims in your future space, which often look like dead deep, deep wounds, deep wells once and your network finds it. It will not be able to get out of it until these stochastic gradient descent and drop out.
It's like the model is wearing very big boots. So it cannot fall into these wells. So I think it's it's a nice metaphor for this. Still, with enough epochs, you can always overfit and you need to know when to stop so you can still apply old good regularisation. And you might know from elastic nuts, for example, and what lot of one know.
To. But also as a. Model, you can need to watch out for gradual divergence between the train dataset and the rotation because you should always have to reach those but your data to train and validation. And you should also have to hold out the ultimate test. And as you're training your network, you you measure the air on the train, right. To adjust the weight. And you can get that error there. And you should also then apply these weights exactly.
On do validation data set. And as your model trains do, air first goes in synchrony down in both. But once you remember, your model starts memorising your examples, then the era when the train will further decrease while invalidation it will increase. This is it. And then, you know, OK. You to stop here. So that's also a very important aspect to keep in mind. How are we doing? Oh, [INAUDIBLE]. Sorry. So this was general information about Perceptron and Amobi.
Let's look into more detail and to CNN's. Because I think they are very fascinating breakthrough in machine learning. And they really managed to do something that people were not able to do before. Also, for us as humans, it's really easy, like we have an amazing visual sensory system. So it's really easy for us to recognise objects. So we look at this and say, okay, five five Cat four computer is not so obvious.
And if if we did the machine learning at the old school machine or an approach, we would have to handcraft these feature extraction and we could come up maybe with some filters and say, well, five, what delusional look like. There's always this kind of air that's going from left to right. And there's like a sharp object just above it. But then you get another five, which is all smooth. And then you filters not work anymore.
And then try imagine coming up with all the rules that tell you that this is a cat in a mask and not a tiger and not a lion and not a puppy. So imagine doing that. That's nine impossible. People have tried, but it usually failed. So D.C. is being part of a part of the Decoding Family Day due to feature extraction themselves. And they also solve a very important problem in images because images can be really, really large. I mean, this is a very visual image, the fives here.
But often times, for example, especially in medical imaging, that the resolution is really, really high and you get really large images. So how do you extract information from these images that you can condense it and still keep it useful? So CNN's they have two big tricks here. They have convolution layers and they have pooling layers. Convolution is basically when you take a small window of pixels, maybe like three by three, you can be eleven by eleven, seven by seven.
You are the architect and you apply filters stood out and you and your filter has some pretty fine numbers and it's often randomly generated and you just do matrix multiplication. This pixel to this pixel just picks up to this pixel. This is not a pixel. This is just a number. And you send them up and you put them and you record them basically in your next layer on your output layer pooling does something even more simple. It takes three by three pixels, for example, or some other patch.
It's usually a square patch and it just takes the maximum value out of those records. So here you see we have six by six. And that's the output. We got two by two, because this patch of three by three went one, two, three, four. And there we go. It extracted the maximum number. It doesn't have to be macsween can be medium, whatever you prefer, but it can it helps to condense the information. So these filters, they are quite curious. Like I said, they can be randomly generated.
You can prespecified them if you want. And depending how they are built, they can highlight certain features so they can either detect edges. Right. If you have height, like higher values in the middle or lower values in middle or vertical arrangement. They will detect edges in the image. They can sharpen the image by kind of depressing values on the outside and highlighting values in the middle.
You can do blurring of all kinds of things. And what's interesting is that I will show you typical on your on the architecture. In a moment, but essentially by applying very similar filters from layer to layer, you are able to extract more and more complex features. So at first divulges the text, the edges in the image, and then they will detect more complex items.
And then in the third, fourth, fifth, say, you will see whole object. So this is this is really a very interesting property on these filters. And this is what a typical CNN would look like. This is not a very large network. Like I said, you are the architect. You decide on many hyper parameters. How many layers to have? How to arrange them. What should be the size of the filters? How many filters should you have?
But the still the kind of the flow of information remains the same that you start with. Your image is the input. And then you do. Convolution usually start with convolution and you extract kind of more. What you highlight important information. We using these filters. So usually you you if you have one input image, maybe it has three channels, three colours, but you usually apply multiple filters.
So you get a stack of Dumpty's. And then the information from these layers is passed onto the pooling and you condense the information and then you do convolution and this condensed information. Why not? And then you can negative it further until you come to a well, you don't have to condemn it to the point that you have one to one. Here in like a pool or a convolutional layer, at some point you just say, okay, I'm just taking all all of these.
What's what's remaining of this? And I'm turning it into a vector. And this is your last your dense called dense layer. And this is the last vector of neurones that will have these. They will be activated in the activation pattern, will then be mapped to a certain class. So here we have just two classes, pathology, non-physical, not pathology. But they were also very successful experiments where you could train the networks to recognise that in one hundred classes.
And I don't think we have the time for this. But I really, really encourage you to go to this interactive example, because you can actually see how the neural net works. And it's it's really great. And you can click on all and get more detail and you can really see how the information is flowing. Right. So what I've pointed out before, that filters in higher layers, capture more and more general information. Well, we can use this property in the technique. So that's called of propagation.
And it's just one of possibilities to help to interpret your neural net. Because, yeah, with so many parameters, it's impossible to look at every single weight and say, oh, no, I know what it means when I don't know. I know what it's doing. Now, here you you have to find a way to output the results, which kind of contributed to network's decision. And we're using guided propagation and similar techniques. And you can really see. Okay. The network said that this picture is a dog.
And when it said so, it was taking this consider this information into consideration. It is regarded this as important. There is an urban myths about some borough, one early machine like deep learning experiment where I think they would. The story goes that the researchers were trying to classify tanks, maybe like which country it came from. And they on their training set. The model was very accurate. But later on, it just couldn't.
It was complete mess. And once they looked deeper, they realised that the background was contributing more. So they somehow all the tanks were always in snow or on a desert. And that's what the model learnt to pay attention to. It turns out to be it's very likely interest in urban myths. But it actually does happen. There is a real example where x rays were marked very, very finely by a clinician, whether they had pathology or not.
It was just a little pen mark somewhere in the corner that nobody else noticed. Yet the modern modest it is. And it had perfect performance. It didn't care what kind of image was on the X-ray. Just a mark means its pathology. And then, of course, of a new images started coming in that didn't have that mark because they were from a different hospital, from a different condition. The model was completely helpless. So it's really important, I think, to to pay attention to these things.
So we've talked a lot about how you can speed up the training with dropout and gradient stochastic gradient descent. Yet when you have very little data, for example, or you're really pressed for time and compute power, you can have another shortcut. And that's called transfer learning. So the trick is that whenever whatever your dataset is, is an animal's is a cars. Is it x rays? The first layers are usually learnt to detect edges and then little just little properties of the images.
And if you think that they are common enough that they are also common in your problem and your data set, then you can actually use an already pre trained model like image that you can loaded using sound frameworks that I've told you about. You can loaded into your system, use the weights from as upward wage from a certain layer. For example, in the first four we say, OK, we don't want to train or model. We just cut them and transfer them to our model.
And we use them and then continue the training further on so that it shortens your time considerably. And the results are still very good to the point that when I mentioned once that I want to train the model from scratch. People like why? Why would you do that? They are working just fine on transfer. All right. So that was a very quick introduction into CNN. And we still happily have some time to talk about Grauwe new owners, because this is a very exciting area.
They really took off just a couple of years ago. There's new flavours being born every month. And the applications are really fascinating. But first, let's talk about what the graph is and how they're different from other inputs. So we know already that CNN's and Arnon's can work on images or text or speech. But what if you have a graph? Right. So here we have a pretty complex and census graph. It was generated randomly. We have five nodes and they're connected.
The Connexions actually directed. So you can go from one to zero, but not back. Right. We don't have an arrow that goes back. Not all graphs have to be so-called directed. It could be just just a link. Then it can go both ways. And it can be anything. It could be, for example, that this is a story. This is a surgeon. No, no, no. Wanted a surgeon. And No. Zero is a patient or. I know. This is a symptom. And this is a disease or vice versa, or this is a drug.
And this is a side effect. Well, these are two proteins and acting direct so that you can encode so many things. But how would you do a deep learning them? Graff's theory itself is very, very old. And even without deep learning, we can do amazing stuff there. We can come to neighbours. This is the adjacency matrix. So you can see how many the nodes are connected.
You can do community detection. So one of my in one of my postural trials, it actually was a very important aspect on Twitter, for example. But if your graph is really, really complex and you want to do a very, very subtle things in it, like maybe you want to assign nodes, some nodes, classes, and you have labels for most of them, but not for all. How do you do this with this tradition, that method? It's pretty tricky.
And for a long time, this was a real problem because graph is not structured right. If you'd taken it, it's not it's not an image with it's pixels that are a grid. It's not a text where every word is forming another word. So how do you do convolutional know, for example? And you don't have to do convolutional. Not all. All graph. One, that's a graph convolution. That's just one flavour.
But, yeah, it's it was it was a big challenge. And I think what helps here is to think that, well, an image is a grid. And what do we do? Convolution in it. We just take information from these nine nine pixels, for example, depending on what our kernel is. And then we updated. We turn it into one pixel for one one value.
So it turns out you can do the same on graphs, but instead of having a predefined grid and always needing nine or any other square, a number of note, you just say, well, I have this node and I have its neighbours. And imagine that. So what you need, though, is that every neighbour or every node has some features. I know it's a patient, for example. Then you have their height, weight, age, blood pressure, temperature, and you need to make sure that every node has these features.
And also in the same order. And what do you do when you want to do? Graphical evolution is the first step is to take information from each of the neighbour, from the node in question, extract features from those neighbours. And then, for example, it can do any kind of mathematical function that lets you choose, for example, averaging. And then you take the features that on that note and for example, average.
That was the average information from the neighbours. So essentially like your slushing information about because you also do it for every node. What would that be good for, though? Like, why would we need to propagate? This information is called message passing on. It's useful. In case you have labels on these nodes, but not on the one you're interested in. Then you want to classify this note. Is it a patient at risk? Right. Is it is it a fraudulent account?
So, of course, what's very important in this case is that your graph, the Connexions, and it actually makes sense. Right. These connexions cannot be random in an image. It's dictated just by the position of the pixels. And then it makes sense. Imagine you just scramble the pixels in the picture. You yourself are not very able to recognise what it's a cat or dog. So there are the position makes sense here. It has to make sense as well.
But yeah, essentially, this is one one of the things you can do with graph convolution. What are other things that the genomes in general are good for? So like I said, for example, node classification maybe can help you in disease diagnosis. Protein. Protein interaction. Drug protein interaction. That would be called link completion. So you have you have your note in the graph and you want to figure out whether they are connected or not.
There is also a very interesting technique called node embedding, where you just want to condense the feature space into a just represent clusters in your in your graph. And then you can do that by using node embedding or you can classify whole graphs. Not just the node in the graph, but whole graphs. And that can help you. For example, was muk molecule class prediction is a toxic or not. So it's the diversity and application that was just medicine, right?
The graphs are useful in social network analysis. Of course, like US, bank banking, fraud detection, all the things because very large, that there is very large diversity of information and you can encode in a graph. So, yeah, really fascinating field. And when when preparing this lecture, I went through maybe like 40 different resources and I want to highlight just a few of them here and the the kind of split into chunks.
So these are very nice, user friendly talks and blog posts on Quackenbush networks. Then we have a very interesting talk and very recent talk by Michael Crichton. And he is in London and he's very, very active in this area. Then this is number four is a very good review of methods and applications of graphene, your networks. They really go through all the possible flavours and all the possible uses of craft.
In deep learning, this is a book that is also very fresh and it's available on print online. And again, if you want to go out and do things and programme and train your models, some graphs, they're already taught how to do that. So there is a deep graph library that is very stable and works even on top of other tools and deal frameworks. And for example, for Lifesciences, deep cam is a very useful collection of tools.
It's not exclusively deep learning on graphs, but that area is very occupies a large chunk of it. So, you know, I hope you've learnt some new things today. Those of you who are already deep, very practitioners of it was a good recap. I thank you very much for your attention. And I have one last link here, which is about attention networks, which is another very interesting technique in and in your and that's that allows them to improve their results even further.
And the outlook that we even have some time for questions. I would especially welcome you to some more practical questions or machinery and industry or this stuff, because we wouldn't have time to go through Damascus and index. But if you go through all the links that I've posted and in those in the slides, you will know these topics better than I do. Maybe.
