A Theory of Weak-Supervision and Zero-Shot Learning - podcast episode cover

A Theory of Weak-Supervision and Zero-Shot Learning

Jun 09, 20221 hr 4 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks.

Transcript

Thanks for inviting me. The second I got this quote, yeah, I just, you know, wasn't giving a talk on online is my way, and I was really struggling with how to do it like this for a year. And the reason I say it is that I will not be able to watch the chat while I'm giving the talk. So please, you know, it's a small goal of just jump and ask questions. You know, don't be shy. So we'll just interrupt. We don't be English. Interrupt me whenever you want and ask questions. Okay, good.

I monitor the chat as well. So don't worry about that. Okay, thanks. OK. OK. So some got. So there's one funded academic used to be that. Well, they have amazing theory, but you know, it's just really clear to practise what happening today. Machine learning is actually the opposite. So we have a lot of your mistakes that seems to work. We'll talk with the second about what this means since work. The theory behind it is relatively weak.

And so my group and I and many other people in the theoretical machine learning our guide to business theory and understand why what works works. And of course, once you learn, when you learn why it works, you also learn when it doesn't work, when it works and what it doesn't work. And that's an important issue in machine learning these days because.

I think that people also in the industry push software and claim that, you know, it does, you know, whatever, while a particular when to use it for critical decisions, one has to be very aware of the limitation of the software. And of course, I don't know this issue about bias and other social issues related to modern machine learning. But before we get even to this one issue, this basically a technical question.

If you had been trained your software along the data, if you don't know what is doing, then you know you should be very careful. What you promise when you sell, the software will be published. OK, so I mean, I think this audience doesn't need to hear that.

But when I talk to more practical people, I kind of disdain by saying, Well, you know, if you only if you're the only thing you know about your algorithm that works well on your desktop, that basically you only think, you know, is the spokeswoman that they beat and you don't know anything else. And in fact, any generalisation claim, of course, required a. And that's a pseudonym to start the theory. And if you want to have a model, you don't know what. Claims you actually have about your suffer.

So my background is theoretical computer science, intellectual computer science. Basically, you're going to propose we're going to have a paper that looks very much data, but I find it very depressing. Also influenced by students to move into this area of machine learning. But the focus of what we do in our group is trying to build theories about. And so that's, you know, the first step in theory is is trying to explain why this will work or why do people need to think works and windows to work?

And then we see that doing it in a more or a legal sort of audience is not an invitation. Naturally, when you really understand what you're doing. You often can get better regulators. And then again, it's very important to quantify the limits of the work. So in this particular talk, a good talk, a lot of very specific, very hot area now in machine learning, which is called zero short in future planning week supervision.

And this slide is usually at the end of the talk. But I want to emphasise one thing in the slide. So to do this kind of work before I found out the best way to do it is like when ablation was a very practical group. So behind all the names you see above is a rising star in experimental machine learning. And so he's as well. This sort of works out of collaboration with him, funded by a big deal for good. And so unless you're of here, here and here is my graduate student also sales.

It was my graduate students, but. A delay in Andrew, undergraduate student that will help you in experiments and they work with Stupa and Christina is a postdoc that is basically going to two groups. And so while you'll see beauty in this, folks will also have some experimental results. And I just want to give good treatments were not that were not done by Mike Wolf with all of the details on the expertise, and that's why we love working with experimental people. OK, so what's this hot subject?

Zero short. And if you feel stunning and if you open our lips or milk or I see a metal, you see the results of do so. Yeah, and it's kind of the subject these days. OK, so we all know that the big bottleneck in machine learning is data. So, you know, the way we were taught machine learning is, you know, basically you learn from examples and try to generalise. And in almost all application, particularly when you go to deep learning, you need a little bit.

And the quality of your results highly depends on the quality of your training data and decide not to print. But when you do those expense, you often you have to pay to get it. Sometimes you can hope to get it anyway anyway. And so under this big umbrella of zero short and crucial selling, the idea is to learn with no examples or no training data. Always very few bland. OK. And it seems to be a very. Impressive has seems to have impressive results in practise.

And. On the other hand, this whole idea of studying without these other sounds like a Fool Go edition sounds like complete nonsense. So it's obviously very challenging to try to build fury around them. OK, so that's this attack of the interest. Just because it looks to us, getting this addition is like complete absolute. How can you study was no example. And that's who, of course, is the name is actually somewhat misleading is food that you learnt with no planning.

It was very small stuff, but you learn from some other information. And so you have all sorts of other ocular information that you rely on in order to do the learning. OK. And there are. Since it's a relatively new area, though, a lot of variables of that fall under this name. And one of the things we did is though addition is first of all, we don't obstruct from what's go down. So we are talking about two approaches.

And one of them is the future of learning where the information coming from other classifiers that I classify for different data. And we'll see. They deal with that. It means that I talk a little bit about geotag learning, and his idea is that I have some information, dozens of abuse that a doctor knows. My book might target classification. OK. So let me give you some High-Level examples before we get to the legalities.

So if you are someone already trained, almost perfect classifiers for tomatoes, ducks, aeroplane leather and OK. So you have these tools in your head. And now I tell you, Oh, I need to classify five. OK, well, so you seem to be able to learn what's good and what's not and what's the talk and what's not and so forth. So the idea is that I want to build on having this glossy photos in order to classify a new book. So imagine the way I have classified formulated classification might be related.

And I want to somehow aggregate on them in order to get a classification for when you talk. Another way this is done. It's almost the same, but in practise it looks very different is I have pictures, images of abuse subjects and I have features associated with these figures. So, you know, I have to carry on with their stuff, but bills that are possible. OK, so now for each of them using this images, I can learn how to detect features like black, white, brown stripes.

What the it is, maybe. OK, so I can learn from these images. Classify you for these features. OK, so now I'll give you a new target, and I give you the target in terms of features. So you already know how to find the features in the image. And the question is, do I give you enough information here? So you can actually. More or less accurately classify with respect to, again, a new unlearnt dog. So this is kind of the high level idea behind all this is your short.

And then are you going to get on with two episodes? But since it's a new area, many papers have, you know, somewhat different formalisation. And one of the first contribution is we don't really follow it in more as general methodical format. OK, so just presenting it as a picture. So the hope was this is, I guess, some unlabelled data. I want to classify X and then I get some classifiers. But they are not for my darling. So there's a target I want to classify with respect to some target glass.

I don't ever classify Orthodox, but I get to classify it for religious targets. OK, so this can be, you know, the colour or whether it's the cocoon or stuff like that or other features that you can find index. So now the only information that you have about X is its classification, according to this, and features all related items.

And I want to I want to get you some this information, and in order to give the labelling of facts according to the new stuff and to do a way to do it is to actually information you have is in the case of if you label example, the only thing that you know about the target is full. A few examples, much fewer than what you need to other really classifying the classifier for the targets, you know, from scratch.

So this is for the few label examples. And the other way is that you don't have example to not at all. Well, there's some other way in which you should get some information about the target and you get it in terms of eight different metrics that give you a conditional probability. We'll see the days or attributes to the target. OK, so we'll see three results, and let's look about it, a weekly supervised. OK, so now let's get to a little bit more forward.

So we have a distribution and we have a classifier which we don't have a should, which we don't know way to find on the Domain X. And then we have some set of labels classifier, but they're not classifiers for what the classifier? For something else? OK, now for each of these related labels or function survey. We use. The few example of the small training set in order to estimate the flow of this classifier.

With respect to the dog show, if you think about it of, say, the classified view, I survie is whether the image has a black. And why is whether the image is a and so highly of the bill, which is basically the possibility that we have a Typekit is not a firebox? Okay. So again, the only information that we need to have about X is its classification.

According to this data request, files that we're looking for, a function that map this effect also shows that everything is finally here for simplicity. So we have a vector binary vector and we want to make it to the classic, the target classification. And all of this, of course, is the possibility of the domain. The reason for the classification justification is different than what we get from the function of this binary vector.

OK, so the first thing we observe as we start and this is the first work that we did in this area. So the first thing we observe is that in practise and for those who know, for example, there's been a snorkel. So what they do is basically they both sleep for what's called a council. OK, so what's crowdsourcing for? Crowdsourcing is what Amazon is doing for you and stuff like that. So you give a images, say to a lot of lives by hand.

They label the features and there are so many levels and somehow you have to handle it all because some of the labels are a bit more careful to do some images of how the deal with than others. OK. So but what's common to outsourcing is that all the sources that they lose, they're working for the same stuff. We asked them whether the image has a door or not. And they try to answer this question. So they were doing a better job than the other.

But basically, they have the same goal. They have to classify an image according to a particular sort of so reasonable assumption and crowdsourcing and all the good and the outsourcing they use, it is that the ill, with respect to the real answer, are independent. So if the. The labels are not coordinating in some the same way, and why should they do that? They assume that they make mistakes because it would cost because they're so careful but ill of. OK. Can we assume the same thing here?

Well. If I look at the question, you know, does the image has real? And it's not a final thought. OK? Is this event independent of the question? The image has a bloc that is notified. Of course, they're not independent because, OK, so this. Assumption that was used in so many important is that you walk and you can show that the whole can be significantly large, so assume that the individual does have it all on one's own with respect to the targets.

Okay, then, if the liberals have, it will be dependent. Then by taking the majority you know you, they'll basically, of course, goes explain exponentially down to zero. OK, but if they're not independent, that is not the case. And in fact, there is no greater than, you know, the median age about all the Mongols. So that's, you know, although people say to themselves, you know, so can we do better? So the truth is that if Malcolm, you know, with few, we can help you get a bit unbelievable.

So if you think about it a little more carefully, you see that actually dependency is not necessarily the easiest case for one of us or even the one to give you the most information. So I feel that we have in this picture, we have three glossy files that will be one that leads to the retreat and assume, of course, we don't know that, but the CEO, each of them has intellectual and. This bug love you. Oh, oh, okay, so now if the animals out this, you're not independent of the joint.

Then a majority is always, always quick. So that's good. The you don't have independent people, but actually a majority of them is loyal. On the other end, here's the independent views. OK, so you know, each intersection available as a possibility to OK. And now if you think majority, then you'll resolve have. OK. So in fact, independence is not so great if you want to take majorities. So how do you leverage of this phenomena? OK.

So I actually really see where we're going. So let's look at this even worse example. So assume that we have a classify W4 and W5 that there L is a subset of the LW three. OK, so now if we take a majority, then all these are always going to have doughnuts. OK, but if we eliminate these to classify four and five that are included in three, then suddenly majority will be perfect up.

OK. But of course, we cannot do that because we don't know where all the ills of the classifier would suspect with the new. OK. Of course, we could do it in pictures, but we don't really know that. OK. So our goal is to start with this set of we classifier that we get, and the idea is to get. A subset. That has almost destroyed it. So we want to get to go get to this basic situation, but we want to take out of a lot of glossy files. We want to take a few glossy photos of the ill.

But of course, we don't know that else. OK. But then after we find a classified of the goals, Ronald, and we're just taking with George as before. OK, so the only idea is we take it out of the well chosen substance instead of a majority of the whole of the cloth. But of course, we don't know the whole and we don't know if there was just Jones or not. But what we know is that. This classify this class, if you disagree on more points, more places in the domain, then this classified.

OK, just to classify, you basically agree in almost everyone. Well, this does if I disagree on area or on measures to help people. OK. Now, the important thing is they. To measure how much disagreement you have between two classifiers, you don't need label data, you can do it on only one data and unlabelled data is cheap, the expensive one is labelled.

So what happened now is we're going to leverage on unlabelled data to get information that would help us find joins more or less to join classifier so that we can think majority of them and get them much better results. OK, so not to get a little bit more formal. So again, we have the weekly emails. We have a lot of unlabelled data and we have a small set of labelled data. Now we know for each label, we know the URL of this.

We have an estimate of the label of the ill label with respect to the target because we use label data to estimate. Now it's important that here we are estimating a linear number of comedies. OK, we estimate the limited number of pills. OK, now we estimate the medley of the disjoint disagreement. OK, so now we estimate that from the pills. But we can estimate them using unlabelled data. OK, so now what's the URL that we would get? OK, so if we choose a subset a out of all the.

Classify and we'll do the majority out of this subsets, then the worst case, ill, we have. OK, well, it's the L of this, a majority, OK, but over what's so over all possible classifiers? To satisfy. This week to this matrix. OK, so we're not going, so we mucks over all possible, in particular the stereo set of weekly labels. And the only thing that's the sleek, visceral view is that we know all the set of feels.

Victorville, and we know the metrics over, so the worst case they're all here is the marks over all set of labels that satisfy these Epsilon and Andy and the L on that subject. OK, so so far we're interviewing about what can we do with us? So I won't get through the day, but basically you can. Figure out the maximum air or using a linear. OK. The Liverpool game is, you know, the number of concern you have here is a function of size of life.

OK. So. So now for a given subset, we can figure out the maximum well that we can get. And again, I emphasise, was getting together again that we don't really know what's, how it can be the targets with respect or distribution. So the only thing we can do is we take the monks over all possible of all possible examples. What we get was we will classify.

OK. So for a particular particular subset of I know how bad it can be and I want to find, you know, the low bone for any subset so as to look at it mean overall. So now I can optimise the eye that they choose and for each one that they choose. I know what would be the worst case ill for that one. OK. So in other words, after all this? It's not easy. And in fact, the solution for this would be exponential. It would be exponential in the size of the substance in case.

OK, so that's good as on this case. Well, if he if you want large gains, just become more and more expensive competition. Okay. But for three, if we are willing to adjust with subsidies of three, yeah, I think so. Yeah. Sorry. Excuse me to interrupt. Could you go back to the if I start when you are going to introduce your variables and this one?

Yes. First, I have the question in the is non overlaps definition that you are going to define X with distribution d. This is the overall distribution of all data set of different tasks. You mean? This is the distribution of the input of that we need to classify. If I understood correctly for your problem, you are going to consider a different task as an input. Yes. I mean, they are different, but we assume for simplicity, we can also do transfer.

But let's give it simple. So the tasks are defined on the same distribution. Understand. OK. And what do you think about the huge database of images? Mm-Hmm. Okay. And you know, there are all sorts of things in the images and but the distribution is over all images. OK. And in your analogy, would you consider any sample complexity, I mean, that restricted the number of, for example, all Libyan data or labelled data or not, it is independent from the number of Libyan that are on it.

Well, OK. So the Labour leader should be large enough so that they get a relatively good estimate for the absolute. OK, but obviously, I need a load of an. You know, squirrel data in order to, you know, it was high probability, get a good estimate for those who want to you on that. OK, and then the Labour leader, well, now let's do it without it now. And maybe they don't call us. OK, thank you. But it's. OK, so.

It seems the solution is not fully satisfied, really for computer scientists, because, you know, if I really want to optimise respect to large sets of, you know, this might be exponential in the size of the subset of time considering. But then for at least three, four three levels, we can get, you know, close close a reform solution. We don't even have to do the, you know, poor guy. How good is it? So now we still we pause purely for a second and we get to experiment.

So this was done on a standard benchmark of images anyway. So you have all sorts of anyone's this data let you classify to that, OK, and what do we have? So the beauty of this is that I would explain when we all use names. The other two, although it will be good, the extension, which are not fully analyse the extension beyond three, where you add more and more, we classify.

OK. And what you see here, which of course, is not purely about up, but in this particular data, it seems like three is as good as many. So if you choose if you choose the optimal three we classify, you already get enough information as if you use more than that. But we again, this is just data. And can I just ask what happens if you only use to label us this works? Oh yeah, because. OK, so I was going to put about later. But here you go.

So what we get here is. We had the margin of the air, which of the. And the pillars of the Coalition's. OK, now you want more than just the words coalition in order to get votes. Well, we'll see when we get to a more sophisticated guillotine with the word complete. OK? And then the yellow one here, this is the one with them. This is this sound of a crowd sourcing of water, which is also used in in this context, which basically you assuming the bench.

And if your shoe independence, if you take three out of four, you take some of the labels and you don't have too many of them, then you get a lot of goals somehow in the example we have. And again, this is not purely somehow when you go through many more labels, the majority seems to perform much better. So maybe somehow it clicked for independent. But it's still even for a lot of labels and it's very heavy competition.

It's still those of each, you know, an algorithm that goes just use the vestry and also supported by George. OK, so that that is right, but it's very limited, and it was kind of the first few people in that area, but it was only for binary classification, only for zero one loss function and you know, a lot of other limitation for people of the complexity. So work, which a little discussion is sort of a more general framework to build on with labels.

And now we go to use either the lost function and also we're going to be doing multiclass classification. That means you classify between more than two classes. OK, so here's the high level idea, and that's really what they really want to emphasise. So we have a set of unlabelled data. It's a view that we want to classify. So let's this you know, that would you give the green one B all the possible maybe. So each point you is the vector of labelling for the label for anybody?

And we hear we at this point, we have all possibility. So now. We're going to leverage on a small amount of labour data. In two ways. Okay, so first, I'm going to use the naval data. To get an estimate of the ear of each of the we classify it with respect to the target. I'm glad because the Labour Day is labour with respect to the date to the top. OK, so the loss of X and Y is the expected loss of the. Oh, that's mix up you. This is the expected loss.

Well, this should be here and it should be here. OK, so the loss I explain why it is the loss expected loss of a. With respect to the target. OK, this should be this line, so OK, and that come the second week. Given a classifier feasibility and the labelling of all the data of some people. Since. With weeks since we know she's alive, we can estimate the arrow of survived with respect to the goal of supply. Was it when when we use for this, Victor, why?

Okay, so again here. Here we are still awaiting the arrival of resupply of the weekly limit with respect to the Typekit. [INAUDIBLE]. We estimate. The arrow of a particular labelling vector, why? With respect to free supply. Why do we do that? Well, we'll get to a more formal way to do this later. But we kind of assume we can assume that the Alpha II. On the correct label, why? Should be is on the same level that we get, you know, full day, full day in labour data.

OK, so this labour data, so I saw this. Here we have the LVI with respect to the targets on the Labour Party. OK, now if why is it labelling? Then. The M.O. of CIA on one should be about this one. Oh, great. So again. Using the Labour leader, we got an estimate of the ill of I. We spoke to the target. So now, if we generalise, then for any quick labelling, why the only data? We expect to see to get the same type of air or flight of resupply.

OK, a few I had this ill was fake to the targets on the Labour leader, it should have the same on the correct labelling of the earlier unnamed bit. OK, so what happened to you is that for each of the labels. We get a subset of the whole domain. In which the labelling, according to Duff, they've has about the same M.O. as it has on the label beetle and the toy. The true classification has to be in the intersection of all these things. Questions of. I know that's kind of pushing it.

OK, we'll see it again, Hugh. OK, so let's a way that it would be a six. Of possible solution that we consider. And what do we know? OK, so be more formal. It's a set of vectors y which give any inputs and unlabelled data in. And in a cave, which is the number of possible classes, and we assume that why give us this big tool of a possibility so we are doing sophistication.

OK, so for any particular input, x y could vector things K that gives the possibilities, but X is any one of the key classes. OK. So now, if Y is a feasible solution. Then we expect to see CIA l on the way. Would be the same as or close to what we saw as the LVI on the label data. OK, this value we estimated from the label data. So we assume that this. Value the the real value of the universe, a labelling move to only give us about perceived value plus minus some ill.

OK, so if we choose the will quickly, then what we get is very high probability, the total solution, but have to be inside this. OK. And this set is basically the intersection. Of the values of way that satisfied this into. So you take the same error margin for every fight I. Yeah, that's one possibility. OK. Since we don't have I mean again, we don't have any better information. There's no reason to think different levels.

If I knew if we all we that the ills of, you know, have different distribution of something, then I could never do that. I don't have the information. This is kind of, you know, worst case analysis I'm really analysing using what I have. OK, so basically now I build a set. Which hopefully, hopefully is much smaller than the total set of all possible neighbouring. And with that hope, or will it be the correct labelling is inside this sect, which has been the intersection of all the all the.

Said the fund for particular neighbourhoods. Question so far. Excuse me. And help ease the number of lives later, yes. So we did meddle in the definition and here is a young fellow who examined before it was ancient and at least the number of naval data. OK? Yeah, yeah. Yeah. And if the M.Ed is going to infinity and we consider the aim is going to be constant, then the Gunma is constant. Yes. Well, if anyone goes to infinity, then I'm betting pool standard super was burning.

Yeah, exactly. I mean, this setting means that if the internet is going to infinity, this formulation means that, uh, again, we have a constant gomaa. However, we expect that the goal should be zero because the infinite. Yes. Yeah. So, yeah, so OK. So you get this limit because otherwise you're right. I mean, you can use hosting or derivative and get a bit of. Yes. OK, thank you. Yeah. Good know, that's a really good point, and this is.

Okay. So, OK, so we know that the solution will be inside this, OK? In a perfect world, one victim. Of course, you realise it's not is going to be a set, hopefully not too large. We quantify that in the future. So the next thing we do is, again, we're going to say, well, if I choose a particular also, OK, so let's see. We some function of without parameter is the solution. And so we have a set of possible solution to compete and function of the fees.

OK. And we want to basically minimise minimax optimisation on the loss. OK, so what's really what's so? Of course, we would take the minimum over data. But then we're left to take the maximum over. Why? Go over all the possible distribution, the computer. OK, so the first of all, I'm assume that we fixed it the. Okay. Then again, finding the material amongst all the feasible ways. So this again could be done in April. That's not the issue, but the hard part is the meaning of over on paper.

And we also believe in general and but even if we take convictions, we still have the the problem that the MAX, you may not be defensible. So we can't do that intercept. So I won't go into detail. That's a completely different subject. But there is this way of subclavian analysis, which basically estimates gradient when you cannot assume that it is what you do. Okay. And you can show done in a number of steps. It's basically a polynomial of values. You are going to get it better.

You can estimate Petar Tilde here. That is going to be very close to the minimum exclusion. And think my word, this is designed to get into this for this sub gradient descent at this subterranean method is actually up of its own. I think you can trace it to Newton, but it's very practical when you say you cannot assume that everything is different. OK? So in kind of cutting over time, we can get. A. Obama did better.

This does such that edge to the edge of the field, I would be almost a solution for the Mumbai problem or give us guidance, when can we do it? Okay, so we need a set, of course, to be convex. And we need the laws function to behave, you know, smoothly and in particular, we do it for at least one continuous function. Again, you can choose ideal restriction, but you need some sleep in particular conviction to get anywhere.

And you know what? Those function satisfy the populace of those functions and satisfy this condition are soft marks and by, you know, some of those function in the supplement. OK. Now, I just want to say one words. It was a dramatisation of so so far that I didn't elaborate too much before we gave a solution, which is a soft classification. But you are too old actually to say how far to us, how far this is from their hard classification.

If I just want to be one, I can also be directly pulled out of the because somebody continues to be awash with this. But we can actually get a pretty good on that. And the bond related to don't do a one the or that we get from the Guardian for the super subglacial subterranean metal. The complexity, of course, has to get done. Otherwise, you can use a cover angle to rebut something like this has to be there and then you have the whole bar, OK?

And the last thing I want to emphasise is this kind of analysis give you related information that is very interesting on its own. So this measure? Is a measure of the size of the feasible set.

And the size of the visible fit basically says, how good was the information that you gave me if you gave me weak label is useless, then my physical self would be in the hole if you gave me very good related labels so that together this really pinpoints the target, then the the physical set would get smaller. This is a way to measure OK, and this will take us to the last thing if you allow me to computer another few minutes, can I? Yes, I think that is fine.

And so if someone needs to leave and would like to ask an urgent question, maybe we can just give them the chance to ask one now. Sure, social. OK, that doesn't seem to be the case. Nobody's showing up, so. So please do continue. Thank you. OK, thanks. So again, there's a simulation that I the only TV I want to show is. OK, so here here we actually use this again. The method is to move forward because the multiclass classification, but there is very little algorithm to fill in the multiclass.

So first, we look at this and say what happened would be on this level with them just for a binary classification. And remember that this presumably is what we did before. OK. And what you see here is that we do get somewhat better than the algorithm for the just just as majority. So you can actually, even for binary precision, this more data analysis gives you better and these are results forward of a non-binary classification.

I don't want to get into too much into this. OK. So the last thing I want to say, and I really feel, is that we look at zero short term, which is, you know, really the ultimate hype these days. And we wanted to ask the following question. What information you need to give me so that you give me a result that is meaningful, meaningful. OK. So again, what is a zero shortening? So you get a lot of images and from the images you learn attributes.

I think that it was like classifieds. So, you know, so you learn whether it has anyone as a phase five, the or whatever. OK, so from the images, so what you learn is you learn how to find these attributes in images. OK, so basically, you input is first of all. Avoidance. Oh, no. The only technique for identifying a set of attributes in images. OK, so this is you can think about it is, you know, the related data that is already classified and you use it in order to understand.

OK, now we don't have any examples, any training data. So in order to get and in order to figure out, you know, the target, what they get is we get a matrix that relates the attributes. To the different classes and to distinguish between, OK, so over metrics that basically says, well, you know, if a zebra was disposability, mobility would have the day you'll see a thing with the way.

Remember that images are confusing because zebra might have the day, but not every image, because it would show that they will release you. OK. So you got this matrix of basically conditional performance. OK, so. Now we want to ask basically, OK, so how do I do it first? OK, so don't blame this work that was done. Basically, look at it the following week. He was the targets. OK, here's the a few of the features that we know how to find. Okay, here's the error in finding this feature images.

So here's how good we find the hole that features in the image. And here is the mapping or the metrics that isolate the features to the classes and target classification. OK, so basically this is of input. This is a process that extract features from the input. And this is the process that met features into the glasses that don't get glasses. OK. So um, and the second.

Yeah. And on your previous work, you respond, you can publish this paper, basically, says the photo, he basically showed the following most of his experiment says, Well, you know if? Hey, is. You know. Is, you know, basic identity metrics, then sure, you can classify everything would work great on the other day. And if the attributes are completely orthogonal to the glosses you want to classify, then of course there's nothing you can do.

So just overall, think about this and this feature and this feature, this attribute and this attribute. Well, you know, you don't you can't get much out of this if you look if you want to distinguish between Zoopla and we're OK. So this was the result and we wanted to, you know, the more the more detailed you were you. OK, so what we assume here is assumed that the first.

Face. It's quite a few that given an image, you get that done, oil's prosecutable, it's a different level, but assume that this work is fine. What we really want is we want to understand the error in the mapping between features, which we attribute the features in the glasses because this is inside the [INAUDIBLE] out of the whole of the whole method.

Do I get enough information in the features, all the attributes that they see in order to actually distinguish between the classes that they need to distinguish between the targets? OK. And most of the classical work this is completely ignored, completely ignored. The questions that maybe the attributes that you see don't define completely the targets that you want to classify. Okay. So, okay, so.

Uses a formalisation to do so again, we have a Domain X with ludicrous classification, and again, we have this set of attributes. OK, reach out to groups risk you take a domain, then maybe two zero one four hesitating, doesn't it? Stuff like that. Okay. And what we get is we get this matrix, which is the class attributes Matrix. Okay? And basically said the age. I say the probability that all of the data that the feature is one condition on the target classification is OK.

So in other words, what's the probability that when I have an image of Zimbra, I actually see the date? Stuff like that. OK. So the metrics, okay. So now I look at all the possible distribution. Again, I don't know anything about the input, so I have to look at all the possible distribution, you know, between, you know, the key features, all the attributes and the key classes.

OK. And. Out of all of the possible distributions, we can think of self to the distribution that satisfied the condition that defined by the metrics. So what has been is satisfied that. Well. The possibility of a particular vector for particular, Victoria has classification, she has to be equal. Basically, it's the marginal distribution, all the metrics. Okay, so it's. Some of all the vectors, the possibility that they get Jay was played by A.J., this should be a.

Yeah. OK, so this define, you know. Distribution that satisfy the matrix. And of course, we can define for a particular classification function, view a particular function from the attributes to the classification. We can find defined the all of this over the distribution. Now again, as before. What we want to find is the low bond and then all bond again is all will all distributions that satisfy the conditions defined by the attribute plus metrics.

And then the mean over all functions plus values of the whole of a for this function and this distribution, thinking about it in probabilistic term? Think about it to think of what it is for. We know the marginal distribution. Okay, we know the probability of a plus for Future I conditional trust. We don't know the correlation between the events. OK, so basically what we look is we look for the worst case distribution subject to the margin of error that we know. OK, so we only know the marginal.

So we'll have to assume for the worst case, we have to assume that we look at the maximum, the maximum able with respect to all the distribution to satisfy this margin of. OK. And then we assume this particular distribution that we actually choose the best classifier. So this is all over again so far. Just theory. The real thing here is that we have to compute the score once. And why it's important so. Currently, you get a zero go with them and you want it and you get the most.

You don't have a clue how good you are. You don't have a clue how much you should pass this answer because there's not the way we associate with this. So what we do here is we give you some handle on how much you should trust. Well, what is the risk in using this input at this output? Why is the risk? Well, you might be in a better situation. We don't know. But in the worst case, that's the animal that's at least that ill that we can have.

OK, so in other words. It might be that you have a better input. But the worst is over, they will give you this. And since you don't know deal, you should you could as well assume that this is the evil that you have in your use of that word, and we can also show that this is tight. In order to show the side you have to work with, then the wise classifiers in order to get useful mid-month results. OK. So in other words, you give me the standard thing to give me as.

I need you basically give me a way to learn how to use and a mapping from the attributes to the class to the target classes. OK, that's the only thing I have. And I ask if I use what you give me and if I use it the optimal way, what can I guarantee about the you can guarantee about the quickness of the results? And here's what we show is that if you don't know anything more about your distribution, about your input, then. You can't guarantee to have better than this low volume deal.

Of course, there may be cases in which you get better, but you feel guilty, not because you don't have information. OK. It's easy to solve it also for the Libyan case and. Again, to see what action is. I just want to say one word here, so this OK, so this is. This is the low one result for some time. It's again, anyone other data? OK. And these are various algorithms. OK. And they have all of these various eyewitness in parties are indeed allowed into law.

OK, and another way we use it, I didn't talk about it in detail is that we could actually use the way this analysis to figure out which amongst the key classes of the how to distinguish because might be that you have allowed you will always because two particular classes are not distinguishable while all the other classes are, well, classified. And so you there's a way to actually quantify which classes of the one to give you there.

OK. I think I took too much of your time so this year through any questions. Well, thank you very much first. First of all, it's a very nice talk. Thanks. I would just stop the recording now.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android