Strachey Lecture: Doing for our robots what evolution did for us - podcast episode cover

Strachey Lecture: Doing for our robots what evolution did for us

Mar 29, 201955 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Professor Leslie Kaelbling (MIT) gives the 2019 Stachey lecture. The Strachey Lectures are generously supported by OxFORD Asset Management. We, as robot engineers, have to think hard about our role in the design of robots and how it interacts with learning, both in 'the factory' (that is, at engineering time) and in 'the wild' (that is, when the robot is delivered to a customer). I will share some general thoughts about the strategies for robot design and then talk in detail about some work I have been involved in, both in the design of an overall architecture for an intelligent robot and in strategies for learning to integrate new skills into the repertoire of an already competent robot.

Transcript

Okay, George. Welcome to the Hillary term 2019 Strictly Lecture, organised by the Oxford Computer Science Department. First of all, I'd like to say a huge thank you to our sponsors, Oxford Asset Management, who make these lectures possible, make it possible for us to invite very distinguished speakers from across the world, and this is the fourth year that they're now supporting these lectures. So we're very grateful for that support.

And I'd also now like to welcome our speaker, Professor Leslie Pack Hellblade, who is the Panasonic Professor of Computer Science and Engineering at the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. Okay. I hope I slept that out. Correct. You'll be wanting to hear what she has to say.

But just a very brief introduction for me to say that Cowling has made research contributions on a huge range of areas to decision making under uncertainty, learning and sensing with applications to robotics. Her research has a particular focus on reinforcement learning and planning in partially observable devices and the Sky Computers and Thorp Award. She's been elected fellow of triple-A I. And she was founder and editor in chief of the Journal of Machine Learning Research.

So I'm delighted to be able to welcome her here today to talk about doing for our robots, what nature did for us. All right. Thank you so much. Okay, let me not reverberate. Thank you for inviting me. I'm excited to have the chance to talk to you. I enjoy backtalk and feedback and stuff. So as I go along, if you want to ask a question or complain about something I've said, I'm very happy.

And also, of course, we can discuss at the end. So my goal, my research goal for my whole life really has been to understand the computational mechanisms that we need in order to make a really general purpose intelligent robot. So I'm not trying to solve any particular robotics problems, but I just want to understand the nature of intelligence and how we can put it inside a physical system that interacts with the world. So the way I think about a robot is that it's a transducer so fundamentally.

So I should also say that I worry about the software part of a robot. So let's assume that that's some hardware specs. Then what I want to think about is what program it is that I need to put in the head of my robot. And fundamentally what it is, is a program has to take a history of actions and observations. So okay to the star, some history of what it's observed and what it's done itself and decide on the next action.

And really that's the entire job of a robot engineer is to figure out what pi program pi to put in the head of the robot. So that's what I want to talk about today and get you to help me think about it. So how should we frame that problem? How should we think about what pie should go in the head of the robot? So. First I want to have you think about the robot factory, right? So I'm going to be a robot factory. I'm a robot engineer. I'm going to pick a program PI to put ahead of the robot.

And the question is, what programs should I pick? And what I'm going to argue is that I should pick the program that works as well as possible in expectation over the possible situations the robot might find itself in. So if you say to me, Robot factory, I need a robot to weld a particular model of a particular car in my factory, then I can make a very specific program to do that because I know exactly what it's going to need to do.

If, on the other hand, you say robot factory, you make a robot that come to my house and do whatever I ask for it to do. Well, that's a different problem. It's a harder problem, but it's a problem of the same kind. Right. So in expectation over the environments that this robot is supposed to work in, I should the program should should perform well.

And so right now, there's a lot of argumentation about the role of learning and the role of reasoning and whether programs should learn or not and how much and I want to say that that's really there's no point in having an argument. We just want that program. If we can find the program that works best in expectation of the environment, if the environments are very different, it's going to have to learn something from the particular environment it finds it finds itself in.

If the environments are very similar, it won't. Okay. So then I take my job to be the job of designing the robot factory. So how is it that I'm going to think about finding good programs to put in the robot so that it can perform well when it goes into the distribution world that needs to go into it. So there's a bunch of different ways to think about it in one way, which somehow at the moment seems kind of prevalent, is that the robot should just learn everything, right?

So I should put approximately nothing in the head of the robot. The robot should go out and it should learn everything from its experience in the world. Well, okay, so that's actually not even remotely sensible, right? Would you buy a robot that didn't know anything and allow it into your house and have it like break a bunch of stuff and try to figure out how to do things?

Right now, another strategy is to say, well, you should just hire really smart engineers and set them at a desk and they should type in that program and that should do very well. And that's historically maybe been the approach, but that's actually really hard for the engineers. Most engineers don't have good access to what program they should write in computer vision. We've learned the lesson, right? People used to try to write computer programs that would recognise faces.

Lots of very smart people tried really hard to do that and it completely failed. What succeeded was that they could write programs that could learn to write faces. And I'll come back to that. Another strategy could be we figure out what humans do and then just do that. And I think that that's an interesting and important enterprise. My own personal bet is that that might not be the quickest way to get to where I want to get through.

So people should do that. But that's maybe not what I'm doing. Or maybe we could recapitulate evolution, right? Maybe we could just get robots to somehow evolve or learn. Not in the particular niche that they're born into, but over some longer time. And maybe that would get us somewhere. So I don't know. None of these things is completely appealing, but I just want to kind of explore a little bit more. So let's think about learning or evolution in the factory.

Right. So I said that when humans tried to write programs to recognise faces, that didn't work out very well. But humans actually were really good at writing programs to learn to recognise faces. And so maybe we can do something like that. Right. So we want to come up with some strategy for behaving in the world that works well in lots of different worlds. To somehow do that in the factory, we kind of have to replicate the variability of the domain that the robot's going to have to go into.

We have to replicate that variability in the factory so that we can test our program so that we know if they're going to do well when we put them out in the world. So we need we could, though, maybe we can formulate that in terms of a certain space, an objective function, some kind of a test distribution. So this is maybe not certainly it's a well-formed approach to the problem, but there's a debate again raging right now on the machine.

Learning community has had big arguments with people recently at the Europe's conference about this. And people want to say you should do no harm. Write that that if you build anything into your system, into your machine learning system or robot system, if you build something in, you risk being wrong. And if you build in a wrong thing, you've doomed your robot or your whatever it is to being suboptimal. It can't overcome that. And that's true, although I would say that we should.

We're going to have to take the risk. Right. Those of you who raised children probably did some mildly suboptimal things in the process, but probably mostly they came out okay. Right. So I feel that way about robots too. Like we have to make a decision. We're the engineers, we have to build the system. We might do things, we might build in some things that are not exactly right, but we don't have time to wait to let things be completely generic.

So you could imagine running a completely generic algorithm that just enumerated programs in order of complexity until you found a good one. But that's crazy, right? For reinforcement learning people, you could do something roughly the same. Okay. So what are we supposed to do? So one strategy, I don't know, it's sort of appealing would be just set up some evolutionary process. It's going to take a really long time. We go to the beach and then eventually the problem gets sorted for us.

Yeah. Okay. So that could be good. But I'm worried that we might not be alive at the time that that thing finishes or even gets anywhere interesting. And so I'm going to talk about a somewhat actually much more boring strategy, which is to combine aspects of all of these things in a way that might help us get from where we are to really interesting and flexible robots in a short amount of time. Okay. And so the story that I want to kind of. Make tell.

Here is one where we do metal learning in the factory. So metal learning is a fancy name. It means learning to learn. It means in the factory we run something like a machine learning algorithm. Research algorithm that arrives at an algorithm that has the property that when we put it in the world, it can learn effectively. So I want my robot to be able to come in to your house and make tea. That's what I want. It's going to have to learn. Every different house is different.

It's going to have to learn how it's organised, how you like your tea, all these things in the factory. I would like to be able to meddle, learn how to do that. There's all kinds of interesting constraints. I've been speaking a bit with Liz Pelkey, who knows a lot about human development, and she says that.

Human babies and probably mammals in general are are basically born with a lot of fundamental things about the world, that there are other agents, that there are clumps of matter that cohere, that there's space that we move through. And so I think I feel kind of licensed to build those things into my robot. There are invariants in the worlds that we care about that we could build in theory.

Maybe I know the kinematics of my robot, and I think there are also some interesting constraints that we have to respect that don't come from the problem that but that come from the fact that humans are actually going to engineer these systems. So if I'm going to build a system. Even if I'm going to build the metal learning system, it has to have some degree of modularity just because I personally can't understand the whole thing.

I have to understand the pieces and parts and put them together in some kind of a systematic way. So these are some constraints that we can bring to bear on the problem. Okay. So I'll tell you just some little bit of history and then I'll talk about actual technical stuff. So my first job when I got graduated as an undergraduate in Philosophy of All Things was to work at a research lab, and they were just building a robot. And nobody there really knew anything about robots.

And my job was to get the robot to drive down the hallway using the sonar sensors. And the sonar senses were terrible. They didn't get very reliable returns, and I didn't know anything. And so what I did was I wrote a program. The robot would run into the wall and I would fix the program. The robot would run into the wall in a somewhat different way. I would fix the program.

The robot ran into the wall for weeks. I fix the program, the robot, and while I fixed the program, the robot and as the wall. Eventually I learned about how the sensors interacted with the environment and the control system and so on. I learned enough about that so that I could write a program that drove the robot down the hallway without running into anything. And the lesson I took away from that was that I didn't want to be in that loop anymore.

Right. So the robot could darn well learn how to interact with the world. And I didn't want to be in the middle of that. All right. So I'm going to be outside designing the learning algorithm lets the robot interact with the world. So then I reinvented reinforcement learning and I did it kind of bad and wrong and eventually got introduced to people who knew something better.

So that was good. And I made a little robot that did reinforcement learning and I actually learned something during my thesis defence, which was kind of cool, but by like the mid nineties, which was when neural networks were cool before. So you might read this is kind of like the third time right now the neural networks are cool. They were the second time they were cool was kind of like in the nineties. But in the nineties there was the same story is now everyone said, Oh, this is awesome.

We're just going to put neural networks in there and they will figure everything out. But I just for sample complexity reasons, for some bunch of technical reasons, I don't actually think that that's possible. Okay. So what to do instead? So I've been working over maybe about the last ten years with that colleague Thomas Perez, who knows a lot about robot kinematics and robot planning and that kind of stuff.

I, in the intervening time, learned something about planning under uncertainty and model learning. And we've been working together and we're taking the following approach. So our view is that there are some basic inference algorithms. And representations that are justifiable based on regularities in our environment and some kinds of fundamental computational sorts of facts. Convolution is a great example. Right.

So the same people who like to say that it's a terrible idea to build things into their networks, do convolutions. But if they're doing convolutions, they're building something into their network, which is an understanding of some translation invariance or some local spatial regularities and stuff like that. So they're building a lot of knowledge. And I think that there is a handful more mechanisms like that.

I am not going to argue that this is the right particular set of mechanisms, but I would argue that there are there is some set of mechanisms, and I'm hoping it's like six or ten and not 700, because otherwise it's going to be hard. So we're building in some basic principles and mechanisms. And what we were doing until very recently was actually hand building a system. And so we're learning people hand building a system.

Well, it was to get an understanding of the whole arc of the system, of how you could go from perception through estimation and reasoning and action and so on, to build the system that was pretty, pretty competent. So what I'm going to do is tell you about that. And then I'll tell you about how we're adding learning. So this is a this is a photo I like to show in all my talks lately. It is not my kitchen, I promise.

But imagine what it would take to make breakfast there or clean it up and think about what? What makes it so. That's hard. That seems like a hard problem for a robot and what makes it hard. So there's a bunch of reasons that make it hard. One of them is that it's in a sense, in a very high dimensional space. Right. Robot people like to talk about how many degrees of freedom their robot has, you know, six or ten or 20 or something.

But how many degrees of freedom does the kitchen have? You can't count them. It's not just the positions and orientations of all the objects. It's like whether the grapes are mushy and when the people are coming home, right. So that all this stuff very hard to even think about the state space of that kitchen. The horizon is really long. If you imagine how many like little linearly interpolated motions the robot might have to make to clean that kitchen, that's really a lot.

And the uncertainty is fundamental. So, again, there's. If you talk at a robotics conference, often people will say, but don't don't worry about uncertainty. We'll just make the sensors better and then you won't have to worry about it. And for some kinds of uncertainty, it's true that making the sensors better will cure the problem. But making the sensors better won't let me know what's inside the cupboard or your head. Right. So there's uncertainty. That's just that's kind of very fundamental.

And I can maybe get information about these things, but it requires careful, explicit action to do that. Okay. So we have a kind of an architecture with boxes and arrows and it's not very surprising or different from other people's architectures with boxes and arrows. But I'll tell you what I think are some of the salient points of how we address this problem and show you a demonstration. Okay. So we have this thing called live space hierarchical planning in the now.

Fundamental to it is the idea of reasoning and belief space. So everyone who's ever had an atomic theory course or did way back in the day or something knows that you can basically take any machine apart into these two pieces, one of which remembers something about the history and the other part that decides what to do based on what it's remembered.

So we'll call the state estimation and action selection. And for us, the arc that goes between the boxes is BUI, if you can think about it as a probability distribution over the possible states of the external world. So the first box is trying to estimate what's going on, and the second box has the job of taking that belief that distribution over the state of the world and deciding what action to do. So that's the space we kind of live in. So the first question is what what goes along that wire?

And if you've read papers about common filters or dips or something. There's been talk about state representation and so on. But if you think about a robot that has to clean the kitchen, it's state representation can't be like some lovely little vector. It's a very complicated story. So first of all, we don't know in advance how many objects there are or what they are. It's not like we can say, Oh, I have a ten dimensional state space, so we have an open world.

We keep. And again, I don't want to argue that this is the one true way. I just want to tell you a thing that kind of works. So we keep a kind of a database of objects that we believe exist in the world and some distribution over their properties, like their relative positions in their mass and so on. We keep a representation of what space we believe is free and what space we believe is occupied because we have to reason about whether it's safe to go somewhere that we haven't looked at yet.

We keep distributions about what kinds of objects tend to occur near other ones so that we might search more efficiently in certain kinds of places. So this is our belief, something complicated. We have also to worry about an integration that very few people worry about. Lots of people worry about robot motion planning. Just how do I move the robot from one pose to another one?

And that's a non-trivial problem. And there's a million algorithms and AI people worry about high level symbolic actions. What should I do and in what order? At some very abstract level. What's interesting is that these things can't be isolated from one another, that the geometry can actually completely affect what high level actions you should do and in what order.

The geometry might tell me whether I can drive my car down a certain alley or not, or whether I can fit in a certain place, or whether I can put these two pans on the stove at the same time. So we spend a lot of time worrying about reasoning about the interaction between discrete things and continuous ones. Okay. So probability geometry, discrete stuff, the planning problem, the optimal planning problem in our domain is like unthinkably difficult.

So we kind of fall back on ideas from control theory. So in control theory, an important idea is feedback. And the important thing about feedback is that you could do a slightly wrong action. And if you're just very quickly look to see what happens, you can decide to do something else instead. That might make up for the slightly wrong thing that you did before. So you don't have to pick optimal controls, you just have to pick like not terrible controls.

I'm all about being not terrible. Optimal is not in it, right? There's no way. So I've made peace with not being optimal. So here's our strategy. Our strategy is we make it really weak, really very approximate model of the dynamics of our world. And do planning. Right. So here's my beliefs. I have a distribution over the state of the world and make some kind of plan. I'm going to take the first step of that plan executed in the world, get an observation to see what happened.

And I'll take my belief and plan again. And an important thing to think about when you see this system is the perspective that you take when you're making decisions. So the spec, the perspective that we take, I don't know if you can see that kind of grey, you can a little bit from the planner's perspective, it's interacting with an environment, but its environment includes the belief update. So it's a control system that operates in belief space.

We give it its objectives and belief space, right? I tell the robot, I would like you to believe with high probability that the kitchen is clean or that the green boxes on the left hand side of the table or something. I can't give the robot goals in state space because the robot doesn't have access to state space. It can't promise me that it can change the world in a certain way, but I can ask it to come to believe something now.

And it's not allowed to just like delude itself and just say, Oh yeah, I believe it, no problem. It has to actually do the work and run the bays update and so on so that it really does believe this thing. So I ask it to come to believe something and when it chooses actions, it has to think about not just the effect of the effects of the actions on the state of the world, but actually the effects of the actions on its own belief. And that's why it looks right.

You look not to change the world, but to change your belief. And what's nice is that you can treat all your actions. Actions that gather information. Actions that change the state of the world. Actions that do both things at once. Which really most do all in the same framework. So that's a lesson from DPS, but it applies here too. So we think about planning and building space. We think about planning to take actions that will control our own state of information about the world.

Okay. One more kind of high level idea. I promise I'll get more even more technical in a minute. But I just kind of want to give you the story here. Another kind of high level idea here is, okay, so planning is difficult and inefficient and exponential on the horizon usually. And so we can't stand to have a very long horizon. And if we thought about how many actions it takes to clean that kitchen, the horizon is horrible.

So we do something hierarchical. And people have talked about hierarchical planning a lot. Usually when they talk about hierarchical planning, they use the idea of hierarchy to make planning for a completely worked out plan more efficient. But we're going to do something a lot more aggressive than that. So what we're going to do is we start maybe with some high level goal and we make a plan at some level of abstraction.

So, for instance, I made a plan to come to Oxford involved going to the Boston airport and flying around and walking through Heathrow and doing some things like that. But I mean, it is a pretty high level of abstraction. Now, partly I did that because I'm kind of computationally lazy and I figured I could work out the rest later.

But partly also I did it because I didn't have enough information. I couldn't have planned my trajectory through Heathrow because I didn't know what the map was like or what gate we would come into or any of that stuff. So we make a high level of abstraction and then we kind of commit to the first step. So when we make a plan, I should say the little that you can think of the yellow boxes and as abstract actions, you can think of the blue boxes as what we call primitives.

They're like sets of states that we have to go through kind of sub goals. So we take the first signal and we make a plan for that. So get to the Boston airport and then we take the first goal of that and we make a plan for that. Like, I don't know, get an Uber. And so finally I get down to some primitive action and I execute it. So I'm being optimistic that I can work out the rest of the stuff later. And eventually right now, we're hand building these models.

Eventually, I'm going to have to learn, for instance, to predict whether it's reasonable to walk through Heathrow in 20 minutes. I don't know. You can tell me whether that's reasonable or not. Okay. So we have this hierarchical planning thing. It also lets us remember I said whenever we took an action we re planned. The fact is that if we have this structure, we can decide whether we need to replan very efficiently.

We can ask the question, I just took this action. Did it did it lead to a blue box that I was expecting? If it did, I can do the next one. If it didn't, I can pop that low level thing off the stack. So let's say I was planning to take an Uber to the Boston airport, but I discovered that there aren't any. Okay, so I pop my bottom plan. I don't give up the idea of going to the airport or the idea of coming to Oxford or my academic career or, you know.

Right. You know, people possibly who reason too much at the high level, but it's not healthy. A little bit is okay, but too much is not. So you'd like to control your reasoning in the structure lets you do that. So that's also kind of a nice thing. Okay, we put this together, we get the robot to do some stuff. So I'm just going to show you some movies and kind of talk while it does.

What's important about these movies is that the robot is doing different kinds of things subject to different goals, and it's all the same code. In this case, we asked it to put the green block on the corner. The Green BLOCK is too big to pick up, so it has to push it. It reasoned that it had to take the orange one out of the way. It also knows that pushing is really unreliable, and so every time it pushes it, it looks to see where it went and says, that wasn't so good.

I better, you know, replan it, pushing the same code again. We asked the robot to go out of the room. It knows just it knows about space and it knows about occlusion and obstacles. And it says if there's something in my way, I can't go through it. First. It says, if I want to move for some space, I have to look at it and believe that it's free. If I look at it and find something in the way, I have to move it out of the way. So it's reasoning very generally, doing all these same kinds of things.

Before it was picking up the oil model to see if there was oil in it. Okay, so this was good and bad. Oh, this is a crazy robot in Singapore. Okay, well, whatever. A little too much similarity on that one. All in the same row, all the same code. But so that was good. Kind of general purpose. Kind of general purpose, reasoning, planning, estimation, and so on, but kind of not so good because there's no learning or add activity in there at all.

There's no learning in the moment and there's no learning in the factory. This is basically me and my writing code. So not an extensible strategy for making general purpose robots in a factory. So the question now is how can we think about making a system like this learn? Okay. So in particular, we want to think about what kinds of things can we learn and how to do it. And so there's there's kind of an interesting distinction.

I really think that there's two importantly different kinds of learning, which, again, people used to talk about a lot back in the day, but I think they tend to model them now a little bit. So a classic kind of learning is to learn about the world, right? So I'm a robot. I don't know what happens if I push this button, so I'm going to try it in your kitchen and see what happens or I'm not sure. So there's some things about how the world works that I don't understand.

And I have to do things to gather information to figure out how the how does the world work. I might learn observation models, how to my sensors. Tell me about the world transition models. What happens if I do this? Most of the work right now in robot learning is focussed on these two lower boxes, right? So one thing is object detection, which has been huge, right?

So various kinds of perception has been very important to us and also primitive policies, strategies for picking things up or reorienting something in my hand or riding a bicycle or walking. Right. So those are all very important kind of you can think of them as sort of closed sensory motor loops. So this is this is one kind of learning which is really gathering information about the world that time.

There's another kind of learning which is at least as important, I think, which is learning to reason more efficiently or effectively. And I would argue that learning to play chess or go or Starcraft, is that right? Especially chess and go. Maybe not the Starcraft thing just to go to figure out, you know, the rules. Once you read the rules information, theoretically you were capable of making an optimal first move. It's just that you're too dumb to compute that right.

If you just had a better computer, you wouldn't have to learn any new information. So you have to put the information in a different form, but you've got the information. And so there are lots of opportunities within a kind of an estimation and planning and reasoning architecture like the one I told you about to also do that kind of learning.

And so we're thinking about what kinds well, I'm going to do now is talk now concretely about some particular thread of work that's going on right now in my group involving learning transition models. And a little bit about learning sampling for planning because when you plan in continuous domains, you have to do something to manage the the continuity, the continuum of the possible actions that you can take. Okay. So old story. Lots of models in the world.

They're all wrong. Some are useful. And useful for what? So we have to think about if we're going to learn a model of the actions the robot can take. What kind of a model could we learn? I had a post-doc named George Kennedy Ross, who thought, I think very nicely and usefully about the continuum and abstraction in learning models for robots. He likes to talk about the swamp. You can imagine there's some system of partial differential equations that really governs how the world works.

But that's and you could imagine that being a very accurate model, but not a model that you could plan with easily. AP When people think about these beautiful, totally abstracted, wonderful symbolic rules of symbolic modelling and so on, and they're lovely and you can play with them efficiently. And then the question is, well, how do you make what kind of connection can you make?

Right? So what we need to do is think about how can we abstract chunks of swamp into into kind of nice abstract symbols up at the high level. So I think we need both levels. And the view that we're taking right now is that we have local control loops that kind of operate in the swamp. So picking things up and moving things in your hands and walking that all kind of the swamp level stuff. It gets you from this blob of the swamp to that blob so you can do some stuff like that.

But then what we want to do is learn models of how those low level control loops work. And those models don't have to be perfect. They just have to be kind of like good enough. We hope to be able to abstract over objects, right? So I don't want to have to learn a model of the dynamics of this particular prob, but I know still if I let go of it, it will dry up. I know what'll happen if I throw it. I know a bunch of things about it, so I know some, some things abstractly.

And what I hope is that I can get some kind of virtuous category diagram thing going on here so that at the high level there are arrows that are maybe nearly deterministic that move me from sets of states at the high level to other sets of states which are really embodied in these kind of swampy control loops. So. Actually Skip that's permanent. So what we're going to try to do what I'm going to talk about now in detail.

Is learning a model of the preconditions and effects of a low level control. So imagine someone has learned an awesome policy for picking something up or stirring something or pouring liquid. And now I want to learn and abstracted model of it so that I can do planning. That's what what we're up to here and what we've learned recently in the world of planning for mixed, continuous and discrete systems is that is that we can get really great leverage on these problems if they're articulated.

If the models that we have were articulated in a certain way and a really important aspect is that they be factored that we talk about the state of the world not as an under analysed thing like State 94, but that we describe it in terms of state variables. And the state variables have values that we can change. It's also true that that constraints are a very useful language for describing the effects of actions in the world,

especially geometric kinds of ways. So we're going to we're going to look for models that are articulate, articulable in this kind of style. Okay. So how can a competent robot acquire a new ability, assuming that my robot already knows how to pick things up, how to move around. But we've learned to do a new thing, like pouring or stirring, and we want to add it to the robot's repertoire. So that's what we're up to. So here's an example. Here's pouring in two dimensions.

I can describe the situation using a bunch of continuous valued variables, things like the size of the aperture of the cup that I'm pouring out of the size of the thing that I'm pouring into the relative pose of the centres of the two things, the way in which the robot is grasping the cup. You could imagine some game parameter in the controller for doing the pouring. You could imagine conditioning on the viscosity of the stuff in there.

We're not doing that, but you could imagine that. So there's a bunch of parameters that govern the situation. And what I'm interested in understanding is under what conditions will my pouring operator actually work? Pretty well, that is to say, get the stuff into the target. That's what I would like to do. So one way to think about it is that we could write a kind of symbolic ish looking description of this operation of pouring.

But it has a bunch of continuous parameters, right? The grasp and the sizes of things and the relative pose and so on. And what I want to learn is a constraint. A constraint on the values of those continuous variables. But has the property that if it's true and I execute the action under these circumstances, then the goal will probably be satisfied. The effect will probably happen. But if it's not true, then probably not.

So that's that's the thing that that I want to try to learn when will this operation have the desired outcome? And because I actually would like to do this with an actual robot, I would like for it to not take too many samples. So I'm going to be serious about that, too. Okay. So one way we can think about this problem is as a regression problem. So we could say, instead of saying, well, my constraint is either satisfied or not, I might say it satisfied or not to some degree I can.

You could imagine making a score of pouring. For pouring. It's easy. We just measure the number of particles that end up in the in the target place. And that's kind of a score. And we say, oh, we would like the scoring function to be higher than some value. We'll just call that value zero for now. It doesn't matter some constant. And so then what I'm going to do is I'm going to do some experiments and I want to learn the mapping from the values of all those continuous variables into the score.

Right. So and if I can learn that and if I have the scoring function, then I know for any assignment of values to those variables, the likelihood that the or the, you know, the amount of liquid that I expect to fall into the cup. We can formulate this as a bunch of different kinds of regression strategies. We're going to use a Gaussian process regression for probably some of your experts. And this is some of you don't know what it means, so I'll try to talk to everyone.

It's a way of of articulating our own uncertainty about this mapping so that we can do experiments effectively. So what we do is we do some set of initial outpourings and we get some for each. Each time we try it, we have some assignment of values to those variables, right? We try it with different sized caps and different relative positions and different games of the controller and so on. And for each one of those we get a score. Okay, so the way it goes in process works.

You think along the x axis here I can only do one dimension that's really in a lot of dimensions, but I can only do one. Those are the the the parameters were very. And we're interested in knowing when is this g function bigger than zero? That's what we would really like to know. So whenever we do an experiment, so that's one of these little blue x's, we get an observation of that function and we can, using some Bayesian reasoning, compute a function that is sort of the mean and posterior mean.

We have a distribution over the actual function. And every time we get an observation, we can compute a distribution on the function. So the dark red line is the mean of that distribution over functions. And the pink area is unlike standard deviation, couple of standard deviations in the other dimension. Now lots of people will use Gaussian processes to do a lot of different things. Often people think about it. They're interested in finding the optimum of the function.

We're interested in something else. We're interested in. The level set we're interested in knowing what ranges of theta is is g above zero for what ways I can do pouring like I don't want to learn pouring in just one way. I'll explain why in a minute. But for what arrangements of pouring is it going to work out and for what arrangements will it not?

So this block area there that I've drawn in, that particular figure is right now the region of theta space where I believe with high probability pouring will work. Can you be sure that that makes sense here? Right. So for right now, there's just this little region where I'm convinced it's going to work well with high probability greater than 0.95 or something. So but now what I would like to do is some active experimentation to try to understand the boundaries of that level set.

I would like to know well what other configurations will give me good pouring and which ones won't. And so we use an algorithm called the straddle algorithm, which is pretty interesting. I'm not going to go over it in detail, but it has this notion of an acquisition function. So for different theta is it tells us which values of theta will give us the most information, not about the maximum, but about the boundaries of the level set. I want to know the boundaries of successful pouring.

So this particular acquisition function likes to try experiments in places where the mean is near zero, right? Because that's we're probably near a boundary of good versus bad and where the standard deviation is higher and it combines this in a suitable way. And what that means is that we can take a small number of samples and update our belief about this function. Okay. So what we find is that this kind of active learning is is data efficient.

So if we try experiments at random, it takes really a lot of experiments before we get a good idea of the of that super level set. If we try this other approach, I have to tell you just a tiny story, because the first paper we did about this was, again, just me and Thomas, and we did it using some kind of feedforward neural network because we thought it would sound cool and it kind of worked, but not very well. And but mostly what it did is it enraged our students. So they did a better job.

So students did a better job. The Gaussian processes, this red thing, it's awesome. So with not too many trials. No, this iteration here is how many experiments we had to do. It's not at times ten to the third, it's just like ten or 20 or 30. So we learned something from experimentation. There's another piece of the story which I'm actually going to skip and I'm going to show you.

So first in simulation and then in the real robot, what again we do is we take something that has the ability already. It already understands picking up objects and putting them down. It learns pre images for pouring in this case and stirring. We asked it to make a cup of coffee. To make a cup of coffee. There has to be cream in there. There has to be sugar. There has to be coffee. It has to be mixed. It has to be on the green thing and it has to be served at the end of the table.

That's the thing. We do not tell her what steps to do or in what order. So it's using a general purpose planner to do that. And it uses these learn it uses these learned to print images. To to kind of get new descriptions of these operations of stirring and pouring and scooping, and it puts them together to make these plans. We have to watch it stirring because it's fun. There we go. I like this nice little simulator. More fun. Is this okay? So here's our robot doing basically the same thing.

In this case. We just learned the pouring and the pushing it already picking up. What's interesting about this is that the goal varies. That is to say which. Yeah, I know. So you can come in fixed my motion planner if you want to, or you could just giggle. But so we give it objectives. We move the objects around. We ask you to do different things at different times. It kind of does it. It's not a thing of beauty, but it's actually reasonably reliable.

I'll show you some outtakes at the very end. This time we told you that we wanted the thing on top of the block and the stuff in it. And I guess I would argue that as we make these kinds of scenarios more complicated and the goals more complicated, look there, it pushed the ball so that it was in the usable workspace so that compare with the other hands that was like mildly clever and did it.

Now not off the table as we make these scenarios more and more complicated, it seems, at least to me, with my limited imagination, harder and harder to just straight up learn a policy to do this. And it seems to me that some kind of planning is actually kind of important to the process. Okay. Oh, last one. Okay, Guru. Right. Um, let's see. 10 minutes. Okay. This is good. So good. So what did I talked about there? Learning.

Assuming we kind of had the framework for a description of the effects of an action. And now I want to talk about how we can actually learn the framework. Right. So in that case, I, I said which aspects of the domain were important to making that prediction? I said, the sizes and shapes of the cups mattered. I said, the gain and the controller matter. I said all that stuff. And you just had to learn that constraint on a fixed dimensional kind of problem.

But that's not a reasonable setup, really. If I'm trying to put myself out of a job a little bit as the person who writes all this stuff down. So another thing that seems to be important is deciding which objects and which properties of those objects both affect the success of doing an operation and which might be actually changed by doing that operation. So I had some old work that did this in a kind of a logical framework.

And so what we tried to do recently was recast that in a more hip, new neural network way. I don't know if it's better. I actually think it probably is. But yeah. Okay. So the idea. Let me just skip forward here. Okay. The idea here is. We have a representation of the state of the world. And, uh, but, but it's going to have different size in different instances of the problem.

Right. So most setups for neural network learning and, and functional approximation and so on assume some kind of fixed dimensional representation. And when they don't, then they feed things in sequentially. Recently there's been work on something called graph neural networks, which makes me laugh because it's kind of like marked off random fields, which is an old idea.

So this is a new name for an old idea, but it's a good idea, which is that you can learn something about local relationships among objects or properties or values in your model and propagate those values. And you can learn those local models in a way that makes them independent of the arity of the problem that you'll face today. Right. So today I have to clear two things off the table. Tomorrow I have to clear 20.

But I hope that the models that I learn about how to do that will transfer automatically, will work independent of the size of my problem. So in any given problem instance, I might have a representation right at the moment of my current belief about the world. In this case, imagine that it's not even uncertain, although it could be. So I right now I know about some objects, and for each object I know about some properties. And what I'm interested in doing is learning a transition model.

That is to say, what will happen if I do this action right. So one way to think about it is that it will depend on some properties of some objects in the current state and it will affect some properties of some objects in the resulting state. And what I'm going to focus on now is just telling you a little bit of a story about how we can find a model that has the sparseness property.

Okay. So there's this, again, kind of old idea from I, which also comes from natural language and the notion of didactic reference. So [INAUDIBLE] is, I guess, in Greek means pointing to so this remote or that chair or the water bottle on the lectern. Those are all TikTok references. Okay. So what we want to do is decide which objects and which properties of objects are relevant to making the predictions that we want to make.

And so what we're going to do is we're going to start out and say, I know there's at least one object in the world that's important. Let's think about pushing an object. So if I want to learn the model of what happens when I try to push an object, then I know which object I'm pushing. Okay. That's good. But there may be other objects that matter or that are affected by doing this action. And I'm going to refer to those objects using dynamic references in this work.

We have some fixed set of dyadic references which we should increase and learn, but for right now they're fixed. I can talk about an object above this object or below it, near the nearest object and so on. So there are some relations which you can think of when applied to this object well after some other set of objects. The set might be empty. The set might have one object, the set might have many objects.

But whatever their way of talking about other objects in relation to one object I already know, and then you can apply that kind of recursively. So imagine that I wanted to talk about pushing an object. We'll call that object object one and I can say LED object to the way to read that stuff that right there is LED object to be the object or possibly the set of objects that's above object one. And let object three be the set of objects that's above object to and so on.

So if I had a scene like the one on the right. I could abstracted is a graph of relations. And then I could say, well, if a is object one. Then these other objects. These particular objects in my particular world play the roles of object to an object to an object for. All right. So I'm going to use this mechanism to have a flexible representation of an object I'm operating on and the other objects that are relevant to it in a way that applies no matter how many objects are in my scene.

Okay. So if we have a set of these didactic references which reach out and name some other objects relative to the object I'm operating on then. And we maybe we figure out which properties of those objects are important. Then we have a kind of a straight up neural network learning problem, right? Which is now we have a fixed dimensional input. We have this object and it's other objects that are relevant and some properties. And we map into some properties of that object and some other objects.

So that's plain old numeric regression problem. We know how to train a neural network if we know what data to give it. Okay. And so and then you can apply it, right? What's important about this thing is that it's meant to it's supposed to apply anywhere, right? It says if these guys have these properties on the endpoint, this is the properties that the other guys will have in the output.

Okay. And so then we have an algorithm for learning this, and I am not going to go into detail, but it takes a kind of a normal training set. A state is an arrangement of all the objects in the world. I take an action, a resulting sentence, then I get a resulting state, and I have an outer loop that kind of greedily operates on the structures of the rules and an inner loop that does some M stuff and an inner inner loop that does neural network train.

Okay. And so what happens is that we can do things like learn the results of pushing objects on a crowded table. Right now, we're not testing this inside a planner. So this is a really preliminary work. We're just checking to see if the model that we get predicts the data that we trained on. Well, I mean, critics held out data well, but it's just based on likelihood. And we compared it to two other strategies. Right. So there's our learned rule based model.

We compared it to just a straight up neural network in which we encode the positions of all the objects on the table. The problem is that you don't know what order to put them in. And so we picked what we thought was the most helpful order.

But it's a very hard thing to do. And we compare it against a graph neural network, which is, again, a kind of a modern, structured neural network that abstracts away from the individuals in a nice way, but doesn't have exactly the right bias for this kind of problem. And what we found so purple is our thing. Blue is the graph. Neural network in red is a plain old flat neural network in this case with just three objects in all the scene.

So we're not testing generalisation over multiple objects, but we find again that the sparse rule learning can learn officially very quickly how to make good predictions in this domain. More importantly is the fact that it's relatively unaffected by clutter, right? So as you add more objects into the world, the neural network suffers because they're all in some arbitrary order and it doesn't know what matters.

The graph neural network does reasonably well, and the learning thing works still more reliably, you might ask. I certainly asked when I first saw these results. You might ask why does it get better as there get to be more objects in the world? That seems counterintuitive. The answer is that if there's a bunch of stuff on the table and I'm pushing one object that might push another object, but almost everything stays the same.

So predicting everything stays the same is not so bad and you just have to learn the things that don't stay the same. So if you average over the number of objects in the scene and you predict how well you predict what happens, then you measure how well you're predicting what happens to them. The more objects that don't move, actually, the easier it becomes if you have the right bias. But it becomes harder for the flatterer on our. Okay.

So this is just like a tiny, tiny tip of an iceberg, but it's very exciting. I feel like we have the tools and the pieces and the parts to figure out how to make generally intelligent robots. I kind of do. We have a bunch of work we have to work on connecting the vision algorithms that exist now, which are awesome but not quite what we need into speed estimation in a useful way.

Right now we have learned policies down at the low level and planning at the high level, but that should be more fluid. We should be able to catch the results of planning in a way that lets us routinised the things that we do very frequently. We're not doing that. I think end to end learning is a blessing and a curse. My colleague Tomas likes to call it dead end learning. I'm not sure. So what's interesting about end to end learning, right?

So that's when you say, I have this giant system and I'm not going to try to give it intermediate signals of success, but rather just measure the quality of the whole thing based on the final actions it takes. That's the right thing. You can't argue with it intellectually, right? It is exactly the right thing. You don't want to say my state estimate it needs to be awesome. According to some criterion, that's only about state estimation.

Really. All I care is that the status metre does a job that helps the planner do a job that causes the controller to emit the right torques. That's all I care about. But the idea that you could back propagate from errors on the talks all the way through the plan or in the state estimate or to me seems like not so clear. So I think we have to figure out ways of combining local reward signals and end to end reward signals.

We're talking about interacting with humans, all kinds of stuff. So I brought back one more ancient slide. This is from 1995, but it's still kind of like my view of what's going on. And then we have to think about learning a lot of different kinds of levels of abstraction and how to make them actually not divided into layers.

I think that right now, especially in robotics, a lot of people are working at what I would call the skill level, although, you know, I think in robotics, that's roughly right. What's interesting is that we can kind of do skills and then we can kind of do like fancy stuff up at the top, like play go. But we're terrible at just like basically making breakfast or even walking out of this lecture room.

So that middle ground I think is interesting and important, and I want to recruit more people to work on it and think about it. So there's a bunch of people who helped with this, and I'm grateful to them for what they have done. And with that, I will say thank you and let you watch the robot make mistakes. So thanks.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android