Abstracts: NeurIPS 2024 with Dylan Foster - podcast episode cover

Abstracts: NeurIPS 2024 with Dylan Foster

Dec 06, 202411 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Can existing algorithms designed for simple reinforcement learning problems be used to solve more complex RL problems? Researcher Dylan Foster discusses the modular approach he and his coauthors explored in their 2024 NeurIPS paper on RL under latent dynamics.

Read the paper

Transcript

AMBER TINGLE

Welcome to Abstracts, a Microsoft  Research Podcast that puts the spotlight on   world-class research in brief. I’m Amber Tingle.  In this series, members of the research community   at Microsoft give us a quick snapshot—or a podcast  abstract—of their new and noteworthy papers.

[MUSIC FADES]

AMBER TINGLE

Our guest today is Dylan Foster.  He is a principal researcher at   Microsoft Research and coauthor of a paper called  “Reinforcement Learning Under Latent Dynamics:   Toward Statistical and Algorithmic  Modularity.” The work is among the   oral presentations at this year's Conference  on Neural Information Processing Systems,   or NeurIPS, in Vancouver. Dylan, welcome  and thank you for joining us on the podcast!

DYLAN FOSTER

Thanks for having me.

TINGLE

Let's start with a brief  overview of this paper. Tell us   about the problem this work addresses and why  the research community should know about it.

FOSTER

So this is a, kind of, a theoretical  work on reinforcement learning, or RL. When I   say reinforcement learning, broadly speaking, this  is talking about the question of how can we design   AI agents that are capable of, like, interacting  with unknown environments and learning how to   solve problems through trial and error. So this is  part of some broader agenda we've been doing on,   kind of, theoretical foundations of RL. And the  key questions we're looking at here are what are  

called, like, exploration and sample efficiency.  So this just means we're trying to understand,   like, what are the algorithm design principles  that can allow you to explore an unknown   environment and learn as quickly as possible?  What we're doing in this paper is we're, kind of,   looking at, how can you most efficiently solve  reinforcement learning problems where you're   faced with very high-dimensional observations,  but the underlying dynamics of the system you're  

interacting with are simple? So this is a setting  that occurs in a lot of natural reinforcement   learning and control problems, especially in the  context of, like, say, embodied decision-making.   So if you think about, say, games like Pong, you  know, the state of the game, like, the state of,   like, Pong, is extremely simple. It's just, you  know, what is the position and velocity of the  

ball, and, like, where are the paddles? But what  we'd like to be able to do is learn to, you know,   like, control or, like, solve games like this from  raw pixels or, like, images kind of in the same   way that a human would, like, just solve them from  vision. So if you look at these types of problems,   you know, we call these, like, RL with rich  observations or RL with latent dynamics. You know,   these are interesting because they, kind  of, require you to explore the system,  

but they also require, you know, representation  learning. Like, you want to be able to use   neural nets to learn a mapping from, say,  the images you see to the latent state of   the system. This is a pretty interesting and  nontrivial algorithmic problem. And, kind of,   what we do in this work is we take a first step  towards something like a unified understanding   for how to solve these sorts of, like,  rich-observation, or latent dynamics, RL problems.

TINGLE

So how did you go about  developing this theoretical framework?

FOSTER

Yeah, so if you look at these sort of RL  problems with latent dynamics, this is something   that's actually received a lot of investigation in  theory. And a lot of this goes back to, kind of,   early work from our lab from, like, 2016, 2017 or  so. There's some really interesting results here,   but progress was largely on a, like,  case-by-case basis, meaning, you know,   there are many different ways that you can try  to model the latent dynamics of your problem,  

and, you know, each of these somehow leads  to a different algorithm, right. So, like,   you know, you think very hard about this  modeling assumption. You think about,   what would an optimal algorithm look like? And you  end up, you know, writing an entire paper about   it. And there's nothing wrong with that per se,  but if you want to be able to iterate quickly and,   kind of, try different modeling assumptions  and see what works in practice, you know,  

this is not really tenable. It's just too slow.  And so the starting point for this work was to,   kind of, try to take a different and more  modular approach. So the idea is, you know,   there are many, many different types of, sort of,  systems or modeling assumptions for the dynamics   that have been already studied extensively and  have entire papers about them for the simpler   setting in which you can directly see the state of  the system. And so what we wanted to ask here is,  

is it possible to use these existing results  in more of, like, a modular fashion? Like,   if someone has already written a paper on how  to optimally solve a particular type of MDP,   or Markov decision process, can we just take  their algorithm as is and perhaps plug it into   some kind of meta-algorithm that can directly,  kind of, combine this with representation   learning and use it to solve the corresponding  rich-observation, or latent dynamics, RL problem?

TINGLE

What were your major findings?  What did you learn during this process?

FOSTER

We started by asking the question sort of  exactly the way that I just posed it, right. Like,   can we take existing algorithms and use them  to solve rich-observation RL problems in a   modular fashion? And this turned out to be really  tricky. Like, there's a lot of natural algorithms   you might try that seem promising at first but  don't exactly work out. And what this, kind of,  

led us to and, sort of, the first main result in  this paper is actually a negative result. So what   we actually showed is most, sort of, well-studied  types of systems or, like, MDPs that have been   studied in, like, the prior literature on RL, even  if they're tractable when you're able to directly   see the state of the system, they can become  statistically intractable once you add, sort of,  

high-dimensional observations to the picture. And  statistically tractable here means the amount of   interaction that you need, like the amount of,  sort of, attempts to explore the system that   you need, in order to learn a good decision-making  policy becomes, like, very, very large, like much,   much larger than the corresponding, sort of,  complexity if you were able to directly see the  

states of the system. You know, you could look at  this and say, I guess we're out of luck. You know,   maybe there's just no hope of solving these  sorts of problems. But that's perhaps a little   too pessimistic. You know, really the way you  should interpret this result is just that you   need more assumptions. And that's precisely what  the, sort of, second result we have in this paper  

is. So our second result shows that you can,  sort of, bypass this impossibility result and,   you know, achieve truly modular algorithms under a  couple different types of additional assumptions.

TINGLE

Dylan, I'd like to know—and I'm sure  our audience would, too—what this work means   when it comes to real-world application. What  impact will this have on the research community?

FOSTER

Yeah, so maybe I'll answer that, um, with  two different points. The first one is a broader   point, which is, why is it important to understand  this problem of exploration and sample efficiency   in reinforcement learning? If you look at the,  sort of, setting we study in this paper—you know,   this, like, RL or decision-making with  high-dimensional observations—on the   empirical side, people have made a huge amount  of progress on this problem through deep  

reinforcement learning. This was what kind of led  to these amazing breakthroughs in solving games   like Atari in the last decade. But if you look at  these results, the gains are somehow more coming   from the, like, inductive bias or the, like,  generalization abilities of deep learning and   not necessarily from the specific algorithms. So,  like, current algorithms do not actually explore  

very deliberately, and so their sample efficiency  is very high. Like, it's hard to draw a one-to-one   comparison, but you can argue they need, like, far  more experience than a human would to solve these   sorts of problems. So it's not clear that we're  really anywhere near the ceiling of what can be   achieved in terms of, like, how efficiently can  you have, you know, an agent learn to solve new  

problems from trial and error. And I think better  algorithms here could potentially be, like,   transformative in a lot of different domains.  To get into this specific work, I think there's   a couple of important takeaways for researchers.  One is that by giving this impossibility result   that shows that RL with latent dynamics  is impossible without further assumptions,   we're kind of narrowing down the search space  where other researchers can look for efficient  

algorithms. The second takeaway is, you know, we  are showing that this problem becomes tractable   when you make additional assumptions. But I view  these more as, like, a proof of concept. Like,   we're kind of, showing for the first time that  it is possible to do something nontrivial,   but I think a lot more work and research  will be required in order to like, you know,   build on this and take this to something  that can lead to, like, practical algorithms.

TINGLE

Well, Dylan Foster, thank you for  joining us today to discuss your paper on   reinforcement learning under latent  dynamics. We certainly appreciate it.

FOSTER

Thanks a lot. Thanks for having me.

[MUSIC]

TINGLE

And to our listeners, thank you all for  tuning in. If you'd like to read Dylan's paper,   you may find a link at aka.ms/abstracts.  You can also find the paper on arXiv and   on the NeurIPS conference website. I'm  Amber Tingle from Microsoft Research,   and we hope you'll join us next time on Abstracts!

[MUSIC FADES]

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android