Scaling Properties of Diffusion Models for Perceptual Tasks

Daily Paper Cast

Nov 14, 2024•25 min•Ep. 73

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

🤗 Paper Upvotes: 7 | cs.CV, cs.AI

Authors:
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik

Title:
Scaling Properties of Diffusion Models for Perceptual Tasks

Arxiv:
http://arxiv.org/abs/2411.08034v2

Abstract:
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .

For the best experience, listen in Metacast app for iOS or Android