Scaling Properties of Diffusion Models for Perceptual Tasks - podcast episode cover

Scaling Properties of Diffusion Models for Perceptual Tasks

Nov 14, 2024•25 min•Ep. 73
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

🤗 Paper Upvotes: 7 | cs.CV, cs.AI

Authors:
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik

Title:
Scaling Properties of Diffusion Models for Perceptual Tasks

Arxiv:
http://arxiv.org/abs/2411.08034v2

Abstract:
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .

For the best experience, listen in Metacast app for iOS or Android