Transformer Models Beyond Scaling, Multilingual Image Synthesis, Advanced Text-to-Image Control

AI Papers Podcast

May 16, 2024•9 min•Ep. 28

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Compositional Text-to-Image Generation with Dense Blob Representations

For the best experience, listen in Metacast app for iOS or Android