Transformer Models Beyond Scaling, Multilingual Image Synthesis, Advanced Text-to-Image Control
May 16, 2024•9 min•Ep. 28
Episode description
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video
Diffusion Models
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Coin3D: Controllable and Interactive 3D Assets Generation with
Proxy-Guided Conditioning
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with
Fine-Grained Chinese Understanding
Compositional Text-to-Image Generation with Dense Blob Representations
For the best experience, listen in Metacast app for iOS or Android
