Vision-Language Models, Arithmetic Transformers, Next-Gen Video Editing:
May 29, 2024•10 min•Ep. 37
Episode description
An Introduction to Vision-Language Modeling
Transformers Can Do Arithmetic with the Right Embeddings
Matryoshka Multimodal Models
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
Models
Zamba: A Compact 7B SSM Hybrid Model
Looking Backward: Streaming Video-to-Video Translation with Feature
Banks
For the best experience, listen in Metacast app for iOS or Android
