Image and Video Segmentation with SAM 2, Gemma 2 for Efficient Language Models, Boosting Small Models with Contrastive Fine-Tuning, and MM-Vet v2 Challenges Large Multimodal Models
Aug 05, 2024•14 min•Ep. 67
Episode description
SAM 2: Segment Anything in Images and Videos
Gemma 2: Improving Open Language Models at a Practical Size
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal
Language Model
Improving Text Embeddings for Smaller Language Models Using Contrastive
Fine-tuning
OmniParser for Pure Vision Based GUI Agent
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and
Illumination Disentanglement
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models
for Integrated Capabilities
For the best experience, listen in Metacast app for iOS or Android
