OMG-LLaVA: Unifying Vision and Language Understanding, Step-DPO for LLMs Mathematical Reasoning, MUMU's Multimodal Image Generation
Jul 02, 2024•12 min•Ep. 57
Episode description
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Simulating Classroom Education with LLM-Empowered Agents
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented
Generation
For the best experience, listen in Metacast app for iOS or Android
