Beyond Encoders in Vision-Language Models, Revolutionizing Human-LLM Interaction, and Advancing Knowledge Graphs
Jul 10, 2024•12 min•Ep. 61
Episode description
Unveiling Encoder-Free Vision-Language Models
FunAudioLLM: Voice Understanding and Generation Foundation Models for
Natural Interaction Between Humans and LLMs
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for
LLM Agents
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language
Models
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
For the best experience, listen in Metacast app for iOS or Android
