AI Papers Podcast - podcast cover

AI Papers Podcast

PocketPodpocketpod.app
A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is created entirely with AI by PocketPod. Head over to https://pocketpod.app to learn more.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Real-Time Motion Control, Next-Gen Visual Captions, 3D Scene Reconstruction Innovations

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting SAGS: Structure-Aware 3D Gaussian Splatting Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting...

May 09, 202412 minEp. 25

Model Editing Insights with Llama-3, Rethinking Large Language Models in Math, 3D Rendering and Audio Compression

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 A Careful Examination of Large Language Model Performance on Grade School Arithmetic Spectrally Pruned Gaussian Fields with Neural Compensation SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge...

May 02, 202412 minEp. 21

Evaluating LLMs with Diverse Models, Novel Robotic Skills Framework, Editing 3D Graphics with VLMs

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models LEGENT: Open Platform for Embodied Agents Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting BlenderAlchemy: Editing 3D Graphics with Vision-Language Models...

Apr 30, 202411 minEp. 19

Bridging the Gap to GPT-4V, Interactive 3D Generation, Accelerating LLM Inference

AI Papers Podcast for 04/26/2024 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Interactive3D: Create What You Want by Interactive 3D Generation Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Tele-FLM Technical Report SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension...

Apr 26, 202412 minEp. 17

Hyper-SD Breakthrough, MAIA's Neural Understanding, SEED-X Multimodal Innovation

AI Papers Podcast for 04/25/2024 Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis A Multimodal Automated Interpretability Agent SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation MultiBooth: Towards Generating All Your Concepts in an Image from Text Learning H-Infinity Locomotion Control...

Apr 25, 202411 minEp. 16

Model Efficiency, Instruction Prioritization, and Workflow Automation

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions FlowMind: Automatic Workflow Generation with LLMs Music Consistency Models How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study...

Apr 23, 202412 minEp. 14

Adapting Diverse Controls: Ctrl-Adapter, HQ-Edit, Tango 2

AI Papers Podcast for 04/21/2024 Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models On Speculative Decoding for Multimodal Large Language Models...

Apr 21, 202412 minEp. 12

Dynamic Typography, Mesh Reconstruction, and Personalized Image Generation

AI Papers Podcast for 04/20/2024 Dynamic Typography: Bringing Words to Life Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing MeshLRM: Large Reconstruction Model for High-Quality Mesh MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation EdgeFusion: On-Device Text-to-Image Generation...

Apr 20, 202411 minEp. 11

AI Papers for 04/19/2024: Multimodal Advancements, AI Animation, Speculative Decoding

AI Papers Podcast for 04/19/2024 BLINK: Multimodal Large Language Models Can See but Not Perceive Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models AniClipart: Clipart Animation with Text-to-Video Priors TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data...

Apr 19, 202412 minEp. 10

AI Papers for 04/17/2024: Efficient Methods for Model Alignment and Compression

AI Papers Podcast for 04/17/2024 Learn Your Reference Model for Real Good Alignment Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length TransformerFAM: Feedback attention is working memory Compression Represents Intelligence Linearly Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video...

Apr 17, 202412 minEp. 7

AI Papers for 04/16/2024: Advancing Language Models for Multimodal and Long-context Learning

AI Papers Podcast for 04/16/2024 Octopus v2: On-device language model for super agent Advancing LLM Reasoning Generalists with Preference Trees Long-context LLMs Struggle with Long In-context Learning LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Bigger is not Always Better: Scaling Properties of Latent Diffusion Models...

Apr 16, 202412 minEp. 6

AI Papers for 04/15/2024: Modernizing Segmentation, Analyzing CLIP, and Probing 3D Awareness in Vision Models

AI Papers Podcast for 04/15/2024 COCONut: Modernizing COCO Segmentation Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation Pre-training Small Base LMs with Fewer Tokens Probing the 3D Awareness of Visual Foundation Models...

Apr 15, 202411 minEp. 5

AI Papers Podcast for 04/14/2024

AI Papers Podcast for 04/14/2024 Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences: https://arxiv.org/abs/2404.03715 No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance: https://arxiv.org/abs/2404.04125 AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent: https://arxiv.org/abs/2404.03648 Stream of Search (SoS): Learning to Search in Language: https://arxiv.org/ab...

Apr 14, 202411 minEp. 4

AI Papers Podcast for 04/13/2024

AI Papers Podcast for 04/13/2024 OmniFusion Technical Report: https://arxiv.org/abs/2404.06212 LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders: https://arxiv.org/abs/2404.05961 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD: https://arxiv.org/abs/2404.06512 Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence: https://arxiv.org/abs/2404.05892 MiniCPM: Unveiling the Potential of Small Language M...

Apr 13, 202413 minEp. 3

AI Papers Podcast for 04/12/2024

AI Papers Podcast 04/12/2024 RecurrentGemma: Moving Past Transformers for Efficient Open Language Models: https://arxiv.org/abs/2404.07839 WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents: https://arxiv.org/abs/2404.05902 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models: https://arxiv.org/abs/2404.07973 Best Practices and Lessons Learned on Synthetic Data for Language Models: https://arxiv.org/abs/2404.07503 HGRN2: Gated Linear RNNs wi...

Apr 12, 202411 minEp. 2

AI Papers Podcast for 04/11/2024

AI Papers Podcast for 04/12/2024 ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback: https://arxiv.org/abs/2404.07987 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments: https://arxiv.org/abs/2404.07972 Rho-1: Not All Tokens Are What You Need: https://arxiv.org/abs/2404.07965 JetMoE: Reaching Llama2 Performance with 0.1M Dollars: https://arxiv.org/abs/2404.07413 Transferable and Principled Efficiency for Open-Vocabulary Segmen...

Apr 12, 202411 minEp. 1
Hosted on Transistor
For the best experience, listen in Metacast app for iOS or Android