LLM Inference Speed (Tech Deep Dive) - podcast episode cover

LLM Inference Speed (Tech Deep Dive)

Oct 06, 202340 min
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?


We jump into:

  • Is fast model inference the real moat for LLM companies?
  • What are the implications of slow model inference on the future of decentralized and edge model inference?
  • As demand rises, what will the latency/throughput tradeoff look like?
  • What innovations on the horizon might massively speed up model inference?
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast
LLM Inference Speed (Tech Deep Dive) | Thinking Machines: AI & Philosophy podcast - Listen or read transcript on Metacast