LLM Inference Speed (Tech Deep Dive) - podcast episode cover

LLM Inference Speed (Tech Deep Dive)

Oct 06, 202340 min
--:--
--:--
Listen in podcast apps:

Episode description

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?


We jump into:

  • Is fast model inference the real moat for LLM companies?
  • What are the implications of slow model inference on the future of decentralized and edge model inference?
  • As demand rises, what will the latency/throughput tradeoff look like?
  • What innovations on the horizon might massively speed up model inference?
LLM Inference Speed (Tech Deep Dive) | Thinking Machines: AI & Philosophy podcast - Listen or read transcript on Metacast