e:[["$","div",null,{"className":"wide-content-wrapper podcast-content","children":[[["$","div","0",{"className":"pb-8"}],["$","div","1",{"className":"pb-8"}]],[["$","section",null,{"className":"flex flex-col md:flex-row gap-8","children":[["$","div",null,{"className":"flex justify-center md:justify-start","children":["$","div",null,{"className":"min-w-[240px] lg:min-w-[300px] max-w-[300px] max-h-[300px] aspect-square overflow-hidden rounded-md","children":["$","$La",null,{"src":"https://img.transistor.fm/rdNvN-nx89WSjzo-ZTnbwXrCqE65XR3RSwRkgXgzBHI/rs:fill:3000:3000:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9zaG93/LzQ1MTk5LzE3MDg3/MDIxODItYXJ0d29y/ay5qcGc.jpg","width":300,"height":300,"alt":"LLM Inference Speed (Tech Deep Dive) - podcast episode cover","priority":true,"unoptimized":true}]}]}],["$","div",null,{"className":"flex flex-col w-full","children":[["$","h1",null,{"children":"LLM Inference Speed (Tech Deep Dive)"}],[["$","div",null,{"className":"flex justify-center md:justify-start","children":["$","$L7",null,{"href":"/podcasts/a02a3ceb-3418-5549-9514-d67c8101eb2c","className":"items-center text-link-color text-center md:text-start","aria-label":"Go to podcast page","children":[["$","svg",null,{"xmlns":"http://www.w3.org/2000/svg","viewBox":"0 0 24 24","fill":"currentColor","aria-hidden":"true","data-slot":"icon","ref":"$undefined","aria-labelledby":"$undefined","className":"w-[1rem] h-[1rem] inline mr-1","children":[null,["$","path",null,{"d":"M8.25 4.5a3.75 3.75 0 1 1 7.5 0v8.25a3.75 3.75 0 1 1-7.5 0V4.5Z"}],["$","path",null,{"d":"M6 10.5a.75.75 0 0 1 .75.75v1.5a5.25 5.25 0 1 0 10.5 0v-1.5a.75.75 0 0 1 1.5 0v1.5a6.751 6.751 0 0 1-6 6.709v2.291h3a.75.75 0 0 1 0 1.5h-7.5a.75.75 0 0 1 0-1.5h3v-2.291a6.751 6.751 0 0 1-6-6.709v-1.5A.75.75 0 0 1 6 10.5Z"}]]}],["$","span",null,{"children":"Thinking Machines: AI & Philosophy"}]]}]}],["$","div",null,{"className":"flex flex-row gap-2 text-sm items-center flex-wrap justify-center md:justify-start mt-1","children":[["$","span",null,{"children":"Oct 06, 2023"}],[["$","span",null,{"children":"•"}],["$","span",null,{"children":"40 min"}]],null,["$","span",null,{"children":"•"}],["$","span",null,{"children":["Transcript available on ",["$","$L7",null,{"href":"https://metacast.app","children":"Metacast"}]]}]]}],["$","div",null,{"className":"mx-auto md:mx-0 w-full max-w-[400px] mt-6","children":["$","$L1d",null,{"src":"https://media.transistor.fm/e71cd65d/22f74eeb.mp3"}]}],["$","div",null,{"className":"flex flex-col md:flex-row flex-wrap gap-y-1 md:gap-x-3 justify-center md:justify-start items-center mt-6","children":[["$","div",null,{"className":"text-sm text-gray-600 dark:text-gray-400 text-center md:text-start","children":"Listen in podcast apps:"}],["$","div",null,{"className":"flex flex-row flex-wrap gap-3 justify-center md:justify-start","children":[["$","$L7",null,{"href":"https://metacast.app","className":"flex flex-row items-center gap-1 font-light text-base text-sm","children":[["$","$La",null,{"src":"/images/icons/icon-sm.jpg","className":"rounded-md","width":24,"height":24,"alt":"Listen on Metacast","aria-hidden":true,"priority":true}],["$","div",null,{"children":"Metacast"}]]}],["$","$L1e",null,{"episodeData":{"title":"LLM Inference Speed (Tech Deep Dive)","episodeGuid":"a02a3ceb_3418_5549_9514_d67c8101eb2c_e2293ce2_07a7_41a1_b9d8_c327efac7a1e","podcastGuid":"a02a3ceb-3418-5549-9514-d67c8101eb2c","durationSeconds":2376,"imageUrl":null,"seasonNum":null,"episodeNum":null,"publishedAt":"$D2023-10-06T11:58:22.000Z","description":"\n

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?

We jump into:

Is fast model inference the real moat for LLM companies?
What are the implications of slow model inference on the future of decentralized and edge model inference?
As demand rises, what will the latency/throughput tradeoff look like?
What innovations on the horizon might massively speed up model inference?

\n ","enclosureUrl":"https://media.transistor.fm/e71cd65d/22f74eeb.mp3"},"podcastData":{"artworkUrl":"https://img.transistor.fm/rdNvN-nx89WSjzo-ZTnbwXrCqE65XR3RSwRkgXgzBHI/rs:fill:3000:3000:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9zaG93/LzQ1MTk5LzE3MDg3/MDIxODItYXJ0d29y/ay5qcGc.jpg","description":"“Thinking Machines,” hosted by Daniel Reid Cahn, bridges the worlds of artificial intelligence and philosophy - aimed at technical audiences. Episodes explore how AI challenges our understanding of topics like consciousness, free will, and morality, featuring interviews with leading thinkers, AI leaders, founders, machine learning engineers, and philosophers. Daniel guides listeners through the complex landscape of artificial intelligence, questioning its impact on human knowledge, ethics, and the future.\r\n\r\nWe talk through the big questions that are bubbling through the AI community, covering topics like \"Can AI be Creative?\" and \"Is the Turing Test outdated?\", introduce new concepts to our vocabulary like \"human washing,\" and only occasionally agree with each other.\r\n\r\nDaniel is a machine learning engineer who misses his time as a philosopher at King's College London. Daniel is the cofounder and CEO of Slingshot AI, building the foundation model for psychology.","latestEpisodePublishedAt":"$D2025-02-26T19:31:08.000Z","podcastGuid":"a02a3ceb-3418-5549-9514-d67c8101eb2c","iTunesCollectionId":1737674558,"artistName":"Daniel Reid Cahn","title":"Thinking Machines: AI & Philosophy","categories":["Technology","Society & Culture","Philosophy"],"feedUrl":"https://feeds.transistor.fm/slingtalks","url":"https://thinkingmachinespodcast.com/"},"isLoader":false}]]}]]}]]]}]]}],["$","section",null,{"className":"episode-description whitespace-pre-wrap space-y-3 mt-8 md:mt-12","children":[["$","h2",null,{"className":"text-2xl font-semibold dark:page-title-gradient-dark mt-12 text-center md:text-start mb-4","children":"Episode description"}],[["$","p","0",{"children":"In this tech talk, we dive deep into the technical specifics around LLM inference."}],["$","p","1",{"children":"The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?"}],["$","p","2",{"children":[["$","br","0",{"children":"$undefined"}],"We jump into:"]}],["$","ul","3",{"children":[["$","li","0",{"children":"Is fast model inference the real moat for LLM companies?"}],["$","li","1",{"children":"What are the implications of slow model inference on the future of decentralized and edge model inference?"}],["$","li","2",{"children":"As demand rises, what will the latency/throughput tradeoff look like?"}],["$","li","3",{"children":"What innovations on the horizon might massively speed up model inference?"}]]}]]]}]],[["$","div","0",{"className":"pb-8"}],["$","div","1",{"className":"pb-8"}]]]}],["$","section",null,{"children":["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"$1f"}}]}],null]

‌

Episode description

LLM Inference Speed (Tech Deep Dive)

Episode description