GPU Clouds, Aggregators, and the New Economics of AI Compute

AI Engineering Podcast

Jan 27, 2026•46 min•Ep. 75

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Summary
In this episode I sit down with Hugo Shi, co-founder and CTO of Saturn Cloud, to map the strategic realities of sourcing and operating GPUs across clouds. Hugo breaks down today’s provider landscape—from hyperscalers to full-service GPU clouds, bare metal/concierge providers, and emerging GPU aggregators—and how to choose among them based on security posture, managed services, and cost. We explore practical layers of capability (compute, orchestration with Kubernetes/Slurm, storage, networking, and managed services), the trade-offs of portability on “Kubernetes-native” stacks, and the persistent challenge of data gravity. We also discuss current supply dynamics, the growing availability of on-demand capacity as newer chips roll out, and how AMD’s ecosystem is maturing as real competition to NVIDIA. Hugo shares patterns for separating training and inference across providers, why traditional ML is far from dead, and how usage varies wildly across domains like biotech. We close with predictions on consolidation, full‑stack experiences from GPU clouds, financial-style GPU marketplaces, and much-needed advances in reliability for long-running GPU jobs.

Announcements

Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.
Your host is Tobias Macey and today I'm interviewing Hugo Shi about the strategic realities of sourcing GPUs in the cloud for your training and inference workloads

Interview

Introduction
How did you get involved in machine learning?
Can you start by giving a summary of your understanding of the current market for "cloud" GPUs?
How would you characterize the customer base for the "neocloud" providers?
How is the access to the GPU compute typically mediated?
The predominant cloud providers (AWS, GCP, Azure) have gained market share by offering numerous differentiated services and ease-of-use features. What are the types of services that you might expect from a GPU provider?
The "cloud-native" ecosystem was developed with the promise of enabling workload portability, but the realities are often more complicated. What are some of the difficulties that teams encounter when trying to adapt their workloads to these different cloud providers?
What are the toolchains/frameworks/architectures that you are seeing as most effective at adapting to these different compute environments?
One of the major themes in the 2010s that worked against multi-cloud strategies was the idea of "data gravity". What are the strategies that teams are using to mitigate that tax on their workloads?
That is a more substantial impact when dealing with training workloads than for inference compute. How are you seeing teams think about the balance of cost savings vs. operational complexity for those different workloads?
What are the most interesting, innovative, or unexpected ways that you have seen teams capitalize on GPU capacity across these new providers?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on enabling teams to execute workloads on these neoclouds?
When is a "neocloud" or "GPU cloud" provider the wrong choice?
What are your predictions for the future evolutions of GPU-as-a-service as hardware availability improves and model architectures become more efficient?

Contact Info

Parting Question

From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers.

Links

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0