#515: Durable Python Execution with Temporal - podcast episode cover

#515: Durable Python Execution with Temporal

Aug 11, 20251 hr 11 minEp. 515
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

See the full show notes for this episode on the website at talkpython.fm/515

Transcript

What if your code was crash-proof? That's the value prop for a framework called Temporal. Temporal is a durable execution platform that enables developers to build scalable applications without sacrificing productivity or reliability. The Temporal server executes units of application logic called workflows in a resilient manner that automatically handles intermittent failures and retries failed operations. We have Mason Egger

from Temporal on to dive into durable execution in Python. This is Talk Python To Me, episode 515, recorded June 19th, 2025. Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Mastodon where I'm @mkennedy and follow the podcast using @talkpython, both accounts over at fosstodon.org and keep up with the show and listen to over nine years of episodes at talkpython.fm. If you want to be part of our live episodes,

you can find the live streams over on YouTube. Subscribe to our YouTube channel over at Talk Python.fm slash YouTube and get notified about upcoming shows. This episode is sponsored by Posit Connect from the makers of Shiny. Publish, share, and deploy all of your data projects that you're creating using Python. Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs. Posit Connect supports all of them. Try Posit Connect for free by going to

talkpython.fm/posit, P-O-S-I-T. The PyBay Conference is returning to the UCSF Mission Bay Conference Center in San Francisco, California on October 18th, 2025. Get your ticket and pick up a free conference course bundle from Talk Python. Get started at talkpython.fm/PyBay. Big announcement, we have a brand new course for you all data science types out there. Just Enough Python for Data Scientists. Data scientists get things done in notebooks,

but production quality work needs more than ad hoc scripts. Just Enough Python for Data scientist gives you the essential Python and software engineering habits to level up your analysis without drowning in theory. In a few focused hours, you'll tighten up your core Python, write clean and reasonable functions, organize code into importable modules, track work with Git and GitHub, debug confidently, and make your results reproducible with pinned environments and

Docker. You'll also see how modern agentic AI tools can accelerate data exploration, bug detection, refactoring, and documentation. The outcome is simple. You keep your notebook speed while gaining the reliability, collaboration, and professionalism your projects deserve. Just visit talkpython.fm and click on courses in the nav bar. The link to the course is also in your podcast player show notes. And if you'd rather focus purely on building with LLMs, check out Vincent Warmerdam's

LLM building blocks for Python course we just recently released as well. Now on to that interview. Mason, welcome to Talk Python To Me. Fantastic to have you here. It's great to be here. Long time listener. Oh, wonderful. I'm definitely a fan of stuff I've seen you doing online as well. And it's super cool to get together here and, you know, share it with a couple of people. Yeah, definitely excited. Temporal. Wow, what a cool topic. Durable execution.

What a neat idea that I've seen in other places, but I've seen less of it in Python. So I'm real excited to dive into this. This is something I learned about recently. We'll go into that in a bit, but there's a lot here. Let's just leave it like this could be a two hour show and we'd still be going easy. Definitely. Yeah. I, I spend a lot of my time educating people about durable execution. that's, that's what my role is. I'm a develop, one of the developer educators at temporal.

and definitely you could spend hours on this and we would, we could be going forever. So yeah. Yeah. And we definitely could. Well, before we do go on for hours, who are you? It was Mason. Oh yeah. my name is Mason Egger. I am a developer educator at temporal, as I mentioned. And I also help run the PyTexas conference. So PyTexas is one of the oldest, actually the oldest regional Python conference in the world that we know of. No one else has come to claim that spot yet.

We started in the late fall of 2007, like right after PyCon US would have started. We'll be experiencing our 20th year this upcoming year, which is exciting. And I also am the president of the PyTexas Foundation. So PyTexas has its own 501c3 organization that is used to basically act as a sheltering organization for all of the Python meetups and events that go on within the state of Texas, which is really nice. So I was elected president of that back in 2002.

I've been helping run and build the Python community there ever since. And it's a lot of work, but it's a lot of fun. And I really enjoy my community work. Yeah, it sounds extremely fun, although also challenging. You say for the entire state of Texas, Texas is a big place. It's like countries as well, if you look at the size of Texas. It really is. Texas has some unique issues with it when it comes to running community stuff.

We founded our own virtual meetup for basically anyone in Texas, and anyone else can join, obviously, too. We don't say, oh, no, no Texans can join it. But we did it because the amount of feedback we would get, it's like, oh, I live in, say, the DFW area, the Dallas-Fort Worth area, and I would love to go to the meetup, but I live two hours away from the meetup because of how large the Texas cities are. Texas cities are sprawling metroplexes.

I grew up in the Houston area, and the joke when you grew up in Houston is you're always three hours away by car from every other part of Houston. And that really is the case. So it does have its own set of unique challenges, trying to coordinate four major cities across, you know, technically two time zones. If we can, if we include El Paso, the very, very edge, the very west part of Texas is in the mountain time zone. So it's an interesting bit of work.

And we've been doing a really good job of expanding and like adding more and more offerings that we can offer to the community year over year. And it's been a lot of fun and I really enjoy it. It keeps me busy. That's for sure. Yeah, I could say. is that why people drive so fast in Texas? They've got so far to go. Exactly. Yes. I mean, the fastest speed limit in the United States is on the highway next to my

house. It's 85 miles an hour and it's on the toll road next to my house because you're right. Like if you're trying to get from Austin to San Antonio, that's, I think it's about a hundred miles down. And like, if you're going 55, you'll get there in, you know, two and a half hours, but I can make it in an hour and a half hour and 15 minutes. If you just let me fly, I had no idea that the speed limit were that high. I knew they were high, but not that high in Texas. Yep. Wild, you know.

You guys in Germany, come on. Yeah, it is the Southern Autobahn, basically. Fantastic. So you've had a bit of a tougher challenge, I suppose, or more challenges than a lot of people with this whole Pi Texas thing, which I've always thought was super cool. I'd love to go sometime. It's far from Oregon, but still would be really lovely. That said, you started working on this COVID time, right? Which for conferences is the kiss of death. You want to talk us through how that went?

Yeah, I could definitely talk about that for a while. So I got involved in conference organizing, as most people do, by first speaking at the conference. So I spoke at the conference in 2019. I had been a PyCon attendee prior to that. My first PyCon attendance was the last Portland one, which I believe was 2017. and then spoke at Pitexas in 2019 and volunteered to help for the 2020 conference. And that was really interesting because, you know, we had planned to do it in person.

I think everybody had planned, was planning in person 2020, you know, because you, for those that don't know, when you start planning a conference, you start planning about eight, 10 months prior. So if you're a spring conference as Pitexas is, we were planning, you know, Pitexas 2020 in about summer of 2019. No one knew what was coming. We were so innocent then. Yes. We really were. So we were planning that. And then, you know, the events that we know of happened

and we kept chasing it. We kept pushing it back. I think a lot of conferences at that time kept pushing back thinking, Oh, we'll be out of this in two weeks. We'll be out of this in three months. We ended up going virtual. And then in 2021, we made the conscious decision to just not have a conference. The virtual conference for us in 2020 wasn't overly successful. And we've, at that time, we were feeling there was just a lot of virtual conference fatigue and we didn't have a really

good story on how to like make it work and dedicate the time. Everybody was also struggling to handle the world at that time. So being able to put more resources into it was difficult. So we pushed that off and then we came back in 2022. We made a very conscious decision about it. Like we were like, we're going to come back in the safest way possible. PyCon US had announced they were coming back. We decided we were going to come back. We had, we have a venue that has

custom air filtration. We instilled mask mandates and vaccine mandates and all of that stuff. And we had 76 people return. But we knew that if we didn't come back, if we let this kind of like continue on, that the likelihood of getting the community to come back, that the memory is short, that if we didn't come back, we might lose it forever. And having run this for 18 years, we were concerned about losing it at that point. I think that's totally valid. A lot of this

conference attending is it's habit. You form, you've gone a bunch of times. You really enjoy it. Like, yeah, of course I'm going to go. You don't even question like, well, when is it? I'm just going to sign up when it comes out. But you know, if you stop going, then I think that does make a big change. So it definitely does. So in 22, we, we did it and then we kept coming back. And then, you know, every year prior after that, we, we continue to grow. it took us three

years to get back to above, what, what I would consider pre-pandemic normals. and the, the fun fact about that is, and the thing that I think that really kind of helped us out is like, we didn't really start seeing the growth back in the conference attendance until us as the foundation turned our inwards local or sorry, our vision local or our site to start focusing on the

local meetups, because the local meetup scene had not returned. And that was the vast majority of our of our of our marketing was that we would send out these emails and stuff or the local meetups would promote us. And these are huge meetups. I mean, the Python, the some of the Python meetups are some of the oldest meetups that I know of. I mean, like they started in the early, in the late aughts in the early teens. Yeah, I would say they probably were not even called

meetups when they started, right? The user groups or something. They predate meetup. Yeah, meetup.com and all that. Yeah, I think the PyHouston meetup group is called, like their tag on meetup.com is Python-14. So I'm assuming it was the 14th Python meetup when it was created in the world on meetup.com. so, and you know, large groups, but these meetups had

gone dormant. So what I did as, as foundation president at that time was I was like, okay, I'm going to spend all of my effort while my organizers are helping get the conference going. I'm going to spend all of my effort finding organizers and rebuilding these meetups in these areas. so I would connect with people that I knew in the area. I would reach out to friends. I would put out all calls and after time of rebuilding the ecosystem, then we basically

everything came back to life. And that's kind of where a little, you know, the things that I have learned is that if your, if your meetup ecosystem is not healthy, then your regional Python conference ecosystem will not be healthy because it has to feed up into it. Yeah. Yeah. I think it's, it's such a challenge to run like a conference or podcast or whatever to draw in new people.

I mean, you can do all the amazing mailing lists and announcements and everything, but you're speaking to your existing people that know about you and finding new ones is really interesting and this cultivating like user groups to sort of be the foundation. It's a very interesting idea. Definitely. User groups work. I think we also started the PyTexas meetup, which was a virtual meetup that we do monthly that we allow anyone to join. And we cultivated

amongst our network of friends and social media and word of mouth. And then we all know what happened with the social media situation, which really did not help. I mean, there was a lot of just a lot of things that have happened with like the dissolution of Twitter really did not help the conference and the tech ecosystem. Basically, we've all fractured now. We're all in like six different places. Like the move to Mastodon is not complete. Bluesky had its moment, but I honestly

get less engagement on Bluesky than anything else. LinkedIn has surprisingly been our most successful social media platform. It seems like a lot of people have moved there for some reason or another. But basically, it means that you just have to reapply your marketing strategies. And the fun thing that I've had the benefit of is that as my work as a developer advocate, all the roles that I have done, they tend to sit in marketing. Developer advocacy tends to either sit in

product or marketing. All the roles that I have taken sit in marketing. And I've had the benefit of like, whenever this has started going weird, I can ask all of my marketing friends, hey, what would you do in this situation? How would you approach this? So I've got to learn through osmosis kind of how marketing stuff works and being able to apply that to conference organizing and meetup organizing has actually been substantially beneficial to us.

Yeah. Certainly, I think companies that are turned on enough to have developer advocates have developer intelligent marketing teams, and that's a real resource. Definitely. Yeah. It's been really useful to be able to get other people's opinions. And then, you know, just ask other conference organizers, what are they doing? I think that, you know, finding out what works and telling other people about it.

I mean, we, I haven't had a chance to write this year's blog post yet, unfortunately, for this year's conference. But whenever I became conference chair and then president, I was like, we're going to be as transparent as possible. Every misstep we make, we're going to blog about it. We talk about our budget. I mean, we're also a 501c3, so that's kind of like part of our bylaws. But at the same time, it's like this is everything that we do. You can find it on all of our websites.

And this is what worked. This is what didn't. Because there's so many first-time conference organizers who may want to start a conference who don't know how to achieve these goals or whatever they're trying to do. And we have 20 years of experience doing it. And like it, we need to help each other out and make sure that we distill this, this knowledge outward. I don't have 20 years of experience. So I'm only doing it for four, but you know, institutional knowledge.

There are, there are Google docs with like years of back data that I can go and look at and be like, oh yeah, in 2009, we ordered way too many meals because we, you know, didn't charge enough for, we didn't charge anything for tickets. Like PipeSex used to be a free conference. And basically when you don't charge anything for tickets, one of the lessons that we learned is that people will just sign up for a ticket and then not show up. And then your catering gets all kind of out of whack.

So even charging just a little bit of money, like five bucks, I think we charge like five dollars for our network event. It's not because I need the five dollars. I mean, I've told I've told my we spend, I think, thirty dollars per person on the food. It's like it's to make sure that you have a little bit of like skin in the game to make sure that you show up so I can get an accurate headcount for the for the for the catering. So we don't blow the budget by ten thousand dollars.

Like I'm not envious of that. I certainly it's easy to just check a checkbox. Yeah, I'm interested in food. Why not? Yeah. If I come. Yeah. Exactly. This portion of Talk Python To Me is brought to you by the folks at Posit. Posit has made a huge investment in the Python community lately. Known originally for RStudio, they've been building out a suite of tools and services for Team Python. Today, I want to focus on hosting your Python-based data science workloads.

This includes dashboards, reports, plots, interactive web apps, all the way to custom Flask and Django apps. Their service is Posit Connect. Posit Connect makes it easy for data scientists to share work built with Python code. If you have a streamlet app, Dash, dashboard, Plotly interactive plot, a FastAPI service, or even a Quarto report, just give Posit Connect the code it needs to maintain the asset and Connect automatically does the rest.

Connect will manage your APIs and serve your interactive apps for you. And if you want, you can update your reports and dashboards on a scheduled basis. That's right. No need to explain to the stakeholders why that dashboard or plot stopped updating last week. You get a focus on your data science and leveraging your skill set while Connect makes you look good, keeping your code running and private.

With Connect, you get a private URL on your Connect server, ensuring that your asset is continuously available to your shareholders. And you can control which users have access to the asset. Let Posit Connect handle the delivery and DevOps involved in sharing your work. You focus on what you do best. So if you work on a data science team, you owe it to you and your org to check out Posit Connect. Visit talkpython.fm/connect today and get a three-month free trial to see if it's a good fit.

That's talkpython.fm/connect. The link is in your podcast player's show notes. Thank you to Posit for supporting Talk Python To Me. out in the audience. Toon Army says, yay, PyTexas Foundation. That's pretty awesome. Yeah. All right. Well, let's talk. Emporal. Now, I came across this a few weeks ago from this, you know, I think this, ironically, I think this might be the blue sky that I came across it on. Interesting. No. Okay. It was on X. There's still apparently stuff that happens on there.

I miss Twitter. I do too. And, you know, people like post weird comments on, like reviews like oh you know they said they don't like twitter so they must be whatever like no just that used to be so active and it's not so active like put put aside everything else it used to be like you could have great conversation you still can but it's so much less than it used to it's an easy it's an easy explanation the signal to noise ratio is completely messed up now yes it used to be such a much better

place because everybody was posting there and there wasn't as much noise and now the signal to noise ratio makes it almost unintelligible to be able to find anything of any use and that's the number one problem with it. And you pay for attention and there's six other places like you said. So anyway, so I found this post from, we'll work it backwards, from Pietro. It says, people aren't talking enough about how most of OpenAI's tech stack runs on Python, which I thought was a super cool post.

It comes from the Pragmatic Engineer newsletter and it talks about how most of the product's code is written in Python, uses FastAPI, C for certain parts. And then, so all that stuff made, I'm like, yep, yep. Temporal, use for asynchronous workflows. I'm like, what is Temporal? And then I think you sent me a message and said, hey, I would love to talk to you about Temporal. And I looked at him, yeah, this is super cool. So that's how I learned about you guys, which I thought was pretty neat.

And yeah, let's talk about it. What is Temporal? Yeah. So Temporal is essentially what we call a durable execution platform. And it's durable execution is kind of a new term or a new like, like field within the zeitgeist, as you would call it. And we're kind of going with like what we're calling crash proof execution. And the way that we kind of talk about it now is like, essentially, it handles the failure state of your of your applications and ensures that your code

will continue execution regardless of the failures that it encounters. So say, for example, you have a application that's making a couple calls out to a microservice. Okay. And that microservice goes temporarily down. You as the developer would have to traditionally write a whole bunch of logic around handling that failure. Okay. So we have to detect that the failure has happened. What type of error did we receive? Now, what do we do? Do we back off and retry? Do we decide that

this failure is non-retriable and we just don't do it? Like the difference between we're not authenticated versus a 404, those are completely different failure handling modes. And then, So there's all of this kind of logic that you have to build in to handle failure. Temporal basically abstracts this away from you. So whenever you have like a function call or a method call in Python, when implemented with Temporal, you automatically get like retries, for example, by default.

So you get this like declarative policy that happens as a default policy, and it will automatically retry your application or your call until eventually it comes back online. Because let's be honest, in most distributed systems like this, those are most of the time intermittent failures. Like a microservice going offline, are you being rate limited? Those are usually fixable with a little bit of time and some retries.

Now, there are other cases where they're not, but that's like the default policy. That's the default use case or the default for retries alone. And then there's a lot of other cases. So Temporal maintains the application state of your application.

And so say you have like a Python application that has 10 steps in it and you get through step five and then the python application were to crash for some reason out of memory uh maybe your kubernetes pod got descheduled something like this and production happens happens all the time this is as a former sre this was this was the life that i lived for years um yeah even hey there's a security patch for the Linux server that's the yeah running this and we got to restart

it and like uh we have to reboot yeah and now yes it's not wrong but like stuff is going on exactly So now when you have to deal with that, you're like, okay, so do we let everything get through and risk the attack surface? Or then you have to make the calculus. Do we just kill all the processes and then wait for, and then restart them? What's the cost on the restart? And then redoing all of that logic. Did they make any rights to the database? Do we have to worry about cleaning all that?

There's all of this conversation. Since Temporal maintains that application state, that becomes a lot simpler. Because as we, I think we kind of alluded to it, but I'll state it outright. when that when you when that application crashes, and in a typical application, you have to start back over from the beginning. And you have to replay every you basically re execute everything you did unless you were have some like advanced sort of like event sourcing style model where you

are keeping track of it. Temporal does this by default. So it maintains the event history and what's called an event history of your application. And what that does is every time a basically creates checkpoints, every time a function executes, it stores the result of it. And if that application were to crash, it will reschedule it onto another worker. That's what they're called in temporal.

And we'll get into that in a minute within your fleet, re reconstruct the state of that application up into that point of failure, and then continue onward as if the failure had never happened. And more often than not, this happens without the developer even being aware of it. Like no, no alarms will go off unless you personally do them, because this is the design of the system. It's to ensure that in spite of failure, your application will continue executing.

Yeah, that's awesome. And there's a bunch of neat ways in which that happens, which we'll talk about. But I think one mental model people should maybe think about when they consider these is it centralizes a lot of the error handling and the retry code and all of that stuff. So a lot of your application becomes simpler, not more complex. Exactly. Yes. That's actually one of the main feedback things that we get from people is, you know, all of this. There's so many mechanisms.

There's so many actually like patterns that we have now built to handle all of these kind of things. Event sourcing, event-driven architecture, saga pattern, all of these different like CQRS, like all of these different distributed systems patterns that exist to handle all of these things. And Temporal basically abstracts away a lot of them and you get them basically for free out of the box with the project. And it does make your code a lot simpler.

It makes it a lot more contained so you can walk through it straight down. Basically, as like almost as like it's a workflow, you know, the core primitive of temporal is called a workflow. We don't tend to like to refer to ourself as a workflow engine, but some people will say that. And I typically don't correct them on it. If that's if that's what it takes for people to understand, that's what it is. Fine. But durable execution is the actual phrase.

It's a meaty topic. It's a it's kind of hard for people to like kind of wrap their heads around it. So I'm like, let's get you to understanding it and then we'll correct the little the little tidbits here and there. Yeah, excellent. So, you know, people are probably familiar with Hennick's stamina or with tenacity, which is interesting, where you might put a decorator onto a function and say, hey, if this fails, instead of just crashing, try it again.

Yeah. Maybe a few times, maybe with exponential back off. And this is kind of a single thread of execution type of form of durability a little bit, right? Where it's like this function now becomes somewhat more reliable based on like maybe the service comes back or maybe your network comes back or whatever it happened to be that caused this particular example. But with temporal, it's a lot broader, right? Like every step, it saves the state. If something goes wrong, it can basically resume.

I guess maybe this resumable idea is like one of the big difference, right? It's like, let's take our reboot the server example. Let's suppose we got some service that we work on that our app depends upon. We had to reboot it. Now what? It happens. Yeah. So, I mean, yeah, you have a service that our app depends on. We had to reboot it. So if the service, so we're assuming our service is on in like an external machine and it's calling out. Yeah, for some reason.

Yeah. Or maybe it's a Docker container and we rebuilt the Docker container. It takes five seconds to start. Something like that. Yeah. So our service is, you know, calling out to it. it will basically retry until that comes back online. Now, if we were to reboot our service, say our service was running and we rebooted the container that contained it, it would basically, if we're in a Kubernetes system, the Kubernetes scheduler would probably reschedule it onto another node within the pod.

And then it would reconstruct it. And what happens is there's actually, the Temporal exists as a kind of orchestrator executor model, like a service worker model. So the service maintains the history. So what will happen is whenever that new service comes online, or whenever the new execution comes online, we've rebooted it, it got rescheduled, it comes online, it will stream the old history from the service and then reconstruct it, basically going step by step through it.

And so like the function A that was executed, the input and output values are stored in that event history. So the output value, it's like, oh, we've successfully completed execution of function A, store that in variable foo, and then continue onward. And then we can continue reconstructing that service up until it gets to the point where it's like, okay, we have no more events in the history about what happened. So now we know we've, you know, we have reconstructed successfully.

Now we continue forward and execute as if nothing had happened. Yeah. So in your non-durable execution code, you might have try, accept, do something else. You might have stuff like stamina or tenacity where you're like, okay, we're going to try this again. There's a lot of that kind of code that you write. And in this world, you could just say like top to bottom, write the happy path, which is great. And then what happens is temporal says, okay, I tried to run this, it failed.

So we'll reschedule it with a back off or something along those lines, right? So you can basically not deal with a lot of this and get visibility into the errors and the state of flow by letting the orchestrator manage that, right? Yes. And I'm covering like the top 10% of all the different features. There are so many interesting other features that Temporal provides as an ecosystem.

So like one of the other really neat ones is that we provide a really simple way for things to do a human in the loop interaction. So you can very easily send in what's called a signal. So basically sending in data into a running execution and then basically doing some processing on it. So you're waiting on a confirmed signal from someone like your application is doing something you're waiting on to confirm. you can send that in directly into the workflow.

And then that will basically be, that's persisted again within the events, within the event history. So if it crashes after that confirmation, that confirmation is stored as well. So you have that, you have the ability to do long running schedules. So there's some cron syntax, like what are called schedules in Temporal. There's so many different features in Temporal that are just really neat and can solve a lot of the problems that you're trying to do. And scaling becomes super easy as well.

So like you want to scale, just add more workers to the fleet. That's the easiest thing you can do is just add more of these workers and they basically can be executed across all of your different fleets. So the scaling story is super awesome as well. This portion of Talk Python To Me is brought to you by PyBay. PyBay is an annual conference gathering of Pythonistas put on by the Bay Area Python Association.

This year is returning to the UCSF Mission Bay Conference Center in San Francisco, California. It's a one-day conference on October 18th, 2025. I've spoken there previously and had a great time attending. And there's a bonus. Talk Python is sponsoring the conference. Every attendee gets a special conference bundle of paid courses for free as a conference gift. Plus, we'll be giving away a complete set of training materials for a dev team of some lucky attendees.

So if you want to connect with the Python people of San Francisco and go home with courses from Talk Python, check out PyBay. Please use our link. It's talkpython.fm/PyBay. The link is your podcast player show notes. Thanks to PyBay for supporting the show. We'll get into it. There's definitely some examples of large scale use cases, you know, high frequency use cases of these things.

But going back to the timing, you know, what if your user onboarding looks more like, I guess you could be real simple. You could say in order to create a user account, I have to first have them create and enter their username and email or their email and their password. And then I'm going to send them an email and they got to take an action based on that,

right? That would be pretty common, like a long running type of thing you could consider. But for some of these systems, it's like, and prove who you are and upload a document. That's a picture of your ID and somebody will look at it and go, yeah, that looks real enough. And they check a box or, you know, those kinds of much longer onboarding things, Something like that could be modeled with temporal pretty easily. It sounds like.

Yeah. So long running workflows are the other, that, that was the feature I couldn't remember, which was timers. And I have no idea why that left my mind. It's one of my favorite features. long running workflows are one of the like amazing use cases of, of temporal.

So because everything is maintained in this, in that state and basically crashing doesn't really matter because we can just reconstruct the state, you can have workflows or you can have your executions that can last for days, weeks, years. This is kind of what we know, we kind of known as the entity workflow pattern.

So essentially like a user who's going through an onboarding process, you know, like you just said, I think the identity workflow process is actually one of our exact sample applications. So, you know, like you're right, they lay sign up and they have to upload some forms of ID. Someone has to check it. It has to go through maybe a background check process and all of that. That's a long running workflow. That could take days.

That could take weeks, depending on what kind of background check you're getting done. Temporal can suspend its state for however long it wants, you know, and we can guarantee that it will come back online. So the interesting thing, and whenever you ever, if you ever see Temporal at a booth at a conference, we were at PyCon this year, for example.

Our code sample on our booth has a really interesting statement that always catches people's eye, and it's a little mini workflow, and it's sending emails. And what it does is it says sleep for 30 days. And nobody in their right mind would actually write a sleep for 30 days in code and expect it to actually function. 100% works exactly the way you would expect it to in Temporal. And we can guarantee that it works because of the way that Temporal is architected.

Those timers basically exist on the servers on this on this temporal service side. And they just get basically your workflow just gets scheduled to resume after the timer has fired. So you can guarantee that long running workflows will complete exactly the way that you expect them to. So, yeah, long running workflows, amazing use case for Temporal. Yeah, it sounds incredible. Now, I do want to dive actually in. It's super interesting how y'all made this happen in Python.

But I do want to just maybe talk about the scale. Like if I were to run Temporal, now you guys have a cloud. So I guess stepping just a bit back. This is MIT licensed. Yes. But you also have pricing. Yes. Before we go into what I was going to talk about, let's talk about that difference there, that contrast. Yeah. What's the story? So Temporal is 100% open source. MIT licensed. And so there's the temporal service and there's the temporal SDKs.

Every single one of our temporal SDKs are and the service and everything is MIT licensed forever and always that our founders are big, big fans of that. The only SDK that is not is the Java SDK, which is Apache 2. And if you know anything about open source licensing, there is basically a dependency further up the tree that was Apache 2. And you're not allowed to downgrade licensing if you ever make a derivative. So that's the only reason that one is.

but every one of our other licenses is MIT licensed. So you could run Temporal open source and be fine. The thing that we have found is that at scale, the service itself is challenging and requires large SRE teams to run. Because essentially what we're doing is we're offering distributed systems as a service. We're offering reliability as a service. Just because we have abstracted these problems away from someone does not mean that the price does not have to be paid by someone.

And I'm talking about the metaphorical price, not the dollar price. someone still has to ensure that that database stays up essentially and that your things are still getting scheduled and that the network is up and all of these. So it's super complex to do that. And you can run it. So you can run the temporal service locally. You can run the temporal workers locally. Everything you run is still local. The pricing model for temporal is just the temporal

service part, which is run in temporal cloud. So there's a weird misnomer around cloud, which is like cloud always assumes that we run everything for you. Temporal cloud is different. Temporal cloud only runs the temporal service, your workers, where your code executes is always run, at least by for now until who knows if another product will ever come out. I don't know.

Is run by you on you in your data center on your machines. So your code, your execution run by you, that service, that brain, the orchestrator, that's the part you could pay temporal cloud for. Right. That's the part that does the orchestration, the part that handles the the failure and retries and that kind of stuff, right? Yeah. Well, Ted, it's the part that actually the failure retries is handled by the state

machines that are built in the SDK. It's the part that basically maintains the event history. It maintains the communication mechanisms and it is the orchestrator. Yeah. So, but. So if we use the cloud, I could like reboot my local data center machine or even my local data

center. I mean, my version of the cloud, you know, wherever digital ocean or whatever. And when it comes back it'll sort of resume along like you guys will see that it's back and then yeah start running work on something like that that's what i was getting at technically yes so fun fun fun fact about it temporal the again the architecture of this is so brilliant and we could get so touched in the weeds about the temporal service does not actually know about any of the

workers that are running it's always a call out model so your your your machines would come back online know that they have things they need to do they would basically start listening back Everything that happens in Temporal listens across task queues. So they would all come back online. They would start listening again. And then they would see that there's still work in the task queues, which the service maintains. But all the service does, it goes, oh, someone requested that I do something.

And then it puts it on the task queue. And then the workers handle it. The true magic of Temporal lies within the state machines that are built within the SDKs. But they cannot function without that orchestrator service. I remember when I first started working on the courses. like my primary role is writing courses for Temporal. And if you go on to our Temporal Learn site, you can find a lot of our, I think it's learn.temporal.io. You can find a lot of our courses.

I wrote the Java and the Python ones. I remember when I was first writing them, I kept asking the SDK engineers like, well, how does the server do this? How does the server do this? And one of the engineers was like, Mason, you're giving the server too much credit. It's not that smart. It's really the state machines within the workers that are doing all the heavy lifting and the service just maintains it. but with the long way around, what the cloud service does is it maintains that database.

So the history, without that event history, that is the magic piece. The event history is the single source of truth of all things that have happened, and that's what the service provides and maintains. So whether you use the cloud version of that or you self-host that, that is up to you. Now, again, if you self-host it, you are now responsible for maintaining it. You are now responsible for upgrading it, security patches, all of that. And there are multiple case studies.

There are multiple YouTube videos on our channel of people who have gone the self-hosted route and have found that the cloud route is easier once they've reached a certain scale. So yeah, it's pretty neat. - Yeah, yeah, cool. More and more, that's the business side of what works for open source, right? - Yeah. - 'Cause there's something that you wanna do, and then there's another requirement that you're probably not good at, and you guys would be really good at, right?

Like for example, making sure that Temporal itself is reliable and fast and keeps running and that kind of thing, right? - Exactly, yes. And I think it does also come down to, and this is one of the interesting lessons that I've learned throughout my career, is that like, what is your core business? Is your core business to run and maintain a temporal service or is your core business to provide a service to your customers?

And whenever the price of running your own services outweighs what you could have just paid someone else to do it for, then at that point, you have to take a look at something and go, maybe I should just pay the provider who has the expertise, who literally all they do all day long. Because usually someone's SRE team doesn't doesn't full time dedicate to learning the ins and outs of a single product. Like I was an SRE. I managed a cloud platform for VRBO for Virgo.

I knew there was like 12 different applications in that stack. And the way that I learned about each one was the one that went down that day. So you're constantly like figuring out with like you're learning about it as it's on fire. It's a terrible way to learn. And that's not a great way to run it to to like live your life.

i found right why i moved into developer education um exactly there's fewer uh midnight calls in developer education that is the primary reason why i do this now is i i like there is a point when the p when pager duty goes off like one too many times and you become a developer advocate and i'm living proof of that you just know yeah you take it off a space style you just take it out put on some uh gangster music just exactly right exactly if you don't know the reference you need to make

sure you put watching office space yeah the movie right i've driven by the list i've driven by the area where it was filmed it was filmed out here in austin where i live was it incredible yes incredible okay so why did i sort of start in that path what i want to talk about is what is so you guys have a cloud version which obviously does stuff in massive scale you talk about a fleet of these things. And by the way, at the execution level, we're probably mostly talking Go and Rust,

just so people know, it's kind of like fast, fast innards. But Rakesh asks, like, it seems to be resource insensitive. Is there a lightweight option? Like what is a, what is a minimal temporal rather than a, you know, what Uber does or what, you know, ChatGPT does? Because those things, you just can't compare them, right? It's like, you're not Microsoft, you're not Google, you're not LinkedIn, you're not Netflix. Don't try to be them most. Yeah, it depends on what you're

trying to accomplish. So I mean, like, I think when it comes to so there's, there's always there's the service and then there's the worker fleet. So let's talk. I'll talk about the service first, and then we'll talk about the workers. So the service comes in a handful of different flavored offerings. There's a local development binary that is meant to be used for development. And that's what I use on my local laptop all the time. It's also more than suitable for home lab stuff.

So if you're wanting to play with this in your home lab, use the use that single binary. It can it's got an in memory data store, but you can have it persist to it to a SQLite database will get you very, very far on home lab stuff. Non prod use cases. Totally fine.

Speaker 1

SQLite is underappreciated. People see it as like a toy thing or like I can use this while I'm developing like a sample, but then I don't really like you can put millions of records in

SQLite. You can go a very long way. Speaker 1: Oh yeah. Yeah, it is. Yeah, it is an amazing tool it's one of my favorite tools um and then there so in reality what the way that the temporal service is actually built is it's actually just a single binary um but the thing is there are the temporal service is a conglomerate when i say the service i mean like all of the things together it's a conglomerate of multiple microservices that when you put them

together they they connect with they interact with each other and there's like four back-end services no three services and a front-end service then there's like a web ui and that's the service but but then you also have to build in a data storage. So for when you get to that point, that's when you either need MySQL, Postgres, or Cassandra. You could probably get away with SQLite on that one, but I've never actually tried it. But we recommend MySQL, Postgres, or Cassandra.

And then you can add in other things like Elastic for visibility. You can add Prometheus and Grafana for being able to see. But this is when you're starting to scale up. So if you were doing this on small, itty-bitty scale, you could probably deploy it onto, say, a DigitalOcean droplet and be fine with a single binary. We have tutorials on our learn site on exactly how to do this. And then it scales. So like there's a Docker Compose file.

So like once you want to see what all these microservices are doing, like in multiple areas, like you can see how we've deployed all of them and you can play it. There's multiple different Docker Compose files to be like, oh, I want it with this database and these options. And that allows you to kind of like tune it and tweak it to your liking. And then once you get to prod scale, we have Helm charts that we support and you can deploy it directly into Kubernetes.

Now, if you're self-hosting in production, Kubernetes tends to be what we see in the wild as the most popular deployment. Now, again, these are single binaries. You can deploy it however you want. So it has a, from development to prod, it has a very nice scaling story. Or the other option is just go to Temporal Cloud. The thing is Temporal Cloud has a minimum pricing structure for it, which we're constantly updating.

And that's really useful once you get to actual production use cases and you have people paying you use that money to pay for the temporal cloud. It's not a lot. The temporal cloud pricing is super, I think it's really super interestingly well done because it's based on what we call actions. So like anything that basically causes a right to that durable database, you get billed on like fractions of a penny. So it's consumption based.

The main thing is making sure you have enough traffic to cover, basically there's a minimum support cost that you have to pay for like a minimum bill requirement and then you get billed on the actions and making sure you have enough traffic and workload to handle that. I know, and then you get a lot of other cool features in cloud that, I have to make sure I say this carefully, that you don't get on the open source version, but those are all deployment options.

So when it comes to feature to feature option, as right now, it's a one-to-one comparison. What runs in open source is what is deployed in cloud. Like, if I built all of my stuff on the open source stuff, I deployed to cloud on occasion to test it, but I'm constantly building on the open source version And all of my stuff runs in cloud. What you get when you go to cloud is you get those production features. You get single sign-on. You get role-based access control.

You get more advanced metrics and things of that nature. You get all of these things that really matter at a large-scale production, things that large enterprises care about. And there are ways you can get those in the open source as well, but they come as first-class citizens in the cloud product.

Maybe you got to run your own single sign-on identity server or something like that yeah yeah um so over here if i go to the temporal github repo there's docker which has got a docker compose think something yeah there's actually a docker repo repo yeah is there okay so i could even i think it would be like temporal and then doc like in the search i would just search or in the org i would search for docker compose yeah it might be there somewhere

uh there we go are you right yeah i'll put that in the links as well i'm a huge fan of docker compose. I think it opens up a bunch of great options. And if you go in here, you can see that there's like the doctor compose my sequel to Docker compose postgres, and then just a bare bones one, right? So if you wanted to self host it, is this a pretty good way to go? Yeah, I think it's a great place to start. I think it always depends on your scale.

Like if you if you are good at Docker compose, and you know, you think like, the thing with Docker compose is like, and I believe this is all going to deploy it on a single service. So is like our single server. So like we always talk about like, you know, depends on the level of reliability you need of the service because if the database of the service goes away, then your reliability goes away. Like, you know, but that's kind of like the truth here is with that database being the single source

of truth, if that database magically disappears, you don't have it. Now you could, you know, what you could do like an RDS or like a digital ocean managed database and connect that in this. Yeah, run just basically set a room, an off server database connection. string. Yeah, exactly. Right. Of any form that's then, then it's matching the reliability of that

database. Yeah, exactly. Yeah. Yeah. So another thing, whenever I hear durable, and actually I see a bit about this in the show notes as well, and you say retry, like if something goes wrong, which we tried and it'll probably resume. Like sometimes it won't. Sometimes there's a poison, what's called a poison message. Like it's cursed to always fail for whatever reason. Like it's trying to get to a service. The domain is gone or it's not coming back. Right. Yep. One of those

sorts of things. Three months ago, it was there. Then you said sleep for three months and it woke up and said, wait, it's like Planet of the Apes. You're like, yeah, what have they done? You wake up, you're like, the world is not what I thought, right? How do you deal with that kind of stuff? Yeah. So in the retry policy, so the default retry policy, what is, is it, its default is just retry on a specific time limit forever until it's either canceled or it succeeds. Now, if you expect that

something could happen like that. Essentially what you would do is there's something what's known or non-retriable error types. So certain HTTP error codes, you know, might, you might would be like, Hey, you know, this is just not going to come back. 500 is fine, but maybe 404 is like 404 might never come back, but 500 very likely could or cannot connect. I don't know. Is that

404? I don't think so. I think it does. I don't remember. Yeah. I remember, but there's probably a code that like i i can't connect to the server versus like i got there and it said not here yeah so there's non-retriable error codes and i mean like what we do with the core temporal primitives is more of like um i often tell people it's like yeah like things that can retry let them retry but like say i do a divide by zero error no amount of retries is going to change the laws of physics

um you know it's like holding your breath until something changes you're just going to pass out um same thing with these retries so you have to have what are called non-retriable errors and you essentially say, hey, whenever you experience this error, you should just fail. And basically, you would bubble it up and then you have to let the next layer of execution handle it. So yeah, totally a good story for that. Okay. Yeah. But you kind of got to think a little bit

about it, right? Just yes. Yes. You still have to think it doesn't take away all the thinking for when it comes to like, what potentially could go wrong, but at least like with all of like the weird, like, you know, okay, this service might go down DNS, someone messed with DNS. It's always DNS. I had I had an it's always DNS moment the other day. And I'm like, I need a sticker for my laptop says this because it got me again. I can't remember what took me out.

I suffered something with that as well. And it wasn't even directly me, but it was something that I was subjected to. Yeah. It is always DNS. It's always DNS. Except for when it's the database. Yeah. Or except for when you accidentally deploy the walrus operator on Python 3.7 on the server. That also is not good. And it's not doing, which I took down Talk Python for 20 minutes. I'm like, what? Why won't it run? It was running perfect.

Oh, this was like, you know, years ago when those things were like new, right? Yeah. Oh my goodness. So one thing I want to talk about here, let me get back to, that's the right spot. Let's talk about the programming execution model. Let's see, is there a quick start? Let's maybe, let's talk through the quick start. And I think there's just some super interesting modern Python ideas that you got here, right? So maybe talk us through like, how do we get started bringing this in?

Does it have to be a from scratch? Or can I take some part of my application, like a particular API endpoint and go, this thing needs, needs some help. And Temporal can help it. Let's just plug it in on this one bit. That's actually how we recommend a lot of people get started with Temporal is like, we don't, we never tell people like do like a whole ground up rewrite of like everything you have.

We, I've often told people find something that, like find, find a service that annoys you, that pages you because it's constantly going down because it's unreliable and maybe do a like it's a small service and do a small rewrite of that in temporal and just see how your life makes uh makes a difference so the way that you do it with temporal is like you have to use the temporal sdks so the way that you build temporal applications is you build them using sdks

um typically with other workflow-esque engines or other durable execution engines um some of the more modern durable execution engines have kind of followed suit with us but some of the earlier ones didn't we're code-based um we're not dag based we're not yaml based we don't have our own structured DSL, we are 100% code based. And there's a lot of advantages to that. So not XML based. No, no XML. So yeah, so we build you build basically what's called a workflow. Workflow is you can kind

of think of a workflow is like your application. It's it's the it's the blueprint of your entire application. And what we'd say about workflows is that they have to be deterministic, like the code within a workflow must be deterministic, and you execute it line by line going down. And then anything within your workflow that potentially could be non-deterministic or could potentially fail. So calling out to a microservice is both of those. It's non-deterministic

because you don't know what you're going to get back. And it's potentially going to fail because the service could be down. That goes in what's called an activity. An activity is that thing that automatically gets the retries. Now you implement activities as functions or as methods in Python. You can actually do them as both. And as you were saying in the readme, yeah, we use a lot of Python tooling. So that's actually something that I think that our SDK engineers, we have an entire

team whose entire job is to build and maintain our SDKs. They're super proud of. It's one of the things I love talking about. Temporal is not built off of what you would say like an open API spec for a good reason. So it's built basically, I mean, Temporal itself is built in Go. So as you can imagine, everything's built in protobufs. There is a spec on what we build off of, but open API specs generate stuck stub client libraries. And like I've worked with a handful of them. They don't,

they're not very idiomatic to the language. Like it kind of looks like someone bolted a C library on top of Python. Like it works, but it doesn't feel like Python. Our SDK engineers spend months studying the programming language, learning out what is idiomatic of the language, what actually makes sense. And then they build into it. So the interesting thing about this is like, you can see here when we define our decorate, our workflows, we define them by decorating classes. And then we

the entry point by decorating it with an at workflow.run. So that's how we know where the entry point is. We're using asyncio inside of this to do it. Our activities, when you want to turn a method into an activity, you are a function in an activity, you just decorate it at activity.defin. And now you have an activity. We've done a lot of those kinds of things. We're using context

managers very, very a lot or, you know, rigorously within the code base. The really interesting thing is, and I can only talk a teensy bit about this because it's super complex, but there's a really great talk track from it at the PyTexas conference from this year. We built a custom async event loop for this. This runs in a durable asynchronous event loop. And the SDK engineer who built it gave a talk at PyTexas this year, and it's on the PyTexas 2025 website. And I can provide a link to

that later. And it's really neat because essentially we had to build a custom event loop to handle all of the way that we expect Temporal to work. And I don't think that building a custom event loop is a very common thing for people to do. So there's a lot of lessons learned there. I think there should be more of it though. I think it's a really interesting story. Yeah, I'll put this in the show notes. Yeah, that's Chad's talk. It's amazing.

So like he talks about all the things that he had to do when he was building out a custom event loop for Temporal's AsyncIO event loop. Yeah, so for people for whom this is super new, this idea. You need places in your code to execute to say, okay, we can stop, maybe save some state. And then you talked about like replaying behaviors and so on. So you can write code like await sleep a day, you know, asyncio dot sleep one day. And that goes into this, as you said, durable asyncio

execution. So instead of just saying, well, we're going to let the thread do other stuff. It's like, no, save it and then resume it. Right. That's pretty wild. It's pretty great actually. Yeah. And I mean, those, those, and that's one of the great things about the workflows and like that, those you'd have to definitely write as workflows, but yeah. And it takes up no, no energy or it doesn't jam up a CPU thread by sleeping.

Like it's usually if you sleep something, that thread is kind of sitting there and it's stuck. this a hundred percent because it's all event sourced under the hood, essentially. the event, basically the timer started event gets recorded into the service and then it gets descheduled. And then that worker, that, that executor can continue performing other tasks. And then once that timer fires, the next task for it to continue execution gets put on the task queue.

The worker consumes it, knows, oh, I need to resume. And it resumes. And that can happen whenever. That can happen, you know, a day from now, three days from now, three months from now. It doesn't matter. Eventually it gets put on the queue and it gets executed as if nothing happened. Yeah, that's wild. So interesting question out of the audience. If people went with your cloud option, where does that run? Is that AWS? Is that DigitalOcean? Currently it's in AWS and GCP.

Okay. So those are the two clouds. Pick different availability zones. Like if I were in Virginia, US East one or whatever, I could say I want to use that one. Exactly. There's different availability zones and we even have multi-region failover. So if you needed multi-region availability for super, super high availability, which we do have definitely have large customers who need that level of availability. And you can totally do that. Cool. And that doesn't mean you got to be in AWS.

No, no. We have people who are running on their own private, they were running on their own infrastructure on their on-prem and they're calling into it. So that's the niftiest thing about temporal cloud is like the security model is super simple because temporal cloud or the temporal service. It doesn't have to be temporal cloud. It's the service, whether you self-host it or whether it's on the cloud. It never connects to you.

The workers always do an outbound connection into the temporal service. So if you're using temporal cloud, the only firewall rule you have to allow is a single outbound connection off of a single port. That's it. So it's really awesome. Yeah, I'm impressed with a lot of this. So let's sort of kind of get into that. And I want to maybe close out with two topics, testing and what's next, or like sort of also maybe multi-language stuff we should touch on as well. But let's talk testing first.

So there's a couple of interesting challenges here. Like first, if I write code like this and I want to have a unit test, no longer am I like, well, this is complicated. I just have to use pytestAsyncIO. It's like more than that, right? So what's the testing story? Yeah, so the cool thing about the testing story with this is it is technically still pytestAsyncIO.

because we're code-based, this is one of the things that I always have to harp, to, to remind people on my courses and everyone's always like, oh, that's so cool. you, because we're code, you have to give up none of your tooling. You don't have to get like, how do you like to package it? Are you a poetry person? Are you a B person? Are you a Pippin person? Whatever you want to use, use it for your testing. Do you want to use a Po the poet? Are you a pie test person? Like, what are you using?

Use it. It doesn't matter. Temporal does provide, obviously because these are more complex workflows and stuff. they require a little bit. Every single temporal SDK does provide a testing framework that is built into it. And these integrate natively with all of your testing frameworks. So you can use, you use

pytest. I use pytest. Now you could use something else, but all of our courses, I think the temporal 102 course, yeah, I wrote that one, has a whole chapter on testing temporal workflows and activities and mocking activities because you have to be able to call them independently because you, Otherwise that's an integration test. It's not a unit test. It is important. I always feel bad about mocking the code. It doesn't seem nice, but you got to. Yeah, it's super easy. It's so nice.

The you you basically just read you basically redefine the activity and I just put it in the mocking story in Python is so nice. But you just use our testing framework. And what that does is it basically the testing framework will automatically spin up a temporal service and a worker for you. So because as you can imagine, like the execution part of this is like, you have to have a service running, you have to have a worker and then you have to send like basically use a client

to send a request to execute the workflow and it will be executed. The temporal testing service does all of that for you. And it also has basically time skipping. So what you can do is, you know, if I have to test, if I have to test something that sleeps for 30 days, I would actually prefer it to not actually sleep for 30 days. So I can test it. Well, I see I take so long. Yeah. So

time skipping is in there and there's a lot of other really neat features in the testing. So every single temporal Python or sorry, sorry, every single temporal SDK has a testing framework built into it that really well enables testing. And it all works natively with the tools that you are already used to using as a developer of whatever language you're already using. Yeah, that sounds great. And you can say things like run this and wait for a week and then resume.

And you can just say, and now a week has resumed. You know, a week has passed. Now I see what's happened, right? Like it'll just zip right ahead. Yep. Skips right ahead. Doesn't even, doesn't even worry about it. That's always a fun testing trick. So the other one I want to talk about is when I'm over here, into the GitHub repo, I can scroll down. If I look at the repositories, it's Java SDK, Python SDK, Go SDK,.NET SDK, Ruby SDK, et cetera, et cetera. There's a bunch of interesting things here.

Like I can write part of my app in Java, part of the workflow in Java or the workflow items in Java, the queues, and I can do part of it in Python or.NET. And the other part I think is interesting is Python,.NET, Ruby, et cetera, all share a common Rust base. And so it's kind of like they all go in lockstep. Yeah. Yeah. Yeah. So yeah, two great topics there. So the first one I'll start off with is the is the

polyglot stuff, because I think it's one of my favorite things. And we don't I don't I don't ever get to talk about it enough. So I'm glad you asked. Underneath the hood, the way that all of this communication happens is it's happening via essentially protobuffs across task cubes to the temporal service back and forth. One of the things that you'll find if you dig into temporal is that we require your input, the inputs and outputs of your functions to be serializable to basically

protobufs. Now, if you have something that's not serializable, we obviously, because it's code, we provide you the way to extend the serializer. So as long as you can serialize it, again, it's code, you can do whatever you want with it. But because of that, because everything speaks protobuf, all of these languages can natively speak to each other. So you're right, I can write workflows written in Python and call and have three different activities in that workflow,

one written in TypeScript, one written in Java and one written in.NET. And I can call them seamlessly, like basically just by calling execute activity, giving it the name of the function. And then I could actually still then pass in a data class of the data that I have as that parameter, because it's still getting serialized down into a protobuf. It will get deserialized into the other language, execute it and pass back data that I can resume. And then I could call it

technically from a client that's written in Go. So this enables Polyglot across all of these languages. And it's amazing. So if you have legacy systems or you have stuff where like you really need something to be written in a whole bunch of different languages, it just gives you this out of the box for free. And I think it's one of the neatest features. And one of the other part that it really does need about this is it's not just like the fact that it can call it, but it also

preserves the stack traces as well. So one of the other courses that I developed are Crafting the handling strategy course. There's a demo in there where I am showing like basically that exact workflow, a Go client that's calling a Java workflow that's calling a Python activity. So three different languages. And then I intentionally throw an error in the Python

activity and I tell it, do not handle it, do not retry it, let it bubble up. And when I get back to the Go client, I can see the different stack traces in the different languages all the way through. So I get a go basically panic that contains a Java stack trace that contains a, that contains a Python stack trace. And I can, I can, I can see all of this across the line.

So not, and, and also the thing, and just to do it for fun, because I like showing off, I have all of these workers running on different machines. So I am, I am running on different, I am crossing process boundaries. I'm crossing literally across the network IP boundaries, and then I'm crossing language boundaries and it happens seamlessly and you'd never know that it happened. So the orchestration layer is the wildest thing ever. And then you asked about the Rust SDK. That's a fun one.

So that kind of goes back into the history a little bit of how Temporal was built. And for a crash course in this within two minutes or less, essentially our founders started at AWS together and built out what would become, they built the foundations for SQS and what would become simple workflow service. Then one of the founders left, went to Azure and helped build the Azure Durable Task Framework at Microsoft. They met up back together at Uber and built Uber's Cadence.

Cadence was then basically like battle tested for four years, open sourced, and then they got permission to fork it and build Temporal. So Temporal, the company is six years old, but it was a four year old open source project prior. So it's a 10 year old open source project, essentially. But because of that, what happened at Cadence was I think they wrote the Go and the Java SDKs there. So those are very uniquely themselves because they were written independently.

And then the PHP SDK is its own story. Someone wrote that in the community because they really wanted it. And it kind of follows its own rules. But when they started building the temporal SDKs, TypeScript was the first one and then Python, if I remember correctly. They wanted a common core, like basically re-implementing this. Because in these SDKs, there are very complex state machines that maintain all of this state of what's going on.

And they did not want to keep re-implementing this every single time. So they built a Rust core SDK, or it's not even an SDK. It's not an SDK. It's just the core. And all of the, so the TypeScript, the.NET, the Python, and the Ruby SDKs all have a upper level SDK that wraps this core Rust SDK and call into it. So they share a common theme. So there definitely will sometimes be features or like things that happen in the Go or the Java SDK that you're like,

that's a little different because those are not based on Rust core. But yeah, that's how they all call in. So like their PIO3 is basically being used here. We're calling into pot with PIO3 into a rust binding. and that's like a Pydantic and others. Yeah, exactly. Yeah. So, and it's, it's really cool. And that, that makes, adding new SDKs a lot easier because really, and truly the hardest part of building the SDKs is, was like those state machines used to take

a long time. And once they got it figured out on the rust core side, it made adding new languages easier. the Ruby SDK is in public preview and will be going generally available here soon. Um, And there may be one or two more SDKs coming out within the future. If you guessed really hard, you could figure out what it is. There's an SDK. There's a core that doesn't have an SDK. There's no secret. There's no secret about that. Yeah, of course.

People have been begging for that for years and it's obvious. So yeah, may involve crates. Okay. So what's, what is the, what's the future? Like anything that's worth giving a shout out that's coming or that kind of stuff? Yeah. I mean, I think that like a lot of times people often ask me like, what is Temporal used for? And I would say Temporal is used for anything that like you don't want your code to fail.

It like it's it's really interesting to help educate people and work on a product that really does affect nearly every single part of the application of the software development lifecycle and every single part of it, of, of, of the industry. You know, I was used to working on other products that like, yeah, like I worked at a synthetic data company for a little bit. And that had a very niche area.

And then I worked at DigitalOcean, which was cloud products, which is still awesome, but like doesn't affect everything. Temporal really does like, are you doing finance? Temporal is great for long running transactions. Are you doing food delivery? Like, are you a fast food industry? Are you doing groceries? Are you doing, what are you doing? Temporal can benefit from it. So there's a lot of really cool things. You can use it for anything.

And what we're seeing right now, specifically, and this kind of alludes back to your open AI thing earlier, is that we're seeing a lot of AI companies of value out of this because like when it becomes time to take your agents to production, there's a handful of decent production stories out there, but it turns out these are AI agents in production. This is microservices in production with a fancy label on top of it. These are just distributed

systems. Not only are they microservices, they're very slow, long running microservices, which make it harder. Yeah. Yeah. That's exactly what Temporal's model is. Do you have a slow running microservice that can sometimes fail. Great. We have a product that makes all those problems go away. So, you know, like I'll, I'm working on a lot of content right now around showing the benefit of temporal in AI. and we have, we have a handful of customers, who I

can't talk about, that are, using us for lots of different AI related things. but there, I mean, you can look on our blog or anything. You can see tons of people that are using it. and it's a really cool thing. So I would definitely say like, if you're, if you're trying to take AI to production, you should be looking into temporal. it's not an AI tool, you know, like we're not, we're not going to like, we're not going to do the thing that every company did is

we're not going to suddenly pivot and become an AI tool. because we're not, we just, yeah, we solve everything. And AI is one of the great things we solve. So that's awesome. Yeah. You're not going to vibe code with temporal. Maybe you vibe code, temporal code, but not with temporal.

No, I've, I've, I've actually vibe coded a handful of temporal things. And it's interesting because like i'm super picky uh about what what people's temporal code looks like as because i've been teaching people best practices for three years almost three years now and i i'm like no vibe coding that's no claude that's wrong like no no no cursor that's wrong like you can't do that so and the interesting thing about that is the way that i'm looking at it's like oh i need to make

more content about this because yes because the llms are not like it's actually funny every now and then the llms spit out some of my content um and i can tell when it's my content because i know So I write comments in a very particular way. And I'm like, oh, okay. So what that ends up telling me is, oh, I need to make more content around this because we're still not vibe coding at 100% capacity. Yeah. Yeah. Yeah. That's a whole discussion we could go down.

You know, I was totally wrong when we started. I said we could talk for two hours. I think it's three to four. Yeah. We got so much more we could talk about, but there's only so much time we can dedicate to each episode. So let's go ahead and call it. I say, you know, thanks for being here. The more I looked into this, this is a super interesting product. And there's a lot of neat Python integrations like you program it with async and await rather than some funky SDK bolt-on thing.

So people should definitely check it out. Final call to action. They're interested. What do you tell them? Check out, just check out the website. Check out temporal.io or the learn site, learn.temporal.io. It's a great place to get started. You can install Temporal by just running brew install temporal on your Mac, or there's commands for Windows and Linux as well. Curl commands for that. Or Docker compose up. Or Docker compose up. If you want to do that, totally can do that.

Just try it out, build a workflow. And then what I tell people is try to break it. Like start a workflow, kill the worker, bring it back online. Like I think it's really magical when you first actually try to like actually break the software and stuff. We've had multiple people that have taken jobs here who have said, I started playing with it. I tried to break it. And when I couldn't, I decided to apply for a job here. So try to break it. See what you can do. And you'll be amazed by it.

Just a personal anecdote. I remember when I applied here, I was reading through their docs. And I told myself, I was like, if they can do half of the things they claim they can do in the docs, this is revolutionary. I've never seen anything like this. And it turns out we do all the things we say in our docs. It's probably the most interesting tech product I've ever worked with in my career. And I know that I will be working with it probably for the rest of my career.

It fascinates me and I love playing with it. Like I build temporal applications at home for fun just because it's like, oh, look, I don't have to worry about someone's API going down anymore. Yay. Yeah. It's awesome. So I hope you enjoy it as much as I do. I'm pretty impressed. All right. Well, thanks for being on the show. Thanks for coming on and sharing everything. Yeah, it was great. Great to talk with you. Yeah, you as well. Bye bye. This has been another episode of Talk Python To Me.

Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. This episode is sponsored by Posit Connect from the makers of Shiny. Publish, share, and deploy all of your data projects that you're creating using Python. Streamlit, Dash, Shiny, Bokeh, FastAPI, Flask, Quarto, Reports, Dashboards, and APIs. Posit Connect supports all of them. Try Posit Connect for free by going to talkpython.fm/posit, P-O-S-I-T.

The PyBay Conference is returning to the UCSF Mission Bay Conference Center in San Francisco, California on October 18th, 2025. Get your ticket and pick up a free conference course bundle from Talk Python. Get started at talkpython.fm/pybay. Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight.

Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. We're live streaming most of our recordings these days.

If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android