#512: Building a JIT Compiler for CPython - podcast episode cover

#512: Building a JIT Compiler for CPython

Jul 02, 20251 hr 8 minEp. 512
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Do you like to dive into the details and intricacies of how Python executes and how we can optimize it? Well, do I have an episode for you. We welcome back Brandt Bucher to give us an update on the upcoming JIT compiler for Python and why it differs from JITs for languages such as C# and Java.

Episode sponsors

Posit
Talk Python Courses

Links from the show Brandt Bucher: github.com/brandtbucher

PyCon Talk: What they don't tell you about building a JIT compiler for CPython: youtube.com
Specializing, Adaptive Interpreter Episode: talkpython.fm
Watch this episode on YouTube: youtube.com
Episode #512 deep-dive: talkpython.fm/512
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Transcript

Do you like to dive into the details and intricacy of how Python executes and how we can optimize it? Well, do I have an episode for you? We welcome back Brant Bucher to give us an update on the upcoming JIT compiler for Python and why it differs from JITs for languages such as C# and Java. This is Talk Python To Me, episode 512, recorded May 27th, 2025. five. Welcome

to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both accounts over at Fostadon.org, and keep up with the show and listen to over nine years of episodes at talkpython.fm. If you want to be part of our live episodes, you can find the live streams over on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube and get notified about upcoming shows. This episode is brought to you by Sentry. Don't let those errors go unnoticed. Use Sentry like we do here at Talk Python.

sign up at talkpython.fm/century. Brent, welcome back to Talk Python To Me. Great to have you here. Thanks for having me again. Yeah, nice to see you. Believe it or not, we're going to talk about Python performance again. That'll be fun. Yeah. You've been doing a lot

with it lately. It's awesome. Yes, quite a bit, quite a bit. You've been working on the faster CPython initiative, but especially you've been working on basically having Python, the interpreter, rewrite instructions to make it faster. And previously you talked about, you were on the show to talk about the specializing adaptive interpreter. And now we're going to talk about a JIT. So maybe before we dive into it, give us a quick introduction on who you are, what you do, and then we can sort of set the stage for that.

Sure, sure. So my name is Brant Bucher. I am a Python core developer. I've been working on Python in some capacity for the last six years or so. I have a smattering of peps from kind of just over that time period. Everything from Adding union operators to dictionaries, to changing the grammar for decorators, to adding structural pattern matching, and now kind of building towards JIT compilation, amongst other things.

And so for the last four years, I've been doing that full time at Microsoft, where I've been part of the Faster CPython project over there. And basically for the last two years, as part of that work, I've been mostly focused on landing a JIT compiler, just-in-time compiler in CPython, the reference of the limitation of Python that most people are actually using when they type Python in the command line. Yeah. Yeah. Yeah. Awesome.

Well, thanks for all that work because Python's gotten a lot faster. Yes. Yeah. Over the last four years, it's gotten something like 50% faster overall. And a lot of that, like our goal is to be totally transparent with that. And so like you don't need to make any code changes for your code to speed up. The idea is you just upgrade your version of Python. Nothing breaks, ideally. And you're just able to run faster and pay less for your cloud bill or do more or whatever.

Buy a smaller server, something like that. Yep, exactly. And so that's been like really, really cool to see that playing out. And I mean, that 50% is just kind of like an average number. But like, for example, and I had mentioned this at PyCon where I just gave a talk, but for example, real world workloads like Pilot, for example, same version of Pilot runs over 100% faster on newer versions of Python than it did like four years ago, which is really, really cool to see.

Yeah, that's awesome. And that's not necessarily because they change anything. It's just. Nope. In fact, we pin the exact same version of Pilot in the benchmark just because we want to avoid having them change stuff and then us trying to figure out what went wrong with our benchmark. Yeah, yeah. Did it go slower because they added this feature or whatever? Yeah. Yeah, exactly. So 50% faster on average obviously depends on your workload. But I think that that is a big change.

You're talking CPU. What about memory? Do some of these features come at trading memory for CPU time or things like that? A bit of memory. I'm fuzzy on the exact numbers. But when we first added the specializing adaptive interpreter, which was how we got a nice 25% boost in 3.11, that changed the size of the bytecode slightly. And so we used inline caches and basically some scratch space inside of the bytecode itself, which did increase memory a bit.

But that's more of like a fixed cost per kind of the function that you have. Basically, every function gets a little bit bigger, but it's not like we're attaching all sorts of metadata to your actual data. So for smaller programs that just run a few functions and don't have much data, maybe you'll see a proportionally larger increase. But if you're actually working with large amounts of data where memory pressure is actually a concern, you'll probably won't notice anything at all. Yeah, exactly.

A lot of that stuff ends up in C, some C layer anyway, right? Especially on the data science side. But you're not doing something like the pi object, the C thing itself, right? It's not like now they've got a ref count, now it's got some other flag on every object you create. Well, so, I mean, we do take kind of a holistic approach to how we're improving performance. And so it's not just things in the interpreter itself. a lot of it is changing the representation of objects and stuff.

So if you're asking if we're making breaking changes to C extensions, that's not the case. But we have changed sort of how your Python objects are implemented under the hood pretty substantially. So like, for example, in recent versions of Python, and this was actually the free-threaded team at Meta that made some of these changes, which is pretty cool, but they just do less ref count in the interpreter.

If we can prove that we don't need to increment the reference count and decrement it all over the place, then that work just doesn't happen. But kind of more substantially, we've also made changes to the way that instance dictionaries work. So in older versions of Python, every object actually had a full dictionary hanging off of it. You find that dunder dict, right? Exactly.

Yep. And so kind of one thing that is fairly obvious to most Python programmers is that that dunder dict isn't actually accessed all that often. In fact, it's very rarely accessed. And so what we do in recent versions of Python is that rather than creating that dictionary, we do create the full dictionary if you ask for it. But in the case where you don't ask for it, we literally just put the values of the dictionary in the object itself. So it's very similar to how slots work.

So most Python objects, if you don't actually ask for the dunder dict, you basically get a slotted instance, which is really neat. It's great for memory and it's great for all sorts of optimizations that we're doing under the hood as well. And, okay, so what happens if for a while you don't access the Dundedict and then all of a sudden you do? Is it get dynamically generated and you pay a little price for that?

Yes, but we still, so, I mean, we could get really into the weeds with this because I love geeking out about this stuff and the implementation details and stuff. Basically, when you do ask for that dictionary, it will create a dictionary, but the dictionary will just be two pointers. One is to the keys of the dictionary, which are on the class and are shared by all instances. And another pointer, which is to the values of the dictionary, which still live on the object in those slots.

And so even when you ask for the dictionary, it's a very lightweight thing. And then it's only once you start actually like messing with the dictionary. Like, for example, if you add a ton of attributes and there's not space on the instance anymore, or if you start adding like non-string keys or weird things like that, then we actually materialize a full dictionary and copy everything over.

But we try very, very hard to avoid actually creating dictionaries, because it's a very heavy thing for something that should be as light as instance attributes. Yeah, it makes a massive difference whether you have slots or not, actually. And if you have a million entries of a class with five attributes, you don't want to have the names of those attributes repeated a million times, just the data. Exactly. Like some of the memory savings are massive.

And like I said, it just makes things easier for us under the hood because we don't need to chase pointers around just to look up an attribute out of a dictionary, which is a very common thing to do. Yeah, it's very common. Yes. And another thing that, you know, I'm sure some people listening, I know that you know, but one of the other considerations is sort of cache behaviors like L1, L2 cache behaviors of accessing data.

And the more spread out it is, the more you start breaking the cache locality of things. And yeah, it's not great. Yeah, exactly. Before, if we wanted to get something out of the dictionary, we needed to follow a pointer to the object and follow a pointer to its dictionary, then follow a pointer to the values and follow a pointer to the thing that we're getting out.

So now it's just, we already have the object, get an offset into its values and fish out the object and increment its reference count. That's awesome. When did that come out? I think that was 312-ish, something like that. Mark Shannon gave a talk at PyCon, I think it was last year, talking about some of these improvements, both memory and like you were saying, like cache locality and all that. Excellent. There's many little roads and paths we could take to go into the weeds and the details.

I could talk about this stuff for 24 hours if you have the time. Maybe we'll do a 24-hour live stream on JIT sometime, but maybe not just today. I know we're going to get into it, but just sort of looking forward. What's the memory story with the JIT stuff? Yeah, we've actually... Significant or is it pretty cheap?

It's pretty cheap because, like I said, a lot of the cost that you're paying is proportional to the amount of code that you're executing, not proportional to the amount of data that you're executing it on. So again, a lot of the situations where you have this sort of memory pressure are going to be when you're working with large amounts of data. And with that, you probably won't notice it too much.

For our benchmark suite, which has a high ratio code to data, meaning very little data, but a lot of code in our experience, the overhead is something in the ballpark of 1% to 2%. It used to be higher, like 5% to 10%-ish, but one of the core devs, Savannah Ostrovsky, or Savannah Bailey now, she actually just got married, It worked on at the last Python core dev sprint, getting that number down.

And so that's something we've been keeping an eye on, but it's not too huge of a concern right now just because it's in that kind of negligible zone. Probably as we start doing smarter things in the JIT where we need to actually keep a lot of metadata around to be able to reconstruct the world when we exit the JIT, that's where kind of a lot of that memory overhead can come from. But I'm not too concerned about it right now and users shouldn't be either, at least at this stage.

Okay, I'm resisting going into the details for that one. But we're going to be back to it. It's hard, yeah. Yeah, it's amazing. It's so interesting. And I am, just before we get into the details, I am so excited for a JIT for Python. I think this is fantastic. It opens up so many optimizations and interesting performance boosts. Yeah, yeah, you can do a lot of cool things. I mean, optimizing a dynamic language like Python is just sort of lying about what you're doing, right?

Like you're just avoiding doing things that the user won't notice. like these dynamic type checks and reference counting and not actually creating dictionaries behind people's backs and stuff. And so I think that JIT compiler gives us even more opportunities to leverage those sorts of cheating when we're executing your code. And so, yeah, it's something that we're excited about too. Plus, it's just really fun to work on. I do this because I love it. It's really interesting to me.

And so the fact that other people get to benefit from our work is really cool, to say the least. Yeah, for sure. We'll come back to it in a second. But I think one thing I just want to ask you about, since I know you were intimately involved with it, worked really closely with the team, is there's been some reorgs and layoffs and certain things at Microsoft that impacted some of the key folks working on faster CPython.

I don't want to necessarily dive into all their details. I'll let those lie, right? That's their situation. And people have talked about it in public, so whoever wants to know can figure it out. But I do want to just ask, like, what is the status or the future of the Faster CPython project? Does that mean it's frozen in time where it was last couple weeks ago? Or what's the story?

Yeah. So just on the layoffs themselves, like, as you probably heard, like, my team was very, very heavily affected by the kind of across the board layoffs that happened at Microsoft on the Tuesday before PyCon. And again, I won't discuss individual names or situations because it probably isn't the right forum. But me speaking personally, I mean, this sucked, like all layoffs sucked, but this one hit particularly hard because it was like the majority of the team that I work

on. And it also just kind of represented something more because what we were working on was really special and giving back to the community and all of that. And we had a lot of momentum, which is kind of tough to see that. Kind of zooming out, though, like part of the reason it sucks so much is because it was a really cool opportunity.

And it's something really, really special that you don't get to see that much, which is large companies funding open source development and funding it in a way where everyone actually gets to benefit from the work. Right. Like Microsoft took a chance. And at one point, I mean, they're paying seven full time engineers to work on this stuff, which is not like a negligible amount of resources. Right. So one thing I want to emphasize is like, yeah, like I don't agree with the decision.

to, you know, impact our team the way they did. And obviously, like, these are all very, very smart, talented people. So if you get the opportunity to hire them, you absolutely should, because they're some of the best Python and C programmers on the planet. But I don't want the takeaway to be that after the dust has settled on this, that Microsoft is some sort of villain, because, like, they're one of the only companies that are doing this sort of work at this scale.

And I want more companies doing that, not fewer. And so I think it's important that we encourage teams like ours to exist and not dwell on kind of the negative things that can happen after a lot of the work is completed. Yeah, I agree with that for sure. I mean, there's tons of other big tech companies and others who didn't fund an initiative at all. And Microsoft did for three years and it's made a huge difference, right? And I don't know, what do you feel about momentum?

You know, one of the challenges with these things is it just takes to start making progress or to refactor things to make it possible. It's just so much work. I know a lot of people have tried to make some of these changes just in their spare time and hardly win anywhere because it's still involved, right? There's been a team working on this for a while and still is to some degree, right? Just a little bit less.

Yeah. I mean, one really important thing about our team was we proved that the model works. Like 50% faster in four years is absolutely something to be proud of. And that's like a real impact that can be immediately felt, right?

And one thing that I think is important about funding these teams and funding full-time open source development, which we've been seeing more of kind of over the past couple of years, whether it's developers and residents or the work that Meta is doing on free threading, is that it allows us to take on larger projects that otherwise wouldn't really be feasible by one volunteer or small team of volunteers, right?

So being able to plan and execute on the timescale of months and years in kind of a coordinated effort is a really great opportunity. And I think we took full good advantage of that. And one thing that I think is equally important is not only making these big changes that can only be maintained by that team at that large company.

One thing that I think our team did really, really well is that we took the time to kind of weigh our priorities and make sure that whatever we came up with could be maintained by the community. Maybe it could only be developed and initially executed by a team of full-time core developers with the resources of a large company.

But going forward, everything that we've done is absolutely able to be maintained by both the volunteers that we are collaborating with and new people who are approaching it for the first time.

So we're already having discussions about sort of how to have community stewardship of the FasterCPython project going forward. And I mean, at the Python core developer sprints at PyCon, we're seeing a lot of people who have never worked at Microsoft helping to land new optimizations in the JIT and things like that. So we do have a lot of momentum. Obviously, this hurts it, but I think it is good to make sure that the community can sustain what we've built going forward. And I think we did a good job of that.

Yeah, excellent. Well, that's good to hear. From what I've seen on discuss.python.org or whatever the URL is, it seems like people just sort of figure out how to reorganize and keep going. It's not like, well, that was that. It's just, well, now how do we keep going, but with different structures and supports? Exactly. Yeah. Okay. Good to hear. So looking forward to Python being faster still. This portion of Talk Python To Me is brought to you by Sentry.

Over at Talk Python, Sentry has been incredibly valuable for tracking down errors in our web apps, our mobile apps, and other code that we run. I've told you the story how more than once I've learned that a user was encountering a bug through Sentry and then fixed the bug and let them know it was fixed before they contacted me. That's pretty incredible. Let me walk you through the few simple steps that you need to add error monitoring and distributed tracing to your Python web app.

Let's imagine we have a Flask app with a React front-end, and we want to make sure there are no errors during the checkout process for some e-commerce page. I don't know about you, but anytime money and payments are involved, I always get a little nervous writing code. We start by simply instrumenting the checkout flow. To do that, you enable distributed tracing and error monitoring in both your Flask backend and your React front-end.

Next, we want to make sure that you have enough context that the front-end and back-end actions can be correlated into a single request. So we enrich a Sentry span with data context. In your React checkout.jsx, you'd wrap the submit handler in a Sentry start span call. Then it's time to see the request live in a dashboard. We build a real-time Sentry dashboard.

You spin up one using span metrics to track key attributes like cart size, checkout duration, and so on, giving you one pain for both performance and error data. That's it. When an error happens, you open the error on Sentry and you get end-to-end request data and error tracebacks to easily spot what's going on. If your app and customers matter to you, you definitely want to set up Sentry like we have here at Talk Python.

Visit talkpython.fm/sentry and use the code TALKPYTHON, all caps, just one word. That's talkpython.fm/sentry, code TALKPYTHON. Thank you to Sentry for supporting the show. What do you think? I mean, I know you can't say exactly probably, but the JIT stuff is in, it's in beta format now, right? I mean, it's somewhat baked. Yeah, so basically in 3.13, you had the option of compiling Python with a JIT compiler. And the JIT compiler supports kind of all the most popular platforms.

So Window, macOS, Linux, on ARM, Intel, kind of. If you're using consumer hardware, then it works for you. And so in 3.13, if you were compiling your own Python, you had the option to actually compile the JIT as well when you're building it. And actually, a couple of different downstream distributors already started building the JIT, but just off by default. So I think that included Fedora for their 3.13 builds, and I think uv was doing it as well for everything except macOS.

And so basically, if you set the Python underscore JIT environment variable with any of those builds, it will go from disabled by default to enabled and you can try it out. In 3.14, basically, we got the JIT in a place where we felt it was stable enough and kind of ready for wider testing that the official macOS and Windows release binaries now include the JIT also disabled by default.

So if you go to python.org and you download 3.14 for either of those platforms, then basically you can set the python underscore jit environment variable and test it out for yourself. Again, I wouldn't necessarily use it in production, but we are interested in getting the feedback, whether it speeds up your code dramatically or slightly or no change or even makes it slower or leads to memory bloat or whatever like that. Okay. Yeah. Very interesting. So off by default, how do I turn it on?

Just the environment variable. So if you set the Python underscore jit environment variable, the jit will be enabled and it'll do its thing. Set it to what? True one? Oh, yeah. I think it works with anything, but I would just set it to one. Anything that's not a zero, I think. I would just set it to one. I think we checked for one. So I would set it to one, but I haven't actually, I forget. Yeah. I'm going to have to look at the code. Yeah. So let's talk about how it works.

But before we do, I can tell that the audience is ready to go into the weeds with us. All right. So Black LLM asks, Python integers are relatively slow. Does this affect JIT performance when working with integers or maybe reverse it? Like, how does the JIT affect integer performance and those kinds of things? And was asking, like, how does this work with PyObjects and so on? Yeah. So kind of there were two questions that flashed across the screen there.

One of them was about tagging pointers with reference counts, and then the other was about integers. So the integer question is kind of hinting at this optimizations that a lot of JIT compilers and VMs do called integer unboxing, which is the idea that instead of having a full-size arbitrary precision Python integer object that's reference counted and heap allocated, you instead, in the place of that pointer, you just store a single 64-bit value or 63-bit value.

And so that isn't something that the JIT does currently. It is actually something that in 3.14 we do in limited situations in the interpreter. Is that with a specialized interpreter? No, this is actually completely different. So basically in 3.14, Mark Shannon wanted to basically sort of prepare people, especially people who are poking around the internals, that we want to do integer unboxing in the future.

And so there are some situations where an exception is raised and we have basically a line number integer on the stack. And in those situations, what we're doing now is we're putting an unboxed integer. So it's kind of very low impact, like just making sure that, okay, this works and it won't break too much stuff when we start doing this kind of more widely.

But that is something that we do have planned for 3.15 is to start for integers where it fits, both in the JIT and possibly in the interpreter. We're not quite sure yet. storing those values directly in the pointer size. And we may also do something with floating point values as well, since those also fit in 64 bits. The other thing that was kind of coming through in the other question was tagging the pointers to avoid reference counting.

That's something that is already being done in, I believe, 3.14. I don't know if it's in 3.13. The free-threaded build, this is something that we kind of inherited from them. Basically, they wanted to avoid reference count contention on certain objects. And so for certain objects that are being manipulated a lot, we actually just embed the reference count in the pointer itself rather than touching the object.

And that actually led to kind of decent performance boosts on our end when we did that in the non-free-threaded build. Okay. Very interesting. Yeah. One of the challenges of the free-threaded Python is without free-threading, you could just read whatever the reference count is. But with reference counting, all of a sudden, everybody has to start locking contacts, like thread locking.

to get at the reference count, even if you're not doing threading because who knows when the thread could just come to life and go out, right? So that significantly can hurt the performance if you're taking a lock every time you interact with something. Yeah, and I've been really impressed with the free threaded team and meta, the work that they've done to kind of overcome a lot of the performance gap that we saw kind of in 3.13 with the pre-threaded build.

I think it was originally something like 30, 40, 50% slower on average, which is like almost a total non-starter right for actually landing this thing longer term um but uh matt page did a lot of work uh to make the specializing adaptive interpreter thread safe because self-mutating bytecode is a very thread unsafe thing to do um and so uh that was a huge part of the performance win but then it's all just these other little things like you were saying like locks are expensive and they're tricky to get right um a lot of the time under the hood where this kind free threaded performance is coming from is from avoiding locking in situations where we can so having two reference counts one for a thread that's heavily using an object and another reference count for everything else um and uh and all

these kind of clever lock free algorithms and stuff yeah

for like depending things to lists and resizing lists um that's i believe that's lock free or it's very very lightweight it leads to some kind of we like as someone who's maintaining Python, it definitely leads to some mental overhead under the hood for something like a list that used to be a very simple data structure. But, you know, I can manage. That's part of being the core dev team, right? We leverage our pain for everyone else's benefit.

It's like all the TypeScript, JavaScript developers making Jupyter work for Python people. Exactly. You guys do it down at the C level. Even more intense, I would say. Yes. We hide all the unsafe code from you so you can have a safe language. Yeah, so when will I not set environment variables and get this to happen? For the JIT? For the JIT.

I would say don't set this in any sort of production workload, anything where, like, basically we're confident that the JIT, we haven't observed any crashes that are currently happening. We haven't observed any huge memory blowup or anything that would cause significant problems. But part of that is just because it hasn't had very wide use. So while we're confident in sort of what we've seen, we want to see it in wider use so we can know if there are crashes and things that we don't know about.

Another situation is

there are just certain sorts of things that the JIT doesn't handle very well right now. So like kind of the most pressing example is native profilers and debuggers. So while tools like PDB and IPDB for or like coverage or any of these Python level profilers and debuggers, the JIT handles all of those just fine. Everything will work. The problem is, is that it starts to kind of explode in complexity once you want to start supporting things like gdb or perf or any of these other tools that are unwinding basically through c code like at a c level um and so that's something that we kind of need to support if we actually want the jit to be in wide use because a lot of these like actual production environments are going to be using a lot of this kind of ability to walk through c frames and inspect local variables and things like that um it's just really tricky because there are lots of different tools that we need to support All of them have slightly different APIs.

And a lot of the APIs are very heavy to use for JIT compilers. And so, again, it keeps coming back to, like, if this was just something that we were maintaining inside of Microsoft, like, we know what we need to do. We could just land a ton of code that does everything but is hopelessly complex and, you know, just thousands and thousands of lines maintained by a few domain experts. But, again, this is something we want the community to maintain.

So it's kind of finding that balance of, like, what tools do we want to support? Is there a way we can do it that's both fast and also if something breaks in the future or if there's a bug reported that someone other than me could fix it? That's kind of what I'm building towards. And that person can't be Pablo. Someone other than Pablo or I could fix it. Yeah, that's fantastic. Sure, it's all those little edge cases.

But if I use it with this tool and we look inside and we assume that this happens during execution or this byte means this other thing. Another situation too where the JIT doesn't currently support is the free-threaded build. this is just sort of like no one's really gotten around to doing the work yet the JIT was turned off as part of the free threaded build because it's not entirely thread safe right now and so that's something that we're hoping to work towards in 3.15.

It's like take two experimental things and collide them together and see what happens right? Yeah the act of like JIT compiling code itself so when the JIT runs and it spits out machine code that isn't super hard to make thread safe It's just kind of going through and finding all the little bits and pieces and putting locks around stuff. The tricky part is going to be the optimizations that the JIT performs.

So the JIT currently does lots of optimizations to remove dynamic type checks and avoid certain amounts of kind of overhead and other work. We've kind of made a lot of those optimizations under the assumption that there are no other threads that are mutating stuff behind our back.

Because like the way a JIT is fast is by saying, let's assume that no one's mutating the global or no one's like, you know, getting this object's dictionary from another thread or whatever. But as soon as those things are possible, we need to start being very, very conservative about what we're doing or coming up with, you know, increasingly complex lock-free algorithms or solutions to make that work. So that'll kind of be the long tail of getting the JIT turned back on will not be very hard, I don't think, for 3.15, but there will be a lot of time spent figuring out what is still safe to do and if not, how can we do it?

Excellent. So another aspect of my of question i was getting i was like when is this going to be the default when is when do i need to turn it off if i don't want it rather than turn it on if i do where are we where's the shipping story here yeah so uh on by default um i mean obviously the soonest that that could possibly happen is 315 um it's really

tricky to say just given like the massive change in resources towards these projects that have kind of taken place recently um i definitely think 315 is still possible because essentially kind of what we need to do is we need to a be confident that it's stable b be confident that it's maintainable um it needs to support the kind of features that we don't support now which are things like native profilers and debuggers um but then kind of last we need to make sure that uh it's actually faster like it's actually worth turning on so right now if you turn on the jit um the results kind of vary on average it's about one or two percent faster um but kind of the extremes of that are we've observed on our benchmarks up to 20 or 30 percent faster or even up to 10 percent slower depending on the workload so obviously we want to make sure that we're not slowing down code especially if we're turning it on by default but

if we can bump that up average up and make the high higher and the low also higher again if if if I had to bet on it I would say that definitely by 316 I think 315 is doable it just depends kind of like how we're actually able to maintain our

progress going forward. Excellent. Yeah. Okay. I guess it also, that performance side of things, a lot of the work that you all are doing, it sounds like you're kind of setting the foundation for what could be possible, right? Like you said that you're not really using unboxed math on floats and ints, and that can make a tremendous performance difference, right?

So what you've done is going to maybe make those optimizations possible, but they're not in there yet right exactly how much do you see it like there's stuff in the future to work towards and how much of it is present um

i think that like obviously there's always going to be little tweaks that we can make to stuff that we've already landed and like i think that like medium size refactorings are healthy to do going forward just to make sure that we're not you know just kind of bolting stuff on and letting it grow too organically and get you know helplessly complex um but again it comes back to like, let's do the engineering work with kind of these full-time teams and then let the community drive it forward. So we first saw this with the specializing adaptive interpreter.

Once we had the hard part figured out of how to specialize by code and have these inline caches and stuff, we saw tons of external contributors, well, maybe not tons, but like a dozen or so external contributors actually adding new specializations. Exactly, right? Like people who weren't

us adding specializations, which is super cool to see, and people still add new specializations. And with the JIT compiler, it's no different. We have kind of a optimizer that does a lot of the kind of more interesting optimizations on the machine code before it goes out the door. But it doesn't actually operate on the machine code. It operates on something that looks a lot like our existing bytecode instructions. And that's very much an intentional design decision.

And so what we've found, and I actually have a tracking issue on GitHub where I'm just saying, hey, for people who have maybe a little bit of experience with the optimizer or haven't worked on it at all, let's add support for more and more of these instructions over time. And so what I've been seeing is a lot of people who have not really worked on a compiler before able to make these sorts of optimizations in the JIT code.

So for one example, at the sprints, I was working with Thomas Rohn, who's one of our triagers. And he was able to land like three or four different optimizations for the JIT compiler without even touching machine code once. And by kind of the end of the week, we got in a place where if you do an is instance check. So if I say like, if is instance X stir, that the JIT compiler removes that check completely.

And we can just basically, if we already know the type of X, then you don't even load the name as instance. You don't even call the function or whatever. And that's just what someone was able to accomplish at the sprint again without touching machine code or necessarily. I mean, I don't know too much about Tomas, but I don't know if he considers himself an expert in JIT compilers.

Right. And so that's been like a really cool thing to see and something that we absolutely kind of want to maintain going forward is this idea of like, let's build the platform and let other people build on top of it. Okay. And letting as much of the operations and optimizations happen in bytecode means people with Python experience kind of work. They don't have to be JIT people, which is a whole nother level. Yeah. And it's not only just the sorts of optimizations.

Like you've probably heard about template strings or t-strings, which are a new feature of 3.14. The JIT compiler supports those. Not because anyone added JIT compiler support for it, but just because the JIT compiler automatically supports any new bytecodes with a few, you know, kind of exceptions to that rule. And so when the new bytecodes were added to support template strings, they just picked them up and now it works, right?

Like maybe we could optimize them further through work like what Thomas was doing. But like, that's really, really cool to see is when, you know, with a two-line code change, you can actually support template strings, which are not a trivial feature. Yeah, no, they're definitely not. That's fantastic. So basically it's just the new bytecodes that are not optimized, they just pass kind of pass through and do whatever they would have done before and they don't get optimized.

Yep. Yeah. This portion of Talk Python To Me is brought to you by us. I'll keep this short. The best way you can support the podcast is to purchase one or more of our courses at Talk Python. Just visit talkpython.fm and click on courses at the top. We just published two new courses you might be interested in. Polars for Power Users, Transform Your Data Analysis Game. Here you'll learn how to master's polars, the blazing fast data frame library for Python.

You'll clean, transform, and analyze big data more efficiently. And LLM building blocks for Python, where you'll integrate LLMs into your Python code with vendor agnostic libraries, reactive Marimo notebooks, and async and cached pipelines. And if you're thinking of getting more than a course or two, you'll save big time with a whole library bundle. Check out all the courses at talkpython.fm/courses. There's some bytecode instructions that the JIT will actually refuse to compile.

So many of the specialized, basically all of the specialized instructions, the JIT will compile them, the specialized form. Many of the unspecialized instructions, the JIT will also compile them. There are some instructions that very rarely occur. So these would be things like, I think, like imports or certain exception handling opcodes and things like that, where they do kind of subtle things.

And so it's actually tricky to handle in the JIT, but we don't find that they're blocking too many of the hot code paths in general. But again, that's the sort of thing where it's just a matter of someone doing the work to rework those bytecode instructions so that the JIT can handle them. Maybe someday we'll have parallel import statements. We never know. Yeah, for when you put your imports in a hot loop, you can JIT your imports. Exactly. Yeah. All right.

I want to dive into the micro ops and all those sorts of things of it. But before I do, since you also worked on the specializing adaptive interpreter, do these things cooperate? Does the JIT supersede the need for the adaptive interpreter? They absolutely build on top of each other. So the whole reason the JIT is able to optimize well is because of specialization.

So specialization, by rewriting the bytecode instructions into text specialized ones, not only are we able to run that bytecode faster, But just by looking at the bytecode, we have profiling info. I can say, oh, over here, this is two integers being added together. I know the result is an it. And I know that both the things going into it were integers from this point forward. And you can kind of see how it goes from there.

By looking at how we've specialized attribute lookups, as part of that, we guard against the type of the class. So every time you look up an attribute, we now know the class of that object going forward. And so we can remove the checks for that class from all the following attribute loads going forward as part of that JIT code. So we would not be able to do what we're doing without the specializing interpreter.

Everything from knowing where the code is hot to knowing the types to knowing the different operations, like all of that absolutely builds on top of it. And the JIT code itself is actually implemented as a specialization. So basically, we have bytecode instructions that will detect hot code because they increment a counter and eventually that counter will hit some threshold.

We JIT the code and then we replace that bytecode instruction with one that enters the JIT code. And so we literally use specialization to get in and out of the JIT compiler. Yeah, nice. I think this comment by Kyra, may typing help JIT work better? If not, like how could you use typing for optimizations? And you know, a lot of what the JIT does does have

to do with this type information, maybe not in the way that they're proposing it, right? Like the code has argument colon int, Not necessarily that way, but it's more of a, I don't care what you say, I'm going to pay attention to what you do. And then we'll use that type of information, right? Like we'll see what's actually being passed in and maybe compile that. Yeah. Yeah, exactly. We, we completely ignored type annotations, at runtime.

and the reason is, is that we have better information available rather than trying to figure out what you meant by L I S T open bracket, I N T close bracket, right? Like we can instead, like that's the nature of JIT compilers is they happen at runtime. And so versus the annotations, which are really helpful to a static ahead of time compiler, like a more traditional C compiler or something like Cython.

For a Jig compiler, I don't need to figure out what your annotation means or whatever, like what this generic nonsense is. I have the pointer right there. I can look at it and say, this is a list and it looks like it's full of it. And so we have information that we know is correct, that we know is up to date, and it's much richer too. So like, for example, you may tell me that the argument to this function is of a given class, right? An instance of a class.

But you've just told me the name of the class. What's actually useful to me is to know, okay, what keys do or what attributes do instances of this class usually have? Like, what is the internal version counter of that class that we can use to kind of share optimization information across traces? Has the class been mutated lately? All this stuff are things that you can't express in annotations, but they're extremely useful for even the most basic JIT compiler optimizations.

Another example would be unboxing integers. It's not enough to know that something's an int. We need to know whether it's an int and it fits in 63 bits because otherwise we can't unbox it. Yeah, yeah, sure. You try to put a huge number. That always impressed me about Python. Coming from a C++, C# background, I'm like, numbers have sizes. there are consequences if you overdo the sizes. Somehow there's an int that fills the entire screen and I didn't do anything in Python.

You know, it just, it can take it. But when you get down to the machine level and registers, it doesn't like it. Yeah, well, and that's what's beautiful about the J compiler and about dynamic languages like Python is if we're able to jit the code and handle small integers or medium-sized integers, then you get to benefit from all that speed.

But the second that we see something that's 65 bits or even larger, some huge number, we don't crash or wrap or raise an exception or anything like that like the code just does the right thing it's just not happening in the jit anymore or it is happening in the jit it's just not unboxed anymore right like the jit is just as dynamic as your code is which is what kind of makes it magical yeah amazing so the jit as you pointed out is a runtime thing i

think it's it's a little bit different than maybe some people who are familiar with more static language jits you C#, Java, those types of things. The way those work is I try to run some code. It's either JIT compiled or not JIT compiled. And if it's not, it has to compile it until the next step can be taken. This is a little more like this JIT compiler gets brought into the mix.

Once you've seen that there's enough effort being put into a part of a program, like enough loops or enough calls or something, you're like, okay, this could probably benefit. Now we'll kick in the JIT. So this is a little bit more of a transition or a spectrum from traditional CPython to JIT compiled CPython, unlike other languages that are static that literally they just have to be compiled to run, right? Yeah, this is a lot closer to something like you'd see in a JavaScript JIT.

And actually, we've shamelessly stolen many of the ideas that have been proven to work very, very well in JavaScript JITs. And so just to kind of give you an example, like we don't compile an entire function. We'll compile parts of a function or several parts of a function. So if there's one path through a function that goes into another function that comes out, rather than compiling both of those entire functions, we'll just compile that hot path where you're actually spending its time.

And doing that allows us to make a lot of really helpful simplifications under the hood that make optimization easier and stuff. But it also means that we're not spending time trying to reason about and compile code that you're never executing. And again, that's just due to the dynamic nature of since we're running your code, we can see exactly where it's going, exactly what it's doing, and we can benefit from that information.

Yeah. Well, people might think, why would I write a bunch of code that's not executing? But pip install a thing, you might do that for one or two functions, and it's a massive library that has a bunch of libraries it depends upon. You can just leave all that stuff alone, except for the little part that they're working with, right? Yeah, or just an example of where you have a function that does some kind of setup work and then has a hot loop and then does some kind of teardown work.

Probably it's only worth compiling just that hot loop, and that's actually what our JIT does. it will ignore the parts where your program isn't actually spending most of its time. Another place this helps is if there are parts of the code that we actually can't JIT compile for one reason or another. For example, another kind of benefit of specialization is that it means we can do much faster profiling and debugging.

So this new feature of, it was either 3.12 or 3.13, sys.monitoring, basically inserts, rather than basically checking on every bytecode instruction, whether we need to fire profiling or debugging hooks, we just specialize the bytecode in order to fire those hooks.

And so what that means is that if you're running a coverage tool and we have these kind of line events inserted in the bytecode all over your function, basically once we've hit all those line events for the first time, they can be disabled and we can jit all the code that's actually running. And maybe there's a couple of branches that aren't being covered and those will show up in your coverage report.

But the JIT can completely ignore those instead of saying like, oh, I can't touch this function because it has one, you know, instrumentation event on this cold branch or something like that. Yeah, that's very interesting. Yeah. Is this one of the things when I first saw that it's this it's called a tracing JIT, not a whole function JIT. Yeah. There's a tracing JIT that sort of eventually kicks in after enough behavior or enough activity.

is what if you really want it how long does it take to kick in i guess what i'm trying to think through is like what if i want it to right away use the jit adversion is there ways to make the jit more aggressive like set some thresholds like you can with gc uh to say you know collect less often or more often based on you know these these numbers for the gen one gen two and so on yeah or can you can you say once it's run for all can you save the profile like into the dendrpy cache and And then like it loads up.

It's like, all right, we already saw how this works. We're going to keep going. Yeah. So probably in the future, I imagine that we'll have some way of tuning sort of the aggressiveness of the JIT, just like the GC that you said. But one thing that we want to make sure is that we don't JIT things too early. It may be very tempting to say like, oh, I want everything to be JITed immediately.

But chances are you actually don't want to JIT most stuff that's only running for the first, you know, a quarter second of your program's execution, especially if that's kind of setup code that's never going to run again. Like, for example, module-level code for imports and things like that. Currently, our JIT threshold is pretty high, so you need to run a given section of code at least a few thousand times before we try to JIT it.

But that's more just because we're trying to kind of ease into the JIT rather than just trying to JIT compile everything because what we don't want is kind of these performance cliffs where we JIT compile everything and your code is like 50% slower by the time the JIT actually finished compiling everything your program was done running. And so it's kind of finding that balance. It might be a bad choice to even mess with it, right?

And if it is a long-running app, then the higher threshold isn't really going to hurt too much. Again, it's finding opportunities where we can speed things up and avoiding opportunities where we would inadvertently slow things down. But I do think that having some sort of tunable parameter would help in the future. I just don't know if we necessarily need to add one right now just because we can. Yeah. Yeah, it makes sense.

You don't want people to mess with it and make it worse and then say, I'm never using the JIT, see how bad it is. Yeah, another reasonable thing too would be having some way of controlling the amount of memory that's being used. So setting some threshold where you say, I don't want more than a meg of JIT code to use like an extreme example or something like that. What about MicroPython? Could something like this be in MicroPython?

I really doubt it, given how resource constrained MicroPython is, I think they need every byte of memory that they can get. I certainly think it would be cool if they had a JIT, but I'm not holding my breath that they would have one anytime soon or that it would be super beneficial to them, given just how heavy JIT compilers are. What do you think about WebAssembly? I think WebAssembly is cool. Do you think it would be possible for PyScript and friends and Pioxide?

That wouldn't be the MicroPython variant. That would be the Pioxide higher order one. Well, so Python, I mean, I haven't looked at this in a while, but Python does support WebAssembly as sort of a platform that we do support. Yeah. The JIT cannot work on WebAssembly, mostly because WebAssembly is sort of this highly constrained sandbox environment that doesn't allow dynamic code generation.

So the only reason that we're allowed to JIT is because we can allocate data, fill that data with random bytes, ideally not random, but like, in theory, random. Arbitrary. What do you ever come up with? Exactly. Just say, you know, here's some bytes, put them in this array and then set the array to be executable and jump into it. Like that goes against everything that WebAssembly stands for. Right.

And I feel like I haven't been following it too closely, but I feel like there have been a couple of proposals or ideas where people have kind of worked around this limitation by more or less instead of JIT compiling like an array, you JIT compile an entire WebAssembly module and then you load that. And so it can be verified in everything the same way that any WebAssembly code is. And then you start running that current WebAssembly standard, I don't think, allows for that.

But it is a possible extension in the future. And if we wanted to do JIT for WebAssembly, it wouldn't be a very hard thing to do given the current architecture of our JIT compiler. It makes supporting new platforms very easy. So you almost would have to ahead-of-time compile instead of just-in-time compile. Sort of, yeah. You could put that into WebAssembly, which is really hard if you're using tracing to understand what you're doing.

Yeah, and especially because tracing, you tend to have lots of small portions of JIT code that all get stitched together rather than these giant whole functions. but I've seen some crazy hacks where people do things like freezing the state of the entire program basically emitting the new JIT code for WebAssembly and then like recombining everything and running that new program in place of the old one and just weird things like that.

Like that's not something I think that if we're going to JIT then we should do it in a way that the standard allows and the standard just doesn't allow that for right now. Right. Well, and you also don't want it to become a huge impediment to adding new features like you said about adding new bytecodes Yeah, exactly. Interesting. So I feel like in order to stay true to the title a little bit, we should talk about some gotchas. That's what your PyCon talk was about, right?

You want to give a quick shout out to your talk because it just came out on YouTube. Yep. Which at the time of recording, now the time of shipping, remember time travel, all that kind of stuff. But what are we? We're talking May 27th. They actually got the videos out really quick this year. Yeah, I think I gave the talk on the 17th. So it was like 10 days or something like that. It's nice. I think part of it was they didn't have an online option this year for the conference.

And so they were able to put up the videos a little sooner to avoid kind of diluting the value of the online tickets like in past years. But yeah, my talk was called What They Don't Tell You About Building a JIT Compiler for CPython. I gave a similar talk last year called Building a JIT Compiler for CPython, where I went over how the JIT compiler kind of works under the hood.

And kind of the premise for this talk was, well, my talk from last year and a lot of JIT compiler talks that I've seen gloss over a lot of the stuff that I actually spend a lot of my day doing. And a lot of the things that are kind of interesting and not necessarily intuitive about JIT compilers. And so I covered a few of those in my talk. So one of them was, we kind of already touched on, the difference between a whole function or sometimes called method at a Titan JIT and a tracing JIT.

We currently have a tracing JIT architecture. A lot of other JIT compilers have a whole function, especially if you've used like Numba where you decorate a function with at JIT or something like that. Like a lot of these kind of DSL-based JITs also do the whole function thing. And so that's something that when you think about compilers, you think about like a C compiler, which is ahead of time and compiles entire functions.

And so sometimes the switch to tracing can be a little unintuitive for people. And so I walked through some examples to show kind of how that works for us and what the trade-offs are. I also touched on memory management. So how you actually go about allocating executable memory and getting into it, which is mind-bending to think about. It literally is just what I said earlier of allocate some bytes, fill them with some stuff, and then jump into the bytes and cross your fingers.

Yeah, a lot of people who don't do low-level programming might not realize that the OS treats certain parts of memory differently. Here you can read and write from, but you can't execute it. Here you can execute, but you can't read and write to it because you don't want it to go. And here goes the virus. You know what I mean? Or the Trojan or whatever. But you guys have to basically rewrite your executable code, which is tricky, right? Yeah, it's really tricky.

And I've kind of said this, but at best, it's a foot gun. And at worst, it's a major security vulnerability to have code that is both executable and writable. Because the foot gun is, oh, you're mutating code while it's being executed.

That's just a recipe for disaster unless you're being very careful, which a lot of Jig compilers actually do mutate code while they're executing it in order to keep information in their caches and to specialize things in a way that we're relying on the specializing interpreter to do so we don't have that need. But it can also be an issue because if you think about it, what we have is data that's from an arbitrary user program possibly operating on arbitrary untrusted input.

But we're taking that data and we're using it to produce machine code at runtime that's being executed. And so if we're not very careful about how we're doing that, that can lead to security vulnerabilities. And if the code that we're jitting is capable of self-modifying, then that's just kind of opening the door to all sorts of trouble if we're not extremely careful. I'm not a security expert.

I know what the best practices are for JIT compilers, and I've read a lot about how to avoid these things and what the issues are with them. And so we're definitely erring on the side of caution. I know I keep repeating this, but in the interest of maintainability, I want to know that I can trust that when people are making bug fixes or whatever in the JIT compiler that are not accidentally introducing vulnerabilities that can be exploited.

One of the things that over the years has been really impressive to me is how few significant security issues Python's ever had. There's a lot of runtimes, a lot of systems where it's like, yep, and there's another three CVEs patched this month. There's only been a handful of things that I can remember, and most of them are quite minor edge case sort of things. But ability to start, you run a bad Python program and you get full machine instruction, that would be on the list. something bad.

And that's why we're being

incredibly cautious here is because a lot of those vulnerabilities that you're speaking about come from things like JavaScript runtimes or Java runtimes that do have JIT compilation. So JIT compilers are kind of notorious CVE magnets, and we don't want ours to become one as well. And so a big part of that is, well, our JIT compiler is a lot simpler than a lot of other JIT compilers, especially right now. And the more complicated it is and the more of that kind of cheating that you do, the more opportunities there to actually miss something and cause some issues.

But it's not only kind of a function of our simplicity, but it's also just a very conscious effort to make sure that what we're doing follows best practices. And I've been actively working with security researchers to fuzz the JIT and to audit it for security vulnerabilities and stuff, just because I want to make sure that if we're doing this, we're going to do it right. Yeah, fuzzing being sending kind of randomly varying input.

And if something like a crash happens, like the thing completely stops, like it might be a crash at first, but a carefully constructed one could be, you know, buffer, overrun, execute. Exactly, that sort of thing. Or even just simpler things like differences in behavior. If with the JIT turned on, something different happens than with the JIT turned off.

Like the last thing that we would want to do is for you to have like a, if user authenticated, do one thing, otherwise do something else, and then we take the wrong branch, right? Like that's a nightmare scenario. And that's a very hard thing to catch because it's not a crash. It's just wrong behavior. Right.

Like theoretically, you could optimize away that check because actually one of the performance things you all do a lot is like, we now know about this information so we can remove these other checks and just index into the type to get its attribute or to not verify that it is this derived class before you assume that it's this particular function and all that kind of stuff, right? Right. I mean, going back to sort of optimizations that we already performed today.

So like removing is instance checks when we can prove certain qualities about the things that we're checking. if we remove an if is instance user, authenticated user, or, you know, a spam user or whatever, like if we remove that check, we need to be absolutely certain that what we're doing is 100% correct, which to my knowledge it is, but it's also just something that we need to be very careful about doing, right? Yeah. It's one thing to say, I wrote my program and I checked it to work.

It's another to say, I wrote a program that executes all other programs and it still works. And doing all of this in the presence of multiple threads is even more fun. yeah and malicious input and on and on and on crazy all right what other gotchas or surprises did you find there before we wrap things up um like gotchas and surprises for us or no like the what people didn't tell you about building the oh

yeah kind of the last one was again something we already touched on which is uh the uh kind of support for profilers and debuggers this was something that was just kind of a blind spot for me because i don't use those tools on my python code that extensively and in fact like um like pablo reached out to me Galindo, who maintains several of these tools and knows a lot about them, reached out to me and was like, hey, this is something that we need to figure out.

And he actually did a really good kind of write up in one of the issues about like sort of all the tools that we want to support, how different options for supporting them, kind of what different paths forward we have. And so it's more just kind of a matter of figuring out what makes the most sense for us.

That's just something where, you know, if you're writing a J compiler and you're not using something like LVM to generate all this stuff for you, it's just kind of a pain to have to handwrite or even generate this debugging information so that someone can figure out that this variable's in that register and this was my caller. And it's so subtle and it's so easy to get wrong and multiply that by the number of tools that you want to support, the number of platforms you want to support and all that. It makes an already complicated piece of software like a JIT compiler even more complicated.

I can imagine. Yeah. Using the tools is tricky. to make sure that all the pieces are in the binary so those tools work. There is so much magic that happens under the hood so that you can start a GDB session and do up, up, and then print a local variable. There's a massive amount of engineering and possible bugs just for that most basic of debugger behavior, right? Incredible. Yeah. What about the debuggers and PyCharm and VS Code and the real common? Yes,

so if they're not a native debugger, so if they're just attaching to the Python process and using sys.monitoring or something like that, or if you're launching the process under sys.monitoring, then all of that works completely fine. Basically, we don't JIT compile any bytecode instructions that have those instrumented instructions that are firing tracing or profiling events. I see. So if the debugger effectively is attached, you're just like, all right, just leave it alone. Yeah, exactly.

And so it's a matter of we don't JIT compile code that is currently in a debugger. The really tricky thing is, oh, what if we've got some JIT compiled code and then that calls some more Python code that starts a debugger and they start messing with local variables and changing globals and changing the type of our authenticated user to unauthenticated user and all those sorts of things. Like how do we make sure we do the right thing when that debugger continues and the function returns?

I mean, you can even, in a debugger, you can jump from the body of one for loop to another, whatever that means, right? And so making sure that when we return to the JIT code and that we are doing the right thing and that we basically detect that our optimizations are no longer valid and bail out. That's also something that we've had to spend some time figuring out. Basically, we just have...

You keep like a copy of the original and you're like, if things go crazy, we're just going to let that stuff pick up again, the original bytecode and original interpreter. Yep, so the original bytecode is always going to be there. And we do need it because we're only compiling parts of a function anyways.

And for whatever reason, we may choose to throw away our JIT code because it's not being used very much, or in this case, because someone messed with the world in a way that invalidates our optimizations, we basically keep one bit of state on the JIT code and we check that whenever it could have possibly been invalidated. And so anytime you could have entered a debugger, basically upon returning to JIT code, we check that bit, which is a very cheap check to do.

And if that bit is set, then we basically just leave the JIT code and throw it away because we can always create more later, I guess. JIT code is cheap. Yeah, you can just in time compile it a second time. And that's another thing about tracing JITs too, is like when you're throwing it away, you're throwing away one path through one part of the function, not the entire function. Sure. Oh, that's very interesting. Yeah, of course. Of course. All right. Let's close it out with a roadmap.

What's coming? What should people expect here? Yeah, so I mean, for 3.15, we've got a lot of things that we want to do. How much of it we'll actually get to, not so sure. But for right now, kind of the obvious things that we already talked about. So like integer and float unboxing are really attractive optimizations. We want to make better use of hardware registers in the JIT compiler.

So currently, we're kind of, it's a little tricky to explain verbally, but basically, when you're operating on two values in a Python program, if they're being used frequently, we sort of want to keep those in registers. or if they're being used by one bytecode instruction, and then they're going to be used by the subsequent bytecode instructions, we want to make sure that that's in a machine register somewhere. Right. And that's not necessarily expressed in the bytecode.

Nope. Straight from Python, right? It's just like load this thing, do some, load it again, right? Yeah. That's what's tricky is that the bytecode that we're compiling uses a stack, right? But the actual machine, even though it does have like a stack of memory, like what's actually happening is in registers. And the registers are where you want to keep all the stuff that you're actually using.

And so getting smarter about how we're using the registers is definitely something that we want to be smarter about. And I mean, this compounds, right? Like if you're unboxing things and you're putting them in registers, then now rather than having a Python integer out in memory that's being referenced from some other memory location, you've now got an integer in a register. And, you know, adding that together is trivial.

Other things that we want to do are, I said already, support for debugging. Thread safety is another thing that's kind of interesting. So there are a couple people, I'm not necessarily an expert on all the work to make CPython thread safe. Like I've definitely worked with some of the idioms and things before, but there are people who are very familiar with how you make code thread safe and they want to learn more about the JIT.

And also people who know a lot about the JIT but want to learn more about how Python is being made thread safe under the hood. And so this is a good opportunity for kind of cross pollination of those two kind of domains of expertise. And so I think it's a good opportunity for other people to sort of just kind of chipping away at the things that are holding the JIT compiler back from being compatible with the free thread it built.

Yeah, well, I definitely definitely feel like those would be multiplying factors. Like you could speed up your code a bunch from free threading and all the codes being sped up by the JIT. So you could kind of if you employ them both, you'll get a multiplicative boost there. Exactly. If the JIT makes your code 50% faster and you spawn eight threads and they're going six times faster, then obviously that's a pretty significant improvement.

Yeah, that's good. Also, there's probably ways in which you could leverage threading. I don't know if it even matters if Python itself is free threaded because you can do whatever you want down below. but like you could have a thread that asynchronously JIT compiles and keeps it running so you don't block while that's what's happening and then like at one moment swap it over or do analysis and like further optimization in like an idle thread or yes a lot of interesting that's absolutely

something i've been kind of thinking about um where it's like okay now that we have well and this is all assuming that free threading is going to land right like it's still experimental it hasn't been approved all that um but uh like yeah i think it would probably make sense to have the JIT compiler run in another thread or to decompile something quickly and then run the kind of heavier optimizations in another thread. Another thing that might make sense to run another thread is like parts of the GC process, right? Like once you

have this capability, that's something that kind of unlocks a lot of doors to experimentation like that. Yeah, I hadn't even thought about the GC, but absolutely, because a lot of it is scanning and figuring out what is still can be referenced. And then you need that one moment where you change the memory and rewrite it, but that analysis period could be concurrent for sure. It's very hard to get right because the graph is changing while you're analyzing it.

But if you're careful, I mean, there is priority. It can be done. I'm not saying it should be done or it's easy to do, but it's just something interesting that wasn't even an option before, but possibly is now. Or even just something as simple as running Dunderdell methods or weak ref finalizers in another thread. That's something that we couldn't do before, but we can at least try now.

Right. I think the bright spot of running GC concurrently, I think I've only had a moment to consider, but the GC is looking for stuff that basically can be found and it throws away the stuff that isn't found. It would err on finding stuff that really isn't trackable anymore, but it was just a moment ago, but it's not going to find stuff that was undiscoverable and becomes discoverable.

Because once it's untracked, there's nothing that points to it anywhere at all. It shouldn't be able to come back into existence. So you might not be as efficient, but you could, from a memory perspective, you should not crash it and throw away. The stuff that's being concurrently mutated by many threads are the part of the large object graph of reachable objects. All those little graphs off to the side of unreachable stuff are actually fairly quiet. They're totally silent, actually.

Yeah, exactly. Oh, there's a whole bunch of things we could just spend tons of time going down. Yeah, yeah. We'll have that 24-hour marathon sometime. Yeah, we'll stream it. I don't know. Don't speak that to existence. We might lose our voice for a week. It'll be worse than PyCon. You won't even be able to talk. You just have to lay on the couch. I know. I can't go out to any of the, like, a lot of the companies like Astral or Anaconda host events some of the nights.

And I have to be very careful the night before my talk because otherwise my voice gets hoarse. Yes, I've done that at conferences as well. And I've just regretted it so much. I'm like, oh no. The whole reason I came here is to give this talk and I can barely speak. What have I done? All right. Well, I guess with that, let's leave it with this, Brent. People can check out your talk. There's a bunch of stuff that we didn't go into.

A lot of animations about how the different layers of the JIT work and the specialized and adaptive interpreter and so on. So there's a lot to get from watching your talk in addition to listening to this show. So I encourage people to go do that. And I just want to say thanks for being here. No, thanks for having me. I love coming on and just geeking out about this stuff. Absolutely. Always a good time. See you later. Bye. Bye, everyone. This has been another episode of Talk Python To Me.

Thank you to our sponsors. be sure to check out what they're offering. It really helps support the show. This episode is brought to you by Sentry. Don't let those errors go unnoticed. Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry. Want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async.

And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. We're live streaming most of our recordings these days.

If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code. Talk Python To Me, and we ready to roll. Upgrade the code, no fear of getting old. We tapped into that modern vibe over in each storm. Talk Python To Me, I-sync is the norm.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast