#21: PyPy - The JIT Compiled Python Implementation - podcast episode cover

#21: PyPy - The JIT Compiled Python Implementation

Aug 18, 201554 minEp. 21
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

See the full show notes for this episode on the website at talkpython.fm/21

Transcript

Is your Python code running a little slow? Did you know the PyPy runtime could make it run up to 10 times faster? Seriously. Maja Falkowski is here to tell us all about it. This is episode number 21, recorded Wednesday, July 8th, 2015. Developers, developers, developers, developers. I'm a developer in many senses of the word because I make these applications, but I also use these verbs to make this music. I construct it line by line, just like when I'm coding another software design.

In both cases, it's about design patterns. Anyone can get the job done. It's the execution that matters. I have many interests. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm. And follow the show on Twitter via at talkpython.

This episode, we'll be talking with Maja Falkowski about the amazing alternative Python implementation, PyPy. This episode is brought to you by Hired and Codeship. Thank them for supporting the show via Twitter, where they're at hired underscore HQ, and at codechip. Before we get to Maja, let me share a little news with you. First off, Talk Python To Me has a new domain name, talkpython.fm. I put the idea of a shorter .fm-based domain out on Twitter, and I'd say about 80% of the

listeners said they liked it better than the longer .com domain. So here you go. About a month ago, I moved all the MP3 file traffic out of Amazon S3 and into a dedicated audio file cache server. It's a lightweight Flask Python 3 app running through Nginx and Microwiskey. A few listeners expressed interest in seeing the code, so I did a little work to try to generalize this a bit, and I open sourced it. I'm calling the project Cachedier.

And you can find a blog post as well as a link to the GitHub project on the show notes. Next up, we have a new Python podcast. I'm super happy to announce a Python podcast by Brian Okken called Python Test Podcast. You can find it at pythontesting.net slash category slash podcast. Now, let's get on to the show. Maja, welcome to the show. Thanks for inviting me. Yeah, I'm super excited to talk about our topic today, which is PyPy.

And I think what you guys are doing with PyPy is so incredibly cool to be taking some of these JIT compilation GC sort of semi-compiled languages or concepts and applying them to Python. So really happy to talk about that. The story of compiling dynamic languages is really sort of old and half-forgotten.

Like, we know these days that you can do this with JavaScript, but the original work on small talk dates back to at least mid-90s, if not earlier, which is what we are all building on top of anyway. So it's nothing new. The new part is just applying this to Python. That's right. That's right. Well, I think it's great. Maybe before we get into the details of what you guys are doing, maybe you could give the listeners who are not familiar with PyPy a little history and introduction to it.

So PyPy is essentially a Python interpreter, which works very, very similarly to the normal thing that you would call Python, that technically is called CPython. It's a Python interpreter written in C. And we have a different Python interpreter, which is implemented slightly differently. And for the most part, glancing over all the details, it should run faster on most of the examples because it can dynamically compile Python down all the way to the assembler level.

So it's like a normal Python interpreter, except sometimes faster, most times faster, in fact. That's it. It sounds very simple, but it's actually quite a big project. It has been around more or less 10 years by now. Wow. It started 10 years ago. And when did you get involved with it? I got involved, I think, 2006 or 2007.

I was doing, I sort of got interested in Python static analysis, which PyPy, part of PyPy is doing that, is taking a restricted subset of Python, which PyPy is implemented in and compiling it down to the C level. So I was interested in Python static analysis and I glanced over PyPy project and sort of started getting involved. And then I got a spot at Google Summer of Code to work on PyPy for the summer. And that's essentially how it all started.

How many people work on PyPy or contribute to PyPy? Depending how you count, it's anything between three and 30. PyPy is a big umbrella project for a vast variety of anything from, as I said, a Python interpreter to very researchy stuff that people at various universities try to experiment with. Like there is a couple of people working on running Python and PHP in the same process. So you run PHP code in the server, but you can still call Python functions in that process.

There are people working on software transactional memory. So it's a big umbrella project that is a research vehicle for a lot of people, additionally to being the Python interpreter. Yeah, I can see how that would work for if you're doing some sort of academic research, especially something with JIT and GC, then it makes a lot of sense.

I think one of the things that people either who are new to Python or have kind of dabbled in it, but are not, you know, deeply working with it and thinking about the internals of it every day, don't realize that there's actually a whole variety of different interpreters out there. There's a bunch. They're all slightly different.

So let's glance over them because I think it's important to know there's like the CPython is the normal Python interpreter that is probably used by 99% of people using Python. Yeah. If I open up Linux or my Mac and I type the word Python and enter that's CPython, right? That's CPython. So that's what most people would use.

CPython internals that you need to know is the fact that it's implemented in C. And another internal detail that's important to know is that it exposes the C API, which goes quite low. So it's possible to write C extensions in C for Python. So you write a bunch of C code, use a special API for accessing Python objects, and then it can be called from Python code, your C functions. Then we have Jiton, which is quite old, actually.

And it's a Python interpreter written in Java and a similar project called Iron Python, which is a Python interpreter written in C#. And those two interpreters, they're quite widely used for people who write Java and want a better language. So they, so their main big advantage is integration with the underlying platform. So Jiton is very well integrated with Java and Iron Python with C#. So if you're writing C#, but you would really love to write some Python, you can do that these days.

And then there's PyPy, which is another Python interpreter written slightly differently with a just-in-time compiler. So those are the four main interpreters. And there is, there is quite a few projects that try to enter this space, like PyStone, which is another Python interpreter written by Dropbox people. Yeah. I wanted to ask you about PyStone because that's, that seems to me to be somewhat similar to what you guys are doing.

And, and it comes, the fact that it comes from Dropbox where Guido is and a lot, there's a lot of sort of gravity for the Python world at Dropbox that made it more interesting to me. Do you know anything about it or can you speak to how it compares or the goals or anything like that? So, well, I know that it's very, very similar to the project that once existed at Google called Unladen Swallow.

So the main idea is that it's a Python interpreter that contains a just-in-time compiler that uses LLVM as the underlying assembler platform. Let's call it that way. And this is the main goal. The main goal is to run fast. Now, the current status is that it doesn't run fast. That's for sure. It runs roughly at the same speed as CPython for stuff that I've seen on their website. As for the future, I don't know. I really think the future is really hard.

Especially when you don't have much visibility into it, right? Yeah. Like, I can tell you that like PyPy, PyPy has a bunch of different problems to PyStone. So, for example, we consciously choose to not implement the C API at first because the C API ties you a lot into the CPython model. We choose not to implement it at first. We implement it later as a compatibility layer. So the first problem is that it's quite slow. It's far, far slower than the one in CPython.

And as far as I know, right now, Dropbox uses the same C API, which gives you a lot of problems, like a lot of constraints of your design. But also, like, gives you a huge, huge benefit, which is being able to use the same C modules, which are a huge part of the Python ecosystem. Yeah, especially some of the really powerful ones that people don't want to live without, things like NumPy and, to a lesser degree, SQLAlchemy, the things that have the C extensions that are really popular as well.

So you guys don't want to miss out on that, right? Right. So you brought two interesting examples. So, for example, NumPy is so tied to the C API that it's very hard to avoid. It's not just NumPy. It's the entire ecosystem. We, in PyPy, we re-implemented most of NumPy, but we are still missing out on the entire ecosystem. And we have some stories how to approach that problem, but it's a hard problem to tackle, that we choose to make harder by not implementing the C API.

However, for example, the SQLAlchemy stuff. SQLAlchemy is Python. It's not C, but it uses the database drivers, which are implemented in C, like a lot of them. So our answer to that is CFFI, which is a very, very simple way to call C from Python. And CFFI took off like crazy. Like, for most things, like database drivers, there's a CFFI-ready replacement that works as well and usually a lot better on PyPy that made it possible to use PyPy in places where you would normally not be able to do that.

And CFFI is like really, really popular. It gets like over a million downloads a month, which is quite crazy. And CFFI is not just a PyPy thing. It also works in CPython, right? Yeah, it works in CPython in between like 2.6 and 3.something, I think. 3.whatever is the latest. And it works on both PyPy and PyPy3. And since it's so simple, it will probably work one day in JITON too. You said you have a plan for the NumPy story and these other heavy sort of C-based ones.

Currently, the way you support it, this is a question I don't know, is that you've kind of re-implemented a lot of it in Python? So we, to be precise, we re-implemented a lot of it in our Python. Our Python is the internal language that we use in PyPy. Right, that's the restricted Python that you guys actually target, right? Yes. Yeah, but we don't, generally don't encourage anybody to use it. Unless you're writing interpreters, then it's great.

But if you're not writing interpreters, it's an awful language. But we, so the problem with NumPy is that NumPy ties so closely that we added special support in the JIT for parts of it and things like that, that we decided are important enough that you want to have them implement in the core of PyPy. So we have, most of NumPy actually works on PyPy. And this is sometimes not good enough because if you're using NumPy, chances are you're using SciPy, Scikit, Learn, Matplotlib, and all this stuff.

We have some story how to use it, which is to, the simplest thing is just to embed the Python interpreter inside PyPy and call it using CFFI. It's a great hack. It works for us. Really? You can like fall back to regular Cpython within your PyPy app? Yeah, it's called PyMetabiosis. That's awesome. I'm pretty sure there's at least one video online with the author talking about it. It works great for the numeric stack, which is its goal. So this is our story.

We are still raising funds to finish implementing NumPy. It says a very, very long tale of features. And once we are done with NumPy, we'll try to improve the story of calling other numeric libraries on top of PyPy to be able to mostly seamlessly be able to use stuff like SciPy and Matplotlib. It will still take a while. I'm not even willing to give an estimate. Sure. But it's great. And it does look like there's a lot of support there.

We'll talk about that stuff in a little bit because I definitely want to call attention to that and let people know how they can help out. Before we get into those kind of details, though, can we talk just briefly about why would I use PyPy or when and why would I use PyPy over, say, CPython or Jython? Like, what do you guys excel at? When should a person out there is thinking, like, they've just realized, oh, my gosh, there's more than one interpreter? How do I choose?

Like, can you help give some guidance around that? So typically, if you just discovered, oh, there's more than one interpreter, you just want to use CPython. That's like the simplest answer. You want to use CPython, but if you're writing an open source library, you want to support PyPy at least, which is what most people are doing. They're using CPython and the libraries support PyPy for the most part. Our typical user, and this is a very terrible description, but this is our typical user.

This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities. Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive five or more offers in just the first week, and there are no obligations, ever. Sounds pretty awesome, doesn't it?

Well, did I mention there's a signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus, and as Talk Python listeners, it gets way sweeter. Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $4,000. Opportunity's knocking. Visit Hired.com slash Talk Python To Me and answer the call. You have a large Python application that's spanning servers, serving millions of users, and you're running into corners.

Like, you can't serve requests quickly enough. You can't serve enough users from machine. You're running into problems. Now, your application is too big to, say, rewrite it in C or Go, or it's just, like, too scary for whatever reason. So, you look, like, what it would take to run stuff in PyPy. It usually takes, like, a bit of, your code should run, but it usually takes a bit of effort to, like, see what sort of libraries do you use. Do you use NSE extensions?

If their C extensions are, like, crucial, can you replace them with something? So, yeah, this is our typical user. And I have people, I run a consulting company that does that. There are people coming and asking, like, okay, I have this set up. It's impossible to do anything with it now. Can I just, like, swap the interpreters, make it run faster, and make the problems go away? This is our typical user. I hear why you described it that way is maybe not the best way, but, you know, you're right.

If you have 100,000, half a million lines of Python, and really you just need to make it a little faster. If switching to a different interpreter like PyPy will solve that, that's great. So, speaking of faster, can you talk about the performance comparisons? I have a little example I'll tell you, but I'll let you go first. So, as usual, performance comparisons are usually very hard to do and flawed. Everybody, yes, absolutely.

Everybody's thing they care about is not exactly what you're measuring, and so it might be totally misleading. But give it a shot. One good estimate is if you don't have benchmarks, you don't care about performance. Like, if you never wrote benchmarks for your applications, then chances are you don't actually care all that much. And you shouldn't really... That's the first step. Like, make sure you know how fast your applications run.

Once you know that, you can measure it on different interpreters. But as far as expectations go, PyPy tends to run heavy computations a lot faster. Like, a lot is anything between 10 and 100 times faster, depending on the workload. For stuff that's more... And again, what is a typical Python program? Typical Python program is probably Hello World. How fast Python runs Hello World. Roughly at the same speed as CPython, you won't notice.

But for a typical web application, the speed up, if you're not heavily relying on C extensions, would be around 2x. So, 2x faster for a lot of people makes a lot of difference. Absolutely. It also depends on where are you waiting. Like you said, you should profile it and figure this out. If your Python web app is slow because 80% of the time you're waiting on the database, well, it doesn't really matter how fast your Python code is. Your database is a problem. Or something like this, right?

Exactly. Exactly. And like, the thing is like, so let's narrow it down to, say, web applications. Like, okay, let me first talk about other stuff and then let's go to web applications. Like, where people found Piper incredibly useful is things like high-frequency trading. Like, not the very crazy high-frequency where you have to make decisions like multiple times per millisecond. But like the sort of frequency where you want to make decisions within a few milliseconds.

And then those decisions are like tens of milliseconds. Those decisions can, then you want to be able to modify your algorithms fast, which is a lot easier on Python than, say, on C++. And you're running into less problems with how to shoot yourself in the foot and segfault all your trading. So, that's when people tend to use Piper because like, in this sort of scenario, it would be like 10 times faster. So, super low latency stuff where 10 milliseconds makes a huge difference to you.

Something like that. Yeah. Okay. Another example is there's, for example, a project called MyHDL, which is the hardware emulation layer. And these tend to emit sort of low-level Python code that just do computations to emulate hardware. And then again, on Piper, it's like over 10 times faster. So, those are the very good examples. The very bad examples, as you said. If your program, if your staff is waiting on the database, then you're out of luck.

Like, no matter how fast your interpreter responds. But yeah. On the typical web server load, even if there is such a thing, it would be around two times speed up. Sometimes more, sometimes less. Depending on the setup, really. But as I said, you should really measure yourself. The things where Python is quite better, if you spend most of the time in C extensions, then it's either not helping or actually prevent you from doing so.

And the second time where it's not that great is when the program is short running. So, because it's just-in-time compilation, it means that each time you run your program, the interpreter has to look what's going on, pick things to compile to Assembler, compile them to Assembler, and that all takes time. Right. There's a little more initial startup when that happens. Yeah, the warm-up time is usually quite bad. Well, I like to think that warm-up time of PyPy is quite bad.

And then I look at Java, when it's absolutely outrageous. It's a relative statement. It's a relative term. Like, compared to CPython, PyPy time is really terrible. And compared to Luach, it's, again, the warm-up time is terrible. But compared to Java, it's not that bad. So, yeah, it really depends on your setup. And it's typically important for long-running applications. Then again, this is a typical PyPy user. When stuff like server-based applications where your programs run for a long time.

Right. You start it up and it's going to serve a million requests an hour until it gets recycled or something, yeah? Something like that. I mean, these days, even JavaScript is long-running up. Like, how long do you keep your Gmail open? For usually, for longer than a few seconds. Yeah, that's for sure. So, let's talk a little bit about the internals. Could you describe just a little bit of...

So, if I take a Python script and it's got some classes and some functions and they're calling each other and so on. What does it look like in terms of what's happening when that code runs? Okay. So, I'll maybe start from, like, how PyPy is built and then get back to your question directly. Yeah, great. So, PyPy is two things. And it has been very confusing because we've been calling them PyPy and PyPy.

And calling two things which are related but not identical the same name is absolutely terrible. We'll probably fix that at some point. But, like, PyPy is mostly two things. So, one thing is a Python interpreter. And the other thing is a part that I would call RPython, which is a language for writing interpreters. It tends to be similar to Python in a sense that it's a restricted subset of Python. But this is largely irrelevant for the architectural question.

So, you have an interpreter written in RPython that can be PyPy. We have a whole variety. There's Hippie, which is a PHP interpreter. There's a bunch of Scheme interpreters. And there's even a Prolog interpreter and a whole bunch of other interpreters written in RPython. And then... Is RPython a compiled language? Yes. And the other part is essentially the translation toolchain or a compiler for RPython.

So, it contains various things like garbage collector implementation for RPython, the data types like strings, unicodes, and all the things that RPython supports. It also contains a just-in-time compiler for RPython and for interpreters written in RPython, which is one level in direction compared to what you usually do.

So, the just-in-time compiler would be sort of generated from your RPython interpreter and not implemented directly, which is very, very important for us because Python, despite looking simple, is actually an incredibly complicated language. If you're trying to encode all the descriptor protocol or how actually functions and parameters are called, chances are you'll make a mistake.

So, if you're implementing an interpreter and a just-in-time compiler, it's very, very hard to get all the details right. So, we implement the Python semantics once in the Python interpreter, and then it gets either directly executed or compiled to assembly. So, if you're coming back to your question, if you have a Python program, first, what it does, it will compile to bytecode, and bytecode is quite high level.

There's a thing called this module, which you can just call this.this on any sort of Python object, and it will display bytecode. And the basic idea, which is what CPython does, and which is what PyPy does too at first, is to take bytecodes one by one, look what's it, and then execute it. Yeah. And is that like what's in the PyCache folders and things like that? Like those PYC files? Yeah. The PYC files are essentially a serialized version of Python bytecode.

Okay. It's just a cache to store to not have to parse Python files each time you import a giant project. Right. Okay. And so then CPython takes those instructions and executes them via an interpreter, but that's not what happens on PyPy, right? That's what happens on PyPy initially.

So, all your code will be like executed like CPython, except if you hit a magic number of like function calls or loop iterations, I think it's 1037 for loop iterations, then you compile this particular loop, in fact, this particular execution of a loop, into assembler code. Then if you have a mix of interpreter code and assembler code, and if you, the assembler code is a linear sequence of instructions that contains so-called guards.

So, the guards will be anything from if something in the Python source to is the type of this thing stays the same. Then if you happen to fail those guards, then you, okay, I failed this guard, I'm going to go and start compiling assembler again. I mean, at first you jump back to the interpreter, but if you, again, hit a magic number, you compile the assembler again from this guard.

And then you end up with like a tree of execution that resembles both your Python code and the type structure that you're passing in a few other things that are automatically determined. So, at the end of the day, you end up with a Python function or like multiple Python functions that got compiled to assembler if you warm stuff for long enough. Okay. That's, that is super interesting. I didn't expect that it would have this initial non-assembled assembler version. That's, that's very cool.

What was, do you know what the thinking around that was? Is it just better performance? So, there's a variety of things. Like, one thing is that if you try to, to compile everything like upfront, it would take you forever. But also you are, you can do some optimizations. Like, a lot of optimizations done in PyPy are sort of optimistic. Like, we're going to assume special things like sys.setTrace or sys.getFrame just does not happen.

And until it doesn't happen, things can run nicely and smoothly. But you're trying to figure out on the fly what's going on. And then you compile pieces that you know about. So, at the moment when you are compiling a Python loop or a function or something like that, you tend to know more about the, the state of execution than, that is just in the source. Like, you tend to know the types, the precise shape of objects. Like, is this an object that's class X and has two attributes A and B?

Or is it an object of class X that has three attributes A, B, and C? And those decisions can lead to better performance, essentially. So, on your website, you say that this, that PyPy may be better in terms of memory usage as well. How does that work? It's a trade-off, right? So, first of all, PyPy does consume memory memory for the compound assembler and the associated bookkeeping data. That depends on how much code you actually run.

But, the object representation of Python, of Python objects is more compact than PyPy. So, the actual amount of memory consumed by your heap tends to be smaller. Like, all PyPy objects are as memory compact as see Python objects using slots. Right, okay. So, it's the same optimization except it's transparent. Then, the, like, list of only integers would not allocate the entire objects. It would allocate only small integers.

Then, the, the objects are smaller themselves because we use a different garbage collection strategy. It's not ref counting. it's a garbage collector. Right, so, let's talk about the garbage collector just for a moment. Is it a mark and sweep garbage collector? This episode is brought to you by CodeShip. CodeShip has launched organizations, create teams, set permissions for specific team members, and improve collaboration in your continuous delivery workflow.

Maintain centralized control over your organization's projects and teams with CodeShip's new organizations plan. And, as Talk Python listeners, you can save 20% off any premium plan for the next three months. Just use the code TALKPYTHON, all caps, no spaces. Check them out at CodeShip.com and tell them thanks for supporting the show on Twitter where they're at, CodeShip. It's in, very convoluted variant of mark and sweep.

Yeah. It has two generations of objects, young objects and old objects, and old objects are mark and sweep, and young objects are pointer bump allocations. So, the net effect is that if you are having a lot of small objects that get allocated all the time and forgotten really quickly, allocation takes, like, on average, around one CPU instruction. It's, on average, one, because it takes, like, slightly more, but then you have pipelining, so sometimes it takes slightly less.

Okay, do you guys do compaction and things like that as well? No, but we do copy old objects from the young generation to the old generation. Then we don't compact the old generation, but usually more compact than your normal setup where you have lots of objects that are scattered all over the place because you only have to deal with objects that survive minor collection. Right, and that's the majority of objects that we interact with all die right away. Vast majority. Yeah, absolutely.

For the most part. Okay, yeah, that's very cool. One of the things that is not super easy in regular Python is parallelism and asynchronous programming and so on. And you guys have this thing called stackless mode. What's the story with that? It's the same thing as stackless Python. It gives you an ability to have coroutines that can be swapped out without an explicit yield keyword. So it's not like Python 3 coroutines. it's like normal coroutines when you can swap them randomly.

For example, GEvent uses I think GEvent uses stackless mode for swapping the coroutines. Okay, so you said that you can get better concurrency. Can you kind of describe speak to that any or what are your thoughts there? I personally don't use stackless all that much but the net effect is that you you can write code like with Python 3 coroutines without the yield keyword. So you just call function then you can swap the functions for other things.

It's a bit like implicit twisted where you don't get better concurrency than twisted but you're not you don't need to write your programs in the style that twisted requires. I was going to say it's just a little more automatic and you don't have to be so explicit that you're doing threading. Yeah, exactly.

Like the normal normal threads especially in Python where you have the global interpreter log they don't scale all that well and like the solution is usually twisted but twisted requires you to have all the libraries and everything written twisted aware which stackless does not generally requires. I don't have any particular feelings towards all of that to be honest. Sure. Does it also support Twisted running on PyPy? Do you know? Yeah, obviously. Twisted is a Python program.

We had from the very early days we had good contact with twisted people and people who use twisted tend to be from the same category as people who use PyPy. People who have large running code bases that are boring but have problems because they're actually huge. I mean not huge in terms of code base but huge in terms of number of requests they serve and stuff like this. So they tend to be very, very focused on how to make the stuff work both reliably and fast.

So for example like a typical answer to Python performance problems oh just rewrite pieces in C. Well that's that's all cool if you have like few small loops that you can rewrite in C and have everything fast. But like most web servers are not like this. If you look at the profile it's just flat. It's tons of dictionaries and things that are not easy to write in C. And C introduces security problems like suddenly dealing in C with untrusted data is not that much fun. No. So it's definitely not.

Or even reliability right? Yeah. So all those problems. So Twisted people tend to write like Python better than C and they've been very supportive of PyPy from the very first day. So they generally PyPy is running Twisted and it's running Twisted quite fast for quite a few years right now. Yeah that's excellent. It seems like if you have a problem that Twisted would solve you also probably want to look into PyPy. Exactly. This is like the same category of problems that you're trying to solve.

Another interesting stuff about concurrency which I guess I'm slightly more excited about is the software transactional memory that Armin Rigo is working on right now. So this is one of our fundraisers just like NumPy. Yeah so this is one of your three sort of major going forward projects if you will. Yeah those are the three publicly funded projects.

Right and if you go to PyPy.org right there on the right it says donate towards STM and you guys have quite a bit of money towards this project and so it's excellent. What is software transactional memory for the listeners? There are two ideas. First problem they're related but not identical. First problem is that Python has the global interpreter log. So global interpreter log essentially prevents you from running multiple threads on multiple cores on one machine.

So if you write Python program and you write it multi threaded it will only ever consume one CPU which is not great if you want to compute anything. So that's one problem that STM is solving and I'm going to explain just now how it's solving it. But another problem is that it's trying to provide a much better model for writing programs with threads. If you start using threads the Python mutability model makes it so hard to write correct programs.

you're essentially running into problems like suddenly okay but I have to think who modified what in what order and consider all the possible combinations. Make sure that every bit of code that's going to work with this segment of data is taking the right locks and all that kind of stuff that gets really tricky to ensure right?

yeah so essentially the model is where if you write program in C you write the program it's all fine then you switch to threading and you get performance immediately like your program if you write threads correctly it will run four times faster on four cores or whatever but it will likely crash and it will likely crash for the next couple of weeks months years whatever you throw into it because you need to get 100% correctness back so the S-team works slightly differently where you

you essentially write programs in a mode where it looks like you put a gigantic lock around everything that matters in your program so you write one event loop and you know like okay this loop will consume blocks or whatever consume some sort of data in an unordered queue and you can add to the queue in an unordered way and then you put a giant lock over like the whole processing if you write that sort of program with normal threads and normal locks it will it will be correct but it won't run

fast because everything will be giant will be inside the giant locks to be more or less serial but all the complexity in your code of doing parallelism anyway yeah so this so STM stands for software transactional memory it means it works roughly like a database where you run multiple transactions and then if you don't touch the memory from two threads at the same time then it's all cool and if you touch one of those gets aborted and reverted and you can only commit a transaction if the memory

access was right so if you think again about the model where you have one gigantic log it means it will run in parallel optimistically a few versions of the same code on different data and if they tend not to conflict if they can be run serially in a sense like they modify some global data but not in a conflicting manner then you'll get parallelism for free but if they do conflict every now and again then one of the guys gets reverted back to the start so the net effect is that it looks

like you're running stuff serially for the programmer and you get correctness for free if you write it in a way that that's naive then you won't get performance because your stuff will collide all the time but then you can use tools and look where it collides and remove those contention points and you get more and more performance which is almost the same goal but the difference is that if you have 100% performance and 99% correctness your program is still incorrect and you can't run it if you

have 100% correctness and 99% performance you're mostly good to go yeah would you rather be fast and wrong or slow and right it's sort of that you know there's a really interesting classification of those types of problems that you only see every very very rarely from the you know sort of some kind of race condition or timing threading problem and I've heard people describe those as Heisen bugs because because as you interact with a program trying to see the problem you might not be able to

observe it but if you're not looking all of a sudden boom the timing realigns and it's a problem again they're very frustrating so it's important to look at so that the usual answer for those problems in Python is just use multiple processes and using multiple processes works for a category of applications and web servers tend to be one of those because they only ever share data that's either caches or database usually that's another process anyway like Redis or it's in a database like Mongo or

SQL or something like that yeah so you don't care but like there's a whole set of problems where this is not what you have you have data that's mostly not contentious but you still have to share it and work on it you can't afford to serialize and deserialize and pass between processes and yet you want to have a correct result so this is what STM is trying to address a set of problems that can be solved by just splitting stuff into processes right maybe something very computational or scientific

where it's iterative or something would be way harder well essentially anything where you have data that mostly does not conflict and you can do it in parallel but every now and again it's a big data set that you work on and every now and again you tend to conflict like graph algorithms are a great example and you have this large complicated data structure in memory and most of the time you're walking different parts of graphs so you don't care but every now and again you'll find contention

on one graph because two parts are doing stuff on the same node and then you're like that's wrong and writing this sort of stuff using threads is really hard yeah so that has a lot of promise do you know when it might start to show up as a thing people can use is it there yet so it's already there to an extent you can download the STM demo somewhere and the STM demo works it might not scale how you want it it might not work how you want it but it should generally work and scale so the

current version only scales to like two or three cores and given that it comes at a quite hefty cost of like 1.5 to two times slower on each core it's not that useful so the next version will try to reduce the overhead of single core and improve the scalability to more cores and then we'll see how it goes it's going along quite well I would expect like I mean there are consecutive prototypes that are usable to some extent like we managed to get some performance improvements running on multiple

cores but they have 20-30% range which is just not that exciting but on the other hand they were mostly for free which is again something that you might what if I rewrite no no the point is you don't have to rewrite it's a very simple change and then you might get some performance benefit yeah that's fantastic the other one of the other projects that you have on your donation list is a major thing you guys are working on is Pi3k in PiPi what's that it's the Python 3 implementation of PiPi so

as I said before we have various interpreters in PiPi that are all implemented in our Python and one of those interpreters is a Python 2 interpreter and one of those interpreters which is less complete is Python 3 interpreter that supports like 3.2 by now so we need money to push it forward and help I guess too to push it forward to like 3.3 or 3.4 or even 3.5 to bring it more up to speed one thing that we don't do in PiPi is we don't debate the Python language choices and I think it serves us

well so for example I don't work much on the Python interpreter itself I work a lot on the R Python side of things and most of the improvements help all of the interpreters not just Python interpreter so I personally don't care if it's Python 2 or Python 3 the improvements are all the same to me right that's great then you also have a section towards general progress and the last one is NumPy what are you trying to accomplish with that sprint or whatever you call it so as I said before the NumPy

stuff is we want to reimplement the NumPy so the numeric part the operation on arrays and we have a very exciting project for summer of code that does vectorization so using SSE for NumPy and then we want to integrate more of the have a way to call more of the whole ecosystem of numeric Python so scipy mat world lib all this stuff that's outside of the scope so we want to have the core of NumPy implemented in PyPy because those things are too low level to just call external library and then we

want to have a way or multiple ways depending on the setup to call all the other ecosystem and this is essentially what those goals are here those are three ambitious and very cool goals very nice well they've been around for a couple years I think so we are working towards them and we have people working right now on all three proposals as far as I can tell yeah that's great so one thing that is related to PyPy that you've done individually is the JIT viewer can you talk about that briefly so

JIT viewer is a bit of an internal tool for visualizing assembler and the intermediate representation of your Python program so it's very useful for if you're really interested how PyPy compiles your program you can look into that so one related project that I've been working on recently quite a lot is called VMProf and VMProf is a low overhead statistical profiler for Python or for VMs in general but we're going to start with CPython and PyPy so those are tools that help developers find their

bottlenecks in the code and find how to improve performance usually because if you can understand it you can usually improve it yeah that's excellent yeah we've been talking a lot about how PyPy makes stuff faster but before you just say well we're switching to some new interpreter maybe it makes sense to think about your algorithms and where they're slow and whether or not that switch would even help it it really depends on the situation sometimes you switch without thinking about it and

sometimes it doesn't make sense and you have to think about it first it really depends on your program and what are you trying to achieve and sometimes you want to switch look improve sometimes you want to do both yeah well at minimum you probably want to measure and profile your app and try it on you definitely want to measure you definitely want to know how fast your application is running before attempting anything it felt a little faster let's do it exactly you're laughing but we've seen

people like that like my application runs faster on my local machine but not on the server okay how did you benchmark oh I looked at the loading time Chrome like developer tools that's not good enough usually that's like yes it might be slower because your network is slower I don't know what your setup is maybe the ping time is 100 milliseconds the request time is 10 milliseconds so geez it's really slow on the server right awesome all right Manja this is probably a good place to wrap up the

show this has been such an interesting conversation I'm really excited about what you are doing and you know I hope you keep going I want to make sure that people know that the source code is on bitbucket they can go to bitbucket.org slash pipi that's the main repo the main way to contact us is usually through either mailing list or irc we hang out on irc a lot it's hash pipi on free node and we're usually quite approachable when you come with problems and one interesting thing is if you find

your program running slower on their pipi than c python it's usually considered a bug unless you're using a lot of c extensions right so if people run into that maybe they should communicate with you guys and they can definitely file a bug and complain excellent two quick questions I typically ask people at the end of the show what what's your favorite editor how do you write code during the day I have heavily hacked emux actually that does all kinds of weird stuff and I'm way more proficient

with elisp than I would ever want to be actually a skill you didn't really want to earn but you you've done it anyway huh something like that yeah and then also what's a notable or interesting pypi package that you want to tell people about that's a tough one for me because I don't actually write all that much Python code that's using libraries you can't import too much into the like the core core bits right right but definitely and I mean it is self promotion but definitely cffi is something

that I would recommend people to look at as a way to call c because this is something very low that has been very successful as a simple simple simple way to call c that's cool and if I was writing some program in Python and I had some computational bits I'd written in c I could wire them together with cffi you'll be surprised how few people actually do that most of the time I have this Python program and I have this obscure c library that accesses this weird device that nobody heard about and I

need to call it somehow and that's why you call c the computational bits it's actually quite rare but that would be an option too yeah sure sure okay awesome and then finally just you said that you do some consulting do you want to maybe talk a little bit about what you do so if people want to contact you or anything like that so the website is barocksoftware.com and essentially what we do is we make your Python programs run faster like the same thing as we do in open source except on the

commercial side so typically if your open source software is running too slow just come to IRC and if your commercial software is running too slow we can definitely do a contract with you to make it run faster yeah that's awesome yeah so I'm sure people who are having trouble might be interested in checking that out so great Machia this has been super fun I've learned a lot thanks thank you Michael have a good day yeah you too this has been another episode of talk Python to me today's guest was

Machia Falkowski and this episode has been sponsored by Hired and Codeship thank you guys for supporting the show Hired wants to help you find your next big thing visit Hired.com slash Talk Python to me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $4,000 Codeship wants you to always keep shipping check them out at Codeship.com and thank them on Twitter via at Codeship don't forget the discount code for listeners it's easy

Talk Python all caps no spaces you can find the links from the show at talkpython.fm episodes show 21 and be sure to subscribe to the show open your favorite podcatcher and search for Python we should be right at the top you can also find the iTunes and direct RSS feeds in the footer of the website our theme music is developers developers developers by Corey Smith who goes by Smix you can hear the entire song on talk python.fm this is your host Michael Kennedy thanks for listening Smix take us

out of here dating with my voice there's no norm that I can feel within haven't been sleeping I've been using lots of rest I'll pass the mic back to who rocked it best I'm first developers, developers, developers, developers, developers. Developers, developers, developers, developers, developers.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android