#47 PyPy now works with way more C-extensions and parking your package safely - podcast episode cover

#47 PyPy now works with way more C-extensions and parking your package safely

Oct 12, 201717 minEp. 47
--:--
--:--
Listen in podcast apps:

Episode description

Topics covered in this episode:
See the full show notes for this episode on the website at pythonbytes.fm/47

Transcript

Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 47, recorded October 11th, 2017. I'm Michael Kennedy. And I'm Brian Okken. And we've got a bunch of cool stuff lined up for you. So, hey, Brian, how's it going? It's going really good. Yeah, yeah, great. Hey, before we get to your first item, I want to say thanks to DigitalOcean. They've sponsored a bunch of episodes coming up. They're really supporting the show.

And the thing they want me to tell you about is Spaces, which is like Amazon S3, but like literally three times better and you get a two-month trial. So check it out at do.co slash Python. And we'll talk more about that later. How about Fast? Fast Python, Brian. What do you think? I'm excited. So PyPy is fast implementation. And it's good to see that there's still work coming out.

And one of the exciting bits of news just recently is version 5.9, at least on the PyPy 2.7 version of this release, has Pandas and NumPy in it as well, which is super exciting. That's actually a really big deal because they had not been supported. That's one of the things that was a challenge with PyPy. Like it was great. It was much faster. In many ways, it was like five times faster than regular CPython. However, it didn't support any of the C extensions.

You couldn't integrate things like NumPy and stuff. And so it was like you get a subset of Python that's super fast, but there might be things you don't want to do. And oh, by the way, a lot of those are computational and where people care about when it's fast. Yeah. So it's awesome to see that coming on. So getting NumPy and Pandas come on, and I'm sure that eventually it'll come on the 3.5 branch as well. Yeah, for sure. And you also have notes about Cython as well, right?

Yeah. So it includes the part of the help with this, and what it includes is Cython 0.27.1, which supports a lot more Cython projects on PyPy. I'm not sure what the Cython story was before this release, but that's pretty exciting. Yeah, that's cool. Yeah, I think the biggest news here is that CFFI has been updated and the C API extensions for many, many projects now work with PyPy, whereas previously they did not. And so it's not just Pandas and NumPy. Those are the headline ones.

But there's a bunch of things that previously couldn't work with PyPy because of the C extensions. Well, guess what? Now they can. That's pretty awesome. Yeah. And then another bit of news with this release is the optimized JSON parser for both memory and speed, which should help for people trying to pull in JSON. So that's good. Yeah, that's awesome. I think people use JSON every now and then. Not really sure.

All the microservices, it's just like the network lights are above those JSON messages. So that's really cool, and that's all pretty straightforward. I want to show you some stuff that is not straightforward. So there's this project on GitHub that has really taken off. There's a ton of people contributing to it. So let me pull up the main page and see. There's 17 contributors who are doing a lot of work on this project, and it has about 3,600 stars called WTF Python.

So if you've heard of, have you seen the Watt video about JavaScript and Ruby, which is hilarious? You know, Python is lucky in that there's not that many weird edge cases, but this repository will show you, actually, there's some weird cases. So have you seen this, Brian? No, I haven't. This is pretty funny. Yeah, I pulled out four items, but there's a bunch, and this is super active on GitHub. I'm getting all these notifications from it. That's cool. Like, one is about skipping lines.

You say, like, value equals 11. Value equals 32. What is value? It's 11. Huh? What is going on here? There's another one that's similar in the same section. It says, quote E, equal, equal, quote E, false. Okay. And things like that. And it's about encoding and some interesting stuff. So each one of these has, like, a really simple, you know, like, three or four lines of code and then the explanation. And the explanation, I think, is where this gets interesting.

So another one is modifying dictionaries. Like, these are super good ways to trick people. Like, create a dictionary with one item. Go through for each item in it. Delete that item and add a new one. And then print that out. How many times did that loop run, do you think? I have no idea. It's either one or error or something is what I would guess, right? But the answer is eight. Exactly eight. You're like, what? Why does it run eight? Why doesn't it run one, infinite, or zero, or error?

Like, those are the three. Zero, one, or infinity. Eight doesn't make any sense. But if you look at the implementation, the dictionaries are pre-allocated because you're typically adding stuff. They want to grow in, like, a doubling sort of way. Not a every time you add something, it's got to reallocate and copy around things. And so what they do is they pre-allocate a certain number of items. And this trick, like, leverages assigning into those new slots until it runs out. So this is crazy.

I'll give you one more example. Is, let's go with the is. Is is not what it is. So if you say A equals 256, B equals 256, A is B is true. However, if you say A is 257 and B is 257, A is B is false. Do you know why? It's another crazy one. This is insane. And the reason is, I believe the first 126 numbers, maybe negative as well, I'm not sure, are pre-allocated for performance reasons.

And every time you, like, literally say the number seven, like, that points to this pre-allocated flywheel pattern type thing. But beyond that, these get allocated on demand. So you're basically asking, is the pointer to 257 equal to the other pointer 257? And there's no longer this tracking between them and they get dropped. So there's just, there's tons of this craziness going on here. That's pretty fun. Yeah, that's nice. So I think this is a fun project.

I really commend the people working on it. It's great. And I definitely, I want to do something with this later. I just haven't figured out quite what the details are yet, but there's got to be something fun here. So this makes me feel like I should go practice my Python. Like, maybe I'm not as good as I thought I was because that dictionary thing going eight times kind of like took me for a loop for a bit. Anything in the WTF Python would be evil to try to bring up at a job interview.

But it'd be very evil. Yeah. But if they answered it, think of that. Yeah, that'd be good. I ran across this, it's a recent article called Python Exercises. And I've done this before. So as a trying to either brush up on Python skills or trying to do, find some questions to ask at an interview or something, trying to come up with some decent questions. And a lot of the questions out there are, they seem to be sort of generic questions around like any language.

And they just happen to be do it in Python. This is a collection of questions that are, some of them are pretty easy to start off with, like basic syntax stuff. But they're some things that check actually just Python and some use of the standard library. And I think it's a nice collection. It goes through syntax, of course, and then some text processing and OS integration and decorators, generators. And you can get into quite a few things. But I think it's a nice set. It's not too huge.

It's a good one to look at. Yeah, yeah. And they don't seem too trivial. They're like, given this set of data, parse it into a CSV file, start the subprocess, things like that. It's really, it's pretty nice, actually. Yeah. And then at the end, the last thing they talk about is testing, which I very much appreciate. I think it's important to make sure.

I've started with trying to do, send out code examples to, before I bring somebody in for an interview, ask them to solve some coding problem, but also to write a test to prove it works. And I think that's a good thing to add. Absolutely. Yeah, that's really cool. Great that they include that at the end as well. So I've got another thing you should test for. Before I tell you about it, though, I want to tell you about Spaces.

So Spaces is DigitalOcean's new service, which lets you basically store files on the internet and either privately or publicly pass them around, right? So kind of like Amazon S3, but much, much more affordable. So instead of charging you nine cents per gigabyte, they charge you one cent. And you can use exactly the same tools. So, you know, like I use Transmit for my Mac. I love that to manage all my stuff in the cloud.

And when I switched to DigitalOcean Spaces, which I did just because I saw the offer, I'm like, this is so much better before we even talked about this. I just pointed my Transmit at that and it just kept on working. Just said, hey, there's an S3 thing over here and here's the key. So if you are using S3 or some other sort of shared cloud storage for files and things like that, you definitely should check out DigitalOcean Spaces at do.co slash Python and check it out.

There's a two month free trial and then it's really, really affordable and straightforward. I love it. Nice. The audio you're listening to right now came straight out of there. So beautiful. Have you heard of Pickle? Oh, yeah. Not the gherkins, but the built in a way to serialize stuff. I don't remember why, but I try to avoid it because I've heard there's problems. Yeah. There's two major problems with Pickle. One of them is it stores a binary representation of your objects.

And so if you do things like rename a field or maybe even reorder stuff, right? If you add a field, remove a field, there's all sorts of stuff where like just the versioning of your classes or your data, if that changes, you can no longer properly serialize these things. It's not great. So that can be a problem. And that's probably reason enough to use JSON or some other format.

However, right in the documentation, it says, warning, the Pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. All right. So I think people see this like, okay, that looks bad. Let's get out of here. And they just bail as they should. Like, I think even the versioning stuff alone is already an issue. So like, I think there was an issue with somebody caching stuff.

And when they were switching from Python 2 to Python 3, the in-memory representation of like date time or some part of the memory was a different representation and the Pickle and stuff started to conflict with each other. Anyway, this article I want to talk about is called Exploiting Misuse of Python's Pickle. So if you've ever read that warning and gone, huh, that sounds bad. I can kind of imagine what that might look like. I'm going to stay away from it.

This one shows you exactly how to do bad things. And bad things begin with, let's create a remote shell and start executing code. And maybe even let us log in remotely over SSH to this machine by sending a little bit of binary data, like 50 bytes, 100 bytes, something super small, over to this machine. And then we'll just log in and go from there. That sounds bad, right?

Yeah. Jeez. So the idea is when you unpickle something, there's a way, there's a few hooks where you can run arbitrary Python code. And so they say, well, let's just use subprocess.popen and create a shell for us. So you just put that command in like your dunder reduce, I think it's called. And then you've got shells and that's bad. So for those of you out there wondering, what is this warning about? Exactly. Why should I be super scared? Here's why. Great little example. Super approachable.

Yeah. Wacky. Yeah. Wacky. So if I was running like a Django website, I probably wouldn't want to like use that as my exchange format on my services, right? No. And there's so many other better formats anyway. So. JSON, JSON. JSON. Yeah. For sure. All right. So what do you got next for us? I've got a complete beginner's guide to Django. Awesome. This is a seven part series and it looks like six parts are done already. And the seventh part is coming up soon.

And it's, it kind of goes through quite a bit of Django. I know there's already a lot of Django tutorials out there, but the interesting thing I think that makes this one stand out is it's kind of, it has an academic feel to it, I think. And if that's kind of your thing, you might like this. Well, it has a chalkboard. It has a beaker and it has a Superman flying. So these are all good signs. Yeah. Well, it has some like comic like drawings in it too and stuff.

Yeah. Yeah. Yeah. Actually, I think this is really nice. The graphics are wonderful. They've got little, wireframes to help you design the web pieces, some nice graphics for file structure. It seems super approachable to me. I kind of got lost with some of the UML diagrams and whatnot, but, it's well written. People should check it out if you're want to learn Django. So maybe. Yep. Absolutely. And it's based on Python, not legacy Python. So this is all good as well.

Yeah. So if you're looking to, pick up Django, that's a good place to do it. All right. So do you remember when we talked about the malicious packages being uploaded? Yes. PyPI? Yeah. Do you remember what they were targeting? Like how were they making those, getting people to install them? Well, there were a couple of ways. There were naming standard library things in PyPI and then also misspellings. Exactly. So we have a new GitHub project called PyPI dash Parker.

So this is a cool project by a guy named Matt. And he sent this over and said, Hey, you should check this out. I don't think a lot of people know about it yet, but it's, it's really cool. So the idea is, you know, we had this debate about how do people check and how people verify what gets uploaded to PyPI. Should there be like a committee that reviews it? And all that sounded really bad.

And so he's created this library that says, look, the self-serve ability of people to just upload things to PyPI. This is a good thing. Let's not get rid of it. Let's just try to solve this typo squatting problem. So what he's done is he's created this thing called the PyPI Parker and it's an extension to dist utils. So it's a separate command that you can run on it. So if I was like Kenneth writes and I create a request, you do this and I could run the setup PY and give it, I think it's park.

And it will actually generate additional packages that I can upload to PyPI. And there'll be the various reasonable misspellings of requests. And when you import them, it'll raise an error, an import error and says, no, no, no. This thing that you pip installed, you misspelled that. Go get the real one over here. So it gives them like a help message and all that kind of stuff. So it one blocks the ownership or provide, it gives the ownership of these misspellings to the original package owner.

And then for the people trying to accidentally use those, it will give them the warning to say, you've misspelled this, but here's what you actually should be looking for. I think that's great. Yeah. That's cool. Yeah. So well done, Matt. If you're a package owner, check this out. It might be helpful. Since I'm not writing so much anymore, I'm thinking about writing a couple new open source projects. So I'll probably be in that boat soon.

Yeah. Nice. So you should use PyPI Parker and then give us a report. Okay. Awesome. That's our six items for the week. So hopefully everyone enjoyed them. Brian, what else is going on? Well, I'm just getting ready for Halloween actually. So. I know. Houses around here getting scary. A lot of creatures and various cobwebs. But I have not been as busy as you have lately. What have you been up to?

I have just released a brand new course and you can find it at freemongodbcourse.com and that should give you pretty much all you need to know about it. So I have this paid course, which is like a seven hour, super in-depth thing. And I wanted to come up with a way for people to get started with Python, get started with MongoDB. And then if you want to learn more, you can like take the paid course or things like that. So just drop over at freemongodbcourse.com and sign up.

There's really no strings attached. You just have to create an account and then you can go take the class. Oh, another thing I wanted to point out, this is maybe not worth a whole item. And this is not my thing. This is just something I saw is Donald Stuffed, who runs PyPI and the website and all that kind of stuff. He sent out a tweet that said, Python 3 usage has doubled in the past year according to download stats on PyPI. Oh, that's cool.

Yeah. So legacy Python is definitely on the downward trend, even though it's still the majority of things that get downloaded. Yeah. So way to go, Donald, for putting that out there and nice to see that trend continuing. All right. Well, thank you everyone for listening. Brian, thanks for finding these things and sharing with everyone. Yeah. Thank you. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S.

And get the full show notes at Pythonbytes.fm. If you have a news item you want featured, just visit Pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Transcript source: Provided by creator in RSS feed: download file