#59 Instagram disregards Python's GC (again) - podcast episode cover

#59 Instagram disregards Python's GC (again)

Jan 05, 201826 minEp. 59
--:--
--:--
Listen in podcast apps:

Episode description

Topics covered in this episode:
See the full show notes for this episode on the website at pythonbytes.fm/59

Transcript

Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 59, recorded January 4th, 2018. I'm Michael Kennedy. And I'm Brian Okken. And we got a bunch of awesome stuff lined up for you in this very first episode of 2018. So let's say thank you and happy new year to DigitalOcean. Yeah, thanks. And definitely happy new year. It's exciting to be back.

It's very exciting to be back. And we, you know, the Python news doesn't stop coming. I think if anything, it's just picking up speed. I'm afraid we might scare people a little bit with some of your picks this time, Brian. What? The stuff near the end. The stuff near the end. So yeah. Okay. Another thing that's kind of scary is turning off garbage collection. Seems like that might be bad, right? Right. Well, I was actually surprised and very interested when I was listening to

the Instagram talk at PyCon about turning off garbage collection. And there's an article that they put out again. They said that they've, they had turned it off last year and then they wanted to sort of, they were having memory problems. So they wanted to try to turn it back on a little bit, but they still have concerns. Yeah. So maybe we should take a moment, just a step back and say, you described the original thing.

So why did they start down this path of turning off garbage collection in the first place? What they found was they were running many instances of their, the largest Django deployment on Python in the world. So they're running lots of servers with us. And they found that the shared memory across multiple processes running that on a single server was completely falling apart because garbage

collection was shifting stuff around. They said, well, could we turn it off? And it turned out that they could, but they then this article you're referring to says they basically were losing those gains again. And we'd talked about this, I guess, a couple of times of if you turn it off, then you can eventually will run out. But if you're restarting tasks every once in a while, that completely cleans it up. Yeah, exactly.

They were losing some of those gains, but they wanted, so they wanted to get some of those back. This is a really interesting, and I had to read it, read this article about three times, but it's called copy on write friendly Python garbage collection. And it's a pretty interesting story, but the end punchline is that they've got a new addition to Python that's going to go into Python 3.7, or it's already in there that is called GC freeze, which what happens is they get

their main stuff running with all the shared objects. But before they like fork off a bunch of threads, they call this GC freeze and all the stuff that's in memory right now at this point doesn't get garbage collected, but everything from now, from like this point in time on will be garbage collected, which is pretty interesting. Yeah, that's really, it's really interesting. So Python memory management is a little,

I think it's a little obscure. People don't talk about it very much. And I don't think there's a lot of good write ups. You actually found a really fantastic write up on the intricate details of Python memory management. The short version is most things are cleaned up through reference counting. So number

of things pointing at it, when that goes to zero, it goes away. But the problem with reference counting is cycles, I have one object appointed another that object points back at the first, they both have a count of one or higher forever, and they get leaked. And so there's this secondary garbage collection phase that goes through and looks at these items, cleans them up, and so on. So this GC freeze says, let's take all the stuff that exists now, and just tell the garbage collector to ignore it,

don't touch it, don't mess with it, leave it alone, right? And so you get like, basically your app into its like normal working state, and then freeze it one time. And then all the new stuff that would make the memory grow and grow and grow over time is going to be continually GCed. But the core essence of your app, Python runtime, and a bunch of things to get started should be kind of fixed, right? Yeah. And I think that's a pretty cool idea, because that's a common model for applications to

get connections up and get your normal like sitting state, idle state running. And then before you get requests in and, and spawning stuff, just at that point, you're like, well, this is all the shared stuff. Let's just, we don't need to move this stuff around. It's always going to be there. Anyway, it's a cool idea. And apparently it saved them. They were at linear, linear memory growth, and they slowed that down quite a bit.

Yeah, it looks really, really interesting. Instagram is doing amazing stuff, I think, in the Python space and the web space. And if any of those guys are out there listening, they want to come talk about Python and Instagram on Talk Python. I'm more than welcome to any other more than welcome to come over. It'd be fun. And I definitely appreciate that they're very open about this to say, hey, this is what we're trying. It's not like perfect yet, but it's better.

Yeah, it's super cool. Do you know if GC freeze is approved or just proposed for 3.7? So we have a link to the, the pull request that looks like it's already in. Oh, it is merged. Yes, it is merged. So this is pretty awesome, right? We have CPython on GitHub with a pull request merged in with its comment history. That was that's new, right? That's the 2017 bit of magic that it's on GitHub. Yeah, cool. So nice that we can actually track that. So the next thing that I want to talk about

is a little bit different. I think this will be mostly of interest for data science folks. This is a little bit lower level maybe than it sounds, but this thing's called SpeechPy. So SpeechPy, it's a library for speech processing and recognition. So this is a pretty interesting Python project. You can come along and basically give it some, you know, spoken words and it can pull out various effects and things that are sort of the essence of what you need to do speech recognition.

I think this works a little, you don't just feed it like here's a say a wave file and out pops text of what it said, but it gives you what you would need to feed to a machine learning system. Basically takes the spoken words into a representation. You can feed to some kind of algorithm to actually get the text. So I think that was pretty cool. And one of the things that I wanted to bring this up

for is they have a really nice citation statement. So if you look at the GitHub repo, like kind of near the top, it says, if you're going to use this package, please cite it as follows. And that's interesting because there's been some talk in the scientific space, more true science, not data science around people want to publish their software and they want to work on advancing software. But in the academic space, you have to publish articles or, you know, the whole publish or

perish type of thing. And the way you get credit for your work is to be cited. in other articles. And so this is sort of showing a way to cite this work, which is not a paper, but which is an open source project in the same sense that the person, the people who created it might get the same level of academic credit for their thing being cited. So I think that's pretty cool. Yeah. I don't get the syntax, but it must mean something. I have no idea what it is.

Okay. I thought it's kind of neat. If you're doing machine learning, you need to turn waveforms into something you can process. This is pretty cool. And the other thing that's kind of nice is if you look at it here, and I think it's in the documentation or the tutorial, they actually show you how to process wave files from SciPy, which is also maybe cool and handy at some point. Yeah. It's actually something I need to be doing some wave file processing. Well, SciPy apparently has it.

Nice. How about the next one? Well, next up, we've got our friends at PyBytes. Is that what they're called? PyBytes. Yeah. PyBytes. That's right. They've got a new platform and I suddenly forgot the URL, but there it is. Code challenges, but the ES is after the dot. So code challenge.es. No, clever though. But we've covered other things before. I should have looked this up. There's a game one that you're going through a game and doing

code challenges and there's code katas around. This is a similar sort of thing. So you are able to do these little code challenges and they say, it's called bytes of Python, bytes of pie and are, they're self-contained 20 to 60 minute code challenges. And you can write them and verify them in the browser. And I had, I did two of them this morning and I had kind of a lot of fun with it. It was fun. Nice. And you verify them by writing pytest unit test, right?

You don't write it. It has pre-written pytest code that checks your answers. I see. So you've got to do some sort of thing and then you check it in and it runs basically the test against your code and says thumbs up, thumbs down. Yeah. Like for instance, on the second challenge, you have to write three different functions to manipulate a list of names. And it has tests for all of these. I went ahead and just solved one at a

time, for instance. So I tried to solve the first one and then ran the tests and noticed that the first one passed and, and then just did that. And looking at the, with the help of the test output was helped me solve the rest of them. That's really cool. And I also learned something by the transitor property through you. You did? I did. I learned what you learned in that min takes a key like sort and sorted does. That way you could sort some complex object based on like a attribute of it.

I didn't know that. I had just discovered that this morning. So my solution for one of the challenges is to try to find the, find the name with the shortest first name. And I went ahead and sorted the list by the length of the first name and then just pick the first element. Their solution uses min instead of sorting the list. You can just find the min length, which is pretty cool. Yeah. That's really awesome. That's, that's gotta be quicker than a full on sort.

One of the things I like about these sorts of quick challenges is you can probably do them like on your lunch break or a couple of lunch breaks to do one of them. And they just take a browser. So you could just do it on your laptop. It's pretty fun. Yep. That's cool. You could maybe even do it on an iPad or something if you really wanted. Yeah. Well, I don't know. I haven't tried that probably. If it runs in the browser, I bet it would. Nice. So yeah, that's, that's really cool.

I do like that you learn these little things like, wait, min takes a key. I didn't know that. You know, that's just, you wouldn't think you'd pick up these little things so quickly, but you know, these little challenges are nice like that. So before we get to the next item, I want to say thank you to DigitalOcean. They're sponsoring this episode and many,

many other episodes. They're really a big supporter of Python bytes. So as many of you know, many of our bits of code or stuff on the web and our files or MP3 files that get sent down to you all go through DigitalOcean. So Python bytes is basically delivered in all of its forms to you through DigitalOcean, have a bunch of servers there. They're super easy to work with very quick, very reliable. You can create a

new server, a new droplet, they call it in probably 30 seconds. And then you SSH in and you're off to the races. So really, really nice and affordable and check them out at do.co slash Python and let them know that you heard about it on Python bytes. So this end of the year thing, Brian, this is kind of when, I mean, we're sort of on the other side of it, but this is when you get together with your family,

right? People, maybe you didn't even know like, wait, I have a second cousin from wire. Python's like that, right? Yeah. Yeah. You were talking about like, what is the place where you can like do sort of gamified code challenges and that's check IO. So the reason that's relevant, I'm coming back to it, is there's an article by the guys at check IO called how big is the Python family? So this is really nice.

And you know, some of you I'm sure are aware of it, but many people I don't really think are aware of how varied Python is as it sort of as a platform. So when you say Python, typically you mean CPython, hopefully you mean modern Python three, six, not legacy to seven Python, but we'll, we'll let that slide for now. There's also things like Jython and Jython will let you write Python code, but executed on the JVM and interact with Java objects. Iron Python is the same thing for.net.

There's also Python for.net, which I think is a more up to date, modern variant on the same thing. There's Cython, which is compiled slightly different Python. There's PyPy, which is a JIT version. MicroPython, which is Python as an, your app is an operating system and Python on microchips basically. And on Talk Python, you and I talked about Grumpy, right? Yeah. Which is on Go. Yeah. So Grumpy is from the YouTube guys, which is instead of using C to implement CPython,

they said, well, what if we wrote the same thing, but in Go? And that's kind of an interesting thing. So I thought this is just a nice grouping of all of these ideas, a quick paragraph or two on each of them. You know, if you bring in people onto your team and you're like, well, wait a minute, there's actually a lot of types of Python here, check this out. Right. And also maybe a reminder to like, give PyPy a try. Like they just had a big release for both Python 2 and Python 3 versions.

One of the things I like about this writeup that they did is it reminds you why some of these are around. Like if you had to work with .NET, then working with like Iron Python or Python.NET might be like a better thing than just trying to do it other ways. Yeah. And one of the advantages there might be, you know, if you're working on a .NET app, but you want to add scripting. Yeah. Like what are your choices? You probably don't want to give them C#. And even if you did,

like it requires full on compilation and like, you know, how do you deal with that? Right. So this could be a really nice way to plug in like scriptability into your enterprise app, which would be pretty cool. And one more thing I wanted to throw in on this conversation is a lot of times I'll say Python runtime. And I know often people say Python interpreter. This is what the Python interpreter does. It does this and that. Well, if you look at how the whole Python family, only some of them

are interpreters. Some of them are compiled execution engines, right? Like the JVM. That's actually not a great example, but say PyPy, for example, or Cython, those two definitely are not interpreted. And in the traditional sense, PyPy starts out that way, but it converts to a JIT version for the hotspots. I often say Python runtime because I kind of feel like, you know, when you say interpreter, you really

just got the mindset of CPython, which is the most popular, but not always. What do you say? Say interpreter? I don't usually say either. I just say Python. Yeah, there you go. Cool. So anyway, I think this is a nice write up and good to have it all in one place. So I like the one that you have coming up next. One of the problems I often see is I want to do some work, but I don't care if it happens right now. I just want to like start it and let it go

somewhere. I don't usually have a great answer for that. Task processing stuff. And one of the common things is often people bring up is celery. And to be honest, I've tried to get into celery a couple of times, but kind of the learning curve on it, maybe it's just me, but I had, I had a little bit of trouble getting into it. I was interested when I heard an interview on podcast.init about a library called dramatic or dramatic. I'm not sure. It's D R A M A T I Q. Yeah.

But it's a very, I'm sure since it's task scheduling, it's a quite complicated internals. I'm sure you just like declare an actor for on some code and it's pretty easy to get started. I thought I'd point people to it. Yeah, it's quite cool. You basically put a decorator onto a method and then that method, instead of running locally, you can like send work to it. And that send work actually kicks it off on the example they had was rabbit MQ, I think. And that there's like a producer of the work.

And then there's another process that just hangs out and consumes anything that lands on the queue. It's pretty cool. Yeah. So that you can configure like what your defaults to rabbit MQ, I think. And there's just good defaults that work off right off the, just if you don't care. And then there's a, you can configure it to use other things if you need to. It apparently is, well, the, the person

and during, I forget his name that developed this it's used on quite significant projects. I mean, it isn't a toy project, but it's pretty easy to get started and you can configure it to be all sorts of fancy stuff if you need it to be. But one of the things I liked about the conversation is he, he brought up that he intentionally kept the documentation and the fairly terse and small so that when you're looking for something that you think you saw before, it's pretty easy to find

again. So that's cool. Okay. Yeah. That's an interesting point. Yeah. And it looks like you can run it on top of rabbit MQ or Redis. Take your pick. One final thing I want to point out that I thought was interesting is it's licensed under a GPL, but it also has commercial licenses available upon request, which, you know, people are always looking for ways to fund basically fund their open source work. And I thought that was an interesting variation that I saw going through it.

Really? Okay. So I didn't pay attention to that. So I'm not sure what the a GPL is. Yeah. I'd actually don't know either, but apparently you might want a commercial license if instead. Okay. So the last one I want to talk about is a little bit similar to what you're talking about running async work, but it's sort of the challenge of taking advantage of async things, but not making that a problem for people trying to consume it who don't want to think of things that way. So this

article is called controlling Python async creep from friend of the show, Kristen Medina. And he says, basically, if you've got some library that is written in an async way, you're supposed to await it. But anybody who's going to call that and take advantage of that, that caller has to also be async. And then the caller that has to be async. So maybe way, way down somewhere, you're trying to do something

async and it creates this sort of chain reaction of, well, the callers of this have to be async. Well, the caller of those things have to be async and so on. It becomes, it can become quite a problem. So he wrote this nice article, basically going through three examples of where you can sort of put a stopgap and say, okay, like at this level, we're no longer worried about async, but we're still

taking advantages of it internally. So one way you can do that is you can wait for blocks of async code. So if you got to contact, you know, a database, two web services, read something from the file system, you want to do that sort of asynchronously, you could create those pieces of work, but then wait on them as a group. And there's some built-in ways in async.io how to do that, which is really cool.

It's got some nice examples on that. So you could just use a thread and then let that thread's main bit of work be the async thing, but you don't have to deal with it. And the most interesting, I think, is mixing async and synchronous calls. And what he does is he actually detects by looking at the traceback, I think, detects whether the caller is calling it as an async function or as a regular function and implements an async behavior or a synchronous behavior the same. So you could write

a single library. And if somebody in Python 3.6 wants to use it in a fancy async way, it becomes magically async. But if somebody from 2.7 calls it or something like that, an older version, or they just don't call it in this async way. It just magically is a synchronous call and doesn't use that whole stuff. Okay. This is really an interesting way to make it possible to bring async into your package or your libraries

without having the consumer of your libraries have to care about the fact that it's async. But still make it into something they could take advantage of. Wow, that's great. I'm going to have to read this. This reminds me of the, I guess, the learning hurdle that people go through in the C++, C and C++ world when you go from single-threaded applications to multi-threaded applications. You have to look in all the corners. Yeah. It's definitely a mind shift. Yeah. This is very much like that. Okay.

But yeah, Christian did a great job on this. And I really like his solution at the end. And actually, he has it done in if statements. I feel like you could create a decorator that would basically wrap that up in just like a magic, like a syncable or a waitable decorator. It's really, really close to having some sort of decorator magic, making this even better. Yeah. Okay. Cool. All right. Well, that's all our news for the week, except for that it's not. Well, yeah.

We have an extra one. Really quick, I just want to let people know that the Pi Tennessee conference in Nashville is coming up in almost a month from now. So if you are in the Nashville area or willing to travel there, February 10th and 11th, they've got their schedule out, the tickets are on sale and things like that. And they even made a special discount code for Python bytes. If we said, are you going to tell us about it? Then definitely here's the,

here's the code. So if you want to go to a Pi Tennessee, you can use the discount code Python bytes, no spaces, capital P, capital B, and you get 10% off. Cool. Yeah. Very cool. You have some pretty interesting news. It's not directly Python related, but it's very much affects all of us. Yeah. Right. Codes on server, especially in the cloud. I thought I'd, I don't know what to do about this, but I saw it this morning. I thought we just,

it's important enough to not ignore it. So I thought I'd drop a link. What do you think? Like unplug all of the internet, just go hide in a corner or something like that? It's like one of those things like having the credit services get hacked. You just, I guess, be aware of it and pay attention. It's very much like the Experian. What was that credit service? Equifax maybe? Equifax. I'm not going to say it because I don't want to say the wrong one, but the E credit agency,

I totally, for some reason forgetting, I think you're right. But yeah, like basically you're told your world is crashing down. We're sorry. And this is kind of like that. Let me read from what you quote a couple of articles. Let me read what they said in the New York times here. It said that basically there's two problems called Meltdown and Spectre could allow hackers to steal the entire memory contents of computers, including mobile devices, personal computers, and servers running

in so-called cloud computer networks. There's no easy fix for Spectre, which could require a redesign of the processors, according to researchers. As for Meltdown, the software patch needed to fix the issue could slow down computers by as much as 30%. So, you know, your AWS, DigitalOcean, whatever, server may just get 30% slower now. Wonderful.

Yeah. So, most of the places, I think Google, Amazon, and Microsoft have all said that the servers are pretty much changed to deal with Meltdown, but Spectre's still a problem. I don't think there's a ton of concrete details here, at least not that I ran across. It's sort of vague. Apparently, not all the details about the exploit are out, but I'd recommend people check out Risky.biz,

which is my favorite developer security podcast. It's super, super good. And those guys are going to definitely have an insightful conversation on this next time they're on deck. In case we were too vague about it, it was a design flaw found in all microprocessors that allow attackers to read the entire memory of a computer. Yeah. Bummer. I hope you don't do anything on the internet. Carry on now. Okay. So, yeah. So, the last thing,

this is a more positive thing. I think of it, at least. I just announced all my courses, not all of them, actually, only a few of them for 2018, but I announced this new deal that I'm having for all the Talk Python courses called the Everything Bundle. So, talkpython.fm slash everything. And it gets you, what'll be probably 120 hours of Python course awesomeness, including some new ones, Mastering PyCharm, Python 3, an Illustrated Tour, Introduction to Ansible,

and tons more coming. So, I was just finishing some of the videos for the PyCharm course right before we chatted. So, it's almost done. Cool. So, is that going to be out this month then or soon? That is going to be out probably next week. Okay. Cool. Yeah. Definitely soon. Definitely soon. It's so fun to create these courses and just, you know, keep exploring the different areas and helping

people get better with them. So, lots of fun. Yeah. And you do things like working with companies if they want to, like, get access to these for, like, everybody that works there or a handful of people. I definitely have special programs for, like, site licenses, things like that. I've even talked to some universities about having the courses for, like, all of their students or something like that. That would be wild. Still talking. You'll have to increase the price for them, I guess. Maybe.

I guess. But they're students, you know. Cool. All right. Cool. Well, Brian, thanks for sharing all your news. Yeah. Thank you. Nice to be back together after the whole holiday time off. Yes. All right. Catch you later. Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Transcript source: Provided by creator in RSS feed: download file
#59 Instagram disregards Python's GC (again) | Python Bytes podcast - Listen or read transcript on Metacast