#165 Ranges as dictionary keys - oh my! | Python Bytes podcast

00:00

Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 165, recorded January 16th, 2020. I'm Michael Kennedy. And I'm Brian Okken. And this episode is brought to you by DigitalOcean. They're a great support of the show. Check them out at pythonbytes.fm/digitalocean. Get $100 credit for new users. More on that later. Brian, we've got a lot of stuff to get through and I want to just, let's start iterating through it, man.

00:27

Okay, let's iterate through it. Also, I can't believe that it's halfway through January, whatever. Okay, so first off, let's talk about iterators, iterators, generators, and coroutines. So I'm linking to an article, that's pretty much what it's called, by Mark McDonald. And when I googled this relationship between coroutines and generators, apparently everybody else knows

00:50

this is a thing, but I missed out somehow. But this article is a really good introduction to all of this concept and how they all work together. So it starts, well, okay, I've got to start out with a beef. It starts out with like talking, trying to do a gentle introduction to the iterator protocol with like the dunder iter and dunder next. I just want people to stop doing that. Okay, muscle through it, but skip that part. It should be an

01:16

appendix, I think, because people don't do that anymore. Okay, next, it goes talks about generators, which are the same thing as this iterator protocol, sort of, but using the yield function. I know there's differences, but this is how I do it. I use yields for generators. It's so beautiful, because you take the code that's not generator style, and then you just throw in yield instead of like list append, or set.add, or whatever you're going to do to gather up the results. Just

01:44

replace that with yield, boom, you're done. It's usually less code. I love it. It's great. I'm a big fan. Like, for instance, you just do throw things into a loop and put yield in there, or yield the things you have, whatever it works. Unbound generators, it talks about, which means don't convert these to lists because they don't stop. So there are, it is possible to write a for loop that doesn't stop,

02:05

and therefore there's a way to do a generator that doesn't stop. So. Right, if you're working on infinite series, some kind of series that you use a generator for it, it might not stop. Yeah, there's, I mean, there's legitimate reasons to do this, or maybe it does have an end, but it doesn't fit in memory and stuff like that. So beware. Generator expressions, you know, for some reason I just forget about. They're like list comprehensions, but you put parentheses instead

02:30

of brackets, and then it's a generator expression? They're smooth, right? I mean, they don't have those sharp edges of those square braces. Smooth? Oh, wow. That was bad. Okay. The reason why I highlighted this article really isn't for this stuff so far. It's a couple things. It talks about the generator generators can use other generators or nesting generators with a yield from, and this is cool. I didn't know this was a thing so that you can have, so let's say bar and baz are, are generators.

03:00

You can define a new function foo that yields from each of these, and it just goes through one. And then when it's exhausted, it goes through the other really slick. Did you know this was a thing? Yeah, this was added after the yield keyboard was added. So yield was there for a while. And then what you would have to do before if you wanted one of these, you'd have to write a for loop that goes through every item in the sub generator and then just yield that out. But now you can just

03:27

say yield from that thing. It's been a few versions that it came in. I can't remember exactly when, but yeah, it's a bit of a new feature. Maybe three, five, maybe three, four. I can't remember, but yeah, this is great. The place that I've used this most is recursive generators, right? You're writing a generator and it's going through some data structure, but then you get to the point where you're like, well, I need to call it again, but with a different node in a tree or something like

03:51

that. Instead of having to loop over that deal, you just say yield from basically the recursive call. Oh, yield from with a recursive call. Nice. That hurts my head thinking. Yeah, man. Think about, you know how painful it was to learn recursion and how funky it is to learn about generators. You like mash them together and then the brain explodes. Yeah, it's great.

04:11

Okay. The article goes on and talks about the relationship between coroutines and generators because yield usually just has, it's just a thing that it ends up being returning a value out of your function, but you can equal your, to an assignment, a variable assignment from a yield. And that's one of the syntax things that works with coroutines. And I got to admit, I got lost at this point. So this is

04:37

kind of a call to action to everybody. I'd really like to have a coroutine tutorial that could show me how to use coroutines for stuff that I really actually might use that isn't async related. And can we skip the iterator protocol or make it an appendix? Like you said. Yeah. Do you use coroutines? I mean, they look neat. I just don't know how to use them. I use generators all the time and I use async methods, which ultimately are fancy wrappers around coroutines, but I don't use coroutines

05:07

coroutines directly. Not knowingly anyway. Okay. Cool. Yeah. I'll have to play with it a little bit. Yeah. Nice. Something that I use a lot is requests. You probably use requests a lot as well. Yeah. Lots of people do. Yeah. And requests is one of these things, you know, last time you spoke about PyPI stats, was it pypistats.org or something like that. And requests was certainly right near the top.

05:29

It was not number one on the list of things being used, but it was near the top and which that means it's, it can't take too much change, right? There can't be too many features or changes made to it. So it would be nice to have something that makes working with requests nicer that can change more quickly. So

05:45

there's this thing that I came across called request tool belt. Yeah. So request tool belt is a, well, tool belt of useful classes and functions to use, to like make working with requests easier. And it really does at the moment for things. But I think if people are out there and they're like, I always have to do this with requests, it's like these five lines. I got to make sure I remember to do this right. It would be awesome to just, you know, extend this. So this is a small project by

06:11

someone I can't remember. I don't think it says like really a meaningful name on it. Yeah. No, it's just under requests. Actually, this is not the small project I think, you know, but I think it would be cool to like take those ideas. If you see patterns that you're doing with the request library and

06:24

fold them in here. So let me give you the rundown on the four things it does. First of all, if you're going to do multi-port form data encoding, like I have an image file and I want to upload it to the server, to the API, that's annoying, right? It's not, not super easy, but with this thing, it's really easy to go and just basically say, here's a file stream that is field two. It's, you know, whatever

06:50

it is, right? It's binary image data or it's text. And then you just say, here's my data, this multi-part form encoder and boom, it's just uploading files and doing all the stuff it has to do. That's incredible. Just a few lines of code. Yeah. It's really, really nice. And you don't have to think about like, how do I do multi-part encoding again? Just give it a file stream. You're good. The next one is the user agent constructor. So you have to set a header user to ask agent,

07:14

but then like, how do you construct that in a meaningful way? There's a class that takes, or a method, I think it's just a method, takes some arguments and it will generate the string that is a, I guess, compliant user agent for like your API app or whatever. So that's cool. User agent constructor. Sometimes you have to, when you're working with other systems, conform to certain SSL protocols, right? We have TLS version one, 1.2. We have two, I think coming along,

07:43

but there's different versions of TLS, which is the foundation of SSL, right? So they have an SSL adapter that lets you explicitly set, I want to use TLS 1.2 or 1.0 or something like that if you need to. Oh, wow. Okay. That's cool. And then one thing that you can do with requests is you can create a

08:00

session and then it'll start talking over it. It probably reuses the connection. I'm not entirely sure of all the things it does, but one of the things the session does is it'll remember cookies and things like that. Well, maybe you want to make a series of requests using a request session that doesn't actually carry the cookies from time from request one to two to three and so on. So one of the classes

08:23

in here is a forgetful cookie jar. So if you, if you set the request session cookies container to the forgetful cookie jar, it will, well, it implements the protocol, but it always forgets its cookies, obviously. So it's a cool way to like clear out still use sessions, but clear out cookie persistence across calls. Is there a reason to use sessions without cookies? Well, some websites behave differently if they think they've already seen you or things like that, right? Yeah. Like maybe I

08:52

want to test the login function, both working and not working. And then I want to try it of, I forgot my password, but it, I don't want it to know that I've already actually logged in and that sequence or something like it could be some like series that you're testing for playing with. Okay. So like,

09:08

if you got a login, your session login is still valid, but you have to go. Yeah. Yeah. Or maybe you're going to a place like some sort of paywalled ad place and it's like, well, you can come here three times, but if you come here more than three times this month, we're going to show you the paywall. You know what I mean? You're like, well, you're using cookies for that. And my cookie jar is forgetful.

09:27

I don't know. There's, I don't personally have a reason for it, but I can imagine reasons that that might, people might use that for like automation and whatnot. I predict that we will hear other people telling us the reasons now. Yeah, absolutely. They definitely might. So people can visit Pythonbytes.fm slash 165 and down at the bottom, they can tell us why, why they're doing some cool comment section. Yeah. All right. Speaking of cool, let me tell you about DigitalOcean. They're doing all

09:51

sorts of good stuff. They're offering a hundred dollars credit for new users. So it was 50, it's back to a hundred. Yay. That's great. And we, all of our infrastructure and stuff runs on DigitalOcean and it's been just perfect for years. So that's great. One of the things they recently released is

10:07

memory heavy workload droplets. So memory focused droplets. So you can get up to eight gigs of RAM for each dedicated CPU and it goes from two CPUs all the way up to, is that 32, 256 gigs of RAM available on your VM, which is kind of ridiculous if you really need that, but you know, maybe you've got a workload that does. So it's really good for high memory apps, like a high performance SQL or no SQL

10:33

databases and memory caches like Redis, maybe some data analysis of lots of data, stuff like that. So check them out at pythonbytes.fm/DigitalOcean, get a hundred dollars credit from new users and support the show. Speaking of data science, what do you got, Brian? What's next? Yeah. Speaking of data science, pandas is used by lots of folks, not just data science, but I know the data analysis people use

10:54

pandas quite a bit. And in episode 162, you weren't with us for that, but we covered a project called Bulwark. Yeah. I listened into that episode as well and you and Ollie did a great job. That was fun. And we had a listener suggestion about another package called pandas validation. And then I was

11:13

just looking around to see if there's other projects. One of the others I've found was Pandera. So I'll try to briefly talk about these, but pandas validation, Lance tells us that it lets you create a template for your data frame, how it should look, and then it validates your entire data frame against a template.

11:33

So if you have a data frame with the first column being string and second column being dates and then an address, and you can use a mixture of built-in validate types to ensure that your data conforms to that. So that looks pretty cool. Yeah, this is really nice. It's a little bit like tiny bit like JSON schema or something. So you've got these pandas data frames or time series that it's just full of

11:54

whatever. And then you can throw on top of it a cool validation and just it's all at once against the whole collection, right? Yeah. And then Pandera is, I think, a similar sort of project that lets you set up types and properties for different columns of a data frame and perform validation to make sure sort of a schema validation sort of thing also. So they're all kind of solving a similar problem, but I was looking at it and the API and how you use it between Bulwark, Pandas validation,

12:23

Pandera are all very different. Yeah, they are. I'd really like to hear if there is a common approach or if Pandas validation, data frame validation is just not something that's catching on yet or, you know, what people are using. I'd love to hear that. Yeah. And I just noticed at the bottom of Pandera, they have other data validation libraries and others, Panda specific ones like opulent Pandas

12:45

and Panda schema and Pandas validator and table enforcer and so on. So apparently this is like a whole hole you can go down into that I was not even aware of, but I got to say the Pandera API where you basically define a column, a data type, and then a Lambda function that you give it that does the validation. That's super cool. I love that. Yeah. It looks pretty clean. Yeah. It looks incredibly flexible without getting like out of control. Yeah. Speaking of out of control, you know, it's a

13:11

little bit out of control. GUIs for Python. Yeah. And this way, I don't mean it. I'm not actually this time complaining about their absence or something like that. But one of the best libraries for building GUIs

13:27

in Python has got to be Qt, right? Yeah. And I was inspired at the Python meetup that you're running out in West Portland when we saw Augie Moore give a presentation and how he used FBS, Fman build system, plus PyInstaller, plus Qt to build, you know, nice packaged apps that are GUI apps that he could distribute around. And that was really cool. So one of the things though that drives me crazy is like we've got PyQt 5, we've got PySide 2, PyQt 4, we have PySide, we have Python 4 Qt. We have all

14:02

these different, different things, right? I think Python 4 Qt might be the next version of PyQt 5 and so on. And I just don't know where to start, right? And look at it, that's going, oh my goodness. Like you see different examples doing different things. And so I ran across something called QtPy. QtPy. QtPy. Yeah. Or QtPy.

14:22

I wanted to say QtPy, but I don't know. QtPy. QtPy. So QtPy, actually, you know, one of the things about a lot of these libraries is they're like really cool little proof of concepts, but in practice, how real are they? How supported are they? And so on. One thing that seems real and supported is Anaconda, the Anaconda distribution. And with that comes the spider IDE, like the whole Anaconda Continuum data science IDE thing, right? And this QtPy is the foundation of what they're doing to

14:58

write that. Okay. At least it's in their GitHub repo. So it provides a uniform layer to support all those different libraries that I complained about with a single uniform API. So it's like an adaptive layer on top of all those things. And it figures out what version you're actually running against. And then it just adapts. So you write a code once and then you can run it in all these different ways or against these different examples. Yeah, it's nice. Yeah, it's cool, right? So this is

15:22

created by the spider development team. And there's not a whole lot to it. Basically, it's like, well, here's a simple simpler way to work with these different libraries, because maybe you want a different license or you want to go from PyQt4 to PyQt5 or something like that. Because there's all these different examples built with all these different libraries, and they're not exactly compatible. So quite cool, I think. Yeah. And also during the presentation at the meetup,

15:46

Is it Ogie? Yeah. Mentioned that just he uses that. And then if there's a problem with one of these packages, just uninstall it and install one of the other ones. And you don't have to change your code at all. Yeah, it's cool. It just works. One of the other things I thought was neat is at the bottom of the readme, they've got sponsors like, you know, different sponsors at the bottom and become a sponsor. I have not seen an open source project do that before. It's an interesting idea.

16:13

Yeah, it is definitely an interesting idea. I haven't seen that either. Yeah. So maybe I'll try that on my little open source project. Well, they also have the GitHub sponsor at the top. Are you using the GitHub sponsor? No. That's something people can turn on. I think that's really cool that GitHub did that, that people can now sponsor projects like through GitHub instead of negotiating some deals separately with everyone. Yeah. I wonder if they're tied together. Oh, I did look into it. Anyway.

16:39

Yeah. Check it out. All right. So yeah, what's next? Well, I want to shed some light on spreadsheets. They can be a dark place if you get sucked down into VBA or too far down there. Yeah. So actually we got an email from Victor Kiss. I think it's Victor Kiss. K-I-S. He said he's got his, his very first open source project, but it looks darn cool. It's called PyLite XL and it's an XLS spreadsheet thing that you can read and write spreadsheets with it.

17:11

So it's a lightweight, zero dependency, minimal functionality, read writer has other than the standard library. There's no outside dependencies and you can read and write modern XLSX and XLSM files. And with a very simple interface for getting access to the different sheets inside there and rows and columns and stuff. Actually, it's pretty, looks pretty cool. Yeah. It looks totally useful if all you got to do is like get in there and get the data.

17:40

I don't know if it like does things like lets you change say like conditional formatting or other weirdness, but it definitely, if you just get, want to open up an Excel worksheet, not a CSV, but like a full on XLS and get at the data or the rows or whatever it is you're after. It's quite neat. If you go to the link that you're linking to and just scroll down a bit, there's a little animated GIF and I think it tells you pretty much all you need to know.

18:03

You just watch it for a second and it's like, here's the few steps to go work with this Excel file. It's cool. He's already got documentation up with the API, but I found the most on the docs, the best way also to get up to speed really quick is to look at his, he's got a handful of examples for how to do different things. And it's like, oh my gosh, I could just, if I needed to get read Excel from Python, I could get started in a few minutes with this. So yeah, it looks pretty cool.

18:31

Yep. And no dependencies. That's kind of nice as well. So I never really thought about why that would be important, but he lists one of the reasons is that if you are going to a few things, if you're going to compile it into another installer or something using pi installer, not having any DLL or other dependencies makes this easier. And then he even says that he's a, the library is just like a few source files.

18:56

So if you don't even want to install this as a package, if you just want to copy this stuff into your own source, that that's an option. Right. Yeah. Just vendor it. And then you, you don't have dependencies either. Yeah. You know, getting updates, but you know. Yeah. Wow. It's a trade-off. I'm going to tell you about this other thing. And at first it might not sound very exciting, but I'm actually pretty excited about it. I think this is, this is quite cool. It's a clever little library.

19:22

And this suggestion comes to us from Aiden Price. And he told us about some project he's working on using something called Python dash ranges. Okay. Okay. So we have range, like the built-in range. You can say, you know, start equals whatever, end equals whatever. And it goes from the start integer wise up to, but not including the upper bound. But you can't use that range in like more meaningful ways.

19:48

So for example, if I had a range of zero to a hundred, I can't easily ask, is X in there, right? If X is a number or if I have two ranges and I want to intersect them, how do I do that? But this library takes that kind of basic idea, sort of like series, but with a lot of set operations, you can ask for the intersection of ranges. You can ask for whether or not they're mutually exclusive, things like that. So all the set operations you can do on it, but then it also extends that.

20:18

So you can have a range set, which is a bunch of different ranges or even a range dictionary. So like, why would you care about that? So what you could do with a range dictionary is you can use ranges as keys. So if the example they give is if you have... That's crazy. I know, but here's the example they give. It's probably abusing the concept of a dictionary, but it's really useful.

20:41

So if you had an if statement that said, if they use tax or something like that, let's just say, if your income is zero to 10,000, you're in bracket A. If you're in 10,001 to 20,000, you're in bracket B and so on. And you had like a huge if, else if, else if, else if, else if to test for that condition. You can create a range dictionary where the key is a range zero to 10,000, 10,001 to 20,000 and so on. And then like some information about it is the value.

21:09

And then you could just take a number like 37,215 and get it from the dictionary. Say, I want to get that from the dictionary and it'll return. So it'll basically do the test, like is this item in this range as part of the key match of a dictionary? That's brilliant. That's cool. Isn't that cool? Isn't that cool? It's got to be abusing the idea of the dictionary, really. But it's pretty cool. Yeah. Yeah. So it's almost like a switch statement in a sense.

21:37

Like you could take those things and those that if, else, and replace it with this just flat statement of these ranges, and then it'll do the comparison kind of in the data structure. Yeah. Sweet. So there's a bunch of stuff that you can do with these ideas. They got some good examples, but that little example I gave you, I think is probably the simplest one to tell you about because it gives you a good sense of like why you might actually use this, right?

22:00

Like a lot of times you look for these blocks or these ranges and it's really cool to figure out, to be able to sort of test in here. Like you could even do really interesting stuff. Like I want to know is this thing in any of these five ranges, you could just create one of these range sets or these range dictionaries and just ask, is this number in this set? If it is, it's in one of the five ranges that are in there. There's really like cool ways to layer these together.

22:25

Yeah. And especially if you've got that all over the place. Like for instance, I'm thinking hardware stuff. Yeah. It's got to be in there. There's a bunch of numbers and frequencies and whatnot, right? Right. So if I've got different power levels, for instance, they'll have different attenuators that'll kick in at different power levels, but I don't want those power level numbers to be hard coded all over my code.

22:44

So having some central place where I put those in place so that I can just throw in a number and it gets based on that. I know what the attenuation is or something. That'd be great. It's cool. Yeah. That's cool. It also, it occurs to me, this might be useful for testing, right? Because then your assert statement could have a little bit of ambiguity, right? If there's like, well, as long as it's in this range, it's okay. But if it's not, then it's not.

23:08

And so maybe that's also an interesting way to simplify testing. Yeah. Yeah. Okay. Cool. Cool. Well, anyway, I think that's a much more interesting project than it just sort of sounds like. It's like, well, Python has range built in, whatever. But no, this is cool. Yeah. Yeah, that's it for our main item. So what else do you want to tell folks about? Well, I spent some time last night.

23:26

I think I brought this up, I don't know, last time or the time before, that I have a few open source projects, not many, but one of them was lacking some work because it had a bunch of support requests or whatever you call them, issues. So pytestCheck, I went in last night, I went and cleaned all those up and solved a couple of minor problems. But one of the things that I ran into that was interesting and I don't, I mean, I just kind of wanted to highlight it, is plugin for pytest.

23:55

There are other plugins for pytest. Some of them don't work together very well because of all the way they abuse and use pytest. I'm definitely abusing pytest hook functions with pytestCheck. intentionally what it does is it allows you to check certain things within your test, but not fail right away so that you can continue on. And then if any of the checks fail, it actually fails the entire test and tells you all of the failures. It fails them at the end, not as it hits the first one, right?

24:26

Yes. But to get away with that, the only way I could figure out is to hook into the report function, which happens much later after the test completes. Well, so there's a whole bunch of other plugins that allow you to rerun tests if they fail. There's rerun failures, there's flaky, there's retry, and there's a handful of others. Most of them are not compatible with pytestCheck because of the way, at the time that they're checking to see if something fails and the time I'm checking.

24:55

So I guess I just want to point out that if you want that to happen, rerun failures works, flaky and retry don't. Nice. Oh, that's really cool. I wonder if you could like monkey patch flaky or retry to like force it to check later or something like that. Maybe. But also I actually commented in the defect report that it doesn't work with flaky. And I said, well, I think it should try.

25:17

And I had a comment from somebody that said, you're just going to kill yourself off if you think that you're going to try to make it compatible with all the plugins out there. So as long as there is a workaround, it's fine to say, if you need this to work with something like this, use this other plugin and not my problem. Yeah. It seems cold, but open source is a side project. So yeah, absolutely. Cool. Well, I've got a couple of short ones here.

25:42

Jeremy Schendel sent in just a quick message that Pandas is now 1.0. It had been living on the zero ver branch for a long time, but it has migrated over to semantic versioning. It has a couple of new cool features. So we're already speaking about Pandas earlier. If you're using Pandas or whatever, you know, hey, Pandas 1.0 is out. That's a big deal. Probably also means a lot for the stability of the API. Yeah, it's good.

26:05

For the PyCharm fans out there, myself included, friend of the show, Anthony Shaw has created a PyCharm plugin called Python Security. So we'll link to that. And basically what it does is it goes through, much like when you're working with PyCharm, it automatically tells you, oh, you're doing a type mismatch. You're passing an int and it expects a string, or you're calling this function and it takes two arguments, but you're giving it three. It does all that checking in real time.

26:31

This one is for security. So it checks for unsafe loading the YAML files, remote code execution in Flask, man in the middle with requests or HTTPX, and debug configs in like Flask and Django stuff. So that's kind of cool. You want that? Get that and install it. Nice. Yeah. And then finally, I have my Python for Decision Makers course that sort of talks about whether or not you should and how to position adopting Python at your organization. So I did a webcast on that and that's already passed.

27:00

That went really well. But the recording of it is out. So I'll link to the recording if people want to sign up. You got to like register for the thing, but then you just watch the recording. Oh, I'll have to check that out. Nice. Yeah. Yeah. It was fun. A lot of fun. A lot of good conversations there. All right. I don't know about this joke, but I'm going to do it anyway. You ready? Yeah. You've heard about optimists and pessimists and a glass, right?

27:21

A glass is either half full or half empty, depending on which side of that divide you land on, right? Yep. Well, there's a third angle here. And for the engineer, you don't see the glass is half full or half empty. No, the glass is twice as large as it needs to be. Exactly. It's all about capacity planning. Come on. Yeah. Okay. So I don't have a joke, but I came up with a little bit of a brain teaser this morning. Okay. Nice. Let's have it. Yeah. When is 90 greater than 100?

27:52

When is 90 greater than 100? Yeah. Well, there's a couple places. One, which I was informed on Twitter, is when you're comparing a string literals. True. Yeah. Yeah. If you're going to say quote 90 less than quote 100, it's false. Yeah. Okay. The other one is microwave times. So 100. Nice. Anyway. Very good. That's it. All right. Well, you've left people with something to think about. And yeah, thanks for being here. Thanks. Bye. Yeah. Bye. Thank you for listening to Python Bytes.

28:23

Follow the show on Twitter via at Python Bytes. That's Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Transcript source: Provided by creator in RSS feed: download file

#165 Ranges as dictionary keys - oh my!

Episode description

Transcript