#198: Catching up with the Anaconda distribution

00:00

It's time to catch up with the Anaconda crew and see what's new in the Anaconda distribution. This edition of Python was created to solve some of the stickier problems around deployment, especially in the data science space. Their usage gives them deep insight into how Python is being used in the enterprise space as well. And that turns out to be a very interesting part of the conversation. Join me and Peter Wang, CTO at Anaconda Inc., on this episode of Talk Python

00:22

to Me, number 198, recorded January 16th, 2019. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python. This episode is sponsored by Linode and Rollbar. Please check out what they're offering during

01:02

their segments. It really helps support the show. Peter, welcome to Talk Python. Thank you very much. I'm very happy to be here. I'm happy to have you here. It's been a while since we've talked about Anaconda. I had Travis Oliphant on the show way back when, but it seems like it's time for a catch up on what you all have been up to. Yeah, well, there's been a lot going on. It's definitely, one of the employees that's commented that every six months, it feels like a different company. And we do,

01:25

yeah, the space is evolving very quickly. We're trying to just keep up with it. So you would say this data science thing is not a fad. It's probably going to be around for a while? At this point, I think I'm going to go on a limb and say it's probably going to be around for a little while. Right on. All right, before we get into all that though, let's start with your story. How did you get into programming in Python?

01:40

I actually got into programming when I was a young kid and I've been always programming. I've actually been programming for almost as long as I've been speaking English. I got a PC when I first came here to the United States, so I was very lucky. But I actually majored in physics and out of college, I started going to computer programming as a profession. And I did a bunch of C++, but I discovered this thing called Python on Slashdot.

02:03

And I think they announced the version 152. And I was like, fine, I'll go take a look at it. And I started playing with it and I just fell in love. And so my day job was like getting beat up by C++ templates and out of compliance compilers. And at night, I just hack on Python. So finally, after a few years of this, I ended up moving to Austin. I got a job doing Python

02:23

as my day job, which was awesome. In like 2004, I started at Enthought. And I did a lot of work in the scientific community and doing consulting with Python because I knew the science given my math and science background in physics. But I also knew the software principles and software engineering. So it was a really fantastic time. And that's basically the long and short of it. Yeah, that sounds like a great fit. You know, things just came together, right? You have this math and science

02:44

background and you love Python. You found this job and it all just, like all of those things came together to really put you in the right place. They really did. I feel very, very blessed in that way. Now, it was a lot of hard work too, but I got very comfortable. And, you know, there's this great quote from Bruce Lee that you must never, like not, you must never get comfortable, but there will be plateaus and you can't stay there.

03:04

And so I think towards the end of the 20, the aughts, the 2000s, around 2010, I was starting to see big data happening. And I started realizing that Python was getting used for business data analysis more than just science and engineering. And that our little cozy scipy community could actually be something much bigger. And so I started doing some exploration, exploratory work. I really wanted to

03:26

do like D3 for Python. You had a few of the little things I wanted to scratch, some few other itches. And so I started Continuum with Travis in order to address some of the technical gaps that we had in the community and the technology stack. And then also to really push a narrative in the technology market that yes, Python is good for business use. Yes, it's production ready. Yes, you should use it. And it can handle big data just fine. And so we really started pushing that narrative in 2012,

03:53

you know, created num focus, created py data, did all these things. And I think that the results have spoken for themselves. I definitely think they are that they have. That's great. In 2012, I do think there was a little bit more of a debate of, well, is it safe to use Python for our business critical stuff? But I feel like that battle has been really solidly won, especially on the data science front,

04:15

right? There was debates about R, maybe R was the space to be. That's not really where it's at anymore, is it? No, there was definitely a period of language war sort of stuff going on early on. It's odd, like, you know, even then, the discussion about is data science a fad? Is it a fad term? Isn't it just business intelligence? Or is this just that big data hype cycle all over again? You know, there's a lot of

04:36

doubters and haters on that term. But as I've talked to more users and managers and stuff, at businesses, it's clear that they're thinking about data analysis and data analytics in a very different way than they have for like decades. And data science is definitely, definitely here to stay because of that. Absolutely, absolutely. So maybe give people a sense of what you do day to day so they know where you're coming from.

04:58

Well, my day to day consists of my former role as CTO. I run the community innovation and open source group here at Anaconda. I actually don't run the product engineering teams. And I work with everyone. But my general role is working with the community, helping the various community oriented and open source devs that we have champion their projects and work better with the broader community. I also do a lot of industry facing technical marketing and evangelism. So a lot of customers

05:25

will have me go and speak at internal data science events they do, things like that. There's actually remarkably few people in the Python world that really speak to industry on behalf of Python itself, relative to the usage of it. I mean, you'll find no shortage of industry analysts talking about how great Java is, or how great these like big data projects are, you know, all these like PR type

05:44

things. There's no one doing that for Python. And so that is actually some of my day job. And beyond that, it's just trying to keep up with all the things that are happening in data science, machine learning, data engineering, data visualization, AI, all of it. On top of the advocacy role, it's a pretty much full time learning thing, right? Because there's so much change, right? There's so much in every area. I mean, there's all the cloud stuff too. There's edge learning,

06:09

there's data privacy, you name it. Every single area that touches data science is undergoing massive change right now. That's super exciting, but it's also a bit of a challenge. And I think the Anaconda distribution does help some with that. Before we get into the distribution story, though, let's just talk about Anaconda Inc. So when I had Travis on the show a couple years ago, it was Continuum that was the

06:32

company and Anaconda was the distribution. But now those are not different anymore, right? It's just Anaconda, the company and the distribution. We renamed ourselves really out of pragmatism, because we would go to places and we'd introduce ourselves as Continuum Analytics. And they're like, oh, yes, you guys, like you got some Python stuff. We see that here. Like, who are you guys? And then we say, oh, well, we make Anaconda. And they're like, oh,

06:56

I love Anaconda. I use Anaconda all the time and blah, blah, blah. And so we sort of like, after that started happening to us all the time, we sort of figured like, well, maybe we should just call ourselves Anaconda. And, you know, one of the things that held that up was for a long time, as we were growing the company and growing the distribution, we were afraid that changing the company name would actually spook the community.

07:18

And it's a really, it's been one of these interesting things. Like I have, I have lots to say about open source. Let's just put it that way. But it's very hard to play the game of open source, honestly, and not still get beat up with FUD about it. And so even though we've open sourced our build tools, we've open sourced the recipes, we open source everything from the very beginning, there are still people in the community who distrust us because we're a company trying to make a

07:40

sustainable, build sustainable funding for this open source effort. So it's a really, that was one of the reasons we actually were reticent to do that name change until finally just became a no brainer that we basically had to. Yeah. If people keep mistaking you for Anaconda Inc, maybe just say, fine, that is our name. Yeah. And we'll just deal with the haters, you know, on a one-off basis, I guess. I don't know.

08:00

Yeah, exactly. I mean, it's not unprecedented, right? 37 Signals, who made Basecamp and, you know, sort of founded Ruby on Rails, they eventually renamed themselves just to Basecamp. They're like, yep, the one major project, fine, we're just called that, right? I guess it's like Microsoft reading themselves Windows, which they're probably very happy they didn't. But, you know, in a lot of senses, that makes sense. That's cool. Okay, so there's a broad spectrum

08:25

of folks who listen to the show. Many of them will have experience with data science. Many of them will know what the Anaconda distribution is. But maybe just, you know, for the folks who are new or have been working somewhere else, tell them, what is this distribution? How is it different than the standard CPython? And why did you guys make it?

08:43

I'll try to sum this up for a technical, but not data science necessarily audience, right? The basic gist of it is that Anaconda arose out of a failure in the Python ecosystem to address the packaging needs for the numerical and computationally like heavyweight packages that are in Python. And so for the same

09:03

reason that Linux distributions exist, very few people build Linux from scratch. For actually exactly the same technical reasons, we built the Anaconda distribution, because it's actually really, really hard to correctly build all of the underlying components that you need for doing productive data

09:18

science and machine learning. And so the reason it's distribution is because all of the libraries you build and the packages, the modules with extension modules that you load up, they need to be compiled together, they need to be compiled in a compatible way. And so you need to agree on compiler definitions,

09:35

you need to agree on code generation targets, optimization levels, things like that. And if you only ever use pure Python packages, so packages whose code only consists of PY files, then you basically never run into a problem. It's only when you start having extension libraries, things that depend on maybe system libraries, God forbid you try to cross platforms between Linux and Mac and Windows across

09:59

architectures between ARM and x86, you're completely hosed. And so we, in service to the scientific Python community, we built this distribution that was a set of packages and a way of building packages that are compatible with each other. So that's what the Anaconda distribution is. It's a bulk distribution with about a

10:16

couple hundred pre made libraries. And we have a package updater in it called Conda that lets you then install thousands more that are built by us and built by a large open community that also uses the same standards. So that's what Conda and Anaconda are in a nutshell. And it's really one of these like packaging war kind of things or packaging, the confusion of Python packaging. We actually tried to approach

10:42

Guido back in the day to help define some standards around this. And he basically gave us a very helpful guidance, which is maybe your packaging needs are so exotic, you need to build your own system. So we took him at his word and we did it. And consequently, when people use Conda, in a lot of cases, things just work. There's still like corner cases and a lot of like little rough spots, especially in terms of pip interop.

11:04

But we're very proud of the work we've done so far. And it's used in production every day by big, big companies that people rely on Python for their production workloads. So that's basically Anaconda and Conda in a nutshell. Okay, well, that's a really good summary. Yeah, when I think of it, the main value is that you get pre compiled binary versions of the packages that would otherwise have to be compiled from source when you pip install them, right?

11:29

Yes. And the other part is the cross package compatibility, because somebody makes one package, and they have an interest in making them as best they can or whatever, but they don't really care about integrating and testing against all the other open source projects that you may pull into your project that they don't even care or know about, right? So this sort of bigger picture compatibility that you look at is pretty cool as well.

11:55

It's actually become quite critical. And I think this is one of the areas that the Python community, in the confounding haze of packaging, and half built packaging solutions, that we've not really been good at giving guidance to the user community about is that if all you ever need to do is build one package for yourself, and you fully control the deployment environment, and the development

12:14

environment, then maybe you can go and do that, right? But if you actually have to work on a team with other people, like for example, on web developers, a lot of times, they control the server, they choose the packages they bring, and they write the code, and they can just push it out to their server. And they're good, right?

12:30

Yeah, and they're good to go. And they can you can do any number of things that you want to, you know, what I would what I would liken it to is if you ever do, if you build your own wheel, if you build your own native extensions, it's like getting plastic powder or plastic pellets, and making your own mold

12:44

mold of Legos or Lego like things and pouring your own little pieces. And so as long as you're the one that controls what they have to plug into, and you're the one that controls all the molds, then you don't need any standard definitions of studs or holes or lengths or anything like that, you're good to go. But if you ever want to work with other people who have their own molds and their own places and

13:03

studs, they want to put these things on, you've got to come up with a standard definition. And so what Anaconda is essentially, it's like a Lego system, we've standardized what the studs are and what the holes are. So lots of people can build different kinds of Legos, and they all can plug together. And that's kind of the long and the short of it.

13:18

Yeah, very interesting. So some other things that are in play there are you talked about Conda and installing the packages that you built, right, the couple hundred or whatever that come with the distribution. But then you also said installing the others through this thing called Conda Forge. What's Conda Forge? Well, Conda Forge is a community of people who I would say out of a masochistic charity to the

13:42

community. They take on the job of maintaining build scripts and recipes that take upstream software and make it so it's actually buildable in a reproducible way and that it works with other things. So it's a community of package builders and they have several hundred contributors and they've built thousands of packages. We ourselves build about a thousand, although only 200 are built into the big Anaconda installer download. But the Conda Forge community goes even beyond that and

14:09

builds several thousand. And that's what Conda Forge is. Yeah. Interesting. So people are like, you know, it's really painful to build this package, but only one of us should ever suffer and feel that once. And we'll do that on behalf of the community. I'll take that on for this one package. Yeah, basically. I mean, you know, the real challenge is it's one of those things in life where it's almost worse that it's easy to do a bad job. I don't know that we have a term for this

14:32

in English. Maybe there's a long German word for it. But it's like the same thing with the coding principles of like, if something is broken, you want it to break loudly and fail loudly, right? You don't want it to make a half effort. Sometimes it kind of works sometimes. And so with, but building package is the same thing. Most people can kind of get a build working for most things, but does it work well? Will they ever be able to do it again? Like it doesn't work with anything else.

14:57

None of those things, you know, it takes a lot of work to make a good package build. So, well, that speaks to the reproducibility side of things. And I know in data science and scientists using data science tools, that reproducibility is a super important aspect. And I guess the first step is I can run the software, which means I can build the packages and install them. Right. And that is really what we think that providing pre-built binaries and then having

15:22

good provenance of the build system itself. That's really some of the only ways you can really honestly, like not kidding yourself, have reproducibility. I think some people think that Docker somehow saves them, but it really doesn't. So it's kind of a struggle right now, honestly, because there's so many moving pieces. There's a lot of confusion in that space, but I do.

15:42

Yes, I do agree with you that Conda packages used properly can absolutely be a great way to ensure reproducibility for data science. Yeah. Well, it's probably better than saying, well, if you want to install this package, you're going to need to have the Visual Studio 2008 compiler set up correctly on your machine in 2025 or whatever, right? When it's no longer compatible with the Windows or who knows what, right?

16:05

Yeah. We're going to have to, like, one of the reasons I think that our team, the Conda and Anaconda team are happy to move away from Python 2 is because the dependency on that compiler. Someday when we finally put Python 2 to rest, I'm probably going to try to eBay a bunch of, like, boxes of those CDs just so they can break them out of, you know, sort of like a cleansing bonfire or something. I don't know. Maybe you shouldn't burn CDs. That's bad, actually.

16:28

Yeah, but you could have some sort of ceremony with them for sure. Yeah. I think the new Python 3.7, it uses MSBuild. Is that right? You know, I'm not sure on the details of that, but I think that there have been significant improvements. And, you know, the Python folks who work at Microsoft have worked really hard to improve the compiler situation there for Python. I think it's much better now with Python 3 and in

16:53

the later releases of Windows. It's just we have, you know, very old Python, very old Windows that still are deployed that we have to keep those users going. So that's where almost all the pain is. I can imagine. Yeah, I just had Steve Dower from Microsoft on the show, and he's in charge of the installer and stuff there. And he's doing some really, really cool stuff to make it more accessible on Windows. And it's easy to go to conferences and forget how important Windows actually is,

17:19

right? You look around, it looks like everyone has a Mac. There's a few people running Linux. That's pretty much what you see at the conferences, right? But that's not what the actual consumption out in the world is, is it? No, that's not at all reflective of the of even the United States. And then you go to the broader

17:35

world. It's a lot of Windows. It's a lot of Windows, a lot of Linux, too. But yeah, I think this is one of the structural problems that faces the open source community is that when you're small, it's easy to do product management, because it's like you and your buddies. But once you get bigger, you have to actually

17:52

intentionally go and try to pull in information from your users. And I think that's the Python, that's actually, I think, a structural challenge for the Python community at this point in time. When we're talking about Conda Forge and things like that, something I had not heard of before, but I saw that you're running is something called BioConda. Now, it sounds like it might have to do with biology and data science around biology, but that's all I can discern from it. Tell us about that.

18:16

That's new to me. So BioConda is actually not one of our projects. And oh, I should have said this earlier with Conda Forge. BioConda, Conda Forge, and various other sort of groups, they use our Anaconda Cloud package hosting infrastructure to support their community. Because with the Conda package installer, it's easy to give it a namespace flag, basically a channel name, and then it will go and download packages only from that

18:40

channel on Anaconda Cloud. So these represent, Conda Forge and BioConda represent different communities that are using the Conda packaging tool, but they may have set slightly different standards or included certain other standards in their build system protocols and standards. So all these packages work together. So yes, BioConda is for the biology, genomics sort of community. Yeah. They have very specialized, well, specialized is maybe a euphemism, but there's a lot of specialized

19:05

software needs in the biology community. It's very R-centric. There's a lot of, depending on what you're doing in that domain, there's a lot of PERL sometimes. So... Yeah, interesting. We'll leave that there. Are there other ones? Is there like a ChemConda or things like that? No. So there's actually... Yeah. So I think Bio... I'm going to kick myself later, probably, as I forget some. But there are major research disciplines and communities that do use

19:29

Conda quite a bit. So I think the astronomy research community has taken on Python and embraced Python a lot. They use Conda as a way to get nightly builds and dev builds and just really get easy deployments, right, of their complex software. One of the things that Conda does well, I should have said this earlier, it's not just a Python packaging tool. It's a sort of a userland software packaging tool. So we package up R, Perl, Python, C, C++, Fortran, Java, Scala,

19:54

Ruby, Node, you name it. We really are almost like a portable userland RPM kind of thing. And so that allows for these communities that have a lot of scientific engineering code written in not Python, sometimes not even C or C++. We can package all those things up together, move these collections of packages around. Yeah, that's pretty interesting. That takes the challenge of packaging and sort of magnifies it extremely, right? Multiplies it combinatorially.

20:22

Oh, yeah. Oh, yeah. It definitely gets pretty complex. This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable? Well, look past that bookstore and check out Linode at talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are or where your

20:50

users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server, or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays, and a seven-day money-back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations,

21:12

and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm slash Linode. So another thing that looks like it's doing really well is Anaconda Cloud. And so this is a place where like data scientists can share their work and their packages and things like that. Is that right? Yes. So right now, Anaconda Cloud is primarily, I think, used as a package hosting environment. And a lot of developers in the data science ecosystem use it as a way to publish

21:40

nightlies or dev builds. Many of the projects, the key projects, they give us a heads up when they're about to cut a new release so that they can push, make sure that they can announce the Conda package at the same time they announce the release of the, you know, cutting new version of the software. So it's very nice of them. Yeah. So how's that work alongside as well as moving differently than just

22:00

putting on PyPI? It gets pretty complex. So number one, there's channel support. So we basically have individual developers can have their own channel and those packages, you know, their users can just download packages from just that channel and not sort of a single global namespace, right? Another really important thing is that there's not just one build. So Conda as a packaging system has much deeper and richer metadata about the build environment and what it expects of the runtime

22:28

environment. So I can build a package that the same upstream software, I can build different versions that are optimized for different levels of your hardware, like whether or not you have GPUs, whether or not you want, you have an advanced Intel chip or a relatively basic chip, I can push all of that stuff in. And maybe using this version of a compiler or that version of a compiler, like Clang versus GNU GCC, you know, these things actually make material difference in whether or not the package

22:53

will work. That level of resolution and that ability to feature flag and select is not available on PyPI as far as I'm aware. And again, it's just, you know, even if one package is available, if you use pip to install PyPI, pip aggressively goes and tries to build other things from source, right? And if it doesn't, it sort of has a very, it doesn't do an a priori solve what you need, it sort of grabs things

23:14

as they go. And so you can end up with very much the incorrect packages coming down, you can end up trying to build something from source that maybe build successfully. But again, that's not what you wanted. You want the pre build, right? Right, with different settings, different compiler. Okay, that's the primary difference. It is frustrating periodically that you can say, here's a bunch of things I need to install

23:32

on pip, you know, pip install these things. And one of them will have a requirement that the version of one part is no larger than such and such. And yet it'll go grab, you know, depending on the order once you specify it, it may grab the wrong one, you know, and just install that. And then the other package is incompatible. Like there's weird little cases like that you can get into all the time,

23:55

right? Because it's actually, this is one of those areas of software development that for most people, it's not a fun and sexy area to think about. But it's a deeply critical thing. When we rely on open source software is to actually understand what does the dependency matrix look like. And there's no free lunch, you know, if you do it in kind of this relatively naive way, like what pip does, then you

24:15

can easily end up in a corner, and things are incompatible. If you try to do it, what we do, which is have very explicit and curated metadata about versions, and you do an a priori solve, well, people complain the solve takes a long time, which it can. So there's really no free lunch on that. I think one of the challenges that we actually have is that the metadata itself can be wrong. And we found that all over the place. So packages think they will declare they're compatible with this

24:42

version or that version, and they're actually not. And so we have to actually patch what the upstream declarations are. So again, it gets subtle and detailed. There's just a lot of like muck in this area that we have to deal with. Yeah, it sounds a little bit like, these are the problems that you can address and then learn about. If your job is to coordinate a whole bunch of packages that don't interact intentionally with each other, right? They just want to make their project,

25:06

something that you can ship and install and use. And that's fine, right? But at this, this interaction across them is where it gets tricky. There's absolutely a tragedy of the commons. Like with the way I've, the metaphor I've used in the past is that every developer, you know, open source maintainers, bless their hearts. They are way, they're doing a thankless job a lot of times anyway, and they're way burned out and stressed.

25:28

But they're really solving for it. Does my vehicle work in my driveway? You know, can it get out of my driveway and drive into my other maintainers driveway down the street? And if that works, they're good to go

25:38

a lot of times. And when everyone, one, every of the thousand developers in the ecosystem do this, you'll end up with a bunch of cars squashing all over each other in the, in the, in the highways and the freeways, because they're not thinking about that integration problem for their end users. And the end users, a lot of times in data science, they're not sophisticated software developers. They have no ability to solve this problem for themselves.

25:58

They're at the very edge of struggling to write a 10 line script, not understand the complexity of like TensorFlow dependencies or something like that. Exactly. Exactly. So one thing that you all did recently, that seems to be a trend is you switch from the major minor

26:15

versioning scheme to calendar based scheme. And I think this is an interesting thing, especially around open source, because we've had, you know, Mamuta Shemi created this site called Zerover, sort of make fun of all the projects that have been around for 10, 15 years with, you know, 50 or a hundred releases, but are like 0.1 point 17, you know, some point, you know,

26:38

like really small versions. And it seems like one of the fixes is to say, well, let's move towards something that has more to do with, I can look at the version and I can tell you without deeply knowing that software, whether that's a new version, an old version, a medium aged version, right? Like if I told you request was 2.1.4, is that new? Is that out of date? I don't know. Right. But if you use this, this new style, it's pretty obvious. Like, what was the thinking there?

27:07

It's a community convention. It definitely makes it, it's for that user affordance that you can sort of look at it and know. And also, you know, we set this expectation that we will release at a regular cadence and it's for our own internal documentation and everything else. Everyone

27:19

just is able to collaborate more easily around that. But I think the zero ver thing, I mean, I love Mamuta and I think it was a hilarious thing, you know, in a community here where we have SciPy and iPython or, you know, Jupyter and other things, pandas, you know, zero dot, whatever,

27:33

or I guess it's not quite zero dot anymore, but like SciPy for sure. These things, there's actually something we can laugh at all we want to, but there's a thing there that the author is trying to say, or the maintainer is trying to say, which is, it's not quite ready yet. You know, I'll call it 1.0 when I'm good and ready and I'm not ready yet. It might not be for 20 years.

27:53

And so, of course, that's also kind of a silly position to take with literally millions of people and their production code depend on your software. I think they're not saying that it's ready. I think what they're, they're thinking of to say when it goes to 1.0 a lot of times is it's done and software is rarely done. Well, software is done. The instance it's released, at least that version of it, right?

28:13

I think this is where we as an industry actually have to get, we have to up-level our thinking about this. And we got to stop thinking about software as artifacts, hardballs of code that are static. And we actually have to start thinking about this from a flow perspective, that we are looking at flows of projects. And there's a covenant that is established in a relationship between the user of one of these flows and the people who originate those flows.

28:41

And I think, you know, there's a really interesting thing I learned years ago about aerodynamics. And basically that when planes move less than the speed of sound, you can reason about aerodynamics somewhat similarly to water and water flow, right?

28:55

But once you break the sound barrier, the thing that actually causes you the greatest amount of pressure on your airframe and things like that, you actually have to reason about the change in cross-sectional area of the airplane as it moves through the air. So it's almost more like streams of thick rope and you're shoving rope aside. So you move from this particle flow way to looking at actual flows.

29:19

And so similarly with software, I think we've got to stop thinking about this as being just a code drop, right? And maintainers as people who go and dump out a bunch of code and actually look at a relationship with projects. And this gets to like sustainability. This gets to, you know, versioning and what's what, what is the promise in a version number, all of that stuff. It's actually deeply involved. I don't know that the software industry has really started to learn how to consume

29:46

like the enterprise consumers of open source. I don't know that their internal practices have really caught up with thinking about it that way. Yeah. And that's kind of why I was bringing up the versioning a little more deeply because I think the folks that spend their time all day in open source, they know that Flask, even though it had some small version number recently moved to 1.0, but it had some small version number, but it's really

30:10

used a lot and it's been around a lot. So it's fine. Right. But the corporate groups, the enterprise groups, they see that as a flag of like, that's test software. We're not ready to like make our bank run on test software. Is that the feeling that you got by interacting with, because you, you touch both open source and enterprise groups more than a lot of folks, I would suspect. Yes, absolutely. We, we are a B2B software company. That's where the bulk of our revenue comes from.

30:39

And absolutely. We suffered, we suffered mildly for that. You know, we have to basically go and talk to procurement and compliance and it people that are swimming, you know, they're up to their ears in software. They look at a spreadsheet. We come in with our software, our enterprise software and say, well, you know, here's the open source things that are in the manifest. And they look at this thing and they're like, what is this? This is a pile of garbage. It's all zero dot, whatever.

31:01

Right. And it's like, yeah, but that runs Instagram, you know, like that literally runs Dropbox. So like, what are you complaining? You don't really want to get into that. Once you have that argument with an IT guy, you've already lost. Right. You're, you're a small insurance company with a hundred thousand customers. You're not running, you know, YouTube with a million requests per second. That's using similar software, right? It's, but it's the mentality, right?

31:23

Yeah. And you know, a lot of, a lot of going into any kind of, I would say that over the last, you know, five or six years, I've had to do a lot of adulting. And one of the parts of adulting up from just being a geek, like, you know, code nerd kind of guy to being able to actually have customer conversations is actually having quite a bit of empathy for the customer. Right. And from their perspective, yeah, they are just a regional bank with a few hundred

31:45

thousand customers. They don't have the budget of alphabet to write to throw at a SRE team and a whole dev team and all that stuff. So their approaches to understanding risk and risk mitigation from the thousands of vendors that want to sell them software. Maybe it's the most practical, you know, I'm not, again, I'm not defending it, but I'm just saying one could come to a point of empathy, right? With their approach.

32:04

That's a really good point. I do totally agree. It is exactly because they're small, they can't hire the fresh new hottest software engineers that would rather be in Silicon Valley or Austin or, you know, Portland or wherever, right? Like they just don't even have the ability to determine whether or not what you're saying is true in a lot of, a lot of cases, right? It's like, they just, you know, exactly. We just rather use Microsoft. We know that they give us this SLA and this agreement and

32:32

we're just good, right? There's one way to make websites, use ASP.net. We're good. Just use, you know, something else supported like that, right? And it's, it's a challenge that they obviously want to use these new tools and powerful tools, especially in data science, right? But they've, they've got a different culture and way of describing software being ready. You know, and we can laugh all we want to about like these compliance guys, like beating us up

32:54

for our, you know, scipy, o. whatever. But on the flip side, you know, how many of our, our credit card reports and our gas bills come from, yeah, basically some like little ASP app or some, you know, access database, God forbid with a bunch of VBA macros, right? That runs the world. So how elite are we really? That's an interesting point. Yeah. It's definitely worth thinking about. So in a broader sense though,

33:17

I feel like Python is making its way into this enterprise and a major corporation space. I know it's increasingly being used for a lot of work, not just data science, but, you know, other types of software as well. How do you see it? How do you see the world with your inside view you got? Well, I think that's absolutely right. And I think that the Python community may not survive that adoption. Interesting. What do you mean by that?

33:43

Not Python, the language, but the Python community. What I mean by that is that, you know, I've talked to quite a few like maintainers of some popular projects and they've all reflected to me that last couple of years as Python has gone, Python adoption just shot through the roof. I think some of it is our pushes on data science and things like that. Others are, you know, this rapid rise of deep learning. You know, many things have contributed to this, but ultimately Python is now one of the

34:09

most popular languages on the planet. People are getting jobs in Python and they're using Python to do their jobs. And what we're seeing is this transition in the expectation of like, hey, man, this is just my nine to five. Like this is a tool that I'm supposed to use to do my job. And this tool sucks right now. So I'm going to get on your GitHub and I'm going to give you a bunch of grief about it because this is your freaking tool. You know, my, like my employer, I got to feed

34:34

my family. My employer tells me how to use this tool. It's a piece of crap. And so that is, that's what I said. I think the Python community might not survive that adoption transition unless it intentionally really works hard to drive a positive, like to drive some values into the newcomers. So maybe that person that comes and complains because, well, I used to download my stuff from

34:58

Microsoft.com. Now I get it from Python.org, but this thing sucks. So I'm going to go back and just complain about it as if, you know, there's a commercial entity on the other side whose job it is to make the SLA legit. Right. Right. But more likely, more likely, actually, they picked up, they inherited some piece of crap, three-year-old Python code from some guy who didn't know what he was doing. Written in Python 2.5 or something. Yeah.

35:21

Oh, absolutely. It'll be, it'll be 2.5. I think there's a couple of 2.4 things running around that I'm aware of, but a lot of 2.5, there's a lot of 2.5 out there. And yeah, and it's using some old version of that plot lib or something or some old version of pandas. And they're going to complain, you know, on the tracker or on the, you know, on the issue tracker about that. And part of the cultural change that I think we should try to encourage sounds like, okay,

35:44

you're doing this for your job. You need, it's not so great. We are the maintainers, but you have a company who depends upon this. Can your company contribute some time, a PR, some fit, like it's got to be a two-way street. I think it can't just be, well, you know, one of the things I suspect that you also feel at Anaconda Inc. is there are so many companies out there making millions and billions

36:10

of dollars a year on top of free. There's like people working in their free time on some open source project that company is basically built upon and they make billions of dollars and contribute back nearly zero or zero. Yes. I've frequently equipped that I can fit probably the core NumPy pandas maintainers in my, no, no, my, okay. So we've gotten a few more now, so they don't all fit my minivan, but at one point in time, certainly core NumPy.

36:39

You're going to need one of those longer, like full vans that holds 15 people. I may need a 15 person van, but I could, I could probably fit them in the 15 person van. You know, Matt Plotlib, which everybody relies on is like just a few people, maybe part-time. There's not like one whole FTE on it even. Yeah. There's projects like Jupyter that are very large, but also underfunded. And there's projects that are small and underfunded. And it's extreme. Yes. It's exceptionally tragic. Right.

37:06

It's exceptionally tragic. Well, and do you know that I think the part of the tragedy to me is like, if it really took a thousand people to make Matt Plotlib, 600 people to make Flask, maybe the community can't contribute back enough to pay those thousand engineers full-time. But like you said, it's like a van full of people,

37:26

or it's my small car full of people for Flask, right? And click and all those things. The people and the companies that use Flask make so much money and depends so heavily upon it that they could easily pay those three, four or five people to be full-time on that and be doing really well. Right. But they don't. Right. It's just, it's not even asking very much of them, which is what's crazy. I'm of two minds on this or not two minds, but I have like two major views on this.

37:53

One of them is that we should look at this as the triumph of software. I mean, to sort of just to sort of restate the point you're making, which is that, holy crap, one or two or 10 people can build something that is fundamental to billions and billions of dollars of global economic activity. That's something to be celebrated, right? Because that should free up. Think of how many more thousands of software developers

38:19

don't have to be working on Flask. They can just go and have free time. Not really, but you know, in theory, that's how. Build something more interesting than just the framework, right? They could build something with this result. So that's one way to look at it and that we should celebrate where we can. But on the other hand, the thing is like, if we can't even somehow come up with the funding for like 10 FTEs for these

38:39

fundamental projects, what's broken? What's broken, right? Because it can't be, it's not, it can't be that hard. And so I think there's two ways to look at this. One is that the open source community as the, essentially the field of software, I think it's essentially commoditizing out and the labor, what open source represents. And this particular thing happening in the Python ecosystem is the very vanguard of this transition. It represents essentially the end of labor economics

39:07

for software. And so that going away, we're at that transition. And so it's very hard to think about it for companies because companies will allocate budget for software development in a very like headcount oriented way, right? And they know what they're getting when they pay for an FTE dev here or there or wherever. Sure. If they just throw money at some open source, what are they getting for it? You know, they know how to,

39:31

they know how to pay money for software. Companies are very good at paying money for software, but paying for stuff that they can already get for free. They literally, that is a null value on a

39:40

spreadsheet. They cannot compute that. It is a NAN, right? So my view on this is actually quite simple, which is that if open source developers, the people like me who care about the open source ecosystem, if we want to sustain the community innovation and that positive abundance mentality that we have in the open source ecology, the human ecology of open source has moved to post scarcity, post labor economics.

40:05

If we want to sustain that, then we need to actually drive a new conversation. We need to actually provide the tooling and the infrastructure for the companies to think about how to consume this. This portion of Talk Python To Me is brought to you by Rollbar. Got a question for you. Have you been outsourcing your bug discovery to your users? Have you been making them send you bug reports? You know,

40:27

there's two problems with that. You can't discover all the bugs this way. And some users don't bother reporting bugs at all. They just leave sometimes forever. The best software teams practice proactive error monitoring. They detect all the errors in their production apps and services in real time and debug important errors in minutes or hours, sometimes before users even notice. Teams from companies like

40:49

Twilio, Instacart and CircleCI use Rollbar to do this. With Rollbar, you get a real time feed of all the errors so you know exactly what's broken in production. And Rollbar automatically collects all the relevant data and metadata you need to debug the errors so you don't have to sift through logs. If you aren't using Rollbar yet, they have a special offer for you. And it's really awesome. Sign up and install Rollbar at

41:12

talkpython.fm/Rollbar. And Rollbar will send you a $100 gift card to use at the Open Collective, where you can donate to any of the 900 plus projects listed under the Open Source Collective or to the Women Who Code organization. Get notified of errors in real time and make a difference in Open Source. Visit talkpython.fm/Rollbar today. What are some of the key elements?

41:35

One way to do it is you can look at it almost like treat each new... Number one, it's something we have to work on ourselves, which is to not make money be a bad word, which is still a mindset that pervades many Open Source communities and developers. Any affiliation with any kind of money-managing, money-changing organization is seen as essentially... It's seen as corrupting sometimes. Yeah, yeah.

41:57

It's corrupting, exactly. So, I mean, we literally had a SciPy mailing list, I think a couple of years ago, someone was arguing that we should only allow steering council members to be part of universities or part of academia, which they don't have their own agendas. And the other people were just like, are you kidding me? Academics don't have agendas anymore.

42:15

So, people like to kid themselves a lot about this kind of stuff. But anyway, so I think that the Open Source community needs to, number one, not be allergic to money and treat it as a corrupting influence, right? There's companies and ways, business models that are trying to help Open Source and trying to be good participants in it. And then there are the corrupting, evil, taking advantage of type

42:37

companies. So, like, it's not black and white, but there are certainly paths forward where companies like you guys and others are putting in lots of effort to try to make things better legitimately. Yeah. And I appreciate that you recognize that. Like, we really have really tried to be good citizens in the Open Source community. But I think companies, for a lot of companies, that

42:59

it's like the mind is willing, but the spreadsheets are weak. You know, like, it's still really hard for people and proponents and advocates, even within those companies, to, at the end of the day, make the budgetary justifications. Because the companies internally don't know how to, they don't know how to reason about it. Yeah.

43:14

You know? So, I think that's where the Open Source community can try to help. Like, number one, one thing we could do is do almost like a Kickstarter style or like, you know, I play Warcraft a little bit. And so, it's like world boss, like, takedown. So, before we can release any new versions of Library XYZ next year, we've got to get this much money in, right? Yeah.

43:34

And people basically just, but they put the money in. But I think that's actually as fun as that would be in the Kickstarter model like that, as cool as that would be and as interesting as that

43:43

would be, I think businesses have a hard time just writing checks for donations. So, the other thing that I think the Open Source community needs to do, I think the one that's more realistic, is to actually form entities that can have a business-to-business conversation with the corporate players and understand how to talk to their procurement, talk to their legal and everyone else, and basically act as a crossover facility to do the product management so the

44:10

businesses know what they're getting for their money. It's not a charity. You know, some things that people may not be aware of is that for a business to write a $10,000 charity check, that comes out of a different part of the business a lot of times. Even if everyone wants to, for budgetary and for finance and compliance reasons, they literally cannot just write a check to some dude, you know, some Open Source hacker in the

44:32

middle of Europe somewhere. So, these are the things that we need to actually put together. I think the allergic to money issue, I think that that can be solved with the right examples of Open Source companies and companies entering Open Source in positive ways. But I feel like there's some kind of structure or something that has to get between the corporations and the Open Source projects, where it's like you say, it's not a charity check. It's you pay into this and there's, you get a little

45:05

bit more of something. And I don't know what that is, but there's something like that. Then the companies can justify it. They say, look, we depend upon this thing. We pay, you know, 0.01% of our revenue to the people that make it work so that our system doesn't go away. And here's what we get for that 0.01%. I don't know what that is. It's actually, we don't have to reinvent the wheel here. It happens all the time

45:26

in every other industry. It's an industry consortium. It's an industry consortium. You pay into it. And what happens is you get votes on various technical councils and technical boards, and they do the product management and the dev management for what the thing should be. In the Python world, we want that to, in all cases for a lot of these projects, we want that to still be subordinate to the vision

45:48

of the open innovation volunteer kind of crew. But there's so much housekeeping. There's so much issue tracking stuff. There's so much like documentation, management, cleanup, just keeping the lights on and the yak shaving. There's so much that goes into a project that these kinds of consortium models can fund. And I think Python itself, and I'll just come out on your podcast and I'll just say it. I think Python itself badly needs this. Yeah.

46:14

Badly needs an actual consortium like this to be operated in a way that can accept dollars easily. That's easy for people to write checks, right? Like we all know this as entrepreneurs, like make yourself easy to do business with. The open source community, I would say, has not made itself easy to do business with. You got to either hire a core dev. And if you do, that core dev then has to, in their own minds, be like, am I wearing my community hat or my employee hat, which is tough on them,

46:37

right? It's very stressful for them. And the open source community, even when we get the dollars, we don't make it clear to the people writing the checks what those dollars are buying for them. Like if they have a couple of issues that are easy to solve, that really can make a difference for them, we don't necessarily prioritize those issues just because they wrote us a check because we don't

46:53

want to feel like we're that, you know, like it's that quid pro quo. So I think that you really need some kind of facility in the middle of that access consortium that is able to help businesses steer and guide a lot of these maintenance, pretty basic kinds of maintenance things that need to happen for projects that would make their lives easier. And that can then funnel a ton of money into a ton of margin on that goes into the innovation work and all the forward looking kind of stuff.

47:16

And everyone's happy. Yeah. Do you think the PSF could do it? I think the PSF could do it. I think that the PSF would be, I don't know if it operates as a nonprofit. It does. Yeah. Yeah. So if it's a nonprofit, I think it'd be very hard for it to do it. It might need to actually create like sort of Mozilla Foundation, Mozilla Corporation. I think it would need to create some kind of a traditional C corporate or a B Corp, perhaps like a social mission for profit that it

47:42

owns like director seats on and, you know, the chunk of the things. But companies, a lot of times are just prohibited from writing checks to 501c3s unless it comes out of their philanthropy group. So again, this is that making it easy to do business with kind of thing. Yeah. Interesting. Absolutely. I think the PSF should spin up a thing like that. And I've been sort of quietly advocating for this behind the scenes a little bit. And maybe I'll be more vocal about that here this year.

48:05

All right. Well, we can spread a little word on the podcast as we just have. It's really interesting. And I think there's absolutely lots of possibilities for business models in open source. But I feel like there's actually a 98% gap, like 2% of that is captured. 98% of it is not because we have these large, but still not huge, like banks in the Midwest that contribute nothing. They do no PRs. They don't do anything to that effect, right? They just,

48:38

it's just not in their culture. And like you said, there's no real mechanism for them to pay a little and get more and justify that. Yes. Yes. And actually some of the open source business models that are emerging now, they present challenges of their own. Again, my overriding thesis is that the world of software

48:54

is actually commoditizing pretty quickly. And so people, like if you look at the things that have been happening in the last six months, as I would say open source software component vendors, like Mongo and Redis and Timescale and others, as they start getting their business eaten by the cloud vendors, they're realizing that open source, you know, sounded great. Open core sounded great. And then they start losing any future route to revenue. And they've got to actually aggressively

49:23

go to like dual licensing and like deep viral HEPL three kind of stuff. I don't know that open source is even the right conversation to have anymore. I think it should be around sustainable community innovation and the freedom to experiment, freedom to innovate, freedom to, you know, there's a lot of like free as in beer and free as in innovation. But like, the traditional ways we have about talking about the

49:47

source code itself, again, is limited in this paradigm of like code drops. And we're beyond that now. Yeah. And you know, you look at the cloud, for example, a lot of these places that they provide you something, and you pay on usage, right? You don't buy any software in the cloud, but you have the subscription model all over the place, right? And that's, that's starting to really shift the way things are working

50:11

as well. And I feel like the cloud vendors actually have this interesting lock in where they're a little bit defended against some of these challenges that are coming up. Well, absolutely. There's only like three major cloud vendors of significance in here in the US, at least. And all of them are absolutely going for lock in. And they're, you know, ultimately, their business model. It's not necessarily I mean, it's a for profit business model, put it that way,

50:36

right? Yeah, the cloud is the new lock in with a lot of those API's. It's interesting. And like this MongoDB AWS thing you talked about, like, that's a little bit of it as well, right? But it's pretty interesting. Yeah, I think we could probably talk for hours and hours on this, because we're both pretty passionate about it. It's awesome. But let me ask you a few more questions before we run out of

50:55

time. Sure. These are all sort of forward looking type things. And one of them is data science from you called out the year 2012 to me that if you look at the analytics and the graphs and the usage, like there's a huge increase in the derivative of a lot of things around Python at 2012, up till now. So five years further out, what do you think data science looks like? Is it still deeply working with Python? Is it solving different problems? Where is it going?

51:23

We're going to see data science much more integrated. People have a better sense of what it can and can't do by itself rather, right? It's a new discipline that's coming into the business. It's a new swim lane. Everyone's trying to figure out how they stand in relation to it. There's a lot of

51:40

political, you know, fighting and a lot of experimentation within a lot of businesses that I see. But at the end of the day, I think this idea of doing data exploration, doing model development, and revving models that are really critical to the business is the new reality for people. So that's not going away. That's a fundamental dynamic that's going to be here. And if you need to go and explore data, you need to go and

52:01

do model development, then you're going to be doing data science full stop, right? There's no, like, if you need to basically bring in domain expertise, stats, and coding ability to do that well, then you're going to need data scientists intersect. You need all three of those skills, you need all three of those. But data scientists are going to find themselves needing to have a much better, I think the borders between data, the data science world and the others will clarify better.

52:26

So you'll have data scientists interacting with data engineers, and much better, hopefully much better established best practices around how that's supposed to go. And then IT people start accepting that, yes, Python is here to stay, we're going to need to deploy real Python stuff. And we need to know a

52:40

little more something about it, right? And so a lot of these little intersectional areas right now between data science and other concerns, same thing with BI, people right now, there's literally people out there selling point and click visualization tools saying that's data science. And it's like, that's not really data science. But they're going to figure that out probably in the next couple of years. Hopefully, they get the clue. Yeah, I think that's what I think is going to happen.

53:02

Now, the result of that happening is a gigantic, I think that that clue is going to really start hitting home in two years or so. Then the immediate next problem that people have is overall workflow management across all of these things. Because everyone's got their favorite tools. Everyone is producing things that touch and intersect with everyone else's stuff. How do we get all of this stuff managed in one place? And I think that's the challenge doesn't be fit, we're gonna be square in

53:28

the middle of that conversation still. And five years from now, assuming that the Chinese economy assuming that the Chinese economy hasn't collapsed, we are going to see some really scary stuff coming out of Chinese and the AI innovation happening there. Because they have been, they're completely unapologetic about using their entire national population of a billion people as a sandbox for trying AI surveillance, sort of cybernetic, the computer controls you kind of things.

54:12

Yeah, the whole social ranking, and all that stuff that's... So here's the terrifying thing about that. I'm going to be a little bit of a contrarian on this. What if it turns out that their sesame credit system, Rev2, no, Rev1 is scary and crappy. Rev2, what if it turns out that they give social sesame credits for their businesses and local

54:29

politicians? Yeah. What if they actually start upgrading social sesame credits to being this kind of thing where it becomes almost like a, again, back to Warcraft, but like a Warcraft honor reputation system, right? And becomes multicolored, it becomes vectorized instead of scalar. They might actually innovate a scary, awesome approach that has deep problems because it requires a surveillance state.

54:50

And the Western world might look at that and say, huh, you know, that actually works a lot better than, you know, Ivanka Trump, you know, running our fast food joints. Yeah. Sorry, the White House. So that dates this podcast, by the way. For those who are listening months in the future, in case you forgot, just two days ago, the President of the United States served Big Macs at the White House. That just, that happened. So this is still fresh in our minds.

55:12

To Clemson, who won the national college football championship. Yeah. Yes. It's incredible. Anyway. So the point is that the scary thing about the Chinese AI system is that it might work and work really, really well. Yeah. Not that it's just pure wrong, but actually there's aspects of it that are amazing in its sort of black mirror, electric dreams way.

55:32

Oh yeah. Tell you what, it's going to be pretty amazing. I think the same way that like a lot of the Western world is like, oh, well, we already saw where this goes in Orwell, so we're not going to go there. Western world has that kind of snottiness about it. I think they're underestimating how good it could be and how tempting that goodness can look to technologists, to the capitalists, and to the

55:53

policymakers here. That's really for me as a, as someone fled the communist regime, you know, as a child, like that's the scary thing about it. That is really an interesting analysis. And certainly I was thinking ethics, data ethics, and accountability for data models and AI and ML, right? Like, sorry, you couldn't get the house. The AI said no, right? Like, no, no, no. You have to say why the AI said no. Well, we don't know,

56:17

but it's really good. And it said no, you know, like answering that problem is going to be interesting too. It is. And you know, the thing is that already now you get denied, right? And there's already a model that tells you why you're denied. And the AI can, this kind of gets back to that same thing with the whole black mirror thing and the AI in China, like really, really good AI. It doesn't look like that

56:37

AI, you know? So the really, really good systems, quote unquote, good, the really effective systems at partitioning people and spot targeting them, they're going to be dressed up in ways that are palatable. Our robot overlords will look like Cylons. They're going to look really human-like. This is the scary future, man. I'm not trying to like scare you and scare your listeners.

56:56

I'm just telling you though, like, this is what's coming. And as humans, I'm actually a human. I'm not a Cylon as humans, as, you know, tribe human, I think we've got to get better at being human. And so that's maybe too philosophical hand wavy, but anyway. Yeah. It's really an interesting thing to ponder for sure. All right. So I guess final comment or topic just real quickly is I feel like there's been this Python 2, 3 debate, modern Python versus legacy Python,

57:25

as I like to position it. And I feel like the adoption of modern Python in data science is much faster than it has been in the general Python space. One, do you think that's true? And then two, why do you think that is? One, I think it's true. And two, I think it's because a lot of data science stuff is new and legacy data science code tends to age with models. So like a piece of data science code is only as good

57:52

as the model data that it was trained on and models change because the world changes. So there's a built in expiration date on any data science model that you've got. So you're not keeping transaction systems from 20 years ago live. The complexity and the algorithms and the techniques are just not even relevant, right? Like the machine learning of five years ago doesn't compete with the machine learning of today. And it's not like

58:15

you're just going to upgrade. It's a totally different thing. You just retrain it on TensorFlow or Keras or whatever, right? Right. And secondly, this is another sort of important dynamic, which is that the regulatory environment around data science hasn't caught up. So it doesn't require you, you know, I was talking to an engineer

58:32

from a software modeling engineer from an airplane company. And he was saying, yeah, the FAA requires us to be able to reproduce our computational design models for like decades, for decades. Yeah. Wow. So, I mean, yeah, because planes actually, if they're well maintained, they fly for a long time, right? And if there's a structural failure of a part... Right. There's a lot of 737s out there. Yeah.

58:55

Oh, yeah. And so data science just doesn't have that problem yet. And, you know, one of the earliest adopters of Python, this is a really interesting dynamic that people may not be aware of, but in the mid 2000s, there was a significant uptake of Python in the hedge fund and the finance industry. And so that was Python 2, Python 2, 5, 2, 6 around the time. And so that got into a lot of places.

59:18

And finance is actually a pretty regulated area. And so a lot of that code, especially if it starts running production finance systems, people need to keep it running, not only because they're... Even if you stop using a particular finance model to like score or to do whatever, to price a trade and things like that, oftentimes you'll want to go back and do what's called backtesting. So you want to run new data against those old models, and you'll want to race them against the

59:43

new models, right? You'll want to run new models on old data and new data on old models. And so that kind of backtesting approach, you need to keep that old code running for that purpose as well, just from a risk management perspective. So a lot of the finance industries like running ahead and adopting Python 2 has sort of gotten them stuck on Python 2 a little bit. Okay. Interesting. Yeah. So almost a victim of its own success in a way, but in some of these

01:00:09

industries. All right. I guess we're going to have to leave it there because we're out of time. But like I said, a lot of interesting stuff to talk about. I have to just put it at rest. So before we move on, though, I'm going to ask you the two questions, always ask it in the show. If you're going to write some Python code, what editor would you use? My old go-to is still Vim. But for large code bases, I tend to use PyCharm so I can, you know, sort of navigate more easily.

01:00:31

Yeah, sure. Makes sense. And then there's many, many packages on PyPI or available on CondoForge. What do you think one that people maybe haven't heard of, but they should, or you want to recommend? Is it bad form to pimp? Is it like to pimp your own stuff? No, you do it. No, no, go ahead. So I'm really, really excited about a new project that we created called Intake, which I would encourage people to take a look at it. It's pretty new. We just launched it last year.

01:00:58

Yeah, it looks interesting. I was going to ask you more about it, but we just have too many topics already. So tell us about it real quick. So Intake is a data loading abstraction library. So it's basically just load my data, and it abstracts your data loading stuff into a declarative syntax so that the beginning of your data science scripts doesn't have a whole bunch of like embedded and brittle SQL calls or pandas

01:01:19

column transformations or things like that. Intake is a way to make it so that your actual data science or data transformation code is sort of its own code artifact and your data bits are your data bits. It's kind of a nerdy thing, but we think that it actually addresses that data, that model reproducibility and code reproducibility problem that data scientists face.

01:01:38

Sounds really useful. Thanks. All right. So final call to action. People are excited about the Anaconda distribution or maybe getting, making some progress on this open source business model thing we talked about. What would you say to people? So I would say that we have AnacondaCon coming up. So if you're actually using Python

01:01:55

in a commercial environment, strongly recommend AnacondaCon. We have a, we try to make a really good blend of technology and practitioner kind of stuff and workshops there combined with business perspectives. So it's not like an industry conference like Gartner or Strata. It's not like a pure one of those things. It's also not a pure like tech community conference, like Pi data or something like that. So it's, we try to make a mix of those things.

01:02:19

We've gotten really good reviews in the past couple of years. It's our third year doing it. I'm super excited about it. It's here in Austin in April, April 3rd to 5th. So that's AnacondaCon.io. And secondly, people are using Anaconda to like it and they're using it in a

01:02:32

business environment. I would recommend they check out Anaconda Enterprise. We are very, very proud of the product and we have a lot of problems that we solve for people inside business environments and the business use of Python for deployment, package management. Yeah. Real quickly, like what, what's the, what do you get from, right? You know, I talked about the business model should be, you get a little bit more for your money,

01:02:50

not just pure charity, you know, here's a PayPal donate button. What do people get real quick? So Anaconda Enterprise is, it gives you the ability to have your own managed package repository. It gives you a way to do secured and governed collaborative notebooks and model deployment. It works in the cloud. It works on prem. Many of our customers use it across an air gap and very

01:03:12

strictly governed environments. We basically make it so that data scientists and Python practitioners in business can be as effective with Anaconda as they are at home nights and weekends on their own laptops. All right. Yeah. That sounds cool. We just clear all the IT hurdles. Yeah, that's sweet. All right. Well, thanks for all that you've talked about here, Peter. It's been a super interesting conversation. Thanks for being on the show. Thank you so much for having me. I

01:03:33

really enjoyed it. You bet. Bye. Bye-bye. This has been another episode of Talk Python To Me. Our guest on this episode was Peter Wang. It's been brought to you by Linode and Rollbar. Linode is your go-to hosting for whatever you're building with Python. Get four months free at talkpython.fm/Linode. That's L-I-N-O-D-E. Rollbar takes the pain out of errors. They give you the context insight you need to quickly locate and fix errors that might have gone unnoticed until users

01:04:02

complain, of course. Track a ridiculous number of errors for free as Talk Python To Me listeners at talkpython.fm/Rollbar. Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building 10 Apps course. Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python. And of course, if you're interested in more than one of these, be sure to check out our everything

01:04:29

bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code. Bye. Bye. Bye bye. Bye. Bye. Bye. Bye.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript