#333: State of Data Science in 2021

00:00

We know that Python and data science are growing in lockstep together, but exactly what's happening in the data science space in 2021? Stan Siebert from Anaconda is here to give us a report on what they found with their latest State of Data Science in 2021 survey. This is Talk Python To Me, episode 333, recorded August 9th, 2021. Welcome to Talk Python To Me, a weekly podcast on Python. This is your host, Michael Kennedy.

00:37

Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at talkpython. We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode. This episode is brought to you by Shortcut, formerly known as clubhouse.io, and masterworks.io. And the transcripts are brought to you by Assembly

01:07

AI. Please check out what they're offering during their segments. It really helps support the show. Stan, welcome to Talk Python To Me. Hey, nice to be here. Yeah, it's great to have you here. I'm super excited to talk about data science things, Anaconda things, and we'll even squeeze a little one of my favorites, the Apple M1 stuff mixed in with data science. So it should be a fun conversation. I'm also very excited about the M1.

01:32

Nice. Yeah, we can geek out about that a little bit. That'll be fun. But before we get there, let's just start with your story. How'd you get into programming in Python? Yeah, programming started as a kid, you know, dating myself here. I learned to program basic on the Osborne 1, a suitcase of a computer that we happened to have as a kid. And then eventually picked up C and stuff like that. Didn't learn Python until college, mostly because I was

01:57

frustrated with Perl. I just found that Perl just never fit in my brain right. And so I was like, well, what other scripting languages are there? And I found Python. And that was a huge game changer. I didn't really use it professionally or like super seriously until grad school when I had a summer research job. And I realized that this new thing called NumPy could help me get do my analysis. And so that was when I really started to pick up Python seriously. And now here I am, basically.

02:25

Yeah, what were you studying in grad school? I was doing physics. So I did particle physics and used Python quite extensively, actually, throughout my research. And C++, unfortunately, for better or worse. So yeah, but that's how I end. I always ended up kind of being the software person on experiments. So when I was leaving academia, going into software engineering kind of was a logical step for me.

02:49

I was studying math in grad school and did a lot of programming as well. And I sort of trended more and more towards the computer side and decided that that was the path as well. But it's cool. A lot of the sort of logical thinking, problem solving you learn in physics or math or whatever, they translate pretty well to programming. Yeah, yeah. And definitely, you know, working on large experiments, a lot of the sort of soft skills

03:14

of software engineering, things like how do you coordinate with people? How do you design software for multiple people to use? That sort of thing. I actually, I inadvertently was learning how to be a software manager as a physicist and then only realized it later when I went into industry. And how about now? You're over at Anaconda, right? Yeah. So, you know, maybe I'm doing the same thing. So now I'm both a developer and a manager at Anaconda.

03:38

It's a direct path from like PhD physics, particle physics to programming to data science at Anaconda. Is that how it goes? Yeah. I mean, we employ a surprising number of scientists who are now software engineers. And so I manage the team that does a lot of the open source at Anaconda. So we work on stuff like Numba and Dask and various projects like that. Just recently hired the piston developers to broaden our

04:04

scope into more Python JIT optimization kind of stuff. So yeah, so I'm doing a mix of actual development on some projects as well as just managing strategy, the usual kind of stuff. Well, I suspect most people out there know what Anaconda is, but I have listeners who come from all over, you know, what is Anaconda? It's kind of like a Python you download, but it's also, it has its own special advantages, right?

04:28

Yeah. I mean, where we came out of and still is our main focus is how to get Python and just broader data science tools. One of the interesting things about data science is it's not just Python. Most of people are going to have to combine Python and maybe they don't realize it, but with Fortran

04:44

and C++ and all the things that underpin all of these amazing libraries. And so a lot of what we do is try to get Python into the hands of data scientists is, you know, get them the latest things and make it easy for them to install on whatever platform they're on. Windows, Mac, Linux, that sort of thing. So the, you know, Anaconda has a, you know, a free, call it individual edition. It's basically a

05:05

package distribution and installer that lets you get started. And then you can, there are thousands of Conda packages, Conda's packaging system. There are thousands of Conda packages that you can install where, you know, we, or, you know, the broader community have done a lot of the hard work to make sure all of those compiled packages are built to run on your system. That's one of the real big challenges of the data science stuff is getting it compiled for your

05:29

system. Because if I use requests, it's, you know, pip install requests. I probably, maybe it runs a setup high. Maybe it just comes down as a wheel. I don't know, but it's just pure Python and there's not a whole lot of magic. If I'm really getting there far out there, maybe I'm using SQLAlchemy and it has some C optimizations. It will try to compile. And if it doesn't, well, it'll run some slower Python version probably. But in the data science world, you've got really heavy

05:55

dependencies, right? Like, as you said, stuff that requires a Fortran compiler on your computer. I don't know if I have a Fortran compiler on my Mac. I'm pretty sure I don't. Maybe it's in there. Probably not. Right. And as maybe C++, probably have a C++ compiler, but maybe not the right one. Maybe not the right version. Maybe my path is not set up right. And plus it's slow, right? All of these

06:18

things are a challenge. So Anaconda tries to basically be, let's rebuild that stuff with a tool chain that we know will work and then deliver you the final binaries, right? The challenge with that for a lot of tooling is it's downloaded and installed to different machines with different architectures, right? So you've gone and built stuff for macOS, you built stuff for Linux, you built stuff for Windows and whatnot. Is that right?

06:42

Yeah. Yeah. Building software is non-trivial and no matter how much a developer tries to automate it so that things just work, it helps to have someone do a little bit of quality control and a little bit of just deciding how to set all the switches to make sure that you get a thing that works so that you

07:02

can just get going quickly. Early on, I remember in the sort of 2014, 2015 era, Anaconda was extremely popular with Windows users who did not have a lot of good options for how to get this stuff. Right. Like with Linux, you could kind of get it together and get it going. If you were motivated on Windows, it was often just like a very much, I don't know what to do. And so this making it sort of one-stop

07:26

shopping for all of these packages. And then another thing we wanted to do is make sure that there was a whole community of package building around it. It wasn't just us. So things like Condo Forge is a community of package builders that we are part of and hugely support. Because there's a long tail, there's always going to be stuff that is going to be, you know, we're never going to get around to packaging.

07:45

Right. There's important stuff that you're like, this is essential. So NumPy, Matplotlib, and so on. Like you all take control of making sure that that one gets out. But there's some, you know, biology library that people don't know about that you're not in charge of. And that's what the Condo Forge plus Condo is, is like, sort of like pip and PyPI, but also in a slightly more structured way.

08:10

Yeah. Yeah. And that was why, you know, Condo was built to help make it so that it is possible for this community to grow up, for people to package things that aren't Python at all that you might need, all kinds of stuff like that. And yeah, they, you know, there's always going to be, you know, in your specific scientific discipline. I mean, so for example, Bioconda is a really interesting distribution of packages built by the bioinformatics community built on Condo, but they have all of the

08:34

packages that they care about. And many of which I've never heard of, aren't in common use, but are really important to that scientific discipline. Out in the live stream, we have a question from Neil Heather. Hey Neil, I mentioned Linux, Windows, macOS. Neil asked, does Anaconda work on Raspberry Pi OS as in ARM64? Yeah. So the answer to that is Anaconda, not yet. Condo Forge does have a set of community built

09:01

packages for Raspberry Pi OS. The main challenge there is actually, we just a couple months ago announced ARM64 support, but it was aimed at the server ARM machines that are running ARM 8.2 instruction set, which the Raspberry Pi is 8.0. And so the packages we built, which will work great on

09:21

server ARM, are missing, are using some instructions that Raspberry Pis can't support. But Condo Forge, so if you go look up Condo Forge and Raspberry Pi, you'll find some instructions on how to install for that.

09:33

ARM is interesting, right? So let's talk a little bit about that because I find that this whole Apple Silicon move, you know, they created their M1 processor and they said, you know what, we're dropping Intel, dropping x86, more importantly, and we're going to switch to basically iPad processors, slightly amped up iPad processors that turn out to be really, really fast, which is actually blew my mind and it was unexpected. But I think the success of Apple is actually going to encourage

10:07

others to do this as well. And it's going to add, you know, more platforms that things like Anaconda, Condo Forge and stuff are going to have to support, right? So there's a cool article over here by you on Anaconda called A Python Data Scientist's Guide to the Apple Silicon Transition. Yeah, this was, you know, I've been, I'm a huge chip nerd, just due to background and thinking about

10:33

optimization and performance. And so this came out of, you know, some experiments I was doing to just understand, I mean, we got some M1 Mac minis into our data center and started immediately

10:45

playing with them. And I realized I, after some, you know, I should take the stuff I was, I was learning and finding and put it together in a document for other people because I couldn't find this information anywhere organized in a way that was, you know, for me as a Python developer, I was having a hard time putting it all together.

11:02

Right. There was some anecdotal stuff about just like, yeah, this kind of works for me, or this is kind of fast or this kind of slow, but this is a little more, here's the whole picture and what the history is and where it's going and what it means and specifically focused on the Conda side of things. Right. Yeah. And even just the Python side, it's, I mean, it's, it's sort of an interesting problem of, you know, Python's an interpreted language. So you're like, well, I don't, I don't

11:27

have any machine code to worry about. Right. But the interpreter of course is compiled. So you at least need that. And then many, many Python packages also contain compiled bits and you'll need those two. And, and this is, this is an interesting broad problem for the whole, the whole Python ecosystem to try and tackle because that's not too often a whole new platform kind of just appears, you know, making it a whole new architecture takes a while.

11:50

It absolutely does. I think there's a lot of interesting benefits to come. I do want to point out for people listening. If you jump over to the PSF JetBrains Python developer survey, the most recent one from 2020, and you look around a bit, you'll see that while we don't run production stuff on macOS that much, 29% of the developers are using macOS to develop Python code.

12:18

Right. So Apple's pledged that we're going to take a hundred percent of this and move it over to Silicon means almost a third of the people running Python in a couple of years will be under this environment. Right. And even if you have a windows or Linux machine and you don't care about macOS, you may be maintaining a package for people who do. Yeah. And that means Apple Silicon, right?

12:42

Yeah. And there's, I mean, it's, it's interesting. There's a whole, I mean, just other stuff you take for granted you know, the availability of, of free continuous integration services that has been transformative for the open source community. I mean, it's really improved the software quality that all these open source projects can automatically run their tests and build packages every time there's a new

13:02

change. However, it's something like this comes out. And until you get, you know, arm Macs into these services and if they're, you know, until they're freely available, a lot of the infrastructure of these open source projects, they don't have a way to test on an M1 Mac except manually if they happen to have one and they don't have a way to automate their build on an M1 Mac until that, until that sorts out. Yeah. And thinking

13:25

about the workflow here, there there's two challenges that this presents. One is you want to do a get push production or get pushed to some branch or tag it. And that's going to trigger a CI build that might fork off to run a windows compile, a Linux compile, a Mac compile, generate some platform specific wheels with like Fortran compiled in there or whatever. And then you're going to ship that off. If that CI system doesn't have an Apple Silicon machine, it can't build for Apple Silicon, right?

13:56

Yep. Yep. And there was a time. Yeah. Sorry. I mean, yeah. Well, where do you, you know, where do you get M1 in the cloud, right? As a normal, I know there's a few hosted places, but as a, like a normal GitHub or an Azure, it's not common to just go grab a bunch of those and pile them up. Right. Yeah. And it'll take time. I mean, eventually in the same way that, you know, I was thinking back to, you know, go back four or five years ago it was, there wasn't a whole lot of options for windows CI

14:25

available. There were a couple of providers and, and then there was sort of a huge change and then pretty much everyone offered a windows option and they were faster and all of this stuff. And so I think, but that took time. And, and I think that's the thing is, is these, the hardware is in people's hands now, and it's just going to get more and more. And, and it's unclear how quickly we can catch up. That's going to be a challenge for all of us.

14:49

It's absolutely going to be a challenge. It's, it's interesting. I hope that we, we get there soon. The other problem in this same workflow is I was actually just looking at some NumPy issues, specifically issue 18,143. I'm sure people have that right off the top of their head. The title is please provide universal two wheels for macOS. And there's a huge, long comp conversation

15:15

about, I mean, this is like many, many lines of many, many messages in the thread. And one of the problems they brought up is like, look, we can find a way to compile the binary bits, the C++ bits for M1, but we can't test it. Like if we can't, we as developers cannot run this, this output, like it's,

15:36

it's a little sketchy to just compile and ship it to the world. And to be fair, this is on January 9th of 2021, when it was still hard, you know, these things were still shipping and still arriving there. It was not like you just go to the Apple store and pick one up. This portion of Talk Python To Me is brought to you by Shortcut, formerly known as clubhouse.io.

15:58

Happy with your project management tool? Most tools are either too simple for a growing engineering team to manage everything, or way too complex for anyone to want to use them without constant prodding. Shortcut is different though, because it's worse. No, wait, no, I mean, it's better. Shortcut is project management built specifically for software teams. It's fast, intuitive, flexible, powerful, and many other nice, positive adjectives. Key features include team-based workflows.

16:23

Individual teams can use default workflows or customize them to match the way they work. Org-wide goals and roadmaps. The work in these workflows is automatically tied into larger company goals. It takes one click to move from a roadmap to a team's work to individual updates and back. Type version control integration. Whether you use GitHub, GitLab, or Bitbucket, Clubhouse ties directly into them, so you can update progress from the command line.

16:49

Keyboard-friendly interface. The rest of Shortcut is just as friendly as their power bar, allowing you to do virtually anything without touching your mouse. Throw that thing in the trash. Iteration planning. Set weekly priorities and let Shortcut run the schedule for you with accompanying burndown charts and other reporting. Give it a try over at talkpython.fm/shortcut. Again, that's talkpython.fm/shortcut. Choose shortcut because you shouldn't have to project manage your project management.

17:22

Yeah, as an interesting example, CondoForge was able to get Condo packages for Apple Silicon out pretty quickly, but they did it with a clever sort of cross-compilation strategy where they were building on x86 Macs the ARM packages and pushing them out. But they had enough people manually testing that they had confidence in the process that it was okay. But that's very different than how they build other

17:48

packages, which are built and tested immediately, automatically. And if they fail tests, they don't get uploaded. So that's, you know, it was, it was, it's a risk, but it helped get the software out in people's hands quicker. But yeah, long-term we need to get these machines onto all these CI systems so that we can use the same techniques we've built up over the years to ensure we have quality software. I think we'll get there, but it's just going to take some time, right? Yep. Yep. Yeah.

18:13

Let's see. Neil on Livestream says, speaking of open source, Apple is rumored to be hiring experts in a risk. V or Fives have perhaps moved away from having to pay licensing fees to ARM. Yeah. I'm not sure about that, but. Yeah. I mean, it's a, what's interesting here is, is the, I mean, other, you know, chip architectures have been around for a long, long time, but until very recently, you know, average users

18:38

didn't have to think about X86 versus ARM. ARM was for mobile phones and other, you know, never had to worry about power PC or anything like that. Not for real computers. Yeah. And so, but now once you, once, you know, going from one to two is a big step. Now the floodgates are open and now we're thinking about, well, what else is out there? I mean, you know, risk five, I'm not sure how you say, I think risk five is what you call it. Is, is an interesting

19:01

thing. And has even, you know, being a completely open standard, you don't have to even pay licensing fees as mentioned. I don't know if Apple's going to make this transition again so quickly. But I, I can guarantee you that, you know, everyone probably somewhere in a basement is thinking about it, maybe doing some experiments. But yeah, chips move slowly, but it's interesting to think about.

19:25

Yeah. That's not a thing you can change very frequently with drag developers along. I mean, we're talking about all the challenges, you know, that are just down the pipeline from that. Yeah. Very interesting. All right. Well, let's, let's just talk a few, a little bit about this. First, you're excited about these as a data scientist. Yeah. It's, it's there. I'm there really for sort of two reasons. I mean, one thing that's

19:46

interesting is just the power efficiency. I always, there was a talk long ago from the chief scientist in NVIDIA, which really had an impression on me in which he, you know, paraphrasing roughly, basically said that because everything is now power constrained power efficiency equals performance in a way that is, you know, normally you just think, well, just put more power in there, but that heat has to go somewhere. So you, you, we long since hit that wall. And so now you just have to

20:11

get more efficient to get more performance. Right. That's an interesting opportunity. You can get more, you can get like larger power supplies and larger computers. I have a gaming SIM computer and it is so loud. If you get it going full power, like if the windows are open, you can hear it outside the house. It's literally that loud. But at the same time, it's not just on your personal computer, you know, in the cloud and places like that, right. You, you pay not just,

20:36

you know, how much performance you get. There's some sort of combination of how much energy does that particular processor take to run. And if it's one fifth, you might be able to buy more cloud compute per dollar. Yeah. Power and cooling is a huge part of a computer, you know, data center expenses. And even just, you know, it, you can only, you can put maybe, you know, one to 300 Watts into a CPU. You're not, you're not going to put, you know, multiple kilowatts in there or something. And so

21:04

where, where is that? What else, what else can you do? And a lot of that is that, you know, Moore's law is driven a lot by just every time you shrink the process, you do get more power efficient. And, but now it's interesting to think about architectures that have been sort of thought of that, that arm has come in into its own in a extremely power constrained environment. And so now we're letting it loose on a laptop, which has way more power compared to a cell phone available.

21:29

What could we do if we fed, you know, right into the socket in the wall? Yeah. And you know, what happens when I put it in the data center? Yeah. So that's, that's, I think arm in the data center is going to be really important. Yeah. Yeah. Yeah. I think it's, it's definitely, I'd always expected that to come before the desktop. To be honest, I was surprised as many people were by the, you know, suddenness of the Apple

21:55

transition. cause I had assumed this maybe would happen much after we all got used to arm in the data center, where you're probably running Linux and it's easy to recompile compared to, you know, Mac and stuff like that. Yeah. That's what I thought as well. The payoff is so high, right? They spend so much energy on both direct electricity, as well as then cooling from the waste heat, from that energy

22:18

that it's the payoff is just completely, completely clear. Right. All right. So let's see, a couple of things that you pointed out that make a big difference here is obviously arm versus x86, built in on chip GPU, the whole system as a system on a chip thing, rather than a bunch of pieces going through motherboard is pretty interesting. But I think the, maybe the most interesting one has to do with the acceleration, things like the Apple neural engine that's built in and whatnot.

22:46

It sounds like the data science libraries in general are not targeting the built-in neural engines yet, but maybe, maybe they will in the future. I don't know. Yeah. It's a, it's something that we're going to have to figure out because, I mean, I think it was a bit of chicken the egg that, you know, until this happened, you didn't have this kind of hardware just sitting on people's desks. and you weren't going to, you know, run, data science

23:07

stuff on your phone. So now that it's here now, the question is, okay, what can we do with it? I mean, right now, for example, you know, for the Apple neural engine, you can take advantage of it using something called Coromel tools, which actually did a webinar sometime back on, and, and, but that's like for basically you've trained a model and you want to run inference on it more efficiently and quickly. but that's, you know, that's it. There's a, there's an alpha,

23:31

uh, release of TensorFlow. That's GPU accelerated. And it would take advantage of the, you know, on the M one, if you're, if you're running it there, but that's super early. and, and there's, uh, a lot more opportunities like that, but again, that will take time to adapt. It will. I suspect as there's bigger gains to be had, they'll probably more likely to be adopted. Right. So for example, I have my Mac mini here that I just completely love, but it, it's not that

24:00

powerful say compared to like a GeForce video card or something like that. But if Apple announces something like a, a huge Apple pro Mac pro, with many, many, you know, 128 cores instead of 16 or whatever, right. Then all of a sudden in the neural engine, all of a sudden that neural engine becomes really interesting, right? And maybe it's worth going to the extra effort of writing specific code for it. Yeah. Yeah. Well, that's the other thing that's interesting about this is we've only seen one

24:27

of these chips and it is by definition, the slowest one that will ever be made. And so, it's, it's, it's, we don't even know how, you know, what is it going to be like to scale up? I mean, one of those things that is, you know, you, if you're targeting that big desktop user, how are they going to scale this up? This, this all fit on one package. Can they still do that? Will they have to split out into multiple packages? there's a lot of engineering challenges that they

24:51

have to solve and we're not sure how they're going to solve them yet out on the outside. So, we're going to have to, we have to see. It's going to be exciting to see that come along here. All right. So, let's touch on just a couple of things, getting Python packages for M1.

25:05

What are some of the options there? Yeah. So, so the status still is roughly how I have in this article, which is basically you can use pip to install stuff if wheels have been built and a number of packages like NumPy have started to catch up and have, wheels that will run on the M1. another option which works surprisingly well is to just use an x86 Python packaging distribution. I think that's actually what I'm doing because it just runs over Rosetta 2.

25:31

Yeah. And that, yeah, it just works. it is shocking. I mean, Rosetta 2 on average, I'm finding a sort of like a 20% speed hit, which for an entirely entire architecture switch is amazing. I've never seen that before. or you can use a condo forge has the, as I mentioned earlier, their, their sort of experimental, macOS arm, package distribution, which doesn't have everything, but has a lot of things, and is using them, you know, it is all built for arm.

26:00

It's, there's no translation or anything going on there. Right. And on python.org, I believe the Python is that you, if you go and download, I believe it's, um, a universal binary now for sure. So that means it'll, it'll adapt and just run on arm or run on x86. You just get one binary. The, the numpy conversation was kind of around that as well, I believe. All right. you got, you did some, performance analysis on the performance cores

26:28

versus efficiency cores. That was pretty interesting. And so that was pretty similar to hyper threading. If you want to run Linux or windows, you basically got to go with Docker or parallels. And then I guess maybe the last thing is like, let's wrap up this subtopic with like pros and cons for data scientists, people out there listening. They're like, ah, I can't take hearing about how cool the M1 is anymore.

26:47

Maybe I'm going to have to get one of these. Like, should they like, what do you think as a data scientist? Yeah. As a data scientist, my takeaway from all the testing was you should be really excited about this, but I would wait unless you are doing what I would describe as a little bit of data science on the side and not a huge amount. mainly because, you know, these, the, what they've proven is

27:05

the architecture has great performance and great battery life. The thing we still have to see is how are they going to get more Ram in there? How are they going to get more cores in there? and, and then also when is the rest of the ecosystem going to catch up on package support? so I, honestly, I, I'm, you know, if you're interested in sort of bleeding edge, knowing what's coming, I would totally jump in.

27:23

if you want this for your day to day, I would probably still wait and see what comes out next. because I think a data scientist especially is going to want some of the, you know, more cores and more Ram, especially than what these machines offer. Right. There's always remote desktop or, or SSH or something like that. Right. If you've got an Intel machine sitting around, you can just connect over the network locally. Yeah. Yeah. Very cool. All right. Excellent.

27:45

I just want to give a quick mention that Paul Everett from JetBrains and I did a Python developer explores Apple's M1 way, way back in December 11th of 2020, right. When this thing came out. so, people can check that. I'll put that in the show notes as well. All right. Let's talk about the state of data science, 2021. How'd you all find out about this? How do you know the state?

28:07

Yeah. So, this is something we've been doing for a few years now. I mean, since we have a big data scientist audience, you know, a couple of years back, we decided, Hey, let's, let's ask them about what challenges they're seeing in their jobs, but, and then publish the results so that the whole industry can learn a little bit more about what are data scientists seeing in their day-to-day jobs that's, you know, going well, going poorly, where do they want to see improvements? What are

28:31

they, what are they sort of, feeling and thinking? So you got a bunch of people to come fill out, the survey and give you some feedback and yeah, yeah, we, we, we, you know, 140 plus countries. So we have pretty good, reach across the world. and, and, you know, more than 4,200 people took the survey. So it's, yeah, we got a lot of responses. It's always amazing to

28:55

see. Yeah. Quick side thought here, I guess. So you've got in that survey, which I'll link to the PDF results in the show notes, you've got all the countries highlighted and obviously North America is basically completely lit up as like a popular place of results. So as Western Europe, Australia, and even Brazil, Africa is pretty, on the light on the side, what else can be done to get sort of more Python, more data science going in Africa? Do you think you have any thoughts on that?

29:24

No, I don't. That's a good, that's an excellent question. I don't, that's actually might be a good question for a future survey to be honest is, is I can speculate, you know, I don't know if it's, um, you know, access to the computing or if it's bandwidth or, or if it's, you know, resources available in the local languages. I mean, there's all sorts of possibilities. One thing that is really nice about Python and data science is so much of the stuff is free,

29:47

right? So it's, it's not like, oh, you got to pay, you know, some huge Oracle database license to use it or whatever. Right. So I mean, there's a real possibility of that. So yeah, I don't really know either, but, let's see, there's the standard stuff about like education level. I guess one of the areas maybe we could start on, it's just, you know, people who are doing data science, where, where do they live in the organization, right? Are they the CEO? Are they vice president?

30:15

A good portion of them were, 50% is either senior folks or managers. That's kind of interesting, right? Yeah, I can see it sort of coming out of, data science as helping in decision-making and that sort of thing. And so I can, I can see it gravitating towards, the decision makers in an organization. and, and that sort of thing. I mean, one of the interesting things that, maybe as in a later, later one of the pages is, how spread out data science is across the

30:45

different departments as well. that there was, you know, obviously it and R and D show up higher than the others. but you kind of see a long tail in all the departments. And, you know, my, my theory on that is I think we're seeing data science evolving into sort of a profession and a professional skill, if that makes sense. So in the same way that like every, you know, knowledge workers are always expected to do writing and to know how to write. Yeah.

31:10

but we also hire professional technical writers. I think we're getting into a space where we'll have everyone will need to have some numerical literacy and data science skills, even while we also employ professional data scientists. Is it the new Excel? Like if I'm, if I'm a manager, I, and I don't know how to use Excel, people are going to go, what is wrong with you? Why are you,

31:31

how did you get here? Right. You're going to have to know how to use a spreadsheet. I mean, it could be Google sheets or whatever, but something like that to, you know, pull in data, sum it up, put it in a graph and so on. And are you feel, are you seeing that more formal data science, you know, Jupyter type stuff is kind of edging in on that world. Yeah. It's, it's going to, again, I think we'll have to see sort of how the tools settle out.

31:53

one thing I know for sure is that you'll have to at least become familiar with the concept so that even if the people doing the data science and reporting to you are using whatever their favorite tool set is at least understanding their workflow and how data, you know, goes through that life cycle and, you know, data cleaning and modeling and inference and all of those things, you'll have to understand that at least enough to interpret what, what is being told and ask the

32:17

right questions about. Right. So if somebody comes to you and says, you asked me this question. So I put together a Jupyter notebook that's using PyTorch forecasting. Maybe you can do none of those, but you should kind of understand the realm of what that means. Something like that. Yes. Yes. You'll have to know at least what steps they had to go through to get to your,

32:34

the answer. So you can ask good questions about, cause if you were a decision maker, you need to be able to kind of defend your decision, which means you're going to have to at least understand, you know, what went into the inputs into that decision. Well, we bought that company cause Jeff over in business analytics said it was a good idea. Turned out he, he didn't replace the, not a number section and that really broke it. So

32:55

this portion of talk Python is brought to you by masterworks.io. You have an investment portfolio worth more than a hundred thousand dollars. Then this message is for you. There's a $6 trillion asset class. That's in almost every billionaire's portfolio. In fact, on average, they allocate more than 10% of their overall portfolios to it. It's outperformed the S and P gold and real estate by nearly twofold over the last 25 years. And no, it's not cryptocurrency, which many experts don't

33:29

believe is a real asset class. We're talking about contemporary art. Thanks to a startup revolutionizing fine art investing, rather than shelling out $20 million to buy an entire Picasso painting yourself, you can now invest in a fraction of it. If you realize just how lucrative it can be, contemporary art pieces returned 14% on average per year between 1995 and 2020, beating the S and P by

33:53

174%. Masterworks was founded by a serial tech entrepreneur and top 100 art collector. After he made millions on art investing personally, he set out to democratize the asset class for everyone, including you. Masterworks has been featured in places like the Wall Street Journal, the New York Times and Bloomberg. With more than 200,000 members, demand is exploding. But lucky for you, Masterworks has hooked me up with 23 passes to skip their extensive waitlist. Just head over to our

34:23

link and secure your spot. Visit talkpython.fm/masterworks or just click the link in your podcast player's show notes. And be sure to check out their important disclosures at masterworks.io slash disclaimer. I guess one of the requisite topics we should talk about is probably COVID-19 because that was going

34:42

to be over in a few weeks or months, but then it wasn't. So it's still ongoing. And one of the things that you all asked about and studied was basically did COVID-19 and more specifically sort of the shutdown as a result of it result in more data science, less data science, increased investment, not so much. What did you all find there? Yeah. So interestingly, I think we found that there was a sort of all different organizations

35:08

had every possible answer. So, you know, the, the, the, about a third decreased investment, but a quarter increased investment and another quarter stayed the same. And so that's, you know, there wasn't one definitive answer that everyone had for that, which is, I think probably has a lot to do with where data science is at in their organization. I mean, on one hand, data science is an activity that, is easy to do remotely. you can, you know, there are a lot

35:36

of jobs that you can't do remotely. Data science is one you could do remotely. So that, that part isn't an obstacle so much. but is a lot of it also is, has to do with risk. I mean, everyone, when they, when they face this was thinking in with their business hats on, what is the risk to my organization of an unknown economic impact of this pandemic? And so a lot of places might have viewed their data science as being, a risky still early kind of thing. And so let's pull back

36:03

a little bit. Let's not spend that money. Is it optional? Okay. We cancel it for a while. We put it on hold. Yeah. Yeah. But, but clearly interesting for, for some organizations, it was so important. They put more money in. and so it, it, a lot of it had to do with just where you're at in the journey. I think industries, you found out where people were doing data science, obviously technology, right? Tech companies. I'm guessing this is like Airbnb, Netflix,

36:26

those kinds of places. There's a lot of data science happening in those worlds. Academic was number two. Yeah. I mean, data science is a, is still a actively researched thing. I mean, as we, as you see, sometimes it's hard to keep up with all of the new advancements and changes and everything, not just in the software, but in techniques. And so academia is super busy on this. you know, banking is also a top one because, I kind of think of banking and finance as being some of the,

36:52

you know, the original, you know, corporate data scientists in some ways. and so obviously there, it was interesting to see automotive actually score so highly. It's that's, that's the one that surprised me as well. Automotive is 6% and the highest industry was 10%. So yeah, that's really quite high. Yeah. I wonder how much of that is self-driving cars.

37:12

You know, I don't know that. I mean, the other one is, you know, as we've heard with the chip shortages, supply chain logistics is an interesting use of data science to try and predict how much supply of all the things you're going to have, where and when, and how should you transport stuff. And I imagine car manufacturing is especially, challenging, especially now. Interesting. Yeah. They, they really shot themselves in the foot, didn't they? When they said,

37:36

you know what, all these extra chips, people aren't going to need cars. They're not going to buy cars during this downturn. So let's cancel our order. We'll just pick it back up in six months. And six months later, there are no chips to be had. So, we have it. Yeah. I mean, GM, I think it's even shutting down a significant portion of their production in the U S because they're just out of chips, which is crazy. Antonio out in the live stream says he's doing

38:00

data science with his team in the energy oil and gas industry. And we're not the only ones. Yeah. It's funny that doesn't appear in the list. we, we, we don't have energy, but they're, they're, you know, down to 2%. again, all of the percentages are low because there's so many industries and everyone was in all, it was all over the place, but yeah. Team size is interesting. I think one of the things that it's interesting here is what I think of

38:22

software developers, they kind of cluster together in like development team groups, right? They've got the software development department, maybe in a company or a team building a piece of software or running a website. To me, data scientists feel like they might be more embedded within little groups. There might be a data scientist in the marketing department, a data scientist in the DevOps

38:46

department and so on. is that maybe correct? Yeah. I think we've seen companies actually do both at the same time, even where sometimes they'll have, I mean, one of the things we have listed is a data science center of excellence. and, and what that ends up being is a, some sense, a group that is pathfinding for an organization. They're saying, okay, these are the best practices. These are the tools. This is what to do, figuring that out and then rolling it out to all the departments who have

39:10

their embedded data scientists who can take advantage of that. cause I think it's valuable to have a data scientist embedded in the department because one of the most important things as a data scientist is your understanding of the data you're analyzing and your familiarity with it. that I would, I would really prefer the person analyzing, you know, car supply chains, understand what goes into that and also no data science as opposed to a data scientist for whom it's all numbers and they don't

39:33

know. Right. If you could trade absolute expertise in Git versus really good understanding of the problem domain, you're probably better off going, you know what, just keep zipping it up and just really answer these questions. Well, I mean, you don't actually have to make that trade off, but I agree that domain knowledge is more important here. Yeah. So it had the highest, so think of the departments where

39:55

data scientists live. It was pretty high than R and D and then this data center, center of excellence you spoke about, then ops finance, administration, marketing, human resources. It's really spread out, which is sort of what I was getting at before. Yeah. Yeah. So, so I think there are a lot of, uh, seeing a lot of organizations build their data science expertise, ground up department by department and then maybe we'll coalesce some of it into, you know, a single, single department

40:22

at some point. Right. Maybe that department makes like the APIs for the rest of the sort of isolated folks and so on. one that was interesting is how do you spend your time? I mean, you think about these AI models or these plotly graphs and all these things that data scientists produce. Then there's the quote that data cleaning is not the grunge work. It is the work, right? And you sort of have this chart

40:43

of like, how do you spend your time? And 22% is data preparation, 17% on top of that is data cleaning. And so, yeah, that's pretty significant portion of just getting ready to ask questions. Yeah. And that's, and that really, that that's the piece that requires that domain expertise to know what you're looking at, what's relevant, what problems it'll have. No data set is perfect and,

41:04

and, no data set is perfect for all questions. And so, even if, you know, you can't ever clean the data just once, cause what you're doing is preparing it for the questions you're going to ask. And so you need someone who can, you know, understand what's going to happen there and do that. And that's what, that's really the expertise you want. Yeah. Cool. Another topic

41:22

you asked about was, barriers to going to production. So, some pretty intense graphs, many, many options across many categories, but basically you asked, what are the roadblocks do you face when moving your models to a production environment? The, you know, intense graphs are really that everyone has a slightly different perception of this depending on what seat they're in. Are they, are they the analyst? Are they the data scientist? Are they the DevOps person? Everyone

41:48

has a different answer for what the roadblocks are. right. And, and which is makes sense because you're going to see what is relevant to your job. when you, when you sum everyone up, you, you kind of sort of see this even split across it security. Honestly, what I found interesting was that there was both converting your model from Python or R into another language and also converting from

42:09

another language into Python and R. Yeah, exactly. So one of the challenges that people had was just like you said, recoding models from Python or R into another language and then the exact reverse. And they were almost exactly tied. 24% of the people said, Oh, I got to convert these Python models to Java or whatever. The other people are like, he's got this Java model. I got to get into Python so I can put it in FastAPI on the web. Right. Something like that.

42:36

Yeah. Anecdotally. I mean, I think the, the, the, you know, maybe we'll have to change the phrasing of this question in the future because putting Python and R together might have, conflated a couple of things potentially. cause so I just know anecdotal evidence. you know, we have talked to customers who their data scientists wrote everything in R, but they didn't want to put R in production and we're asking them to recode it into Python because Python was okay for production.

43:00

but I've also had the conversation. People are like, we don't have our data modeling in Python and Python's not okay for production. Java is okay for production. and, and so it's, it's this weird problem of companies have built up how they do deployments on specific languages. And those aren't

43:15

the languages that people are doing data science in all the time. Right. And I suspect in the Java one, it's just like, we have a bunch of Java APIs and apps running and those people that do that stuff, they run those apps and you're going to give us a model that's just going to fit into that world. But if you are already running Python for your web servers, just put it in production. It's, it's already right there, right? Yep. Yep. Yep. Yeah. Yeah. Quite interesting. Okay.

43:40

let's see. I'll flip through here and find a couple more. one was interesting. It was about open source, enterprise adoption of open source. yeah, you may want to speak to the results there. Yeah. I wish we could have asked this question 10 years ago, cause I think it would have been fascinating to compare to now. Yeah. yeah. It's the trend that's super interesting. Yeah.

43:59

The, you know, the, one of the surprising things for me was the outcome that said, uh, well, less surprising was 87% of organizations said that they allow the use of open source inside the organization. I think that's not too surprising. I mean, even just Linux is kind of like this sort of baseline. How is your organization functioning without Linux? Yeah. and then almost what

44:19

programming language could you choose these days? That's not open source, right? You know, the, you've got Java, you've got.net, like especially.net was one that wasn't open source is pretty popular. Like too late. That's all open source and installed through package managers now. And then then the move to Python. And yeah, I mean, I can hardly think of a language or a place to run where you can't use some level of open source. Yeah. But the second question, which was,

44:46

does your employer encourage you to contribute to open source? I was surprised to see 65% said, yes, that is a, a huge fraction and, is interesting because, that has not always been that high. I know that we have spoken again to, you know, people who have said, Hey, you know, my, I wish I could contribute, but my employer, we just don't have a policy for this or we don't have a way to do that. Yeah. I used to hear that a lot, right. That it's just, it's, it's too complicated.

45:11

I might leak something out. yeah. Or bring in some GPL stuff and mess up our commercial product or whatever. Right. Yeah. So I don't know how all these companies have, have solved that internally, but I am excited to see, that there's now a huge potential base of open source contributors out there that, commercially that there wasn't before. I do think there's something about creating

45:34

a culture for software developers and data scientists where they want to be. And people don't want to be in a place where they're forced to use just proprietary tools that are old and crusty, and they're not allowed to share their work or talk about their work. And, you know, there's people who would do that, but as a, I would love to be in that environment. Like that's not that feeling and,

45:52

you know, talent's hard to come by. So you, you will probably create environments that attract the best developers and the best developers don't want to be locked in a basement told they can't share or contribute to anything. Yeah. Yeah. I definitely agree with that. Another thing that's hot these days, hot in the, as you don't want it, but it's a very hot potato style is, supply chain stuff and open

46:16

source pipeline issues. Right. And the survey actually mentioned that one of the problems that people mentioned, one of the reasons that they don't want to use open source is they believed it was insecure because our $20 billion bank is now depending on, you know, this project from Sarah about padding numbers or whatever, right? Like if somebody takes over a thing, we're going to pip

46:39

install a virus into the core trading engine. That's going to be bad, right? Like that's an extreme example, but you did ask about what people are doing to secure their, basically the code they're acquiring through open source. Yeah. And this is something, I mean, we're interested in just generally because there's a lot more focus on security and you see more reports about supply chain attacks on software. And so we're curious how different organizations are tackling the problem.

47:01

uh, obviously the most unsurprisingly, the most, popular answer at 45% was they use a managed repository, which interpret to mean, basically it's kind of like you have a private mirror of the packages that are approved in your organization and everyone pulls from there, not from the internet directly. which is a, a, a smart approach because it gives you a natural sort of gating, thing that you can do where there is an, there is a review process to bring new

47:26

software in there. and, and so there's a lot of, you know, things here. I mean, obviously even commercially, we sell a repository for condo packages, for precisely this reason, uh, because, customers want some governance and are more than happy to, pay us. Yeah. Team edition, is our on package, repository. and so this is a, this was an ask for customers, which is why we, we built this product, is they were like, Hey, we want your stuff, but we want

47:55

it inside our firewall. We don't want to go directly to your public repo. You want to opt in to say, yes, we want the new numpy, not just, Oh, somebody randomly pushed them, pushed something out. And so we're going to just grab it and assume that it's good. Right. You can apply policies as well. That's

48:11

common as a lot of places will say no GPL software for various reasons. or they might say, Oh, you know, if there are reported, you know, CVEs, these security, reports that, you know, go through NIST, they might say, I want no packages with a CVE more severe than some level. and those, the, every IT department wants some, some handles to control that kind of policy, decision-making. And so, yeah, so that's obviously that, that I think that's why that's the most popular

48:39

option is it's the easiest, thing to get a handle on. It is. Yeah. You can set up a private PI PI server. Yep. Pretty straightforward. there's a cool article on testdriven.io, but yeah, the Conda and the Conda version that you all offer. That's pretty cool. 45% as high. I didn't expect that many companies to have a private repository. It's good, but I don't, I just expected it to be, I don't know, lower. Yeah. I, although on the other side, you know,

49:07

that means 55% of those were just downloading random stuff from the internet. So, so it's good. I think the message is getting out that you have to think about these things from a risk perspective. Another was 33% of the organizations do manual checks against a vulnerability database. Yeah. So this is, what I was describing earlier. The CVE databases are often a common,

49:28

uh, vulnerability, manual checks. That's a lot of labor. so I, I, it'll be interesting to, um, see how many places move to automating that in some fashion in order to, the hard part there is those databases have, again, to data prep and data cleaning often to make use of those public databases. You need to do some amount of curation because there's a lot of stuff that ends up in there that's mistagged or unclear or whatever. and so a lot of this manual checking

49:55

is probably also just doing that curation. One of the things that's nice. Yeah. One of the things that's nice is, GitHub will now do automatic PRs for security problems that it knows about at least. Yeah. Those, that kind of automation is going to be really, important, I think in the future, just because you can't manually go through all those things.

50:11

What are you seeing around source control? You know, source code algorithms, these are really important and people want to keep them super secure, but if they put them on their own private source code repositories, they lose a lot of benefits like automatic vulnerability checking and stuff like that. What's the GitHub or GitLab versus other stuff, maybe enterprise GitHub. What's the trends there? The, the interesting thing there is, is yeah. you know, everyone is using source control at

50:39

some point and they oftentimes they want it managed inside their firewall. And so yeah, things like GitHub enterprise and things and GitLab are pretty popular for that. a lot of times I think what a places will do is they'll use, some kind of the next item here, the 30% said they're using a vulnerability scanner. A lot of those vulnerability scanners you can use on your own internal source

50:58

repositories. And so that way they're, they're not taking advantage of GitHub automatically doing that for them, but, they at least have some solution probably for looking for stuff. 20% said they have no idea what they're doing. And then another 20% said we're not doing anything. Well, I'm sure of it. Let's maybe close out this overview of the survey results here by talking about Python, Python's popularity. Is it growing? Is it shrinking? Is everyone switching to Julia or have

51:29

they all gone to go? What are they doing? Yeah. So I think, I think Python's advantage here is being a, pretty good at a lot of things. And so it ends up being a natural meeting point of people who are interested in, you know, web development and data science or system

51:45

administration automation and all of that. So I think, I think Python still has some, some growth to go, but I mean, what's interesting is, is, you know, in our survey, the second, I would say the second most popular, was SQL, which has been around forever and is going nowhere. Those are often used. Yeah, exactly. And they're often used in parallel, right? Like, yeah, I'm going to do a SQL query and then run some Python code against the results, that type of thing.

52:07

Yeah. Yeah, definitely. I, I'm a big believer in that there is no one language for everything and there never will be. but there is, you know, a lot of different options that people are looking to. I mean, go make sense for a lot of sort of network service kind of things. I mean, Kubernetes is built almost entirely out of go. but, I'm not sure if I'd want to do any data science and go at this point. and so it's going to always be a mix. It might not even be that

52:33

you're doing one or the other. You might be doing both. Like for example, maybe you've written some core engine and rust, but then you wrap it in Python to program against it. Right. It could be both. I guess it could even be a more combination than that, but, yeah, the popularity of Python looks, looks strong. So it looks like it's still right up near the top. I mean, obviously the group that you pulled is somewhat self-selecting, right. But that's still a general trend outside of your space.

53:00

Yeah. Yeah. This is definitely going to be skewed to Python because otherwise, why are you taking an anaconda survey? But, but still I think, yeah, it is definitely something you see broadly in the industry as well. Well, speaking of a different languages and stuff out in the live stream, Alexander Semenov says, just learned that I can use rust in JupyterLab with some help from Anaconda. My mind is blown. Good job.

53:21

Yeah. That's the one thing I should mention about Python is one of the advantages is if you're using Python, you're probably benefiting from most of the languages on the stack, even if you're not writing them. And so the ability of Python to connect to anything is I think it's strength and why it continues to top these lists. Yeah, absolutely. And then Paul out there has a question about the

53:43

commercial license. And I guess there was some changes to it. Can you maybe speak to that? I don't really track the details well enough to say much. Yeah. So what, what we did was, our, the Anaconda distribution packages have a, terms of service that says, if you are in an organization above a certain size, we want you to have a commercial license if you're using it in your business. I forgot the exact threshold,

54:05

uh, where that's at. and, and the reason there was to help one support the development of those packages. And I should say, by the way, that terms of service does not apply to Condo Forge. Obviously those are community packages. but if you, if you want the assurances that Anaconda is providing on those packages and you are a company of a certain size, uh, we would like you to have a commercial license, that allows us to support you more directly.

54:28

It allows us to fund, continued work on those packages. And that's, that's sort of, it was, it's a sustainability thing, I think. but it, it's, for most people, it's not an issue. cause they're either below that size or you're just using it individually. Do you know what that size is? What that cutoff is? I do not recall off the top of my head. And so I'm afraid to quote a number. Yeah. Yeah. Sure. No, no worries. Cool. All right. Well, thanks for giving us that. I mean,

54:52

it seems fair that large companies benefiting from your tools contribute back. I think that statement should be applied to open source in general. If, if your company is built on Python, you should give back to the Python space. If your company is built on Java, it's Oracle. I don't know if they need help, but you know, in general, if you're built on top of something, there's a lot of support you can give

55:13

back. Right. It's, it's kind of insane to me that, you know, banks that are worth many, many billions of dollars do very little in terms of like directly supporting the people who they're built upon. Right. they hire a pay for a couple of people building the core libraries. Like if you're using Flask, right. Support the Flask, pallets organization, something like that. Yeah. And then we in turn, you know, take that licensing money and some fraction of it goes to

55:38

num focus for, the broader sort of data science open source community. In addition to, you know, us directly funding some open source projects as well. All right. Well, we're about out of time, Stan, but let's talk real quickly about Piston because Piston is not, rewriting Python and rust. It's not replacing it with Cython or just moving to go. It's, it's about making core Python faster, right? Yeah, this is something, I mean, we've been thinking about, performance in Python for a

56:06

long time. one of the early projects that, you know, Anaconda created is called number. It's a Python compiler. It's focused on numerical use cases and it really is, does its best job, in dealing with that kind of numerical loop heavy code. but it's not a, it's not going to optimize your entire program, but optimize a specific functions. And so number has is very good at a very specific

56:28

thing. And so we've been thinking for a long time about how we could broaden our impact. And so when I saw that, Piston, which I, you know, among many pilots on compiler projects had reemerged in 2020, with a new version written from scratch, based on Python 3.8, as a just in time compiler in the interpreter. So it's designed to optimize any Python program. it can't necessarily do any given thing, as fast as number might be for a specific, you know, numerical algorithm, but the

56:56

breadth is, is really what, is interesting to us. and so I saw this project had emerged, Piston 2.0 kind of came on the scene. I started looking more closely at it and we started talking with them. and we realized that there's a lot that I think the Piston Anaconda could do together. And so, we, have hired the Piston team on, to our open source group. So they are funded to work on Piston the same way we fund open source developers to work on other

57:20

projects. and so we're really, but the benefit that, there's other, help we can give and resources and infrastructure that we can offer this project. And so we're really excited to see where this is going to go from here. Yeah. I'm excited as well. All these different things that people are doing to make Python faster for everyone, not just, well, let's try to recompile this loop, but just you run Python and it just goes better. I think that's pretty exciting. You know, we've got

57:44

the cinder projects from Facebook. Yeah. This is a really good year for Python, optimization projects. I should be careful about typing that into a search engine, but, but the cinder project is, is not something that's publicly available really. it's not like a supported improvement, but it's a, here's what they did at Instagram. There's a bunch of speed ups. Maybe you all can bring some of that back into regular Python, but yeah, it's, there's a lot of these types of ideas.

58:12

And yeah, awesome. Looking forward to see what you'll do with this. And, you know, the cPython core developers, have even announced that they're going to, you know, undertaking a new effort to speed up cPython. and so we will, we're looking to collaborate with them. they, they're going to have to, you know, figure out how, what they can do within the confines of cPython, because you are the Python interpreter for the world. Yeah.

58:35

And so you need, you need to be careful, but there's a lot they're going to do. And we're going to try and share ideas as much as we can. because these are both open source projects. Right. A lot of the challenges have been in compatibility, right? Like, oh, we could do this, but then C extensions don't work. And those are also important for performance in, in big ways and other stuff, but yeah, so they do have to be careful, but that's great. All right. Final comment,

58:58

real quick follow-up from Paul. I'd like my company to do more open source, more to do more to support open source. Any advice on promoting that? Yeah. I think the best, first place to start is identifying what open source does your company absolutely rely on. and especially if you can find an open source project that you absolutely rely on, that doesn't seem to be getting a lot of support, and then go look at those projects and see what are they, you know, do they have an

59:24

established way to donate funds? do they have, you know, other needs? that's something I think that is easier to sell as you say, look, our organization absolutely depends on X, whatever this is, as opposed to picking a project at random. it's easier to show a specific business speed. Yeah. Yeah, for sure. You can say, look, this is the core thing that we do and it's built on this rather than, oh, here's some projects I ran across. We should give some of our money away.

59:47

Yeah. That's a hard, harder sell to, stockholders, I guess. All right. Well, Stan, this has been really fun. Let me ask you the final two questions before we get out of here. if you're going to write some Python code, what editor do you use? So if I'm on, if I'm on a terminal, it's Emacs. if I have an actual GUI desktop, I'm usually using VS Code these days. And then notable PI PI package or conda package that you're like, oh, this thing is awesome. People should know about whatever.

01:00:13

Yeah. you know, wearing my, my GPU fan hat. I think a lot more people should know about CUPI. C U P Y it's a, Python package. That's basically if you took NumPy, but made it run on the GPU. it's a, the easiest way I can think of to get started in GPU computing, because it just uses NumPy calls that you're familiar with. so I would highly recommend if you are at all curious about GPU computing, go check out Coupy. So over there on that computer I have over there, it has a G force,

01:00:40

but on this one, it obviously doesn't have Nvidia on my Mac. does that work? Cuda cores, the CU part of that is for the Nvidia bits, right? What's my GPU story. If I don't have Nvidia on my machine, not as clear. yeah, there, the, you know, CUDA has kind of come to dominate the space, um, being sort of, first out of the gate, the, there's a lot more Python projects for CUDA. I'm, there are not, really clear choices, I think for AMD or for like, you know, built in GPUs,

01:01:12

uh, at this point. although I've definitely watched the space, you know, Intel is coming out with their own GPUs, sort of this year and starting next year. and they have been collaborating with various open source projects, including the number project, to build Python tools to run on Intel GPUs, both embedded and discrete. So, yeah. Okay. So this may change

01:01:33

in the future. It'll be interesting to see. Final call to action. People are excited about, you know, digging more into these results and learning more about the state of the industry. What do they do? go search for a state of data science, Anaconda, and you'll find the results of the survey. I would, there's a lot of detail in there. So I would definitely go through and take a look at all of the charts and things. Cause there's a,

01:01:53

there's all kinds of topics covered in there. Yeah. I think it's 46 pages or something. And we just covered some of the highlights. So absolutely. All right, Stan. Well, thank you for being here. It's been great to chat with you. Thanks. It's been great. You bet. This has been another episode of talk Python to me. Our guest on this episode was Stan Siebert, and it's been brought to you by shortcut masterworks.io and the transcripts were brought to you by

01:02:16

assembly AI. Choose shortcut, formerly clubhouse IO for tracking all of your projects work because you shouldn't have to project manage your project management. Visit talkpython.fm shortcut. Make contemporary art your investment portfolio's unfair advantage. With masterworks, you can invest in fractional works of fine art. Visit talkpython.fm/masterworks. Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code. Visit talkpython.fm slash

01:02:47

assembly AI. Want to level up your Python? We have one of the largest catalogs of Python video courses over at talkpython. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm. Be sure to subscribe to the show, open your favorite podcast app, and search for Python. We should be

01:03:10

right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

01:03:50

I really appreciate it. Now get out there and write some Python code. And I'll see you next time.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript