#538: Python in Digital Humanities

Michael Kennedy

00:00

Digital humanities sounds niche until you realize that it can mean a searchable archive of U.S. amendment proposals, Irish folklore, or pigment science in ancient art. Today I'm talking with David Flood from Harvard's DARTH team about an unglamorous problem. What happens when the grant ends? But the website can't. His answer? Static sites, client-side search, and sneaky Python. Let's dive in. This is Talk Python To Me, episode 538, recorded January 22nd, 2026.

00:48

Welcome to Talk Python To Me, the number one Python podcast for developers and data scientists. This is your host, Michael Kennedy. I'm a PSF fellow who's been coding for over 25 years. Let's connect on social media. You'll find me and Talk Python on Mastodon, BlueSky, and X. The social links are all in your show notes. You can find over 10 years of past episodes at talkpython.fm. And if you want to be part of the show, you can join our recording live streams. That's right.

01:14

We live stream the raw uncut version of each episode on YouTube. Just visit talkpython.fm/youtube to see the schedule of upcoming events. Be sure to subscribe there and press the bell so you'll get notified anytime we're recording. This episode is brought to you by Sentry. Don't let those errors go unnoticed. Use Sentry like we do here at Talk Python. Sign up at talkpython.fm/sentry.

01:37

And it's brought to you by CommandBook, a native macOS app that I built that gives long-running terminal commands a permanent home. No more juggling six terminal tabs every morning. Carefully craft a command once, run it forever with auto-restart, URL detection, and a full CLI. Download it for free at talkpython.fm/command book app. Hello, David. Welcome to Talk Python To Me. Amazing to have you here.

David Flood

01:59

I'm glad to be here. Talk Python has been part of my story up to this point.

Michael Kennedy

02:03

Has it? Okay. Well, you are about to write the next chapter in the story. So that's pretty excellent. I have a sense of what's coming. We planned out what we're going to talk about and that sort of thing. And I'm really excited about this topic. So it's going to be a good one. Honestly, I think one of the real powers of the Python community and the reason the language has such staying power is there's such a diversity of use cases, technology, like technology standpoints, right?

02:34

Like I build software for this group or I build these types of apps and it's not just, you know, like Ruby on Rails, which, you know, it's been very popular, but it's, it's for websites, right? You know what I mean?

David Flood

02:45

Yeah, absolutely. I mean, web development has dominated my use of it, but my entry into it, which I suppose I'll mention in a moment, was through all those little tools.

Michael Kennedy

02:57

Let's hear it. Who are you, David Flood? Tell us, introduce yourself real quick and tell us about how you got into it.

David Flood

03:04

So my background is in music and the humanities. I mean, in 2019, I didn't know what Python was or the name of any programming language. and I've been doing textual criticism, which is, you know, there's lots of criticisms in the academy. This is the one where if you have lots and lots of versions of the same text, you are comparing them to work out what the initial text was and like how it changed over time.

Michael Kennedy

03:33

Okay, give us an example.

David Flood

03:35

Okay, so one of the famous examples, hope I can remember it off the top of my head, is from Shakespeare. We're all familiar with the line to be or not to be. is the question. That is the question. Well, there's a variant of it. One of the early copies written by Shakespeare himself has... Somebody's going to be able to type into the chat exactly what it is. They'll know this anecdote. But it's something more like, "To be or not to be, I."

04:04

That's the question. And so, which one is the original one? Why did he change it? That's kind of one example i work mainly in the in the new testament which is especially complicated because no other corpus from ancient history has as many copies of the same text as that corpus does so it's quite um quite quite complicated and our techniques have have grown grown because of that and perhaps

Michael Kennedy

04:29

become more advanced than now i mean that many variations over that huge span of time over different groups with different, maybe not intentions, but certainly colored by different worldviews and philosophies and so on. And yeah, I see the trouble.

David Flood

04:47

No, yeah. And they were people of the book. So copying it is something that happened a lot. And they copied the monks, like the medieval monks copied everything. They copied our Greek classics. So that's what I was interested in. And because of the wealth of data that we have, Computer tools are more and more important in that field. So when I started my PhD in 2019, I knew that I wanted to use some of these cutting-edge tools. Some of them may be surprising.

05:19

For example, we've been using phylogenetic software. This is software that evolutionary biologists are using or computational biologists are using to track, for example, how COVID strains mutate over time. Oh, interesting. What they're comparing are the DNA letters. And so you have the sequence of letters and you're comparing how those change over time. Well, you can swap in textual variants for DNA letters.

05:48

And now we can track how texts change over time and group them into families, things like that.

Michael Kennedy

05:56

It's like a time series, but of words or letters or something.

David Flood

05:59

Yeah, I mean, yeah, there's lots of important algorithms for comparing sequences of things. And so if we can just swap in Greek words and Greek text instead, then we can maybe apply it to textual criticism. So I was pretty interested in those things. That wasn't actually the method that brought me into it, but something like that, kind of computer intensive tools. What I learned is that these tools weren't actually available to me. They

06:27

weren't desktop applications. And for the most part, they weren't public web applications. They PyPI or something like that, right? Yeah, exactly. Exactly. Or Java. And I needed to glue them together. So the long story short on that is during the first year of my PhD, I was picking up Python, watching YouTube videos while I was doing the dishes. And then the pandemic hit while I was living in Edinburgh in Scotland, probably not far from Will McCoogan.

06:59

And so the pandemic gave me the excuse to spend even a few more hours each day picking up these new, these new technical skills. And so I did it, I was able to use these advanced tools in my in my work. But what was really important to me was sharing, like making that available to my colleagues, is I had to I had to move from writing these like bad top to bottom Python scripts into things that could be reused by other people. And that led me into the web, because the web is where that's how

Michael Kennedy

07:29

I can share with anybody. It's really wild how much the web is kind of the last bastion of app freedom. It's so bizarre because, you know, I've many times told the stories of the insane battles of just getting our apps that just playback video of content that's already on the web into the app store. I mean, weeks of fighting about the weirdest, most nonsensical things with

07:54

both Google and Apple. But we also now have the Mac platform and the Windows platform very aggressively looking for digital code certificates and all sorts of signing and other kinds of proof like it you can't even just send somebody an executable anymore it won't run it's it's crazy

David Flood

08:13

it's it's down to like okay put it on the web i guess that's right i i i played the game of distributing desktop apps that's how i did it that's why i initially distributed things um and at this point i just require people to install python and then install my desktop app from pypi because it's too hard otherwise for me. I mean, I could pay for the code signing from Apple and do all of that, but it's just, it's too much work for the time that I have.

Michael Kennedy

08:40

Yeah, I'm about to do another round of it. I'm working on an app and my developer account is still active. So we might have a fresh round of fun. Hopefully it goes through this time. Anyway, I do think it's such a challenge. And are you leveraging? I don't know if the timing was right. Like maybe this was too early, but these days, are you leveraging things like uvx to run, or are you just pip install this thing and then run it?

David Flood

09:04

Yeah, I haven't updated the readme in a while, so I think it just asks for pip. But certainly, if somebody asked me today, I would say, yeah, just install this with uv. Because then they don't even need Python. Exactly.

Michael Kennedy

09:17

And that's brilliant. And that's a really, it is another barrier reduced in distributing these applications, right? Like, if you can get uv installed on a machine, then you don't even have to say install, just The way you run it is uvx my thing and it's all transparent to you, right? Which is beautiful. So what was it like?

09:35

Yeah. So what was it like coming from what sounds like a not super screen focus, super techie aspect and having to dive into this world and someday you're probably like, how is it that I'm publishing stuff to PyPI? What has happened to me?

David Flood

09:51

Yeah. well, yeah, I remember when I, when I first signed up for GitHub, because you know, whatever YouTube tutorial I was working through at the time, you know, said that I needed to do that. You know, I think it all started making a lot of sense. I didn't have any technical background, but the world kind of open source software, it just kind of made sense. It felt like it fit really well into my academic, you know, circle. I think a lot of the attitudes are

Michael Kennedy

10:23

similar. I agree. I think they are actually. And I think that's, I think that's a pretty neat thing. Yeah. Very cool. All right. Well, let's talk about what you're doing with digital humanities. You're actually at a really interesting project or organization, I guess, that does many projects,

David Flood

10:40

right? Yeah. Yeah. So fast, fast forwarding, I did, I finished my PhD in the humanities. Sorry. I had so much fun. No, that's fine. That's fine. I had so much fun writing like these tools and then just solving the distribution problem to share them with other scholars. That was so fun that I was open to this kind of opportunity where now I'm doing this full time. And so, yes, so I'm on the, we call it affectionately Darth, which is digital arts and humanities at Harvard.

Michael Kennedy

11:11

There has to be a lot of Star Wars memes and references, I'm sure.

David Flood

11:14

If you can pull up a 404, I think there will be a Darth Vader reference.

Michael Kennedy

11:19

Seriously, I'm here for it. Yes, page not found. I find your lack of nav disturbing. You know what? I think that is beautiful. And I really, I really think that people should embrace the 404, the fun 404 page, you know, more, right? There should really be something going on that like makes it, you know, something hasn't worked out, but you can just, you can make people laugh. Yeah. I appreciate that.

David Flood

11:48

I've heard people push back against it. Like if you're on a, if you're on like your medical website and you're maybe about to get bad news and then you get like a picture of a kitten.

Michael Kennedy

12:00

Dr. Kitten doesn't know where your results went. Like I get that. That's not funny. But I mean, most things are not that serious.

David Flood

12:06

Yeah.

Michael Kennedy

12:07

Mostly. Okay. So what kind of things does Darth do? You've described this as kind of a web or tech agency within Harvard. Yeah, it is very much.

David Flood

12:18

So, you know, Harvard has a gigantic IT group. I don't know how many hundreds of people work, but more than 500 people in IT. We are a small team and we operate very much like a small agency. So usually what happens is a faculty member has a funded research project that's going to last for an amount of time. And then we consult with them to build it. And most of the time, I kind of think of these as I kind of have these different categories of these kinds of projects that I think of.

12:54

I lost in my notes what I call them. But they are there. You have like a one is like a virtual research environment. So the focus is this is this is a platform that we're building for the research to be done on. Like the reason the research should be done in like a web app would be because you have access to visualization, to Postgres, to Pandas. So we can kind of build up this platform to do the actual research on and some of the data entry.

Michael Kennedy

13:23

So like a full on research application. Yeah, exactly. I guess you can also kind of see your work through the different stages of research projects and academic research and so on. And we'll get to maybe end of life in a sense further down in the conversation. But so this would be we have a grant or we just work here and we're going to work on some form of research. What do you give them?

13:50

Right. And I think that's a super interesting challenge because one of the real common answers would be Jupyter, Jupyter Lab, Marimo, whatever. But that's still pretty code heavy for people who are possibly philosophers or something, you know.

David Flood

14:05

Oh, exactly. That's why in digital humanities, I won't even, maybe I won't even attempt to define it in any narrow sense, because I'll get in trouble with somebody. But you have two groups that are interfacing with each other. And one is digital humanities as a field, like as a subfield, all of its own. And these are people who have humanities domain, like knowledge, and technical skills, and they're bringing them together. And in a lot of cases, the audience for

14:36

that kind of work is other people working in the digital humanities. But far more common, and this is what we work with, is people who have humanities domain expertise, and they want to publish or do research or share with other people who have that same humanities domain expertise, and they are now interested in adding a technical component to it. How can we supercharge what they have?

Michael Kennedy

15:03

This portion of Talk Python is brought to you by Sentry. I've been using Sentry personally on almost every application and API that I've built for Talk Python and beyond over the last few years. They're a core building block for keeping my infrastructure solid. They should be for yours as well. Here's why. Sentry doesn't just catch errors.

15:22

It catches all the stuff that makes your app feel broken, the random slowdown, the freeze you can't reproduce, that bug that only shows up once real users hit it. And when something goes wrong, Sentry gives you the whole chain of events in one place. Errors, traces, replays, logs, dots connected. You can see what's led to the issue without digging through five different dashboards.

15:42

SEER, Sentry's AI debugging agent, builds on this data, taking the full context, explaining why the issue happened, pointing to the code responsible, drafts a fix, and even flags if your PR is about to introduce a new problem. The workflow stays simple. Something breaks, Sentry alerts you, the dashboard shows you the full context, Seer helps you fix it and catch new issues before they ship. It's totally reasonable to go from an error occurred to fixed in production in just 10 minutes.

16:12

I truly appreciate the support that Sentry has given me to help solve my bugs and issues in my apps, especially those tricky ones that only appear in production. I know you will too if you try them out. So get started today with Sentry. Just visit talkpython.fm/sentry and get $100 in Sentry credits. Please use that link. It's in your podcast player show notes. If you're signing up some other way, you can use our code talkpython26, all one word, talkpython26, to get $100 in credits.

16:41

Thank you to Sentry for supporting the show. Maybe just take a moment and speak to, maybe, I don't know if this venue will actually speak directly to anybody who I was imagining here, but people who work with folks, what would you tell somebody who works with a group who have some technical skill, who could create some of these things that we're going to talk about, but the people who they've created for don't necessarily

17:02

think they need it or know that they need it. I've gone often on rants about how programming is a superpower, not a replacement for your job, right? Yeah. That's a problem for a lot of people,

David Flood

17:15

especially because you might use some new computer tools to supercharge your research. But the article that you publish or the research output of that, the audience, they may not be interested in hearing about that at all. And so for most people who are working in this space, the tools, you have to use them in such a way that you can talk about the research output without talking about the tool. And we have other venues to talk about the tools themselves, like the Journal for Open

17:43

source software and you can kind of get some of it out there. But that is a, that's the significant challenge is convincing people that it, that it could be useful and then convincing the audience that they should be interested in kind of the methods behind how some of the new research comes

Michael Kennedy

17:57

up. Also, I think I'm a big believer that presenting stuff in the right order is really, really important. If you present your research and it's beautiful and powerful and oh, look, we've also, by the way, covered a hundred times more data than any prior research. Surprise, I wonder how I did that. And then people are like, this is amazing. Then after you kind of hook them with the inspiration and what's possible, then you're like, let me tell you about the tool.

18:21

And all of a sudden you're like, that's a cool tool, right? This is not just like geekery, like programmer, you know, Charlie Brown speak, wah, wah, wah, wah, wah. You know, it's like, no, I'm listening. Tell me now.

David Flood

18:30

Yeah, exactly. I mean, one of the things I think that really opens people's eyes is a really powerful search interface. You have all of this research data. just put it behind Elasticsearch with some really good filtering on it. And all of a sudden you have fast, rapid access to the data in a way you never had before. Like you were never scrolling through the Excel spreadsheets and finding exactly what you wanted, like you were with this new search

18:55

interface. And that by itself is like so simple. We're so used to that in web development that like everything needs to have a fantastic search now. But so many people have their data locked behind, you know, a terrible search interface.

Michael Kennedy

19:07

Yeah, just a few things to sort of expose that. So this, give us a sense of what these data exploration web apps might look like. These are probably kind of mostly stuck to the inside, kind of internal to the research lab research team groups and so on. These are probably not that public facing, right?

David Flood

19:24

Almost everything we work on does end up having a public facing component. So maybe the research itself is done, locked behind a user login. That's just for the researchers. But then they expose that research to the public, usually with a good search interface and different pages for exploring their data and visualizations and things like that. So yeah, everything we do ends up becoming a production public web app in the end.

Michael Kennedy

19:52

And then another one of your categories, you put it was virtual research environments like data entry, publishing, authoring, collaboration. Tell us about that.

David Flood

20:01

Yeah, so a good example of this maybe is one of the projects that... Well, actually, the best example of it is the project I worked on during my PhD. It's called Apatosaurus. The short story behind the name is that it sounds like apparatus. In textual criticism, when you are displaying and visualizing variant readings to

20:24

a base text, that form of visualizing it is a critical apparatus. A critical apparatus is a a pretty boring website name, but Apatosaurus dinosaurs might make textual criticism sound fun.

Michael Kennedy

20:37

Yeah, I do love dinosaurs. No, that's really cool. So this, this comes out as a web app. And I know you also have some, you talked about some desktop apps as well.

David Flood

20:46

Yep. Yep. That's right. So, yeah. So, so there's this people, people upload their, their collation to this and then they can visualize it. And like there, there's a public component of this as well, but really the backend is editing, editing a collation, and adding notes to all of the different readings and stuff. So I could show what the backend looks like, but we can also move on.

Michael Kennedy

21:08

- Let's move on just because most people will not totally hear, but just give us a sense of like, like what do people, what do you create for people so that they're like, yeah, I can use this app, right? Like give us a sense of some of the features, I guess is what I'm getting to.

David Flood

21:25

- Yeah, so another good example is we have a project at Harvard called Mapping Color in History. And this is a collaboration with a lab. This lab brings in pieces of artwork and they do spectral analysis on the pigments so they can identify what was used to make a particular color of this red or what was made to make this color of blue. And then the idea is tracking how did people make those pigments over time, over time and specifically in Asian art.

Michael Kennedy

22:02

Is this the Dharmra, Puna, Puna?

David Flood

22:05

No, this is mapping color in history. I don't think it's up here. Sorry about that.

Michael Kennedy

22:10

Somewhere. That's all right. I'll find it. Keep talking.

David Flood

22:13

Okay. So the front end is great. You know, like the public end, this is people can explore by pigments and then see the images that contain those pigments. Now in the back end, what the researchers will be able to do is correlate exactly which point of a painting the analysis was done on. So they have this deep zoom image viewer where they'll zoom in and they'll select the point where that was taken from.

22:41

So how else would you do that other than a digital interface to indicate on an image of a painting where that spectral analysis was performed?

Michael Kennedy

22:52

Sounds almost like astronomy in a weird way. Oh, yeah. We zoomed into here and we took a different spectrum of the painting and we realized that it's actually identical to this, you know, something crazy like that, right?

David Flood

23:04

Yeah, yeah, yeah, that's right. Yeah, so it's essentially a pigments, like a pigments database.

Michael Kennedy

23:10

So the third category of these digital humanities projects that you put down was like data extraction, transformation. In data science, they often say, you know, 80% of the work is the data wrangling, which is like cleaning, organization, just getting it so you could possibly start asking questions about it. I'm sure you all do a lot of that.

David Flood

23:31

Absolutely. So often, the very beginning of a project might be an Excel sheet or several spreadsheets. And the first task is to ingest these into, you know, a proper database. Not so much MongoDB for us. It's going into Postgres. We're Django Shop. We're Django Shop. So it's going into Postgres. And yeah, no, that is probably the number one challenge of the early stage is figuring out what the right data model is, what the right relationships are to model the data.

24:07

Doing that work is advantageous to everybody because, you know, it helps both the researchers who brought the data to think about it in a more organized way. I mean, they've been trying to do that. And they have the spreadsheets. But now we're modeling out the data so that we can add it to database tables and then to use later. So that works out well for everybody. And yeah, absolutely.

24:31

Cleaning the data, getting dates, working with fuzzy dates, being able to parse July of 2020 or summer of 2020 and handling kind of all of those cases so that we do get dates in the end.

Michael Kennedy

24:45

One of the crazy stories from data parsing history is one of the, I can't remember exactly what it was, you talked about biology tools or genetics tools earlier. One of the groups that names genes had to change the name of a gene because it kept getting parsed by Excel into a date. Yeah, I remember that. I remember that. That's right. Yes. So these are the weird edge cases I'm sure you run into. Like it's not even supposed to be a date. Why is this a date? I don't know.

25:14

Why is it helping out here? The code keeps crashing. Like pandas parsed it as a date and it's not or whatever.

David Flood

25:21

Absolutely. Yeah. Yeah. So yeah, usually lots of test suites around that ingest process until we've got it. Now, once we've got it in, usually the research is ongoing and then we're able to provide them now a new cleaned interface to do the additional data entry as the project is going. And that's usually a win-win for everybody.

Michael Kennedy

25:40

Sure. And so this sort of ETL ingestion side of everything is it's like, don't worry, Darth has got it for you. And then we'll provide you like a database connection to start working. Or do you give them the tools and then they kind of iterate on them? And how much is this you and how much is this you providing like CLI tools and stuff or notebooks over to people?

David Flood

26:03

I'd say most of the people that we're working with are aware of the technical tools, but they don't want a database connection. So we are giving them, we're doing the ingest and then building a platform where they can begin interacting with their data.

Michael Kennedy

26:17

Yeah, I'm sure they don't want one. Maybe you give them an app though, right? With like Elasticsearch and other things that they can.

David Flood

26:25

No, absolutely. Yeah, that's what we do. Yeah, okay. Yeah, we give them a web platform to begin exploring, to begin publishing.

Michael Kennedy

26:34

So I was thinking that you said you're a Django shop, which is cool. It sounds, though, to me like describing what you're doing, just imagining how this is. You're probably creating these projects often. How often does one of these projects actually last? Or how many of them do you iterate? I'm trying to get a sense. Do you work on stuff for a year or is it like every two weeks we're on a new project?

David Flood

26:58

It's why I think of us as like an agency. Because we get to work on greenfield projects fairly often, like you're imagining. Which would not be the case normally at a big university IT department. So, you know, maybe two or three projects a year, two or three big ones a year. And then we have to put to bed a few a year as well. Because these things, they're funded with grant money. And then the grant money runs out and it's time. And then we have to figure out what do we do with it now?

27:26

We don't want to lose the data and this way of presenting it. But we can't keep paying for Elasticsearch.

Michael Kennedy

27:33

Yeah, of course. I'm certainly, we're going to dive into that because that is, but let's save that for the end. It seems like that's the arc of the story of these things. But I certainly think it's something that you don't think about that much, right? Like you said, it was only a hundred dollars a month for this. And we got a big grant. There's a bunch of, no big deal.

27:49

But like when the grant's out, who's on the hook for a hundred dollars a month and making sure it survives upgrades and all that kind of business.

David Flood

27:56

No, that's right.

Michael Kennedy

27:57

Yeah. So my original question when I started on this path was thinking like, do you, how do you get started on these? Do you have like a big framework or a cookie cutter sort of thing or something like this is how we do it because it plugs into all this other automation and tools we built for the last 10 projects. You know, that's kind of a unique position. A lot of companies build one website for themselves and that's their app or they're an agency that goes across so many, so much variation.

28:21

They can't do that kind of stuff. Right.

David Flood

28:23

That's right. That's right. That's a good question. We have things that we reuse. Some of them are open source, different search components and things that we maintain that we'll use across projects. And we have tried to do the cookie cutter Django project. The truth is, each project is different enough that really we like to evaluate it from first principles as we're evaluating it and thinking, what is the best technology to use? Yeah. Yeah. So yeah, we don't have a cookie cutter.

28:59

We don't have a kind of a meta framework for bootstrapping them because they're sufficiently different from each other that we... I find that too.

Michael Kennedy

29:07

I find that too. The idea of how we could just grab this cookie cutter or copier. Are you familiar with copier? People out there might be familiar with that. It's a little bit like cookie cutter with the bonus that you can update it later if you change your mind about something, like actually change this project to use Postgres rather

29:24

than SQLite or something, which is pretty cool. But every time that I do, every time I try to work with one of those projects, even ones that I've created for myself, I'm not, I hate not anyone. I'm like, oh, it's like 75% awesome and 25%. I just got to take this stuff out. You know, I'll just, I'll just do it from scratch. It's not, how hard is this? I'll just create a few folders and put a few things in there and I'll copy the one, like the pyproject.tom or like the one thing

29:48

that's like, how do I do this again? I'll just copy that and we're good to go. Yeah. I mean,

David Flood

29:52

That's what I find. That's what I find. I find it, it seems like a really brilliant idea, but in practice, it hasn't saved us time yet.

Michael Kennedy

30:00

No, I mean, maybe it's a case study. Like, okay, let's see what they're doing for this one. Oh, that is interesting how they're integrating this other thing maybe, but as a true foundation, I find it in theory awesome. In practice, I just end up not doing it for various reasons. Don't know why. I'm gonna save this for later. Because the question I'm about to ask you is gonna send us just down a rat hole.

30:21

So instead, before we go down the rat hole, maybe we could, not that one, maybe we could talk about, I mean, you talked about some, but let's maybe just feature some of the projects that are maybe more well-known that you guys have done.

David Flood

30:35

Sure. Yeah, good. So yeah, one of them is called the Amendments Project. And this is, I didn't know this until I started working on this project, that there are, there There have been thousands of, I think it's 22, at least 22,000 proposed amendments to the United States Constitution that never went anywhere. And so kind of the goal of this project is to show that there have been lots of attempts to amend the Constitution, but actually the Constitution is frozen.

31:06

I mean, it's not actually amendable anymore, at least not in the politics of any time recently. So this is a database.

Michael Kennedy

31:14

I cannot imagine a situation where the U.S. Constitution gets amended. It has to be unanimous across all the states, right? Is that right? I can't remember. I don't know.

David Flood

31:23

I remember off the top of my head if it has to be unanimous, but it certainly has to be across party lines.

Michael Kennedy

31:28

Yeah, it's got to be pretty darn close if it's not at all. It's like time travel or travel to speed of light. Could be theoretically possible. Probably not going to happen.

David Flood

31:40

No, it's hard to see. It's hard to see. Yeah. So this is from a historian at Harvard. And so it's a database of all and the full text from all of these amendments. And, you know, it's from the public's point of view, it's a Postgres full text vector search interface for finding and filtering through on all of the different amendments that have been proposed. I love it.

Michael Kennedy

32:08

Yeah, this is a nice looking site.

David Flood

32:10

We work with a designer.

Michael Kennedy

32:12

she's very good yeah of course like an agency would right yep yep nice so we'll get a really pretty rich search interface and then off you go I have no idea even

David Flood

32:22

what I would search for but yeah well you can always search for something religious something abortion related there's gonna be lots of things there I

Michael Kennedy

32:29

thought all those also like guns but like I don't want to go down I'm not sure I even want to go down there right awesome though this looks super useful maybe someday we'll have a functional government again we'll see let's let's change it or maybe we'll go down and it's folklore like look at you so all right so yeah so another

David Flood

32:45

really great uh project at least from a content point of view uh that's interesting um the research that it's doing um is the fin folklore database um which so in in in celtic storytelling you know um moms have been telling and telling stories to daughters and and and and people have been telling stories for a very long time hundreds or a thousand years about um finn mcummel who is a hero a hero from irish mythology some of it some of it based in you know historical events but it

33:21

goes back it goes back so far um so there are there's many hundreds or thousands of of of these stories that have been spread and versions of these stories that have that have been told and And so some of them are audio recordings where somebody like some researcher has gone out to an island off the coast of Scotland and recorded somebody telling their version of the hero of Finn and his band of heroes. You know, they defend Scotland and Ireland from invaders and attackers.

33:53

Very exciting stories and stuff and a team of characters. So there's audio recordings and then there's documents, like written documents that contain these. And so this is a database of kind of all of those all in one place with, on the public side, a nice search interface for discovering them, you know, either using the map view or searching.

Michael Kennedy

34:18

Yeah, that's cool. I got my map view for some random thing I searched about here. Amazing. But this is pretty interesting, all these different tellings and stuff.

David Flood

34:26

Oh, and yeah, one of the big challenges with this project is that it's fully internationalized. So it's available in English. Everything is available in English, Scottish Gaelic, and Irish Gaelic, but that extends into the database. So usually people have multiple names recorded for them. And so, yeah, you may have one person with any number of names in different languages, sometimes more than one Scottish name, that kind of thing. And so the data model on this one is quite messy, but sensible.

35:00

But yeah, it's quite a lot of different kinds of data to wrangle. And then with all of the translations for each thing.

Michael Kennedy

35:05

Yeah, that's wild. It's not just, we need the user interface of this thing to translate about. That's way more, right?

David Flood

35:13

Yeah, yeah, it is that. It is that. And then it is also, yes, all the items in the database have a translation or can.

Michael Kennedy

35:22

This portion of Talk Python To Me is brought to you by us. I'm thrilled to announce a brand new app built for developers created by yours truly. It's called Command Book. You know that thing you do every morning? Open up six terminal tabs, CD into this directory, activate that virtual environment, run the server with --reload. Now, CD somewhere else, start the background worker, another tab for Docker, another one to tail production logs. Every tab just says Python, Python, Python, Docker tail.

35:50

and you're clicking through them going, which Python was that again? Where my app is running? Then sometime later, your dev server silently dies because it tried to reload while you're in the middle of a code edit, unmatched brace, a half-written import or something. Now you're hunting through tabs to figure out which process crashed and how to restart it. My app, CommandBook, gives all of these long-running commands a permanent home.

36:13

You save a command once, the working directory, the environment, pre-commands like git pull, and from then on, you just click run. You can even group commands together to start and stop everything for a project with a single click. It also has what I call honey badger mode, auto restart on crash. So when your dev server goes down mid-reload, command book just brings it right back up and does so over and over until the code is fixed.

36:37

It also detects URLs from your output so you're never scrolling through thousands of lines of logs just to figure out how to reopen your web app. And it shows you uptime, memory usage, and all sorts of cool things about your process. The whole thing is a native macOS app. No Electron, no Chromium, just 21 megs. And it comes with a full CLI. So anything you've configured in the UI, you can fire off from your terminal with just a single command.

37:00

Right now it's macOS only, but if there's enough interest, I'll build a Windows version too. So let me know. Please check it out at talkpython.fm slash command book app. Download it for free, level up your developer workflow. The link is in your podcast player show notes. That's talkpython.fm/command book. I really hope you enjoy this new app that I built. You want to work in the native language of the people who did that part of the folklore or whatever, right?

David Flood

37:27

Yeah, well, and people are still speaking those languages. So people who would use this to, you know, like somebody may have heard a story from their mom or dad and are now would like to find other versions of that story. And they live in a part of Scotland where they speak Scottish Gaelic as their first language. They can still access the site.

Michael Kennedy

37:43

And then that mapping color history one, that's another one of the public ones that you said is pretty major.

David Flood

37:49

Yeah, that's right. Yeah. So, yeah, that's a pigments database. You can search by either English color names like blue and find all of these Asian paintings that have blue or a particular kind of pigment of how they made the blue.

Michael Kennedy

38:04

Yeah, nice. So what's the open source story? You're creating all these apps, maybe some of these frameworks. There's got to be some tools. Is there a big desire or already an effort to have a lot of these things open source or is it too niche or is it just like this is the advantage of Harvard has is other universities don't get this?

David Flood

38:27

No, it's something we talk about quite a bit. Usually these things start, usually they start closed source during development. And then we work with the faculty and we talk about how we can take, you know, like the repo for the web app, how we can take that public. And so we've done that for a number of projects. Not all of them are. But the ideal is that they all make their way into the open, and especially when they become archived.

Michael Kennedy

38:56

Sure. Yeah, that's a good way to help them live on. And they might even go into GitHub's Arctic Vault, which is crazy. I don't know if people know about that out there, but GitHub has, quite a while ago, started taking copies of all of the repos and backing them up and storing them in the Arctic vault. It's kind of cool. I really, really, really hope we never need that, but it's kind of neat.

David Flood

39:18

Yeah, me too. Usually universities have their own archival system, so any important research data is usually part of that system as well.

Michael Kennedy

39:30

I see. Okay. Yeah. Obviously, right? Like I'm just, I can't remember where it was. It was somewhere, I think it was South Korea or Taiwan where like seven years of government data got lost or something like that. It was really, really bad recently. There was a fire and I think they had backups, but maybe just into the building, you know, like we'll put that out. We'll back it up to the hard drive over here. Not good. No, not good. You definitely want this stuff to survive.

39:54

I mean, academia has this history of like tomes that have survived the past and really, really long lived information. Right. besides the Library of Alexandria or something like that, maybe.

David Flood

40:05

That's what we want. That's what we want.

Michael Kennedy

40:07

We want it to, yeah, we want it to last. Absolutely. So maybe that's a good time to sort of talk about the trailing end. I think there's a lot of interesting things going on here. Just like you've run out of money, not because you actually run out of money. The grant is done and you've either spent or given back or whatever with the remaining little bits of money. It's always a weird balance with research. It's like, oh, we got $3,000 left on this research grant. What are we going to do with it?

40:34

It's not like, oh, we're going to give it back. We just didn't need it. It's like, we're going to find a way to like fund a student to do a little more work or whatever. But eventually the grant is over. That's right. You've got some expensive app access to a big database because it needs a big search or a lot of compute or something.

David Flood

40:50

That's right. Everything during, like, I mean, anything, anything that's a, that's a Django app. We deploy to AWS using containers, which isn't the cheapest way to host anything. But that's for the most part the Harvard way. And it is robust and is reliable. And we don't have a DevOps person on call on the weekend to rescue one of these apps. So having them reliable is good. Okay, so it's on AWS and paying for the containers, paying for that Elasticsearch cluster, the RDS Postgres database.

41:36

Okay, well, even if somebody wants to start paying for that out-of-pocket, all of those little services, they add up to enough that we need to do something when the project hits end of life. And so our gold standard that we've developed so far is asking, can this become a static website? Can we bake this out into all HTML files and acknowledge that there will be some trade-offs? We will trade off some searching. You know, it's not gonna have Elasticsearch.

42:06

Doesn't mean that it won't have any search though. So we'll trade out Elasticsearch and it'll be very difficult to add new data, but that's okay because it's being archived. So can we get it into a static site? And that's challenging depending on how you've set it up. So we now have projects where we set them up from the beginning to be archivable like this. And one of them is called Water Stories. And it was a companion to an art installation at the Radcliffe Institute on the Harvard campus.

42:36

And so this was this live site during the duration of the art installation where people could come in and add stories that they had about water onto an iPad. And then those went up to our database. we built that with something called Django bakery which if you opt in and you use all of their class-based views the way that they're meant to be used then you can bake this out into static files when you're done very low effort that was perfect that is such a cool idea and mad props to them for

Michael Kennedy

43:05

ASCII art logos come on now I feel like that should be in the view source if it's not but this is such a cool idea because you can you can just take a working site you guys are a Django shop. So you have a lot of your sites are written in Django and you just go make it static, right?

David Flood

43:22

Essentially. Yes. And, and what's, what's, what's really great about it is if they wanted to make a change and they have, they have asked since we, since we made it static, they've asked for a couple of changes. So locally, I just Docker compose up this whole application, make the change in the Django admin and rebake the site. And so it's, it can still be updated. Something,

Michael Kennedy

43:42

if you've never tried this, like something like, Hey, can we just add one more menu item? And you're like, no, no, no, we're not adding the menu item because you want that. That means we're changing 7,300 pages because they all bake in the whole HTML. Right?

David Flood

43:56

Exactly. Yeah, exactly. But if that's in my, in my Django database and my SQLite file, then no problem at all because then I just rebake it.

Michael Kennedy

44:04

Yeah, yeah, exactly. Absolutely. So I think this is super neat. There's also frozen, frozen flask. If I could get rid of all the ads, I do not need a Yeti thing, whatever that is. the glass, not the mythical thing, but frozen flask, which does a similar thing for flask apps. If you're a flask person probably would work with court. Don't know for sure, but probably. So that's a pretty interesting idea as well. throw that in there. but also what else?

44:37

Also you talked about search, right? That can be, can be such a problem. And I'm a huge fan of your recommendation here with a page find. Tell us about page find. So this has been, I think it's been a

David Flood

44:50

bit of a game changer in how functional one of these archived sites can remain. So we're actually in the process of that amendments website that searches across 22,000 full texts of amendments. We are in the process of sunsetting that, and that will become a static site. And for that search, we already have an internal demo that proves that we can replace that Postgres full search

Michael Kennedy

45:16

with PageFind. You lose vector search. Yeah. You've kind of got to get really true keyword matching. Yeah. Yeah, that's right. But you still get filtering. I mean,

David Flood

45:27

and really faceting and filtering is when it comes to discovery of things, I mean, I find that's really what's useful. So filtering these amendments by state or by the Congress that was active at the time or by the person who co-wrote it. All of those are totally great in PageFind. And the keyword search is just fine in PageFind. One of the things I really like about it is that it takes your index and it chops it up into lots of little files that can just fly across the

46:00

network. So it's a very fast search. It's not a huge network load, even if your index is initially very large. And it essentially cuts it up somewhat alphabetically. So if your search starts with T, or I should say a better word for audio, if it starts with W, then it will load up the index for words that start with W and fly that over the network instead of the whole thing. So it's pretty slick and it has a great Python API.

46:29

So to do the proof of concept for the amendments search, I just took a database dump and then manually indexed with a Python script into PageFind.

Michael Kennedy

46:40

Wait, there's a Python API for PageFind?

David Flood

46:43

Yeah. So the way PageFind works, I should have said that, is the way most people will use it is by normally PageFind consumes HTML. So you give it access to your dist folder.

Michael Kennedy

46:56

Oh, okay.

David Flood

46:57

And then it crawls through all of your HTML files. And you can do great things like adding little HTML tags that are just for PageFind, that give it the filtering ability, or that you want to sort by something. And so that's great. Or you can just call PageFind from Python or from TypeScript and just build that index manually. Well, thanks a lot, David.

Michael Kennedy

47:19

I have another thing I've got to go research. This is awesome. I'm a huge fan of PageFind, as I said. on my personal website, mkennedy.codes, is just a pure stat. It starts in Markdown and ends up in HTML. But if you add page find in, you get a super rich, if you want to just know, you want to talk about, like what was about Docker, it shows you really nice results, pulling out the different parts of the page and sections that talk about it, like the headers and then what is said.

47:45

And it even does like sub, sub word, you know, like you just type doc, it finds all the words that match that. And what I really like about it is a couple of things it's instant. It basically is like nearly instant. If you type a few things, it gets way faster because it's pulling down. And if you go and look in the network console here and you type something, you can see that it's actually pulling in these little tiny fragments, which this one's

48:10

coming off disk cache in three milliseconds, right? But it breaks your index into a bunch of very small page find fragments that I think it's like, it starts with anything that starts with the word DO. These are all the prebuilt results and stuff like that. Right. That's right. That's right. Yeah. That's super cool.

David Flood

48:27

Yeah. One of our open source projects that, that we maintain is a view of a view JS component library for page find so that we can style it and reuse it across different projects. Oh, that's awesome. I love it.

Michael Kennedy

48:42

Yeah. I think this really unlocks it. And I mean, you go to so many, so many sites, like their documentation or just their web app in the search is so bad. You type something and it's like thinking, spinning, spinning, spinning, spinning. And then like five seconds later, it gives you kind of janky results. And if you just like throw a page find in there, it's, you can't type fast enough to outrun the results. You know what I mean?

David Flood

49:06

No, that's right. Yeah. Too many static site search solutions, they use like a, like a JSON blob that you, that you have to pull down and, and then iterate through.

Michael Kennedy

49:15

You know, what's worse. and I see this a lot, would be if you go to google.com and then you would say effectively site colon whatever and then you search Docker, right? They basically pull that. You know, they just say search this and you just get Google results for your site. And obviously it's, I mean, Google's fine, but it's just.

David Flood

49:36

No, I find that unusable, really.

Michael Kennedy

49:38

I do too. It really, you're like, ah, geez. But now I'm super excited to realize I can do that from my dynamic content as well. So with the Python integration. OK, nice. What about something truly static? Have you looked at Hugo and some of the other type of things?

David Flood

49:56

Sure. So when I see you've even got the tab up for the SUMEB project, which is-- that's essentially a database of many, many specimens taken from the SUMEB mine. So in the-- Oh, it is. Yeah, yeah, it is. So if you click on Minerals database, you open up that search interface and that's powered by PageFind. Oh, this is? Yes. I forget what I was... I see. You guys even hooked into...

Michael Kennedy

50:26

I was thinking just like pure static, like Hugo, like...

David Flood

50:30

Oh, yes. Yes. Yes. So this is an Astro site. So for this website, we have this as an Astro site so that we have a little... Because with Astro, they make it so easy to pull in like view components. So like our page find is a custom view JS component library with Astro. You can use React components, you can use the view components, but what it does is it's just

Michael Kennedy

50:52

a static site generator. Fantastic. So a little bit more designable than like Hugo or something. Here's your markdown file. Good luck with that.

David Flood

51:00

Yeah. I love Hugo though. Yeah. I use Hugo for different personal sites here and there, and it's just so fast and easy to get up and running. But yeah, it's great.

Michael Kennedy

51:08

- Great, great when it's a good friend. - That's what my website's written in, it's in Hugo. But if I'm integrating with anything else, I used to kind of like split it up, like this part's Hugo and this part's like a Python app. And it's pretty easy to get something that'll take a bunch of markdown files and just turn them into HTML and just put a page template around that. So I've kind of stepped away from mixing and matching that as much as I used to.

51:30

So now if I got a static section of a dynamic site, but that doesn't address, has nothing to do with the archival side of things, right? Because the idea is that the thing that I'm describing is gone on purpose.

David Flood

51:42

That's right.

Michael Kennedy

51:42

So you've got some, we've got Django Bakery. I threw out Frozen Flask, and I'm sure there's a ton more that neither of us are aware of at the moment.

David Flood

51:52

So Django Bakery was really good for that purpose. And we're keeping our eyes open for projects that it's a good fit for. But that was a pretty simple website. It needed a dynamic backend, but it was quite straightforward. And for Django Bakery, you have to opt into inheriting from their class-based views. I see. So if you're doing, for example--

Michael Kennedy

52:13

You've got to dig ahead of it, yeah.

David Flood

52:15

Yeah, yeah, yeah, absolutely. Yeah, hard to add retroactively. Probably impossible. Now, our other websites, like the fin example and the mapping color example, those are APIs. That's a Django API, Django REST framework for one, GraphQL for the other. One has a view front end, one has a React front end. OK, well, Django Bakery just isn't isn't going to work very well for like serializing JSON.

Michael Kennedy

52:39

Yeah, it's like awesome. Here's your unrendered JavaScript front end code and it's just going to look empty or something.

David Flood

52:45

Yeah. So it is a good reason to consider using like vanilla Django templates when possible, like for that reason. But those were, those were inherited from the vendors, those two sites. And we've made a lot of progress on those. So, you know, what, what to do in that, like in that situation, Django Bakery isn't an option. And those projects are not end of life yet. So we have some time, but we're, we're, we're, so what we're doing is strategizing, okay,

53:15

how will we rescue them? How will we keep them alive once, once somebody needs to stop paying for hosting? And we have, we have ideas. We have, I think there's, there's clever, interesting

Michael Kennedy

53:26

things out there. We'll have to keep looking into it. There are some pretty interesting ideas. And that ran in a container, you could just have WebAssembly, but still have it go, right? Sort of a local loopback type of thing.

David Flood

53:43

Yeah, I'm really interested in this one because it enables essentially the full functionality of the live site to exist as what is just a static site. So because of Pyodide and projects like PyScript, we can run Python in the browser and we can run SQLite in the browser. And now we can even run Postgres in the browser with PG Lite. So if we can run all those things in the browser, then couldn't we have Django hosted right in the browser?

54:15

And you can. So there's a proof of concept that proves it's possible called Django WebAssembly. And if you load this up, it'll let you log in to the Django admin. And you're not logging into anybody's backend, you're logging into your own browser where this is running in a service worker.

Michael Kennedy

54:36

Awesome. Look at that. Oh, hold on. I told me what the password was. Very secure. Matt, password.

David Flood

54:42

Well, it can be entirely insecure because, yeah, you're just, it's running right in your own browser.

Michael Kennedy

54:47

Yeah, that's awesome. And here we are, Django admin. Incredible.

David Flood

54:50

Yeah, so I'm pretty interested in this. You've got to convert an RDS Postgres database into either SQLite or something like PGLite, but I think that's all doable. So I think it's an exciting possibility.

Michael Kennedy

55:02

Yeah, for sure. I do think, so maybe you have a rich query system that you're powering by your database that's really heavy.

David Flood

55:09

Exactly.

Michael Kennedy

55:10

And it's got a bunch of data that's like, here's all of our working data that you might ask questions about. Maybe you just convert that to page find to help you find the pieces and then just keep the operational data and maybe like even a SQLite with like the Django RRM, you can just switch the connection, keep talking to it. I mean, there's possibilities to just get something not too terrible Well, it's not the same, but not that far off.

David Flood

55:31

Yeah, exactly. And then it goes on GitHub pages and it can live hopefully forever. I mean, it feels like GitHub will last forever, but it'll last longer than funding will anyways.

Michael Kennedy

55:41

It's definitely going to last longer than just something that we can't pay for anymore, right? I don't know how long GitHub's going to be around for, I think a while, but you never know, right? It seems like stuff's going to last forever, then it gets changed. We had subversion. Now it's completely gone, right? Just 20 years, 15 years later, but still, I think 100% there.

David Flood

56:05

Yeah. But if somebody can, if something ever happened, somebody just needs to copy that, that folder of HTML, CSS and JavaScript files and dump it into an S3 bucket or somewhere else. And then it can continue living there. So it's a good option.

Michael Kennedy

56:19

It's a great option. It's a really, really good option. I mean, I guess one of the long-term concerns might be what if the WebAssembly standard changes so much that it's not supported anymore? But you could probably bite-wise convert it if you had to, you know, like somebody would probably be able to create one.

David Flood

56:37

Yeah, that would be unfortunate. So I suppose if that happens, I mean, if that happens, yeah, we're booting up one of these projects is like booting up an emulator for some old DOS game.

Michael Kennedy

56:49

Right, right. Well, I mean, I guess let's think about this for a second. Somebody got, oh gosh, what was the chain? This is the whole, JavaScript, the PyCon talk where got like Firefox compiled into, not WASM, into, ASM JS or something like that. So it was run like Chrome was running Firefox, which was running, I think doom, which was also ASM JS. If we can do that, we could get something that would run, that would read old Web Assembly into new WebAssembly if it really mattered to the world.

57:24

Absolutely. Yeah.

David Flood

57:26

Especially if it's in a public repo that people who care about the data can, can rescue it somehow.

Michael Kennedy

57:31

Yeah. What about like a virtual machine? You know, I agree. Yeah, absolutely. Could have saved me some, take a snapshot of Ubuntu LTS, some version, and just what are we going to do?

David Flood

57:44

Everything we do is Dockerized. Everything is in a container. So in the worst case scenario, we could give somebody the image, and they could run it if they have Docker. I think that's a nice peace of mind to know that no matter what, something will be able to run this container. And even in, I don't know if you've used GitHub, what is it called, Codespaces. I archived one project. It was kind of dramatic and sudden that it needed to be archived, so without much time to do anything.

58:13

And it was a Ruby on Rails project. And I'm not a Rails developer, but I was able to get it archived in a way that anybody could, with one command, go to the repo on GitHub and boot it up in Codespaces and then have it live running from their Codespace.

Michael Kennedy

58:30

And so that works too. Very cool. I think as WebAssembly grows, there'll be more possibilities for these types of things. Yeah, amazing. I'm pretty excited about PageFind having a Python API. didn't realize that. So I'm going to be doing something with that for sure. So what else? Let me ask you one more thing before I kind of let you wrap up with some final thoughts here.

David Flood

58:51

What about AI? Oh, that's a good question. So AI, I mean, there's like, in my story, there's like one interesting part of AI, which is that I got started and self-learned everything I needed to about software development to begin doing this right before ChatGPT really came on

Michael Kennedy

59:10

was able to do real programming yeah you're like four years of legit programming before right so i

David Flood

59:17

think i mean so i was thinking i was thinking when i was thinking about how i got into it i thought what if i was four years later starting my phd and wanting to do these tools um i would have been able to accomplish what i needed to for my research without acquiring the technical skills and that

Michael Kennedy

59:34

would have been that's a good thing i'm not sure if that's good about it it could be both i would

David Flood

59:37

would have thought it was a good thing. I would have thought it's a good thing. But in my hands now, like a software engineer, AI is more powerful in my hands now than it would have been then. So I can make it work for me. Yeah, I can make it work for me in a way that I couldn't have been able to then. So I'm thankful for that, but it's something I think of. I don't want to say it's necessarily a bad thing, but it definitely marks a difference, a difference in time between other

01:00:07

people who are maybe wanting to get into digital humanities, they're humanities researchers. They want to add some digital tools. You know, I think this will kind of, this will probably knock people off of the more technical path because it's not needed. I think it will too. And I think that that

Michael Kennedy

01:00:22

might be a negative. When you were telling me your story originally, I was thinking kind of like, how neat is it that you didn't sign up for, and the people you're working with probably didn't intend to sign you up for learning true software development. But look at this cool and interesting job that you now have that you never would have imagined.

01:00:42

I'm sure when you signed up for your PhD, you're like, you know what I'm going to do when I get my PhD, I'm going to go X, Y, like, I'm going to join the Darth program. Like, no, probably not. Right. But here you are. And I think that's actually a really interesting knock on effect for a lot of researchers and people in grad schools, they're kind of put into this programming adjacent type of thing. You know, and a lot of folks sort of are like, actually, that's pretty interesting.

01:01:04

I'm going to kind of lean into that. And I think AI might knock, like you said, knock people off that path to some degree.

David Flood

01:01:11

Yeah, yeah, definitely. So that's just like one part of the AI story. The other one is that, like how we use it. It's great for data extraction, pulling data out of different, you know, to make these search interfaces more powerful, to extract different data from them. That's just one example where it's been handy. We're looking for ways that it can really empower faculty. We're still very much in the exploration phase of how we can use it and provide it to faculty as a digital humanities tool.

Michael Kennedy

01:01:48

Sure. I was thinking pretty much when I asked the question of it, it's just like two parts. One, how is it? Are you guys using it to help take projects? Well, that would have been a month. No, actually, it's three days. You know what I mean? that. And then if people are asking, you know, a professor comes along and says, and we want our own custom AI thing, or we're using Harvard's internal one that we're allowed to use, but we

David Flood

01:02:13

won't be able to use it once the grant runs out. You know what I mean? Yeah. Yeah. I think one, one good example of this type of thing is that what we're starting to get is faculty who are vibe coding and now, and we are going to teach them. We're going to teach them how to do it. You know, instead of having them. Yeah, it's absolutely a skill. Yeah, no, it is. It is. Instead of copy and pasting from ChatGPT into VS Code, having them learn Copilot, maybe even having them download Cursor.

01:02:43

Download some real dedicated tools to get this done to make them more productive. So, yeah, educating about how to do it is one thing. You asked if we're using it. We have access to Copilot. and that's great. I can't say that we've shipped anything in three days instead of a month yet, but one anecdote is that right now I'm doing some really interesting processing of music audio files, and somebody asked to have a beatboxer if I could chop that file up so that all of the individual

01:03:19

sounds that the beatboxer makes are identified in a file. And so I'm using some music libraries, Python library called Librosa. There's some complicated math in there. It's a little bit too much for me. It's no problem for Claude. Claude knows how to do that math. And then, and I use my expertise to string it together to get a good output.

Michael Kennedy

01:03:39

Yeah. Awesome. You got time for one more quick question before we'll clap things up. For sure. Raymond out there, Raymond Yees asks, it says, it'd be good to hear how Harvard uses containers on AWS and its reliability. It's reliable, not cheapest way to host things. Are you thinking about moving moving that or is it not that much? Okay, I'll tell you about a failed experiment.

David Flood

01:04:03

We were using ECS and we're still using ECS. So that's AWS's main, you know, it's not Kubernetes, but it's one step down with their horizontal scaling container clusters. And I wanted to move us onto a single EC2 instance because our projects are popular, but they're not so popular that we actually have to worry about horizontal scaling.

Michael Kennedy

01:04:25

Right. It's not like it's front page in New York Times. I guess it probably could be. But even so, for the static sites, they probably still can take it.

David Flood

01:04:35

Yeah. So I priced it out and I got an example deployed, an example project deployed, and was able to confirm that it would indeed be much cheaper. And it was deployed in a similar way using AWS CDK. So it's all infrastructure is code all the way down. But it turns out there's all kinds of compliance.

01:04:54

When you are in charge of the VM at like a big university, or I'm sure any corporate setting, if you are in charge of the VM and the OS on it, then you have to know that you have the latest patches in. You have to know that you have latest Ubuntu. And then there's other things, different observability things that you have to have in place that are not usually required if you're running in a container cluster like ECS.

01:05:21

So it ends up being a lot less work and much easier to achieve compliance if we run containers or some other serverless thing. If I run all my personal projects, they all run in a single virtual machine, but we're running in containers.

Michael Kennedy

01:05:38

Yeah. Yeah. And you've got all the SOC 2 stuff and all those different things, right? Like there's layers. Yeah, that's right.

David Flood

01:05:44

Yeah. I mean, I'll mention that, but what I didn't say is that in that 2019, when I started learning Python. I discovered Talk Python almost immediately. And one of the first episodes that I listened to

Michael Kennedy

01:05:55

was the other digital humanities. Cornelius Van Litt. He was an awesome guest.

David Flood

01:06:01

That's right. Yeah. And I thought that was great. And that was also a bit about manuscripts, a little bit more on the image side than the text side. And I didn't understand everything that everybody was saying, but I just, I kept tuning in. And I think because of that, Because Talk Python was like this, you know, I've been remote working for most of my time. And Talk Python has been kind of like that conversation with the open source community that's been always in my ear.

01:06:28

And I think that made, you know, a difference, making me feel like I understood the software landscape and like the developer culture and what was going on. And then the different Python libraries and what was possible. So to people who are interested in taking things in a more technical direction, I think it's helpful just to find a few things like that, that give you an insight into that world.

01:06:53

And the more you listen to it, the more you start to hear the same acronyms and the same things said enough that you start to feel like, okay, now you're part of the club.

Michael Kennedy

01:07:03

I really appreciate that. That's cool. I've certainly had people reach out to me and say things that at first didn't make any sense to me. Like I've been listening for six weeks now and it's starting to make sense what you're talking about. Like, why have you been listening for six months when it made no sense? That's insane. But a lot of people use listening to the podcast, is it mine and others, as language immersion, right?

01:07:24

Like I could get Duolingo and I could learn Portuguese or I could move to Brazil for a month. You know what I mean? And then I would really learn. - Yeah, exactly. - Right.

David Flood

01:07:34

- Exactly. No, I think there's truth to that. And some of the things I did was, you know, search through, like search the word deployment, because I'm trying to get my head around how to deploy for the first time. And I just want to hear people talk about it. Like I could read about it. I could read the tutorial, but I just want to hear people talk about deployment to get a sense of what

Michael Kennedy

01:07:52

actual deployment sounds like. There's something really different when you're learning or trying, even you're maybe an experienced programmer, but not in this particular area to hear a human side of it, not just the docs, not a sterile. These are the four steps, but like, I love it. I mean, it's probably why I created the show. It's because I didn't hear those stories. We got to tell those stories. Awesome. I appreciate that. So super cool. All right.

01:08:16

So if other people are listening, maybe one of your pieces of advice is keep listening. You'll get there.

David Flood

01:08:22

Yeah. And if anybody is in the humanities and somehow found their way onto this episode with no technical experience, I just would give the caution of, like, you know, the anecdote that if AI coding had been around the way it is now when I was learning, I wouldn't be doing digital humanities at Harvard. I wouldn't have been able to get into this field. I wouldn't have known about it. So I guess just think about that when you're learning and applying new tools.

Michael Kennedy

01:08:52

I don't really know what the right fix for that is. That's a very challenging problem. I mean, you can say I'm just literally not going to fire it up. But I mean, we used to hunt through Stack Overflow and the web and over and over. And if you're really stuck or you really don't understand, like they're good at explaining stuff too. You just got to really stay in a learner's mindset, not just press the easy button and make this thing and move on. Easier said than done. Easier said than done.

01:09:15

So yeah, I want to leave this with kind of a thought about how much things like Python and these tools and technology can really empower stuff that you wouldn't think is even related, like understanding old manuscripts and how painting is connected or changed over time and stuff, right? Those sound very much disjointed from tech and software, but they really are

01:09:40

superpowers that you can bring to your work, whatever your industry is. I know our field of study, I know there's some sociologists out in the audience and I'm sure others as well. All right. Final thoughts, David, close it out. You said it great. I mean, you know,

David Flood

01:09:55

Just applying these technical tools to old questions, that is the core of digital humanities.

Michael Kennedy

01:10:02

When I first started hearing about this, I thought, I really don't know how this ties together. And after seeing it a few times, I definitely see the power of it. And I thank you for your time coming on. Thank you for sharing your look and the look inside of your team and inside of a small piece of Harvard. I really like these kinds of episodes because it's hard to see this from the outside, right?

01:10:23

like you just see the results, but you don't see like the inner workings of the team and the motivation and stuff. So thank you so much for being here. And yeah, bye everyone. This has been another episode of Talk Python To Me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Take some stress out of your life. Get notified immediately about errors and performance issues in your web or mobile applications with Sentry.

01:10:48

Just visit talkpython.fm/sentry and get started for free. Be sure to use our code, talkpython26. That's Talk Python, the numbers two, six, all one word. This episode is brought to you by CommandBook, a native macOS app that I built that gives long-running terminal commands a permanent home. No more juggling six terminal tabs every morning. Carefully craft a command once, run it forever with auto-restart, URL detection, and a full CLI. Download it for free at talkpython.fm/commandbook app.

01:11:19

If you or your team needs to learn Python, We have over 270 hours of beginner and advanced courses on topics ranging from complete beginners to async code, Flask, Django, HTML, and even LLMs. Best of all, there's no subscription in sight. Browse the catalog at talkpython.fm. And if you're not already subscribed to the show on your favorite podcast player, what are you waiting for? Just search for Python in your podcast player. We should be right at the top.

01:11:46

If you enjoy that geeky rap song, you can download the full track. The link is actually in your podcast blur show notes. This is your host, Michael Kennedy. Thank you so much for listening. I really appreciate it. I'll see you next time. I'm out.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript