"Building Tools for Research" with Galen Reich

00:03

You're listening to a stage talk titled Building Tools for Research. This week, we were joined by my colleague, Galen Riesch. Galen is behind many of the Bellingcat built open source tools available today and came in to share his advice on how to get started in tool development, whether you're a coder or someone passionate about accessibility. You can find links to all of the resources mentioned

00:27

in the talk in the podcast description. This talk was hosted by me, Charlotte Mar, on Thursday the 28th of August 2025 in the Bell and Cat Discord server. So welcome back to our Stage Talk series. This week I'm joined by my brilliant colleague Galen Riesch. Galen's name pops up in many conversations here in the server and within Bellingcat. I was actually surprised to hear that this is his first

01:00

stage talk. Here's our tech community lead and one of the many amazing people we have behind the scenes creating useful tools and visualizations for our investigations. Here's the mind behind tools that you might be using today, including tools that help you find things like the Shadow Finder tool, a tool that assists with locating a place based on a shadow. and collaboration tools that make working together easier like the search grid generator and the color highlighter.

01:27

You also may have seen his face in our YouTube tech series, walking you through how to use the command line, or have come across his guidance in our tech community spaces within this very server. Regardless of where you might have encountered Galen, today he is here to discuss the basics of his role, highlighting why you might want to get involved in tool building and what goes

01:47

into a useful tool for research. We'll explore some of Bellingcat's tools and highlight others, explain what ethical and security considerations you need to make when building a resource, and give you direction on where and how you can get involved. As we talk, please make sure to add your questions in the chat box via the message bubble icon in the top right corner of your screen, and please note within your question if you do not want me to read your username out. Again,

02:15

this is being audio recorded. So it's really important that you do that if you're not comfortable with me reading your Discord username. All right, Galen, go ahead. Tell us a little bit about tool building. Yeah, thank you so much, Charlie. It's a very generous introduction for the stage talk on building tools for research. I'm going to take a moment just to introduce myself a little bit more and to give you a bit of a sense of

02:43

my background and where I'm coming. where I'm coming at all of this tool development from. I'm an investigative technologist at Bellingcat. I spent a lot of time working with you wonderful folks in the community. My background is in engineering and research. Before my time at Bellingcat, I was at the BBC where I worked in their research

03:11

and development department. And before that, I was at the University of Birmingham where I was completing a PhD in radar, so in electrical engineering, which is a far cry from most of the work that I do now. So I'm definitely not a developer by training, although I do now develop a lot of tools and work with folks to develop

03:34

new investigative approaches. But really, I think For me, the common thread is development of ideas and finding the ideas from the kind of rich pool of thoughts and things that people tell you that could bubble up into a really valuable tool. So that's a little bit about me. What I want to talk about today then is like, why build a research tool? What makes a tool accessible?

04:06

And finally, what makes a tool usable? We'll talk a little bit about the difference in how I'm using the words accessible and usable there a little bit later on. But yeah, why build a tool? What makes it accessible and what makes it usable? The absolutely most important thing that I cannot overstate, tools must be free in

04:32

the open source research space. We did some research at Bellingcat a few years ago, some of my colleagues did, and one of the huge takeaways was that any tool that requires some sort of paid access is just like a no -go. Don't even think about it because many people don't have the resources to be able to spend on getting access to a tool on the off chance that it solves their problem.

04:58

So tools must be free. And I think that this is kind of a philosophy that fits really nicely in with the open source research philosophy in that with a Bellingcat article, our researchers will try and take you through the investigation and show you how the investigation has been done so that you too can replicate it if you want. And that's a little bit more difficult if they've used the super expensive tool. It doesn't mean that we don't use paid tools from time to time.

05:28

But when thinking about developing a new tool, we have to think that they've got to be free. Now, I have these three key strands that I find useful when thinking about a tool. That's the concept, the utility, and the execution. Is there a good concept? Can it meaningfully advance an investigation? Because if it can't, then the concept isn't very good. If I came up with a tool that increases all the text in a PDF by 5 % and turns every copy of the word into Comic

06:06

Sans, that would be a terrible tool. The concept is just bad. It doesn't matter how well the zero utility, and it doesn't matter how well I execute that tool, the concept just isn't there. Moving on to utility, does the tool have utility for people who might actually use it? Will they actually try to use the tool? Does it seem appealing? Does it seem to solve a need? And then execution,

06:34

is the tool developed well? When people try to use it, like of those people who think it has the utility for them, will they actually be able to use it and actually get the results they need? So these are the three strands that I think I'll unpack those in a bit more detail. So firstly, like, the concept, why build a research tool? So in my experience, there's a couple of strands of like places that tool ideas originate. One

07:05

is investigative needs. So this is where somebody who is working on an investigation says, hey, there's a problem here. And I think there's a opportunity for solving it through code development or through some other source of tool. Generally speaking, not always, but generally, these are professionally motivated. It's professionals working in investigative environments where they're encountering similar problems day in, day out.

07:34

And an example of this is with a recent article that went out, not that recent now, but on the One X Bet betting website. Please take a look at that article. I think it's a fantastic investigation. But as part of that, we needed to build a web scraper to go in and pull out a load of videos that were being streamed on this website. And so there, there's a technical need, and we needed to build some sort of research tool that would

08:06

fill that, that would address that need. And so I went away, worked with a colleague, and we built a scraping pipeline that could run away and grab all of the videos this site streamed over the course of the day. So that's motivated by investigative need and those videos were then, you know, used in the article and used in the subsequent investigation. The other angle is interesting ideas. This is a much more accessible route into building a research tool because it's

08:38

like conceptually intriguing. I'm sure many, many folks listening will have ideas or will have encountered things and thought, huh, that's curious. I wonder if I could do something with that. The downside is like, will anyone care? An example of this that is very close to my heart is the shadow finder tool that Charlie mentioned

09:01

in the introduction. So this is a tool that lets you take a photograph of a scene and by measuring the height of an object in the scene and the length of its shadow, if you know things like the time of day and the date, it's able to give you a restricted set of locations where that photo could have been taken. And if you have multiple photos from the same location, you're able to really narrow down the location quite precisely of where that photo could have been

09:36

taken. And this was a tool that came out of the Discord community. And when I built it, I thought, this is a really cool idea. unites a bunch of really cool things that matter to me, like the kind of physics of the sun and all of the kind of geometry that I spent some time doing in my kind of educational training. It really appealed to me as an idea, but I definitely had that concern, like, will anyone care? And crucially, like, will that tool ever be useful for anything? It

10:11

turns out it was. After a couple of months after putting it together and making it available to folks came the one expect story. And we actually used it to geolocate a really difficult, difficult one of these live streams, we were able to narrow it down to a city so that folks in the discord could have a look and more have a much more focused search on that area. So these are really the two driving factors. And oftentimes, people will

10:45

approach with interesting ideas. I think you really have to develop an understanding over time about which ideas will actually be useful in some future investigation. So you've got to have a good concept if you're planning on building a research tool. Right, utility or what makes a tool accessible. So when I use accessibility here, I'm not referring to accessibility in the sense of a website being colorblind friendly

11:20

or available for use with a screen reader. But rather, I mean the how much prerequisite rather how much prerequisite knowledge is required to access the tool. So I think of this as a ladder of development. So at the bottom of this ladder, you have the kind of messy scripts that you might write as part of an investigation or to test an idea out. This is where tools really start life typically, and they are the least accessible

11:55

and the lowest effort end of the ladder. So to use a tool at this stage of development, a user has to have Python installed on their machine. If it's a Python script, which for me it often is, they have to have some technical expertise and the confidence to edit the code that you've written for your own specific example. It's a great way to prototype a tool, but it will only ever find a narrow technical audience if you

12:22

try and publish that as a tool. The next step up, so adding a little bit more effort and making the tool a little bit more accessible, is what's called a library API. This is where you take that script and you give it a small like incremental improvement. So it's still only going to really be useful for people who are technically adept and familiar with the language we tend to write in Python. So I'll refer to tool being written

12:54

in Python primarily. It will still only be useful for folks who are technically proficient and confident. However, what the library API kind of lets you do is hide away all of the logic of your messy script so that if someone wants to tweak what you've written to suit their investigative need, you've already done the hard work for them. They don't need to edit your code. They can just use your code. So this is a really nice kind of step up. The next step on the ladder is a

13:29

command line interface. So it's a significant step up in terms of accessibility, but Research from the study that my colleagues conducted a few years ago shows that over 50 % of journalists and open source researchers are not comfortable with the command line. So even building a tool out to a level where it's usable on the command line, and for a bit of context, the command line is the part of the computer that you might see in movies where you see a hacker pulling up a

14:00

terminal and typing in code. That's the command line. It looks really scary. In reality, it's just a text -based interface to the computer. So a command line interface gives your tool the ability to be used by anyone who has a little bit of confidence on the command line. They don't have to go through the rigmarole of figuring out your API, figuring out your code. They can just use the tool that you've provided. But like I say, many folks are still not comfortable with

14:31

that. Coming towards the top of the ladder now, we have these kind of couple of user interfaces. So the first that we like to use are Jupyter Notebooks and Colab Notebooks. This is a really nice way to wrap up Python code in a user interface that is maybe slightly more accessible for folks

14:52

with no programming experience. It still takes a little bit of understanding, but in my experience working with people, if you send them a link to a notebook or either a Jupyter notebook or a Colab notebook, after a little bit of explanation, they're very comfortable using that tool there

15:14

on out. And there's a bunch of techniques for simplifying the interfaces on a Colab notebook that I recommend kind of investigating if you're interested in developing tools, because for me, This is really the level at which a tool becomes practically useful for people outside of a kind of technical bubble. So I really highly rate Jupyter Notebooks and I really highly rate Google

15:39

CoLab Notebooks as well. And finally, the kind of the last rung on the ladder, but in reality, it's quite a significant step up in terms of effort. is a full user interface, a full graphical user interface. This is one of the best ways to make a tool accessible to a large number of people, but it can really be a lot of extra development work. And if you're thinking about building a tool with a full interface, ask yourself, is it worth the effort? Is the target audience for

16:14

my tool? going to need the added convenience of a user interface, or would they be happy with a notebook? If you answer that they would be happy with a notebook, that's what you should do. Don't go the extra mile. Just deliver that tool that meets the needs of your users. And this is something that we try to think about. So, Bellingcat, we publish several command line interface tools. We also... publish a bunch of

16:44

research notebooks. So we have a repository on our GitHub which contains a bunch of research notebooks that let you perform a variety of tasks that we don't think merit full development of standalone tools, standalone user interfaces. So that's the ladder from a messy editable script at the bottom through a library API through a command line interface, through Jupyter or Colab Notebooks, and finally, a full user interface.

17:18

The further you go up, the more effort it is, but the far more accessible for the end user. At Bellingcat, we don't particularly advocate a front -end framework if you're thinking about building a user interface. We think that Streamlip seems like a pretty good option. And personally, I have done quite a lot of development in Vue, which is a JavaScript -based front -end framework that can be really good for building user interfaces.

17:52

Another important thing to consider when you're producing an interface is how are you going to make it secure and how are you going to host it, the whole tool? Because... If you've produced a tool, and I've told you earlier on it's important for it to be free, are you going to have to spend a load of money to host that tool so that other people can use it for free? And the answer really should be no. Building a user interface and hosting a user interface can be expensive, but it doesn't

18:24

have to be. Streamlit gives you access to a certain amount of compute per month, which means your users don't have to pay to use your tool. It's really powerful. Same with Google Colab notebooks. They don't have any cost to hosting. And if you build a website, I would encourage you to investigate what it means to build a website that is static. So it's one where you just put the files on the internet and all of the complicated stuff happens

18:57

in the user's browser. It's a really powerful technique for letting users do really complicated things. And it builds security into the process because you're not asking people to upload sensitive files to your server. You just ask people to load it into their own browser. And that's where all of the magic happens. So it's far more secure.

19:19

And ultimately, it's generally free to host the open source projects using tools like GitHub, which if you're developing open source research tools, and you're not familiar with already, I suggest you investigate GitHub and get more familiar with it. So the final section is execution.

19:42

So this doesn't refer to how accessible a tool is going to be and how attractive a tool is going to be to a potential user, but rather this is once somebody's opened your tool up, once they want to they've been convinced that this tool could be useful for them, what makes it actually usable? And really, with the exception of it being free, and by free I really do mean free, I can't hammer this home. Freemium isn't sufficient, not a free trial, it really has to be free to

20:17

get people's buy -in. But the three core principles are it should be simple, so the tool should do one thing really well. Of course, there are many amazing tools out in the world that are very complicated and do lots of complicated things. But my experience in the open source research space is that if you're developing a tool, make it simple. Do that one thing really, really well. Users really relate to tools that meet their need. And you're not trying to sell anything

20:54

to people. You're not trying to get an annual subscription. You don't have to promise the world. You can say, if you want a tool that puts a grid on a map, I've got a tool for you. If you want a tool that lets you search local council meetings, I've got a tool for you. If that's not what you need, these aren't the tools for you. And a shout out, I think, is appropriate at this point for the Bellingcat Toolkit. a fantastic resource for finding new tools that might meet your investigative

21:25

needs. So simple, does one thing well. The interface should be clear. You've got to put the user at the heart of the interface. And imagine that when you open up the tool, you have no idea what it's meant to do. And all of the information should be contained within the user interface and within the design of the tool. Coming from an engineering background, This isn't the most natural space for me. But I know that I previously had lots of fantastic colleagues at the BBC who

21:57

worked on user experience and design. And I feel I learned a lot from them and now I'm a strong advocate for really considering the user's experience and designing accordingly. And finally, if possible, the tool should be quick. This might mean you have to spend a little bit more time optimizing your code or designing your processing pipeline differently. But ideally, a user should be able to press the button and get results nearly instantly.

22:33

Because otherwise, if you press a button and see a loading bar or a spinning wheel of doom, it really erodes trust. folks will start to think, oh, have I done this right? Have I made a mistake? And in the world of these open source research tools, if a user makes a mistake that causes your tool to fail, you don't want that pathway to be very similar to using your tool normally.

23:02

You want them to get a really clear error message and a nice instruction to please contact you as the developer and let you know what they were trying to do when it went wrong. really inviting feedback and inviting development is such an important part of making an open source research tool usable. So yeah, to recap, I think you need that great concept. You need the idea to be a

23:30

really solid one. Put yourself in the shoes of an investigator if you don't work in that space and think, is what I'm offering valuable to them? Does it solve a problem that they either have or could conceivably have in the course of an investigation? Then think of how much effort is reasonable to put in to make this tool have utility because you don't want to be spending months and months and months developing a tool that nobody uses. Build something, put it into

24:01

the world and see what people think. And finally, when you're building it, when you're executing that tool, that tool development, make sure that you put the user at the heart and really design the experience so that the entire tool comes together for your user. And with all of those things, you'll build a fantastic research tool and people will love using it. Charlie, I think that leaves me to hand back to you. Yes, thank you so much, Galen, for that really clear and

24:36

knowledgeable. presentation. It's really amazing to see how you break down the parts of tool development. It's been really helpful actually for a number of people in the chat. There are some common themes emerging in the chat though, so I wanted to ask you first about those. Do you have to have coding know -how to get involved with tool building? And if so, in terms of maybe leading

25:04

tool development? What other ways can you get involved with tool development that might not be so focused on the coding section of development? Absolutely. So the short answer is no, you definitely don't have to be a coder to get involved. I think it definitely helps to have the more familiarity with coding you have. after the easier time you'll have when you're exposed to these quite technical spaces, but it's definitely not a barrier. There are a huge number of ways that folk can get involved.

25:45

One thing that we always find helpful is when folk who don't have a technical background or technical experience try to use our tools. If you're wanting to get involved with tool development, the easiest way, I would say, is to try out a bunch of tools and see if they work. And if you encounter a problem, reach out to the developer. So we have spaces in Discord here where if you encounter a problem, you can shout about it.

26:26

And really, like... feel free to test the tools to like their breaking point, because as a developer that works on some of these tools and supports other developers in building these tools, you know, you do your best to think about what people are going to do with the tool. But ultimately, it's not about us developing it. It's about you using it. So you don't have to have any coding knowledge to use many of the tools we produce.

26:50

But if you try and use it and we failed in the execution step, then we failed and we need to know about that. So a lot of the, for me, the joy, so we just published a tool today, more or less, and I've had some feedback from folks on Blue Sky about tweaks that could make the tool more usable. And what I've done is I've taken those and I've noted those down in the

27:16

tools repository on GitHub. So if you're comfortable enough to create an account on GitHub, there are ways to get involved with the conversation around tools, even without having to be a developer. And if you head to Bellingcat's GitHub, which is github .com forward slash Bellingcat, there's a big section on the landing page there that explains how you can get involved with Bellingcat's tools specifically. So I hope that answers the question. Within the chat, we've got people saying,

27:50

for example, Sarah, our moderator. She says, you can be a technical writer as a contributor. People love us. And that is really true. That is being able to really describe how to use a tool is, is part of the reason why our Bellingcat toolkit exists. Because not enough people do this when they build tools is make, you know, really good documentation. So people understand how to one use it and. the benefits and the disadvantages of each tool as well. So that's in the chat as

28:25

well that I linked the toolkit. I've also just put in the chat some links to spaces within the server. Gelin mentioned the GitHub, but we also have within the server tech contributors, which is a place to find ongoing tool ideas and to pitch those tool ideas and to find people who want to build together. And tools and sites is a place where people share often useful tools

28:53

that they've come across. And tool support is that place within the server where you can ask for help with a particular tool or flag something that's not quite working, particularly with a Bellingcat one if you're not comfortable or aren't familiar with GitHub and the way that that works. Just building on this slightly. Somebody said that they'd be comfortable picking up an idea from a list of ideas people had, but didn't try. Is there anywhere like that? So we've just mentioned

29:25

the tech contributors within the server. Is there anywhere online where people do say, oh, we have this idea for a tool, but don't necessarily have the resources to build it? Is there somewhere like that where you can get involved in those kinds of communities? You mentioned already the tech contributors space. There's a channel in there for working on working on new ideas. I think it's called welcome and info, which is perhaps not the best name, but it can be used

29:56

for developing ideas. And within tech contributors, it's a forum channel. So there are a bunch of different ideas that are in there. Oh, no, there is a I misspeak there is a a section of tech contributors called tool idea workshop. So that's where people might be discussing new ideas. I'll be honest, it's a little bit of a stagnant channel

30:17

right now. So if you're if you're interested, if you're excited by this, please go dive into tool ideas, workshop, share your ideas and really like use that as a space to collaborate with others. We do have a We do have a space with some long -standing challenges on the Bellingcat GitHub. That too is a little bit stale now. I mean, it's something that I've been working on

30:46

a little bit in the background. And I hope that at some point soon, I'll be able to release a kind of list of like a shopping list of ideas that we at Bellingcat think are really valuable and would be really cool enhancements to the open source research world, but they just don't exist yet. That's something that I'm working on in the background and hopefully I'll have something to share publicly on that in the not too distant future, but we'll have to wait and

31:17

see. Yeah, I just also want to thank John in the chat, who's been also putting in a bunch of useful links, which I'll put in the description of the podcast as well for those who do want to learn the command line and coding as well. So thank you so much for sharing those. And it's really valuable, as Galen said, to know coding for these kinds of tasks, but it's not exclusive.

31:41

Joanna in the chat, a good colleague and the person, the brain behind the toolkit has asked Galen, what has been your favorite tool building experience so far? Why was it so fun? So it's a great question, Hannah. I think there are, I've enjoyed building lots of different tools. Um, for different reasons, I think, um, and I need to not let recency have an impact. I recently launched the council search tool, which is a UK and Ireland focus tool for finding verbatim,

32:18

um, verbatim quotes from council meetings. Um, I'm very proud of that and it's very, um, it's very recent, but I think my, my favorite tool development experience has been the shadow finder tool. Because it came out of an idea in the community that I hadn't thought of before, I was aware of some of the methods that you can use for geolocation and chronolocation with the sun. And I was aware of websites like Suncalc. But the idea of running Suncalc backwards hadn't ever occurred to me.

32:54

And so when Gabor Friesen came into the Discord and shared the idea, I thought, oh, that's really smart. That's really smart. It's probably not going to be useful all the time for every project, but maybe one in a thousand projects is going to hit against a brick wall where the geolocation is just super difficult. But you have this other kind of information. And I think that was the most rewarding for me, both because it kind of

33:25

scratched that kind of intellectual itch. but also because it seemed to be a really valuable idea that came out of the community. So Gabor and I collaborated on building the tool, me doing the coding primarily, but then we would kind of iterate. I would share a version with him and we'd talk about what works and what didn't work. So I think having a partner in development really made that experience stand out for me. I'm just going to take the opportunity as well

33:57

to shout out about the grid search tool. This is at grid .bellingcat .com. It's such a simple tool. It also came out of conversations with the community, but there was less iteration because it was such a simple need. Plainly, the idea is if you're doing a geolocation with five people trying to work systematically in an area, how are you going to coordinate the workload? What the grid tool lets you do is it lets you download a grid that goes over a map in Google Earth.

34:29

And so you can say, right, I'm taking this section of grid cells. I'm taking this section of grid cells. And you can divide work and mean that people aren't double checking the same area. And yeah, it can help you keep track of areas you've already checked. So it is useful in like a few fronts. But it's such a simple tool. It just creates a KML file that you can download and open in Google Earth. It's really like the tech equivalent of getting a pencil and a ruler

34:59

and drawing a grid on your map. Well, we're shouting out tools. You mentioned earlier that a new tool came out today. Do you want to share a little bit about what that tool is before we delve into other audience questions? Yeah, absolutely. You must cut me off if I rabbit on about it. Yeah, this is a tool that I've been working on lately and it's just reached the state where it's ready.

35:27

It's a council search tool. This was born out of conversations with local journalists in the UK who one of the kind of classic local journalism tasks is attending council meetings or watching in the modern day, watching live streams of council meetings to see if anything scandalous happens or if there are any discussions about new developments or interesting things for a local journalist to cover. And one local journalist I was talking to said, yeah, I spent I spent every evening

36:07

of my life doing this. And one day I got a real big scoop because they forgot to turn the recording off at the end. And I thought, oh, that's really interesting. And that's great if you've got the resources to do that. But I know that in the UK, at least, local journalism is one of those areas that's been consistently underfunded and undersupported over recent years. So anything for local journalists and local democracy researchers. to help them cover local council meetings seemed

36:40

like a valuable addition. And what this tool does is it goes away and in the background, it grabs the transcripts associated with those live streams that councils put out there into the world, and it makes those transcripts searchable. So if you go to the search tool now, you would see a big search bar and you can type in things like off the record or That's outrageous. Or you could type in specific company names that

37:11

you're interested in. And you can then start to see all of the meetings in which those terms have been mentioned. And the search tool gives you a little preview of the section of text. And crucially, there's a button that takes you to the source video. And if you click that button, it's going to take you not just to the page where the video is, but it's going to take you to exactly correct timestamp for the part that you searched.

37:39

I'm really hopeful that this will become a powerful tool for local democracy researchers and local

37:48

journalists. It's all open source and at the moment I just have written code to support one council multimedia provider and I have an aspiration that other folks who are interested and want to get involved will go out there and look at their local council or if you're not in the UK or Ireland, you'll look at a council that you've come across as part of an investigation or just the first one that came up on your Google search and you'll see what format they've used for their

38:20

meeting streams and you'll see if you can reverse engineer that in a way that can add support for that council and to other councils that use the same provider into this tool. A current outstanding task is to support YouTube, which many councils use. I've not done that yet. And I think it would be really exciting to be able to support those many, many councils that use YouTube to archive their council meetings. One of the audience members has just put, reminds me of Filmot, a transcript

38:56

search for YouTube. So maybe that's something to check out. Sarah says, that's sick. What a time saver. As someone who used to sit in council meetings as part of her job, I 100 % agree. They were hell. John asks, is this on GitHub as well? Because I know that most councils in the Netherlands publish this as well. Yes, it is on GitHub. So it's open sourced. So it, we have a hosted backend service for it. So we host the database for this tool. And that does all of the kind of scraping

39:37

on the backend. But the front end is open source. The backend is open source. And if you, did you say it was the Netherlands? In the Netherlands, many councils publish. Yeah. I think adding different country support would be something that I would be excited to do. It's just not been something that's been in the initial scope of the project, but there's absolutely no reason from a technical perspective why that wouldn't work. Or indeed, if you wanted to host your own country -specific

40:15

version of this, that would be amazing too. Please have a little play with that. Let us know if you come across any issues. You know the spaces in GitHub and the spaces are then here to flag that now. It's really, really important. And yeah, we wanted to really announce that today just so that you had a little tool, a new tool to play with after this talk. I'm just taking it back to building your own tools though, away

40:45

from the current available tools. We had a question earlier about how EU nations are currently moving to free open source software away from Microsoft Office, for example. And they asked about what it meant for cyber security. But I had a wider question on that in terms of actually building open source tools. Are there any security issues that you need to consider when building and opening

41:18

up your code to the world. This is one of the reasons why I'm quite a strong advocate for using those kind of Colab notebooks or producing a website that is static and runs entirely in a user's browser. Because in doing that, you kind of circumvent many of the security challenges of deploying a service like we've done with the council search tool. because then you have to make sure it's secured and you have to have some understanding of how to do that effectively.

41:55

So yeah, there are definitely security concerns in publishing a tool, but if you make it so that it only runs on a used device and you're not loading malicious code onto people's computers, that's a really good thing. As long as you're not the security problem is I think what I'm saying. Following those tips circumvent a huge number of those issues. And then on the idea of open source tools were generally so there's

42:33

a subtle distinction. that we encounter really frequently in the Bellingcat tool space is the difference between open source research and open source technology, open source code. So they're very similar, but slightly different. So with open source research, that means we're using open sources of information to conduct our investigations. And there an open source could require payment. So in Bellingcat investigations, we will sometimes use imagery from the satellite company Planet

43:07

as part of our investigation. Now, that's open source in the sense that anyone could pay Planet for that imagery. And we also then publish the imagery server, kind of check our work. But it's not open source in the free and open source software sense, where all of the code is public and available for people to iterate on and improve. Now, incidentally, I think both sets of ideas are really compatible.

43:37

It's why we publish so many of our tools in an open way, because it just fits the, it's a really good intersection between technical folks and research folks that I think can lead to really kind of fruitful relationships. And I think more and more as institutions move towards kind of open source tools, I think or I hope that the community around open source continues to grow and it helps just everyone build better tools

44:10

in the long run. This is a question that comes up quite a lot when we're talking about tool building. Obviously people tend to think a lot at the minute about AI, one of the most common AI tools. that are kind of emerging at the minute is kind of these verification checkers. I had a conversation with someone very recently who is keen to build a similar one, and they're asking us about the types of disinformation and misinformation that we encounter, because that's something they're

44:43

looking into. But Aries in the chat has asked an interesting question about how tools specifically are preparing, particularly when it comes to detection or counter misinformation tools, how they're preparing to cope with new AI misinformation and disinformation. What kind of architectural feature level innovations are on your mind to help investigators detect and counter misinformation while still leveraging AI effectively is the

45:16

question. But I don't know if you've ever come across a tool that is 100 % reliable when it comes to identifying AI, I haven't. I know we've recommended some in the past with additional verification techniques, of course. But yeah, I don't know. Do you want to speak a little bit to that space that currently there's a lot of tool builders, developers kind of looking at and what that particular space looks like? Definitely. And I think there's a few strands to it as well.

45:52

So I'll try and... speak to each of them as I see them, and hopefully that's useful. So I think, firstly, there's the whole space about like AI detection tool. AI detection tools are going, it's going to be a constant battle between AI detection tools and nefarious AI trainers. Of course, there may be cooperative solutions between like larger platform, but AI detection is like an adversarial exercise, will always be a kind

46:29

of shifting ground. And I don't think, I don't think there will ever be a tool that delivers good results always and is consistent, like is ahead of the curve for a long period of time. It might be that a tool that is very effective one month. is going to be obsoleted with the next model release and its developers will have to go back to the drawing board to improve their

46:58

detection model. But I think sometimes I think we risk focusing on detection tools when actually the kind of ecosystem of misinformation is perhaps a much more fruitful place to investigate. So I think to stay ahead of synthetic media and those advanced obfuscation approaches is almost

47:27

to forget about the technical element. If you see an image online that is surprising or shocking, follow your nose and follow your investigative instincts and see, well, if there is a huge fire in this location, Why are all of the images this exact one angle? Can I find multiple images on social media posted by different sources, like beginning to build up a picture of the misinformation landscape around a particular event, I think is perhaps a more useful approach than putting

48:11

an image into an AI image detector. Of course, there's a place for those detectors, but I think as the technology moves on and as things become more and more convincing and it becomes increasingly difficult to separate reality from a generation, I think it will become those contextual clues that provide the really kind of richest information. So I apologize, it's kind of a non -technical answer to the question and it's not a space that

48:46

I'm a particular expert in. I know my colleague Kalina would probably have a much more intelligent answer to how to spot misinformation, particularly AI -generated misinformation. But I think it's an honest one. I don't think all problems have a tech solution. And that's probably quite an important lesson for those of you thinking of going into kind of or being involved with like tech development in this space is sometimes there are very human solutions to these problems as

49:23

opposed to technical ones. Yeah, I can count. I can't even count on my hand the amount of times I've been asked by a journalist when I've been doing talks on disinformation. Is there like a tool that fixes this? It's like the answer is no. That's not the answer. The tool isn't the answer, as far as I'm aware. So I agree with you there. I wanted to ask though, in terms of when we're talking about updating tools and making sure that they're advancing enough with the tech.

49:53

We've had some tools in the past, spelling out tools break when social platform policies have changed, right? How do you keep on top of that, particularly when you're building search tools, search across specific platforms? a tremendous amount of effort, frankly. I think there's not

50:13

really a good way around it. If a tool that you're building relies on somebody else's service, then that is, as part of developing that tool, you have to understand that if you want to keep that tool working in the long run, you have to... you have to accept the possibility that that person could change how they're operating. This is particularly true in the world of scraping.

50:41

So if anyone has tried to do any social media scraping in the last five years, frankly, you'll have found a plethora of tools, some of which are very good one week and terrible the next. some of which are kind of mediocre all the time, and some of which are just kind of terrible. But as far as I'm aware, there's no like consistently fantastic social media scrapers, because those social media platforms are constantly shifting. And so the amount of work required to keep up

51:12

with them is significant. It doesn't mean it's not worth doing, but it's definitely a consideration when building a tool is thinking like, how, how do I keep this supported in the long run? It's why with a tool like the grid search tool, I'm super happy because that will, I hope, fingers crossed, require zero maintenance along because it does what it says. It does that one thing really well. And it lets you download a file format that's barely industry standard, which

51:42

is a KML file format. So there's no shifting ground there. When it comes to tools that rely on third party services. That's a feature of the landscape. And I think you either accept that your tool is going to take a lot more of your time or you accept that at some point it's going to break. And that's just how it is. Yeah. And as you said at the beginning, I think simplicity is often key when you're thinking about applications, particularly. Do you often, within tool documentation,

52:14

do you often have to assess? What devices can use your tools and in what regions tools are applicable as well? Is that something that's really useful to add into documentation, for example? If you have any, the general rule with tool documentation is if you have some insight that you think is useful to someone else, put it in the documentation. On a personal level, I don't think I have a tremendous amount of insight into what is... what are useful to folks in different

52:44

regions. It's something that I have in mind and would like to better my understanding of so that we can build better tools that are more portable and more useful to a larger number of researchers. In terms of devices, it's a really good question because I think more and more journalists are doing things on their phone. more and more researchers will be doing things on the go. You might be on the train doing a little bit of research.

53:16

So finding ways to make your tool suitable for mobile is like a really essential thing in modern landscape. And that is a downside of the Colab Notebooks, Jupyter Notebooks that I talked about earlier is that they are not very accessible for mobile device. You immediately cut out a significant fraction of your users or at least your users a significant fraction of the time

53:42

when they're on the go. It's one of the things that I really like about using Vue, that's V -U -E, the JavaScript framework for front -end development, is kind of implicit in the design is a mobile -first approach, which means that your site will be good on mobile. So the council search tool should work on mobile. And you should have a perfectly good experience there. So thinking of that and testing for if your tool works on mobile is a really good thing to do. And I know

54:23

it's something that we do. This is maybe a slight aside from tools, but it's something that we do when thinking about article content for Bellingcat stories is does the article look good on a tiny mobile phone? I'm just going to pull up my responsive design mode in my browser to give you a sense. So the size of phone screen that we test for if we're doing some interesting kind of novel experience in an article is 320 by 547 pixels. That's the screen size or the functional screen

55:02

size of an iPhone 5. which is tiny. I can't overstate how small that is. So when you're building a tool, think about what's it going to look like on a tiny, tiny screen? What's it going to look like if that person on a tiny, tiny screen turns their phone into landscape mode? How is this tool going to look? Is it even going to be usable? Do we just have to tell them to turn their phone around? But if you can do that, that's such a

55:31

valuable addition to a tool. If you can add those graceful degradations as people's screen sizes get smaller, as it becomes more and more difficult for them to access the functionality of the tool, let it lose its functionality in a graceful way. Tell the user that it would be better if they found a larger screen or turn their phone around. And make sure you do that. early on in the tool

56:00

interface planning process. Because there's nothing more frustrating than if you've built this beautiful tool interface and then you test it on mobile last minute and it doesn't fit. That is gutting. So that would be my advice is to test it early. I'll let you know what I've cracked that one. Yeah, as you can with the council search tool, please do test that out for Galen on mobile as well. We are coming up to the end of the hour,

56:27

so I wanted to ask you one last question. You mentioned at the beginning that most of the open source tools are free. If somebody does want to make a business out of this, make money, is that even possible within the open source formats or is that something that is just so anti the ethos that you don't recommend it? I don't recommend it, not because it's anti -the -ethos. So for instance, the Mozilla Foundation, they release the Firefox browser, which is an open source

57:05

web browser. But they are also a commercial entity. And as I understand, part of their business model is providing support to organizations that want to use the open source tool at scale. And that's quite a common model. see in the open source software space is an organization publishing something for free and then selling services to help businesses use it at scale. So that kind of infrastructure support for larger, more complicated tools is an avenue where you can make a business

57:45

out of it. I'm the wrong person to ask about developing proprietary tools because it's a different trust architecture to the one that we like using at Bellingcat, which is one where the method is very clear, the tools are very clear, and open source tools make their method implicitly transparent. If somebody wants to sell a tool that does a thing and you're not going to tell me how I do it, it makes it much harder for me to justify using it in an investigation. It's

58:22

not impossible. Sometimes tools will unlock a level of analysis that is tremendously helpful and unsticks an investigation, but it then becomes a conversation about whether that tool should be used or not. And so I'm the wrong person to advocate for a paid tool ecosystem. Keep it open, keep it free is the message there. Thank you so much, Galen, for your time today. And thank you to everybody who's joined the audience. Please do check out the tech contributor space within

58:57

the Discord. I'll pop it again in the chat now. It is a space for you to chat with Galen and others from our team and each other about possible tools. And you can see some of the ones in there that Galen is already talking to people about. And do check out our Bellingcat GitHub as well that has tons and tons and tons of useful tools on day. Experiment with them. And if you find an issue, please flag it to us so we can fix

59:27

it quickly. Thank you again, Galen. This recording will hopefully be up on podcast platforms, which will be posted in here by this weekend. And we'll be back in two weeks time with another stage talk. But for now, thank you and goodbye. Thank you, Charlie, so much for hosting and moderating and keeping track of all questions. As always, you do a fantastic job. Oh, the flattery. The conversation's blowing. Thank you very much.

59:58

Thank you for listening to the Stage Talk. If you'd like to catch a Stage Talk live where you can ask the guest questions, join the Bellingcat Discord server by visiting www .discord .gg The music you've heard is titled Dawn by Newer Self and is courtesy of Artlist.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript