How Does the Wayback Machine Work?

Speaker 1

00:01

Welcome to brain Stuff production of I Heart Radio. Hey brain Stuff, Lauren vog Obam here. If a tree falls in a forest doesn't really make a sound? And if a website changes overnight, did its previous homepage ever really exist in the first place. Because so much of our world is increasingly digital and ephemeral, it's not just a

00:23

philosophical question, it's also a simple matter of history. That's why the way Back Machine, which features step shots of websites as they age and change, is such a fascinating glimpse into the dusty corners of the web. The way Back Machine is a massive digital archive meant to preserve web pages that would otherwise be permanently lost to time. Without this horde of data, every time a page was updated or deleted, it would simply vanish, as if it

00:48

had never been there. Mark Graham, the director of the way Back Machine, noted in Entrepreneur article that the average life expectancy of a web page is about a hundred days. There are a multitude of reasons why these web pages disappear. A site creators move on to other projects, web hosting companies go bankrupt, or maybe the pages moved or replaced with new data and content. One place you may have

01:12

seen the way back machines work. More than eleven million web pages referenced in Wikipedia articles have gone bad over the years. In other words, they now return a four oh four or page not found error because they've been archived.

01:25

In the way Back Machine. Technicians there were able to edit those Wikipedia pages, so the references now point to archived versions of those defunct u r l s. The way Back Machine is the brainchild of Brewster Kale and Bruce Giliad, who also founded the Internet Archive, which is a digital library of websites, books, audio and video recordings, and software. Both projects are San Francisco based nonprofits. Kale and Gilliatt also created Alexa Internet, which analyzes web traffic

01:52

patterns and was sold to Amazon. Project director Graham said via email they with Kale and Gilad, had started to archive web pages in and in two thousand one launched the way Back Machine to support discovery and playback of those archived web resources and yes, the name was inspired by the nineteen sixties cartoon series The Rocky and Bullwinkle Show.

02:15

In the cartoon, the way Back w A B a c. Machine was a plot device used to transport the characters Mr. Peabody and Sherman back in time to visit important events in human history. In a world where there are more than one point seven billion websites, with the number climbing dramatically by the day, how can anyone possibly hope to

02:35

catalog so many web pages? The way Back Machine uses what are called crawlers, a type of software that automatically moves through the web, taking snapshots of billions of sites as it goes. Some of the process is automated, but many of the requests are generated manually by a network of librarians who prioritize certain types of sites that they think are important to preserve for posterity and for future generations. The crawlers don't capture every iteration of sites. The frequency

03:04

of snapshots differs by these sites importance. Very significant sites might be recorded every few hours. Others might be logged weeks or months apart. Most aren't logged at all, So don't worry that embarrassing fan website you made in high

03:17

school is probably long gone by now. The way Back Machine aims to capture snapshots of important content, say the breaking news headlines created by major media companies, Furthermore, it doesn't necessarily recreate the entire site, and it doesn't preserve the data in a way that you'd experience it with your browser. It may only capture a few images of a few pages and not preserve content that's linked to

03:41

other sites outside of the domain. But on a more practical level, you've probably had the experience of clicking on a link on a web page and getting a four oh four or page dot found notation, and now you're wondering what was on the page originally. That's where the

03:56

way back machine can help. To use the way back machine, go to archive dot org slash web type the ur L of the site you want to investigate in the browse history search bar, and the results you'll see a chronological barograph that shows how many times the site was crawled and saved in a given year. Click the year and blow You'll see a twelve month calendar with various dates highlighted. Blue highlights mean the site was saved properly, red means it was not. Click one of the highlighted

04:24

dates and the site stop shots will appear. Click on one of those snapshots, and just like that, you've traveled back in time to that older version of the site. If you want to make sure that a particular site is recorded to the archive, you can do so manually use the save page now option to save a specific page once, but realize that doing so only saves that one page, not an entire website, and it doesn't guarantee

04:47

that the site will be crawled in the future. And if content owners want their material excluded from the Wayback Machine, they can submit a request by sending an email to info at archive dot org. Graham's as that the most amazing thing about the way Back Machine is that it exists at all, and how much of the public web

05:04

it's able to preserve. Given that it has such a small budget and team, they do use volunteers as well, he said, with more support, we can do an even better job of backing up more of the public web. Funding for the Internet Archive and the way Back Machine comes from a combination of earned income from our subscription based web arcing service archive it dot org, major donors and foundations, as well as contributions from more than a

05:27

hundred thousand individual donors. We love being able to give away our services and don't run ads on our web pages. He's sure that the way Back Machine will become even more important in the future. Quote. As the nature of how people communicate and share information evolves, so too we will need to build technologies, processes, and partnerships to continue to do the best job we can to preserve as

05:50

much of this public information as possible. All in support of the way Back machines mission to help make the web more useful and reliable, and in particular, to help support your lists, activists, academics, historians, researchers, and the general public. Today's episode was written by Nathan Chandler and produced by Tyler Clay. Brain Stuff is production of I Heart Radio's

06:14

How Stuff Works. For more on this and lots of other well archived topics, visit our home planet how stuff Works dot com and for more podcasts for my heart Radio but it's the I Heart Radio app, Apple Podcasts, or wherever you listen to your favorite shows.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript