Scale Cast – A podcast about big data, distributed systems, and scalability - podcast cover

Scale Cast – A podcast about big data, distributed systems, and scalability

A podcast about big data, distributed systems, and scalability
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

An Introduction to ZooKeeper Video

In 2006 we were building distributed applications that needed a master, aka coordinator, aka controller to manage the sub processes of the applications. It was a scenario that we had encountered before and something that we saw repeated over and over again inside and outside of Yahoo!. For example, we have an application that consists of a bunch of processes. Each process needs be aware of other processes in the system. The processes need to know how requests are partitioned among the processes....

Apr 26, 2008

More Optimal Bloom Filters

The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives. For example, one might use a Bloom filter to do spell-checking in a space-ef...

Apr 18, 2008

An Overview of High Performance Computing and Challenges for the Future

In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments. Some of the software and algorithm challenges have already been encountered, such as management of communication and mem...

Apr 08, 2008

Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

This talk takes us on a journey through three varied, but interconnected topics. First, our research lab has engaged in a series of disk-based computations extending over five years. Disks have traditionally been used for filesystems, for virtual memory, and for databases. Disk-based computation opens up an important fourth use: an abstraction for multiple disks that allows parallel programs to treat them in a manner similar to RAM. The key observation is that 50 disks have approximately the sam...

Mar 29, 2008

Lecture 1: Cluster Computing and MapReduce

Lecture 1 in a five part series introducing mapreduce and cluster computing. See http://code.google.com/edu/&#8230 ; for slides and other resources. Link to video

Jan 03, 2008
For the best experience, listen in Metacast app for iOS or Android