Disseminate: The Computer Science Research Podcast - podcast cover

Disseminate: The Computer Science Research Podcast

Jack Waudbyshows.acast.com

This podcast features interviews with Computer Science researchers. Hosted by Dr. Jack Waudby researchers are interviewed, highlighting the problem(s) they tackled, solutions they developed, and how their findings can be applied in practice. This podcast is for industry practitioners, researchers, and students, aims to further narrow the gap between research and practice, and to generally make awesome Computer Science research more accessible. We have 2 types of episode: (i) Cutting Edge (red/blue logo) where we talk to researchers about their latest work, and (ii) High Impact (gold/silver logo) where we talk to researchers about their influential work.


You can support the show through Buy Me a CoffeeA donation of $3 will help us keep making you awesome Computer Science research podcasts. 

Hosted on Acast. See acast.com/privacy for more information.

Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Haoran Ma | MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime | #18

Summary: Far-memory techniques that enable applications to use remote memory and are increasingly appealing in modern data centers, supporting applications’ large memory footprint and improving machines’ resource utilization. In this episode Haoran Ma tells us about the problems with current far-memory techniques and how they focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. Owing to differen...

Jan 16, 202344 minSeason 3Ep. 3

Lexiang Huang | Metastable Failures in the Wild | #17

Summary: In this episode Lexiang Huang talks about a framework for understanding a class of failures in distributed systems called metastable failures. Lexiang tells us about his study on the prevalence of such failures in the wild and how he and his colleagues scoured over publicly available incident reports from many organizations, ranging from hyperscalers to small companies. Listen to the episode to find out about his main findings and gain a deeper understanding of metastable failures and h...

Jan 09, 202353 minSeason 3Ep. 2

Andrew Quinn | Debugging the OmniTable Way | #16

Summary: Debugging is time-consuming, accounting for roughly 50% of a developer's time. In this episode Andrew Quinn tells us about the OmniTable, an abstraction that captures all execution state as a large queryable data table. In his research Andrew has built a query model around an OmniTable that supports SQL to simplify debugging. An OmniTable decouples debugging logic from the original execution, which SteamDrill, Andrew's prototype, uses to reduce the performance overhead of debugging (Ste...

Jan 02, 202358 minSeason 3Ep. 1

Audrey Cheng | TAOBench: An End-to-End Benchmark for Social Network Workloads | #15

Summary: This episode features Audrey Cheng talking about TAOBench, a new benchmark that captures the social graph workload at Meta. Audrey tells us about the features of workload, how it compares with other benchmarks, and how it fills a gap in the existing space of benchmark. Also, we hear all about the fantastic real-world impact the benchmark has already had across a range of companies. Links: Paper Personal website Meta blog post GitHub repo Hosted on Acast. See acast.com/privacy for more i...

Dec 12, 202253 minSeason 2Ep. 5

George Konstantinidis | Enabling Personal Consent in Databases | #14

Summary: Users have the right to consent to the use of their data, but current methods are limited to very coarse-grained expressions of consent, as “opt-in/opt-out” choices for certain uses. In this episode, George talks about how he and his group identified the need for fine-grained consent management and how they formalized how to express and manage user consent and personal contracts of data usage in relational databases. Their approach enables data owners to express the intended data usage ...

Dec 05, 202256 minSeason 2Ep. 4

Per Fuchs | Sortledton: a Universal, Transactional Graph Data Structure | #13

Summary (VLDB abstract): Despite the wide adoption of graph processing across many different application domains, there is no underlying data structure that can serve a variety of graph workloads (analytics, traversals, and pattern matching) on dynamic graphs with transactional updates. In this episode, Per talks about Sortledton, a universal graph data structure that addresses the open problem by being carefully optimizing for the most relevant data access patterns used by graph computation ker...

Nov 28, 202241 minSeason 2Ep. 3

George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12

Summary (VLDB abstract): Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data afte...

Nov 21, 202246 minSeason 2Ep. 2

Kevin Gaffney | SQLite: Past, Present, and Future | #11

Summary: In this episode Kevin Gaffney tells us about SQLite, the most widely deployed database engine in existence. SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), e...

Nov 14, 202248 minSeason 2Ep. 1

Matthias Jasny | P4DB - The Case for In-Network OLTP | #10

Summary: In this episode Matthias Jasny from TU Darmstadt talks about P4DB, a database that uses a programmable switch to accelerate OLTP workloads. The main idea of P4DB is that it implements a transaction processing engine on top of a P4-programmable switch. The switch can thus act as an accelerator in the network, especially when it is used to store and process hot (contended) tuples on the switch. P4DB provides significant benefits compared to traditional DBMS architectures and can achieve a...

Aug 08, 202227 minSeason 1Ep. 10

Tobias Ziegler | ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA | #9

Summary: In this episode Tobias talks about his work on ScaleStore, a distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, and scalability. Using low latency RDMA messages, ScaleStore implements a transparent memory abstraction that provides access to the aggregated DRAM memory and NVMe storage of all nodes. In contrast to existing distributed RDMA designs such as NAM-DB or FaRM, ScaleStore stores cold data on NVMe...

Aug 01, 202223 minSeason 1Ep. 9

Chuzhe Tang | Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly | #8

Summary: Many transactions in web applications are constructed ad-hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. In this episode, Chuzhe tells us these ad-hoc transactions, database operations coordinated by application code. Until Chuzhe’s work, little was known about them. In this episode he chats about the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transac...

Jul 25, 202232 minSeason 1Ep. 8

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7

Summary: Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant stora...

Jul 18, 202228 minSeason 1Ep. 7

Hani Al-Sayeh | Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications | #6

Summary: Distributed in-memory processing frameworks accelerate iterative workloads by caching suitable datasets in memory rather than recomputing them in each iteration. Selecting appropriate datasets to cache as well as allocating a suitable cluster configuration for caching these datasets play a crucial role in achieving optimal performance. In practice, both are tedious, time-consuming tasks and are often neglected by end users, who are typically not aware of workload semantics, sizes of int...

Jul 11, 202232 minSeason 1Ep. 6

Thomas Hütter | JEDI: These aren’t the JSON documents you’re looking for | #4

Summary: The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this interview, Thomas talks about how he addressed the problem of JSON similarity lookup queries: given a query document and a distance threshold, retrieve all documents that are within the threshold from the query document, i.e., get me all similar documents!. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered ...

Jul 08, 202212 minSeason 1Ep. 4

Sainyam Galhotra | Causal Feature Selection for Algorithmic Fairness | #5

Summary: The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high-quality training data, most of the fairness literature ignores this stage. In this interview Sainyam discusses why he focuses on fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. ...

Jul 08, 202212 minSeason 1Ep. 5

Draco Xu | TSUBASA: Climate Network Construction on Historical and Real-Time Data | #3

Summary: A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied on climate data to study the dynamics of a climate network. The core task and first step to enable interactive network science on climate data is the efficient construction and update of a climate network on user-defined time-windows. In this interview Draco talks about TSUBASA, an algorithm for the efficient construction of climate networks based ...

Jul 04, 202217 minSeason 1Ep. 3

Felix S Campbell | Efficient Answering of Historical What-if Queries | #2

Summary: In this interview Felix discusses "historical what-if queries", a novel type of what-if analysis that determines the effect of a hypothetical change to the transactional history of a database. For example, “how would revenue be affected if we would have charged an additional $6 for shipping?” In his research Felix has developed efficient techniques for answering these historical what-if queries, i.e., determining how a modified history affects the current database state. During the show...

Jul 01, 202219 minSeason 1Ep. 2

Alex Isenko | Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines | #1

Summary: Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a c...

Jun 27, 202225 minSeason 1Ep. 1

Coming Soon | ACM SIGMOD/PODS 2022 | #0

Welcome to Disseminate! The podcast bringing you the cutting edge of Computer Science research in a digestible format. Each series will focus on papers published at a specific Computer Science conference, e.g., SIGMOD, CVPR, so we will cover a wide range of topics from distributed systems to computer vision. Each episode within a series will feature an interview with the author(s) of a paper published at that conference. The podcasts aims to be an alternative source of information for industry p...

Jun 03, 20222 min
For the best experience, listen in Metacast app for iOS or Android