Streaming Audio: Apache Kafka® & Real-Time Data - podcast cover

Streaming Audio: Apache Kafka® & Real-Time Data

Confluent, founded by the original creators of Apache Kafka®developer.confluent.io

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Episodes

KIP-500: Apache Kafka Without ZooKeeper ft. Colin McCabe and Jason Gustafson

Tim Berglund sits down with Colin McCabe and Jason Gustafson to talk about KIP-500. The pair, who work on the Kafka Core Engineering Team, discuss the history of Kafka, the creation of KIP-500, and what it will do for the community as a whole. They break down ZooKeeper's role in Kafka, the implications of removing ZooKeeper dependency, replacing it with a self-managed metadata quorum, and how they've been combatting security, stability, and compatibility issues. With pending improvemen...

Sep 18, 201944 minSeason 1Ep. 55

Should You Run Apache Kafka on Kubernetes? ft. Balthazar Rouberol

When it comes to deploying applications at scale without needing to integrate different pieces of infrastructure yourself, the answer nowadays is increasingly Kubernetes. Kubernetes provides all the building blocks that are needed, and a lot of thought is required to truly create an enterprise-grade Apache Kafka® platform that can be used in production. But before running Kafka on Kubernetes, there are some factors to consider. What are the maturing stages of Kubernetes adoption? How did Datadog...

Sep 16, 201930 minSeason 1Ep. 54

Jay Kreps on the Last 10 Years of Apache Kafka and Event Streaming

As Confluent turns five years old, special guest Jay Kreps (Co-founder and CEO, Confluent) brings us back to his early development days of coding Apache Kafka® over a Christmas holiday while working at LinkedIn. Kafka has become a breakthrough open source distributed streaming platform based on an abstraction of the distributed commit log, and his involvement in the project eventually led him to start Confluent with Jun Rao and Neha Narkhede. In this episode, Jay shares about all the highs and l...

Sep 12, 201948 minSeason 1Ep. 53

Connecting to Apache Kafka with Neo4j

What’s a graph? How does Cypher work? In today's episode of Streaming Audio, Tim Berglund sits down with Michael Hunger (Lead of Neo4j Labs) and David Allen (Partner Solution Architect, Neo4j) to discuss Neo4j basics and get the scoop on major features introduced in Neo4j 3.4 and 3.5. Among these are geospatial and temporal types, but there’s also more to come in 4.0: a multi-database feature, fine-grained security, and reactive drivers/Spring Data Neo4j RX. In addition to sharing a little ...

Sep 09, 201954 minSeason 1Ep. 52

Ask Confluent #15: Attack of the Zombie Controller

Gwen Shapira (Core Kafka Software Engineer, Confluent) sits down to answer the questions you've had about event streaming, Apache Kafka®, Confluent, and everything in between. This includes creating tables in nested JSON topics, how to balance ordering, latency and reliability, building event-based systems, and how to navigate the tricky endOffsets API. She talks about the hardships of fencing Zombie requests, some of the talks given at previous Kafka Summits, and an important question from...

Sep 04, 201922 minSeason 1Ep. 51

Helping Healthcare with Apache Kafka and KSQL ft. Ramesh Sringeri

In today’s episode of Streaming Audio, Tim Berglund sits down with Senior Applications Developer of Mobile Solutions Ramesh Sringeri to discuss Apache Kafka®—specifically two Kafka use cases that Children’s Healthcare of Atlanta is working on. First, they discuss achieving near-real-time streams of data to support meaningful intracranial pressure prediction and managing intracranial pressure (ICP) in a timely manner to help the care team achieve better outcomes with traumatic brain injuries. Chi...

Aug 28, 201953 minSeason 1Ep. 50

Contributing to Open Source with the Kafka Connect MongoDB Sink ft. Hans-Peter Grahsl

Sink and source connectors are important for getting data in and out of Apache Kafka®. Tim Berglund invites Hans-Peter Grahsl (Technical Trainer and Software Engineer, Netconomy Software & Consulting GmbH) to share about his involvement in the Apache Kafka project, spanning from several conference contributions all the way to his open source community sink connector for MongoDB, now part of the official MongoDB Kafka connector code base. Join us in this episode to learn what it’s like to be ...

Aug 21, 201950 minSeason 1Ep. 49

Teaching Apache Kafka Online with Stéphane Maarek

Streaming Audio welcomes Stéphane Maarek (CEO, Datacumulus) on the podcast to discuss how he got started hosting online Apache Kafka® tutorials and teaching on Udemy, the challenges he faces as an instructor, his approach to answering hard questions, and the projects he is currently working on. EPISODE LINKS KSQL Training for Hands-On Learning Join the Confluent Community Slack...

Aug 19, 201942 minSeason 1Ep. 48

Connecting Apache Cassandra to Apache Kafka with Jeff Carpenter from DataStax

Whenever you see an Apache Cassandra™ in the wild, you probably also see an Apache Kafka®️. In this episode, Tim Berglund (Senior Director of Developer Experience, Confluent) and Jeff Carpenter (Director of Developer Advocacy, DataStax) discuss the best way to get those systems talking using the DataStax Apache Kafka Connector and build a real-time data pipeline. EPISODE LINKS About the DataStax Apache Kafka Connector DataStax Academy: DataStax Apache Kafka Connector Course Join the Confluent Co...

Aug 12, 201948 minSeason 1Ep. 47

Transparent GDPR Encryption with David Jacot

The General Data Protection Regulation (GDPR) has challenged many enterprises to rethink how they deal with customer data. Viktor Gamov chats with David Jacot about a unique approach to inter-broker traffic encryption that he implemented for his customer’s sidecar pattern use case. EPISODE LINKS Learn about Istio Learn about Envoy Learn about Linkerd Handling GDPR with Apache Kafka®: How to Comply Without Freaking Out? Join the Confluent Community Slack...

Aug 08, 201917 minSeason 1Ep. 46

Confluent Platform 5.3 | What's New in This Release

A quick summary of the most important features in Confluent Platform 5.3. We discuss improved Kubernetes and Ansible support, improvements to Confluent Control Center that give you better insight into the data in your cluster, and an important new set of security features—Role-Based Access Control—aimed at making complex deployments more secure. EPISODE LINKS Read the docs Read the blog Watch the video version of this podcast (featuring an actual stream) Download Confluent Platform 5.3 Join us i...

Jul 31, 201913 minSeason 1Ep. 45

How to Convert Python Batch Jobs into Kafka Streams Applications with Rishi Dhanaraj

Zenreach is a company that makes tools to help retailers use digital marketing more effectively. If that sounds like a problem that only marketing people would be interested in, that’s because you don’t know what they do! There are all kinds of fascinating technology problems to solve by utilizing event streaming platforms to process data at volume. Rishi Dhanaraj, our guest today, worked at Zenreach as an intern, and took on a big pile of Python batch jobs, turning them into some really interes...

Jul 29, 201931 minSeason 1Ep. 44

Ask Confluent #14: In Control of Kafka with Dan Norwood

Is Apache Kafka® actually a database? Can you install Confluent Control Center on Google Cloud Platform (GCP)? All this, plus some tips from Dan Norwood, the first user of Kafka Streams. EPISODE LINKS Control Center Docker image Control Center Docker configuration Complete Streams example Watch the video version of this podcast Join us in Confluent Community Slack...

Jul 22, 201924 minSeason 1Ep. 43

Kafka in Action with Dylan Scott

Author Dylan Scott tells all about his upcoming Manning title Kafka in Action , which shares how Apache Kafka® can be used by beginners who are just starting out their own projects and dispels common Hadoop-related myths, as Kafka has grown to become a powerful event streaming platform beyond big data ecosystems alone. To get 40% off Manning products, use the following code: podcon19 EPISODE LINKS Join us in Confluent Community Slack...

Jul 15, 201938 minSeason 1Ep. 42

Change Data Capture with Debezium ft. Gunnar Morling

Friends don’t let friends do dual writes! Gunnar Morling (Software Engineer, Red Hat) joins us on the podcast to share a little bit about what Debezium is, how it works, and which databases it supports. In addition to covering the various use cases and benefits from change data capture (CDC) in the context of microservices—touching on the outbox pattern in particular, Gunnar walks us through the advantages of log-based CDC as implemented through Debezium over polling-based approaches, why you’d ...

Jul 10, 201949 minSeason 1Ep. 41

Distributed Systems Engineering with Apache Kafka ft. Jason Gustafson

Ever wonder what it’s like to be a distributed systems engineer at Confluent? Core Kafka Engineer Jason Gustafson dives into the challenges of working on distributed systems, particularly when it comes to a unique system like Apache Kafka®. He also discusses ways in which Confluent is working with the community to solve active problems and what it takes to be a distributed systems engineer. As always, Confluent is looking for engineers who are interested in distributed systems, and you don’t hav...

Jul 02, 201946 minSeason 1Ep. 40

Apache Kafka 2.3 | What's New in This Release + Updates and KIPs

Tim Berglund (Senior Director of Developer Experience, Confluent) explains what’s new in Apache Kafka® 2.3 and highlights some of the most important Kafka Improvement Proposals (KIPs). EPISODE LINKS Read the blog Watch the video version of this podcast

Jun 25, 201914 minSeason 1Ep. 39

Rolling Kafka Upgrades and Confluent Cloud ft. Gwen Shapira

If you operate a Kafka cluster, hopefully you upgrade your brokers occasionally. Each release of Apache Kafka® includes detailed documentation that describes a tested procedure for doing a rolling upgrade of your cluster. Couldn’t be easier, right? Well, what if you have to do it with hundreds or thousands of brokers, such as you’d have to do if you were running Confluent Cloud? Today, Gwen Shapira shares some of the lessons she’s learned doing just that. EPISODE LINKS Fully managed Apache Kafka...

Jun 25, 201943 minSeason 1Ep. 38

Deploying Confluent Platform, from Zero to Hero ft. Mitch Henderson

Mitch Henderson (Technical Account Manager, Confluent) explains how to plan and deploy your first application running on Confluent Platform. He covers critical factors to consider, like the tools and skills you should have on hand, and how to make decisions about deployment solutions. Mitch also walks you through how to go about setting up monitoring and testing, the marks of success, and what to do after your first project launches successfully.

Jun 18, 201933 minSeason 1Ep. 37

Why Kafka Connect? ft. Robin Moffatt

In this episode, Tim talks to Robin Moffatt about what Kafka Connect is and why you should almost certainly use it if you're working with Apache Kafka®️. Whether you're building database offload pipelines to Amazon S3, ingesting events from external datastores to drive your applications or exposing messages from your microservices for audit and analysis, Kafka Connect is for you. Tim and Robin cover the motivating factors for Kafka Connect, why people end up reinventing the wheel when ...

Jun 12, 201947 minSeason 1Ep. 36

Schema Registry Made Simple by Confluent Cloud ft. Magesh Nandakumar

Tim Berglund and Magesh Nandakumar (Software Engineer, Confluent) discuss why schemas matter for building systems on Apache Kafka®, and how Confluent Schema Registry helps with the problem. They talk about how Schema Registry works, how you can collaborate around schema change through `avsc` files, and what it means for this to be available in Confluent Cloud today. EPISODE LINKS Schema Registry 101 Schema Management Migrate Schemas to Confluent Cloud Schemas, Contracts, and Compatibility Fully ...

Jun 03, 201942 minSeason 1Ep. 35

Why is Stream Processing Hard? ft. Michael Drogalis

Tim Berglund and Michael Drogalis (Product Lead for Kafka Streams and KSQL, Confluent) talk about all things stream processing: why it’s complex, how it's evolved, and what’s on the horizon to make it simpler.

May 29, 201946 minSeason 1Ep. 34

Testing Kafka Streams Applications with Viktor Gamov

Tim Berglund is joined by Viktor Gamov (Developer Advocate, Confluent) to discuss various approaches to testing Kafka Streams applications. EPISODE LINKS KafkaEmbedded TopologyTestDriver Mocked Streams (Scala) Mockafka Test containers Kafka containers...

May 20, 201943 minSeason 1Ep. 33

Chris Riccomini on the History of Apache Kafka and Stream Processing

It’s a problem endemic to the tech world that we are always focused on what’s coming next, that we often forget to look at where we’ve been. Chris Riccomini, who was there at LinkedIn when Apache Kafka® was born, tells us how Kafka and the stream processing framework Samza came about, and also what he’s doing these days at WePay—building systems that use Kafka as a primary datastore. EPISODE LINKS When It Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka So, You Want to Bu...

May 16, 201951 minSeason 1Ep. 32

Ask Confluent #13: Machine Learning with Kai Waehner

Gwen and Kai chat about machine learning architectures, and whether software engineers and data scientists can learn to get along. EPISODE LINKS Blogs on deploying machine learning workloads: Machine Learning with Python, Jupyter, KSQL and TensorFlow How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka Using Apache Kafka to Drive Cutting-Edge Machine Learning KIP-392: Allow consumers to fetch from closest replica Watch the video version of this podcast...

May 08, 201933 minSeason 1Ep. 31

Diving into Exactly Once Semantics with Guozhang Wang

It has been said that in distributed messaging, there are two hard problems: 2) exactly once delivery, 1) guaranteed order of messages and 2) exactly once delivery. Apache Kafka® has offered exactly once processing since version 0.11, which allows properly configured producers and consumers to make the guarantee that each message will be processed exactly one time. In this episode, Kafka Streams engineer Guozhang Wang walks through the implementation of transactional messaging in Kafka in some d...

Apr 22, 201948 minSeason 1Ep. 29

Ask Confluent #12: In Search of the Lost Offsets

Stanislav Kozlovski joins us to discuss common pitfalls when using Kafka consumers and a new KIP that promises to make consumer restarts much smoother. EPISODE LINKS KIP-345: Static consumer membership KIP-211: Documents the current behavior of offset expiration Watch the video version of this podcast...

Apr 17, 201922 minSeason 1Ep. 28

Ben Stopford on Microservices and Event Streaming

Microservices are pretty ubiquitous these days. Really “SOA done right,” they reimagine the services pattern in the context of the world we live in today, nearly two decades since the first big service-oriented systems hit production. But what have we learned in this time? There are plenty of war stories. System designers have explored different architectural patterns—REST, events and databases of all types. In this podcast, Tim Berglund and Ben Stopford explore the event-driven paradigm and how...

Apr 08, 201958 minSeason 1Ep. 27

Magnus Edenhill on librdkafka 1.0

After several years of development, librdkafka has finally reached 1.0! It remains API compatible with older versions of the library, so you won’t need to make any changes to your application. There are, however, several important new features like the idempotent producer, sparse broker connections, support for the vaunted KIP-62 and a complete makeover for the C#/.NET client. EPISODE LINKS librdkafka v1.0.0 release notes...

Apr 03, 201947 minSeason 1Ep. 26

Ask Confluent #11: More Services, More Metrics, More Fun

Do metrics for detecting clients from old versions actually exist? Or is Gwen making features up? This and more useful advice is coming up on today's episode of Ask Confluent. EPISODE LINKS The Java property that will refresh DNS cache frequently: java.security.Security.setProperty(“networkaddress.cache.ttl” , “60"); Improvements to DNS lookups in Confluent Platform 5.1.2 (Apache Kafka 2.1.1): KAFKA-7755 KAFKA-7890 More reasons to upgrade to Confluent Platform 5.1.2 Monitoring clients ...

Mar 26, 201914 minSeason 1Ep. 25