Streaming Audio: Apache Kafka® & Real-Time Data

‌

Episodes

‌

Confluent, founded by the original creators of Apache Kafka®•developer.confluent.io

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Episodes

Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole

Data availability, usability, integrity, and security are words that we sometimes hear a lot. But what do they actually look like when put into practice? That’s where data governance comes in. This becomes especially tricky when working with real-time data architectures. Tushar Thole (Senior Manager, Engineering, Trust & Security, Confluent) focuses on delivering features for software-defined storage, software-defined networking (SD-WAN), security, and cloud-native domains. In this episode, ...

Mar 22, 2022•43 min•Ep 205•Transcript available on Metacast

Handling 2 Million Apache Kafka Messages Per Second at Honeycomb

How many messages can Apache Kafka® process per second? At Honeycomb, it's easily over one million messages. In this episode, get a taste of how Honeycomb uses Kafka on massive scale. Liz Fong-Jones (Principal Developer Advocate, Honeycomb) explains how Honeycomb manages Kafka-based telemetry ingestion pipelines and scales Kafka clusters. And what is Honeycomb? Honeycomb is an observability platform that helps you visualize, analyze, and improve cloud application quality and performance. Th...

Mar 15, 2022•42 min•Ep 204•Transcript available on Metacast

Why Data Mesh? ft. Ben Stopford

With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh. Unlike standard data architecture, data mesh is about moving data away from a monolithic data warehouse into distributed data systems. Doing so will allow data ...

Mar 10, 2022•45 min•Ep 203•Transcript available on Metacast

Serverless Stream Processing with Apache Kafka ft. Bill Bejeck

What is serverless? Having worked as a software engineer for over 15 years and as a regular contributor to Kafka Streams, Bill Bejeck (Integration Architect, Confluent) is an Apache Kafka® committer and author of “Kafka Streams in Action.” In today’s episode, he explains what serverless and the architectural concepts behind it are. To clarify, serverless doesn’t mean you can run an application without a server—there are still servers in the architecture, but they are abstracted away from your ap...

Mar 03, 2022•42 min•Ep 201•Transcript available on Metacast

The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps

When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload. Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the ...

Feb 24, 2022•47 min•Ep 201•Transcript available on Metacast

What’s Next for the Streaming Audio Podcast ft. Kris Jenkins

Meet your new host of the Streaming Audio podcast: Kris Jenkins (Senior Developer Advocate, Confluent)! In this preview, Kris shares a few highlights from forthcoming episodes to look forward to, spanning topics from data mesh, cloud-native technologies, and serverless Apache Kafka®, to data modeling. As a developer advocate, Kris is endlessly fascinated about software design, functional programming, real-time systems, and electronic music. He is a veteran software developer and engineer, with a...

Feb 16, 2022•3 min•Ep 200•Transcript available on Metacast

On to the Next Chapter ft. Tim Berglund

After nearly 200 podcast episodes of Streaming Audio, Tim Berglund bids farewell in his last episode as host of the show. Tim reflects on the many great memories with guests who have appeared on the segment—and each for its own reasons. He has covered a wide variety of topics, ranging from Apache Kafka® fundamentals, microservices, event stream processing, use cases, to cloud-native Kafka, data mesh, and more. As Tim mentions, the Streaming Audio podcast will continue on to explore all things ab...

Feb 03, 2022•7 min•Ep 199•Transcript available on Metacast

Intro to Event Sourcing with Apache Kafka ft. Anna McDonald

What is event sourcing and how does it work? Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern. Anna is passionate about event-driven architectures and event patterns. She’s a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage wit...

Feb 01, 2022•30 min•Ep 198•Transcript available on Metacast

Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela

In an effort to make Apache Kafka® cloud native, Anna Povzener (Principal Engineer, Confluent) and Anastasia Vela (Software Engineer I, Confluent) have been working to expand multi-tenancy to cloud-native systems with automated capacity planning and scaling in Confluent Cloud. They explain how cloud-native data systems are different from legacy databases and share the technical requirements needed to create multi-tenancy for managed Kafka as a service. As a distributed system, Kafka is designed ...

Jan 27, 2022•31 min•Ep 197•Transcript available on Metacast

Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs

Apache Kafka® 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won’t want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics. KAFKA-13439 deprecates the eager protocol, which has been the default since Kafka 2.4—it’s advised to upgrade your applications to the cooperative protocol as the eager p...

Jan 24, 2022•5 min•Ep 196•Transcript available on Metacast

Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra

Maximizing cloud Apache Kafka® performance isn’t just about running data processes on cloud instances. There is a lot of engineering work required to set and maintain a high-performance standard for speed and availability. Alok Nikhil (Senior Software Engineer, Confluent) and Adithya Chandra (Staff Software Engineer II, Confluent) share about their efforts on how to optimize Kafka on Confluent Cloud and the three guiding principles that they follow whether you are self-managing Kafka or working ...

Jan 20, 2022•31 min•Ep 195•Transcript available on Metacast

From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine

Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation. By moving away f...

Jan 13, 2022•30 min•Ep 194•Transcript available on Metacast

Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik

Getting data from a database management system (DBMS) into Apache Kafka® in real time is a subject of ongoing innovation. John Neal (Principal Solution Architect, Qlik) and Adam Mayer (Senior Technical Producer Marketing Manager, Qlik) explain how leveraging change data capture (CDC) for data ingestion into Kafka enables real-time data-driven insights. It can be challenging to ingest data in real time. It is even more challenging when you have multiple data sources, including both traditional da...

Jan 06, 2022•35 min•Ep 193•Transcript available on Metacast

Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris

It’s been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka w...

Dec 28, 2021•35 min•Ep 192•Transcript available on Metacast

Running Hundreds of Stream Processing Applications with Apache Kafka at Wise

What’s it like building a stream processing platform with around 300 stateful stream processing applications based on Kafka Streams? Levani Kokhreidze (Principal Engineer, Wise) shares his experience building such a platform that the business depends on for multi-currency movements across the globe. He explains how his team uses Kafka Streams for real-time money transfers at Wise, a fintech organization that facilitates international currency transfers for 11 million customers. Getting to this p...

Dec 21, 2021•31 min•Ep 191•Transcript available on Metacast

Lessons Learned From Designing Serverless Apache Kafka ft. Prachetaa Raghavan

You might call building and operating Apache Kafka® as a cloud-native data service synonymous with a serverless experience. Prachetaa Raghavan (Staff Software Developer I, Confluent) spends his days focused on this very thing. In this podcast, he shares his learnings from implementing a serverless architecture on Confluent Cloud using Kubernetes Operator. Serverless is a cloud execution model that abstracts away server management, letting you run code on a pay-per-use basis without infrastructur...

Dec 14, 2021•28 min•Ep 190•Transcript available on Metacast

Using Apache Kafka as Cloud-Native Data System ft. Gwen Shapira

What does cloud native mean, and what are some design considerations when implementing cloud-native data services? Gwen Shapira (Apache Kafka® Committer and Principal Engineer II, Confluent) addresses these questions in today’s episode. She shares her learnings by discussing a series of technical papers published by her team, which explains what they’ve done to expand Kafka’s cloud-native capabilities on Confluent Cloud. Gwen leads the Cloud-Native Kafka team, which focuses on developing new fea...

Dec 07, 2021•34 min•Ep 189•Transcript available on Metacast

ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury

What is ksqlDB and how does Simon Aubury (Principal Data Engineer, Thoughtworks) use it to track down the plane that wakes his cat Snowy in the morning? Experienced in building real-time applications with ksqlDB since its genesis, Simon provides an introduction to ksqlDB by sharing some of his projects and use cases. ksqlDB is a database purpose-built for stream processing applications and lets you build real-time data streaming applications with SQL syntax. ksqlDB reduces the complexity of havi...

Dec 01, 2021•31 min•Ep 188•Transcript available on Metacast

Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger

Many of us find ourselves in the position of equipping others to use Apache Kafka® after we’ve gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream proce...

Nov 23, 2021•29 min•Ep 187•Transcript available on Metacast

Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell

If you ever wondered what exactly dead letter queues (DLQs) are and how to use them, Jason Bell (Senior DataOps Engineer, Digitalis) has an answer for you. Dead letter queues are a feature of Kafka Connect that acts as the destination for failed messages due to errors like improper message deserialization and improper message formatting. Lots of Jason’s work is around Kafka Connect and the Kafka Streams API, and in this episode, he explains the fundamentals of dead letter queues, how to use them...

Nov 16, 2021•38 min•Ep 186•Transcript available on Metacast

Confluent Platform 7.0: New Features + Updates

Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments t...

Nov 09, 2021•12 min•Ep 185•Transcript available on Metacast

Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck

Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic’s message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Conflu...

Nov 04, 2021•36 min•Ep 184•Transcript available on Metacast

Automating Infrastructure as Code with Apache Kafka and Confluent ft. Rosemary Wang

Managing infrastructure as code (IaC) instead of using manual processes makes it easy to scale systems and minimize errors. Rosemary Wang (Developer Advocate, HashiCorp, and author of “Essential Infrastructure as Code: Patterns and Practices”) is an infrastructure engineer at heart and an aspiring software developer who is passionate about teaching patterns for infrastructure as code to simplify processes for system admins and software engineers familiar with Python, provisioning tools like Terr...

Oct 26, 2021•30 min•Ep 183•Transcript available on Metacast

Getting Started with Spring for Apache Kafka ft. Viktor Gamov

What’s the distinction between the Spring Framework and Spring Boot? If you are building a car, the Spring Framework is the engine while Spring Boot gives you the vehicle that you ride in. With experience teaching and answering questions on how to use Spring and Apache Kafka® together, Viktor Gamov (Principal Developer Advocate, Kong) designed a free course on Confluent Developer and previews it in this episode. Not only this, but he also explains why the opinionated Spring Framework would be a ...

Oct 19, 2021•33 min•Ep 182•Transcript available on Metacast

Powering Event-Driven Architectures on Microsoft Azure with Confluent

When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solution...

Oct 14, 2021•39 min•Ep 181•Transcript available on Metacast

Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes

Autonomy is key in building a sustainable and motivated team, and this core principle also applies to DevOps. Building self-serve Apache Kafka® and Confluent Platform deployments require a streamlined process with unrestricted tools—a centralized processing tool that allows teams in large or mid-sized organizations to automate infrastructure changes while ensuring shared standards are met. With more than 15 years of engineering and technology consulting experience, Pere Urbón-Bayes (Senior Solut...

Oct 07, 2021•26 min•Ep 180•Transcript available on Metacast

Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt

Kafka Connect is a streaming integration framework between Apache Kafka® and external systems, such as databases and cloud services. With expertise in ksqlDB and Kafka Connect, Robin Moffatt (Staff Developer Advocate, Confluent) helps and supports the developer community in understanding Kafka and its ecosystem. Recently, Robin authored a Kafka Connect 101 course that will help you understand the basic concepts of Kafka Connect, its key features, and how it works. What’s Kafka Connect, and how d...

Sep 28, 2021•31 min•Ep 179•Transcript available on Metacast

Designing a Cluster Rollout Management System for Apache Kafka ft. Twesha Modi

As one of the top coders of her Java coding class in high school, Twesha Modi is continuing to follow her passion for computer science as a senior at Cornell University, where she has proven to be one of the top programmers. During Twesha's summer internship at Confluent, she contributed to designing a new service to automate Apache Kafka® cluster rollout management—a process that releases the latest Kafka versions to customer’s clusters in Confluent Cloud. During Twesha’s internship, she w...

Sep 23, 2021•30 min•Ep 178•Transcript available on Metacast

Apache Kafka 3.0 - Improving KRaft and an Overview of New Features

Apache Kafka® 3.0 is out! To spotlight major enhancements in this release, Tim Berglund (Apache Kafka Developer Advocate) provides a summary of what’s new in the Kafka 3.0 release from Krakow, Poland, including API changes and improvements to the early-access Kafka Raft (KRaft). KRaft is a built-in Kafka consensus mechanism that’s replacing Apache ZooKeeper going forward. It is recommended to try out new KRaft features in a development environment, as KRaft is not advised for production yet. One...

Sep 21, 2021•15 min•Ep 177•Transcript available on Metacast

How to Build a Strong Developer Community with Global Engagement ft. Robin Moffatt and Ale Murray

A developer community brings people with shared interests and purpose together. The fundamental elements of a community are to gather, learn, support, and create opportunities for collaboration. A developer community is also an effective and efficient instrument for exploring and solving problems together. The power of a community is its endless advantages, from knowledge sharing to support, interesting discussions, and much more. Tim Berglund invites Ale Murray (Global Community Manager, Conflu...

Sep 14, 2021•35 min•Ep 176•Transcript available on Metacast