Streaming Audio: Apache Kafka® & Real-Time Data - podcast cover

Streaming Audio: Apache Kafka® & Real-Time Data

Confluent, founded by the original creators of Apache Kafka®developer.confluent.io

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Episodes

Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates

Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates. Kafka Core: KIP-833 provides an updated timeline for KRaft. KIP-866 now is preview and allows migration from an existing ZooKeeper cluster to KRaft mode. KIP-900 introduces a way to bootstrap the KRaft controllers with SCRAM credentials. KIP-903 prevents a data loss scenario by preventing replicas with stale broker epochs fro...

Jun 15, 202311 minEp 265Transcript available on Metacast

A Special Announcement from Streaming Audio

After recording 64 episodes and featuring 58 amazing guests, the Streaming Audio podcast series has amassed over 130,000 plays on YouTube in the last year. We're extremely proud of these achievements and feel that it's time to take a well-deserved break. Streaming Audio will be taking a vacation! We want to express our gratitude to you, our valued listeners, for spending 10,000 hours with us on this incredible journey. Rest assured, we will be back with more episodes! In the meantime, ...

Apr 13, 20231 minEp 264Transcript available on Metacast

How to use Data Contracts for Long-Term Schema Management

Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka® data with a data contract (schema). A data contract is an agreement between a ser...

Mar 21, 202357 minEp 263Transcript available on Metacast

How to use Python with Apache Kafka

Can you use Apache Kafka® and Python together? What’s the current state of Python support? And what are the best options to get started? In this episode, Dave Klein joins Kris to talk about all things Kafka and Python: the libraries, the tools, and the pros & cons. He also talks about the new course he just launched to support Python programmers entering the event-streaming world. Dave has been an active member of the Kafka community for many years and noticed that there were a lot of Kafka ...

Mar 14, 202332 minEp 262Transcript available on Metacast

Next-Gen Data Modeling, Integrity, and Governance with YODA

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale. Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encou...

Mar 07, 202356 minEp 261Transcript available on Metacast

Migrate Your Kafka Cluster with Minimal Downtime

Migrating Apache Kafka® clusters can be challenging, especially when moving large amounts of data while minimizing downtime. Michael Dunn (Solutions Architect, Confluent) has worked in the data space for many years, designing and managing systems to support high-volume applications. He has helped many organizations strategize, design, and implement successful Kafka cluster migrations between different environments. In this episode, Michael shares some tips about Kafka cluster migration with Kris...

Mar 01, 20231 hr 2 minEp 260Transcript available on Metacast

Real-Time Data Transformation and Analytics with dbt Labs

dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests. It’s possible for an analyst to adapt this process for use with a microservice application using...

Feb 22, 202344 minEp 259Transcript available on Metacast

What is the Future of Streaming Data?

What’s the next big thing in the future of streaming data? In this episode, Greg DeMichillie (VP of Product and Solutions Marketing, Confluent) talks to Kris about the future of stream processing in environments where the value of data lies in their ability to intercept and interpret data. Greg explains that organizations typically focus on the infrastructure containers themselves, and not on the thousands of data connections that form within. When they finally realize that they don't have ...

Feb 15, 202341 minEp 258Transcript available on Metacast

What can Apache Kafka Developers learn from Online Gaming?

What can online gaming teach us about making large-scale event management more collaborative in real-time? Ben Gamble (Developer Relations Manager, Aiven) has come to the world of real-time event streaming from an usual source: the video games industry. And if you stop to think about it, modern online games are complex, distributed real-time data systems with decades of innovative techniques to teach us. In this episode, Ben talks with Kris about integrating gaming concepts with Apache Kafka®. U...

Feb 08, 202356 minEp 257Transcript available on Metacast

Apache Kafka 3.4 - New Features & Improvements

Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect. In Kafka Core: KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. KIP-830 includes a new configuration setting that allows you to disabl...

Feb 07, 20235 minEp 256Transcript available on Metacast

How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems

How can you use OpenTelemetry to gain insight into your Apache Kafka® event systems? Roman Kolesnev, Staff Customer Innovation Engineer at Confluent, is a member of the Customer Solutions & Innovation Division Labs team working to build business-critical OpenTelemetry applications so companies can see what’s happening inside their data pipelines. In this episode, Roman joins Kris to discuss tracing and monitoring in distributed systems using OpenTelemetry. He talks about how monitoring each ...

Feb 01, 202350 minEp 255Transcript available on Metacast

What is Data Democratization and Why is it Important?

Data democratization allows everyone in an organization to have access to the data they need, and the necessary tools needed to use this data effectively. In short, data democratization enables better business decisions. In this episode, Rama Ryali, a Senior IT and Data Executive, chats with Kris Jenkins about the importance of data democratization in modern systems. Rama explains that tech has unprecedented control over data and ignores basic business needs. Tech’s influence has largely gone un...

Jan 26, 202347 minEp 254Transcript available on Metacast

Git for Data: Managing Data like Code with lakeFS

Is it possible to manage and test data like code? lakeFS is an open-source data version control tool that transforms object storage into Git-like repositories, offering teams a way to use the same workflows for code and data. In this episode, Kris sits down with guest Adi Polak, VP of DevX at Treeverse, to discuss how lakeFS can be used to facilitate better management and testing of data. At its core, lakeFS provides teams with better data management. A theoretical data engineer on a large team ...

Jan 19, 202331 minEp 253Transcript available on Metacast

Using Kafka-Leader-Election to Improve Scalability and Performance

How does leader election work in Apache Kafka®? For the past 2 ½ years, Adithya Chandra, Staff Software Engineer at Confluent, has been working on Kafka scalability and performance, specifically partition leader election. In this episode, he gives Kris Jenkins a deep dive into the power of leader election in Kafka replication, why we need it, how it works, what can go wrong, and how it's being improved. Adithya explains that you can configure a certain number of replicas to be distributed a...

Jan 12, 202351 minEp 252Transcript available on Metacast

Real-Time Machine Learning and Smarter AI with Data Streaming

Are bad customer experiences really just data integration problems? Can real-time data streaming and machine learning be democratized in order to deliver a better customer experience? Airy, an open-source data-streaming platform, uses Apache Kafka® to help business teams deliver better results to their customers. In this episode, Airy CEO and co-founder Steffen Hoellinger explains how his company is expanding the reach of stream-processing tools and ideas beyond the world of programmers. Airy or...

Jan 05, 202339 minEp 251Transcript available on Metacast

The Present and Future of Stream Processing

The past year saw new trends emerge in the world of data streaming technologies, as well as some unexpected and novel use cases for Apache Kafka®. New reflections on the future of stream processing and when companies should adopt microservice architecture inspired several talks at this year’s industry conferences. In this episode, Kris is joined by his colleagues Danica Fine, Senior Developer Advocate, and Robin Moffatt, Principal Developer Advocate, for an end-of-year roundtable on this year’s ...

Dec 28, 202231 minEp 250Transcript available on Metacast

Top 6 Worst Apache Kafka JIRA Bugs

Entomophiliac, Anna McDonald (Principal Customer Success Technical Architect, Confluent) has seen her fair share of Apache Kafka® bugs. For her annual holiday roundup of the most noteworthy Kafka bugs, Anna tells Kris Jenkins about some of the scariest, most surprising, and most enlightening corner cases that make you ask, “Ah, so that’s how it really works?” She shares a lot of interesting details about how batching works, the replication protocol, how Kafka’s networking stack dances with Linux...

Dec 21, 20221 hr 11 minEp 249Transcript available on Metacast

Learn How Stream-Processing Works The Simplest Way Possible

Could you explain Apache Kafka® in ways that a small child could understand? When Mitch Seymour, author of Mastering Kafka Streams and ksqlDB , wanted a way to communicate the basics of Kafka and event-based stream processing, he decided to author a children’s book on the subject, but it turned into something with a far broader appeal. Mitch conceived the idea while writing a traditional manuscript for engineers and technicians interested in building stream processing applications. He wished he ...

Dec 20, 202231 minEp 248Transcript available on Metacast

Building and Designing Events and Event Streams with Apache Kafka

What are the key factors to consider when developing event-driven architecture? When properly designed, events can connect existing systems with a common language and allow data exchange in near real time. They also help reduce complexity by providing a single source of truth that eliminates the need to synchronize data between different services or applications. They enable dynamic behavior, allowing each service or application to respond quickly to changes in its environment. Using events, dev...

Dec 15, 202253 minEp 247Transcript available on Metacast

Rethinking Apache Kafka Security and Account Management

Is there a better way to manage access to resources without compromising security? New employees need access to a variety of resources within a company's tech stack. But manually granting access can be error-prone. And when employees leave, their access must be revoked, thus potentially introducing security risks if an admin misses one. In this podcast, Kris Jenkins talks to Anuj Sawani (Security Product Manager, Confluent) about the centralized identity management system he helped build to...

Dec 08, 202241 minEp 246Transcript available on Metacast

Real-time Threat Detection Using Machine Learning and Apache Kafka

Can we use machine learning to detect security threats in real-time? As organizations increasingly rely on distributed systems, it is becoming more important to analyze the traffic that passes through those systems quickly. Confluent Hackathon ’22 finalist, Géraud Dugé de Bernonville (Data Consultant, Zenika Bordeaux), shares how his team used TensorFlow (machine learning) and Neo4j (graph database) to analyze and detect network traffic data in real-time. What started as a research and developme...

Nov 29, 202229 minEp 245Transcript available on Metacast

Improving Apache Kafka Scalability and Elasticity with Tiered Storage

What happens when you need to store more than a few petabytes of data? Rittika Adhikari (Software Engineer, Confluent) discusses how her team implemented tiered storage, a method for improving the scalability and elasticity of data storage in Apache Kafka®. She also explores the motivating factors for building it in the first place: cost, performance, and manageability. Before Tiered Storage, there was no real way to retain Kafka data indefinitely. Because of the tight coupling between compute a...

Nov 22, 202230 minEp 244Transcript available on Metacast

Decoupling with Event-Driven Architecture

In principle, data mesh architecture should liberate teams to build their systems and gather data in a distributed way, without having to explicitly coordinate. Data is the thing that can and should decouple teams, but proper implementation has its challenges. In this episode, Kris talks to Florian Albrecht (Solution Architect, Hermes Germany) about Galapagos, an open-source DevOps software tool for Apache Kafka® that Albrecht created with his team at Hermes, a German parcel delivery company. Af...

Nov 15, 202239 minEp 243Transcript available on Metacast

If Streaming Is the Answer, Why Are We Still Doing Batch?

Is real-time data streaming the future, or will batch processing always be with us? Interest in streaming data architecture is booming, but just as many teams are still happily batching away. Batch processing is still simpler to implement than stream processing, and successfully moving from batch to streaming requires a significant change to a team’s habits and processes, as well as a meaningful upfront investment. Some are even running dbt in micro batches to simulate an effect similar to strea...

Nov 09, 202244 minEp 242Transcript available on Metacast

Security for Real-Time Data Stream Processing with Confluent Cloud

Streaming real-time data at scale and processing it efficiently is critical to cybersecurity organizations like SecurityScorecard. Jared Smith, Senior Director of Threat Intelligence, and Brandon Brown, Senior Staff Software Engineer, Data Platform at SecurityScorecard, discuss their journey from using RabbitMQ to open-source Apache Kafka® for stream processing. As well as why turning to fully-managed Kafka on Confluent Cloud is the right choice for building real-time data pipelines at scale. Se...

Nov 03, 202249 minEp 241Transcript available on Metacast

Running Apache Kafka in Production

What are some recommendations to consider when running Apache Kafka® in production? Jun Rao, one of the original Kafka creators, as well as an ongoing committer and PMC member, shares the essential wisdom he's gained from developing Kafka and dealing with a large number of Kafka use cases. Here are 6 recommendations for maximizing Kafka in production: 1. Nail Down the Operational Part When setting up your cluster, in addition to dealing with the usual architectural issues, make sure to also...

Oct 27, 202259 minEp 240Transcript available on Metacast

Build a Real Time AI Data Platform with Apache Kafka

Is it possible to build a real-time data platform without using stateful stream processing? Forecasty.ai is an artificial intelligence platform for forecasting commodity prices, imparting insights into the future valuations of raw materials for users. Nearly all AI models are batch-trained once, but precious commodities are linked to ever-fluctuating global financial markets, which require real-time insights. In this episode, Ralph Debusmann (CTO, Forecasty.ai) shares their journey of migrating ...

Oct 20, 202237 minEp 239Transcript available on Metacast

Optimizing Apache JVMs for Apache Kafka

Java Virtual Machines (JVMs) impact Apache Kafka® performance in production. How can you optimize your event-streaming architectures so they process more Kafka messages using the same number of JVMs? Gil Tene (CTO and Co-Founder, Azul) delves into JVM internals and how developers and architects can use Java and optimized JVMs to make real-time data pipelines more performant and more cost effective, with use cases. Gil has deep roots in Java optimization, having started out building large data ce...

Oct 13, 20221 hr 12 minEp 238Transcript available on Metacast

Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates

Apache Kafka® 3.3 is released! With over two years of development, KIP-833 marks KRaft as production ready for new AK 3.3 clusters only. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares highlights of this release, with KIPs from Kafka Core, Kafka Streams, and Kafka Connect. To reduce request overhead and simplify client-side code, KIP-709 extends the OffsetFetch API requests to accept multiple consumer group IDs. This update has three changes, including...

Oct 03, 20227 minEp 237Transcript available on Metacast

Application Data Streaming with Apache Kafka and Swim

How do you set data applications in motion by running stateful business logic on streaming data? Capturing key stream processing events and cumulative statistics that necessitate real-time data assessment, migration, and visualization remains as a gap—for event-driven systems and stream processing frameworks according to Fred Patton (Developer Evangelist, Swim Inc.) In this episode, Fred explains streaming applications and how it contrasts with stream processing applications. Fred and Kris also ...

Oct 03, 202239 minEp 236Transcript available on Metacast