Streaming Audio: Apache Kafka® & Real-Time Data

‌

Episodes

‌

Confluent, founded by the original creators of Apache Kafka®•developer.confluent.io

Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join our hosts as they stream through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Episodes

What Is Data Mesh, and How Does it Work? ft. Zhamak Dehghani

The data mesh architectural paradigm shift is all about moving analytical data away from a monolithic data warehouse or data lake into a distributed architecture—allowing data to be shared for analytical purposes in real time, right at the point of origin. The idea of data mesh was introduced by Zhamak Dehghani (Director of Emerging Technologies, Thoughtworks) in 2019. Here, she provides an introduction to data mesh and the fundamental problems that it’s trying to solve. Zhamak describes that th...

Sep 09, 2021•35 min•Ep 175•Transcript available on Metacast

Multi-Cluster Apache Kafka with Cluster Linking ft. Nikhil Bhatia

Note: This episode was recorded when Cluster Linking was in preview mode. It’s now generally available as part of the Confluent Q3 ‘21 release on August 17, 2021. Infrastructure needs to react in real time to support globally distributed events, such as cloud migration, IoT, edge data collection, and disaster recovery. To provide a seamless yet cloud-native, cross-cluster topic replication experience, Nikhil Bhatia (Principal Engineer I, Product Infrastructure, Confluent) and the team engineered...

Aug 31, 2021•31 min•Ep 174•Transcript available on Metacast

Using Apache Kafka and ksqlDB for Data Replication at Bolt

What does a ride-hailing app that offers micromobility and food delivery services have to do with data in motion? In this episode, Ruslan Gibaiev (Data Architect, Bolt) shares about Bolt’s road to adopting Apache Kafka® and ksqlDB for stream processing to replicate data from transactional databases to analytical warehouses. Rome wasn't built overnight, nor was the adoption of Kafka and ksqlDB at Bolt. Initially, Bolt noticed the need for system standardization and replacing the unreliable q...

Aug 26, 2021•29 min•Ep 173•Transcript available on Metacast

Placing Apache Kafka at the Heart of a Data Revolution at Saxo Bank

Monolithic applications present challenges for organizations like Saxo Bank, including difficulties when it comes to transitioning to cloud, data efficiency, and performing data management in a regulated environment. Graham Stirling, the head of data platforms at Saxo Bank and also a self-proclaimed recovering architect on the pathway to delivery, shares his experience over the last 2.5 years as Saxo Bank placed Apache Kafka® at the heart of their company—something they call a data revolution. B...

Aug 19, 2021•29 min•Ep 172•Transcript available on Metacast

Advanced Stream Processing with ksqlDB ft. Michael Drogalis

ksqlDB makes it easy to read, write, process, and transform data on Apache Kafka®, the de facto event streaming platform. With simple SQL syntax, pre-built connectors, and materialized views, ksqlDB’s powerful stream processing capabilities enable you to quickly start processing real-time data at scale. But how does ksqlDB work? In this episode, Michael Drogalis (Principal Product Manager, Product Management, Confluent) previews an all-new Confluent Developer course: Inside ksqlDB , where he pro...

Aug 11, 2021•28 min•Ep 171•Transcript available on Metacast

Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour

Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailchimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company’s largest source of streaming data. Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility int...

Aug 05, 2021•32 min•Ep 170•Transcript available on Metacast

Collecting Data with a Custom SIEM System Built on Apache Kafka and Kafka Connect ft. Vitalii Rudenskyi

The best-informed business insights that support better decision-making begin with data collection, ahead of data processing and analytics. Enterprises nowadays are engulfed by data floods, with data sources ranging from cloud services, applications, to thousands of internal servers. The massive volume of data that organizations must process presents data ingestion challenges for many large companies. In this episode, data security engineer, Vitalli Rudenskyi, discusses the decision to replace a...

Jul 27, 2021•25 min•Ep 169•Transcript available on Metacast

Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Ap...

Jul 22, 2021•29 min•Ep 168•Transcript available on Metacast

Powering Real-Time Analytics with Apache Kafka and Rockset

Using large amounts of streaming data increasingly requires interactive, real-time analytics and dashboards—and this applies to any industry, including tech. CTO and Co-Founder of Rockset Dhruba Borthakur shares how his company uses Apache Kafka® to perform complex joins, search, and aggregations on streaming data with low latencies. The Kafka database integrations allow his team to make a cloud-native analytics database that is a fundamental piece of enterprise infrastructure. Especially in e-c...

Jul 15, 2021•26 min•Ep 167•Transcript available on Metacast

Automated Event-Driven Architectures and Microservices with Apache Kafka and SmartBear

Is it possible to have automated adoption of your event-driven architectures and microservices? The answer is yes! Alianna Inzana, product leader for API testing and virtualization at SmartBear, uses this evolutionary model to make event services reusable, functional, and strategic for both in-house needs and clients. SmartBear relies on Apache Kafka® to drive its automated microservices solutions forward through scaled architecture and adaptive workflows. Although the path to adoption may be di...

Jul 08, 2021•30 min•Ep 166•Transcript available on Metacast

Data-Driven Digitalization with Apache Kafka in the Food Industry at BAADER

Coming out of university, Patrick Neff (Data Scientist, BAADER) was used to “perfect” examples of datasets. However, he soon realized that in the real world, data is often either unavailable or unstructured. This compelled him to learn more about collecting data, analyzing it in a smart and automatic way, and exploring Apache Kafka® as a core ecosystem while at BAADER, a global provider of food processing machines. After Patrick began working with Apache Kafka in 2019, he developed several micro...

Jun 29, 2021•28 min•Ep 165•Transcript available on Metacast

Chaos Engineering with Apache Kafka and Gremlin

The most secure clusters aren’t built on the hopes that they’ll never break. They are the clusters that are broken on purpose and with a specific goal. When organizations want to avoid systematic weaknesses, chaos engineering with Apache Kafka® is the route to go. Your system is only as reliable as its highest point of vulnerability. Patrick Brennan (Principal Architect) and Tammy Butow (Principal SRE) from Gremlin discuss how they do their own chaos engineering to manage and resolve high-severi...

Jun 22, 2021•36 min•Ep 164•Transcript available on Metacast

Boosting Security for Apache Kafka with Confluent Cloud Private Link ft. Dan LaMotte

Confluent Cloud isn’t just for public access anymore. As the requirement for security across sectors increases, so does the need for virtual private cloud (VPC) connections. It is becoming more common today to come across Apache Kafka® implementations with the latest private link connectivity option. In the past, most Confluent Cloud users were satisfied with public connectivity paths and VPC peering. However, enabling private links on the cloud is increasingly important for security across netw...

Jun 15, 2021•26 min•Ep 163•Transcript available on Metacast

Confluent Platform 6.2 | What’s New in This Release + Updates

Based on Apache Kafka® 2.8, Confluent Platform 6.2 introduces Health+, which offers intelligent alerting, cloud-based monitoring tools, and accelerated support so that you can get notified of potential issues before they manifest as critical problems that lead to downtime and business disruption. Health+ provides ongoing, real-time analysis of performance and cluster metadata for your Confluent Platform deployment, collecting only metadata so that you can continue managing your deployment, as yo...

Jun 10, 2021•9 min•Ep 162•Transcript available on Metacast

Adopting OpenTelemetry in Confluent and Beyond ft. Xavier Léauté

Collecting internal, operational telemetry from Confluent Cloud services and thousands of clusters is no small feat. Stakeholders need to rely on the same data to make operational decisions. Whether it be metrics from clusters in Confluent Cloud or traces from our internal service, they all provide valuable insights not only to engineering teams but also to customers for their own operations and for business reporting needs. Traditionally, this data needs to be collected in multiple ways to sati...

Jun 08, 2021•33 min•Ep 161•Transcript available on Metacast

Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra

Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud. After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithy...

May 25, 2021•39 min•Ep 160•Transcript available on Metacast

Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland

When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results. Despite best efforts, managing systems from a d...

May 20, 2021•42 min•Ep 159•Transcript available on Metacast

The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe

Jason Gustafson and Colin McCabe, Apache Kafka® developers, discuss the project to remove ZooKeeper—now known as the KRaft (Kafka on Raft) project. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share their progress. The KRraft code has been merged (and continues to be merged) in phases. Both developers talk about the foundational Kafka Improvement Proposals (KIPs), such as KIP-595: a Raft protocol for Kaf...

May 13, 2021•32 min•Ep 158•Transcript available on Metacast

Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner

What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka®? The deployment of Kafka outside the datacenter creates many new possibilities for processing data in motion and building new business cases. In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for so...

May 04, 2021•27 min•Ep 157•Transcript available on Metacast

Data Management and Digital Transformation with Apache Kafka at Van Oord

Imagine if you could create a better world for future generations simply by delivering marine ingenuity. Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure. Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becomi...

Apr 29, 2021•28 min•Ep 156•Transcript available on Metacast

Powering Microservices Using Apache Kafka on Node.js with KafkaJS at Klarna ft. Tommy Brunn

At Klarna, Lead Engineer Tommy Brunn is building a runtime platform for developers. But outside of his professional role, he is also one of the authors of the JavaScript client for Apache Kafka® called KafkaJS, which has grown from being a niche open source project to the most downloaded Kafka client for Node.js since 2018. Using Kafka in Node.js has previously meant relying on community-contributed bindings to librdkafka, which required you to spend more of your time debugging failed builds tha...

Apr 22, 2021•31 min•Ep 155•Transcript available on Metacast

Apache Kafka 2.8 - ZooKeeper Removal Update (KIP-500) and Overview of Latest Features

Apache Kafka 2.8 is out! This release includes early access to the long-anticipated ZooKeeper removal encapsulated in KIP-500, as well as other key updates, including the addition of a Describe Cluster API, support for mutual TLS authentication on SASL_SSL listeners, exposed task configurations in the Kafka Connect REST API, the removal of a properties argument for the TopologyTestDriver, the introduction of a Kafka Streams specific uncaught exception handler, improved handling of window size in...

Apr 19, 2021•11 min•Ep 154•Transcript available on Metacast

Connecting Azure Cosmos DB with Apache Kafka - Better Together ft. Ryan CrawCour

When building solutions for customers in Microsoft Azure, it is not uncommon to come across customers who are deeply entrenched in the Apache Kafka® ecosystem and want to continue expanding within it. Thus, figuring out how to connect Azure first-party services to this ecosystem is of the utmost importance. Ryan CrawCour is a Microsoft engineer who has been working on all things data and analytics for the past 10+ years, including building out services like Azure Cosmos DB, which is used by mill...

Apr 14, 2021•32 min•Ep 153•Transcript available on Metacast

Automated Cluster Operations in the Cloud ft. Rashmi Prabhu

If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these c...

Apr 12, 2021•25 min•Ep 152•Transcript available on Metacast

Resurrecting In-Sync Replicas with Automatic Observer Promotion ft. Anna McDonald

As most developers and architects know, data always needs to be accessible no matter what happens outside of the system. This week, Tim Berglund virtually sits down with Anna McDonald (Principal Customer Success Technical Architect, Confluent) to discuss how Automatic Observer Promotion (AOP) can help solve the Apache Kafka® 2.5 datacenter dilemma as a feature now available in Confluent Platform 6.1 and above. Many industries must have a backup plan not only to do the right thing by the data tha...

Apr 07, 2021•25 min•Ep 151•Transcript available on Metacast

Building Real-Time Data Pipelines with Microsoft Azure, Databricks, and Confluent

Processing data in real time is a process , as some might say. Angela Chu (Solution Architect, Databricks) and Caio Moreno (Senior Cloud Solution Architect, Microsoft) explain how to integrate Azure, Databricks, and Confluent to build real-time data pipelines that enable you to ingest data, perform analytics, and extract insights from data at hand. They share about where to start within the Apache Kafka® ecosystem and how to maximize the tools and components that it offers using fully managed se...

Mar 31, 2021•31 min•Ep 150•Transcript available on Metacast

Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman

Availability in Kafka Streams is hard, especially in the face of any changes. Any change to topic metadata or group membership triggers a rebalance. But Kafka Streams struggles even after this stop-the-world rebalance has finished. According to Apache Kafka® Committer and Confluent Software Engineer Sophie Blee-Goldman, this is because a Streams app will generally have some state associated with a given partition, and to move this state from one consumer instance to another requires rebuilding t...

Mar 24, 2021•51 min•Ep 149•Transcript available on Metacast

Event-Driven Architecture - Common Mistakes and Valuable Lessons ft. Simon Aubury

Event-driven architecture has taken on numerous meanings over the years—from event notification to event-carried state transfer, to event sourcing, and CQRS. Why has event-driven programming become so popular, and why is it such a topic of interest? For the first time, Simon Aubury (Principal Data Engineer, ThoughtWorks) joins Tim Berglund on the Streaming Audio podcast to tell all, including his own experiences adopting event-driven technologies and common blunders when working in this area. Si...

Mar 17, 2021•43 min•Ep 148•Transcript available on Metacast

The Human Side of Apache Kafka and Microservices ft. SPOUD

Many industries depend on real-time data, requiring a range of solutions that Apache Kafka® can help solve. Samuel Benz (CTO) and Patrick Bönzli (Product Owner) explain how their company, SPOUD, has fully embraced Kafka for data delivery, which has proven to be successful for SPOUD since 2016 across various industries and use cases. The four Kafka use cases that Sam and Patrick see most often are microservices, event processing, event sourcing/the data lake, and integration architecture. But imp...

Mar 08, 2021•45 min•Ep 147•Transcript available on Metacast

Gamified Fitness at Synthesis Software Technologies Using Apache Kafka and IoT

Synthesis Software Technologies, a Confluent partner, is migrating an existing behavioral IoT framework into Kafka to streamline and normalize vendor information. The legacy messaging technology that they currently use has altered the behavioral IoT data space, and now Apache Kafka® will allow them to take that to the next level. New ways of normalizing the data will allow for increased efficiency for vendors, users, and manufacturers. It will also enable the scaling IoT technology going forward...

Mar 03, 2021•34 min•Ep 146•Transcript available on Metacast