Tim Berglund sits down with Colin McCabe and Jason Gustafson to talk about KIP-500. The pair, who work on the Kafka Core Engineering Team, discuss the history of Kafka, the creation of KIP-500, and what it will do for the community as a whole. They break down ZooKeeper's role in Kafka, the implications of removing ZooKeeper dependency, replacing it with a self-managed metadata quorum, and how they've been combatting security, stability, and compatibility issues. With pending improvemen...
Sep 18, 2019•44 min•Ep 55•Transcript available on Metacast When it comes to deploying applications at scale without needing to integrate different pieces of infrastructure yourself, the answer nowadays is increasingly Kubernetes. Kubernetes provides all the building blocks that are needed, and a lot of thought is required to truly create an enterprise-grade Apache Kafka® platform that can be used in production. But before running Kafka on Kubernetes, there are some factors to consider. What are the maturing stages of Kubernetes adoption? How did Datadog...
Sep 16, 2019•30 min•Ep 54•Transcript available on Metacast As Confluent turns five years old, special guest Jay Kreps (Co-founder and CEO, Confluent) brings us back to his early development days of coding Apache Kafka® over a Christmas holiday while working at LinkedIn. Kafka has become a breakthrough open source distributed streaming platform based on an abstraction of the distributed commit log, and his involvement in the project eventually led him to start Confluent with Jun Rao and Neha Narkhede. In this episode, Jay shares about all the highs and l...
Sep 12, 2019•48 min•Ep 53•Transcript available on Metacast What’s a graph? How does Cypher work? In today's episode of Streaming Audio, Tim Berglund sits down with Michael Hunger (Lead of Neo4j Labs) and David Allen (Partner Solution Architect, Neo4j) to discuss Neo4j basics and get the scoop on major features introduced in Neo4j 3.4 and 3.5. Among these are geospatial and temporal types, but there’s also more to come in 4.0: a multi-database feature, fine-grained security, and reactive drivers/Spring Data Neo4j RX. In addition to sharing a little ...
Sep 09, 2019•54 min•Ep 52•Transcript available on Metacast Gwen Shapira (Core Kafka Software Engineer, Confluent) sits down to answer the questions you've had about event streaming, Apache Kafka®, Confluent, and everything in between. This includes creating tables in nested JSON topics, how to balance ordering, latency and reliability, building event-based systems, and how to navigate the tricky endOffsets API. She talks about the hardships of fencing Zombie requests, some of the talks given at previous Kafka Summits, and an important question from...
Sep 04, 2019•22 min•Ep 51•Transcript available on Metacast In today’s episode of Streaming Audio, Tim Berglund sits down with Senior Applications Developer of Mobile Solutions Ramesh Sringeri to discuss Apache Kafka®—specifically two Kafka use cases that Children’s Healthcare of Atlanta is working on. First, they discuss achieving near-real-time streams of data to support meaningful intracranial pressure prediction and managing intracranial pressure (ICP) in a timely manner to help the care team achieve better outcomes with traumatic brain injuries. Chi...
Aug 28, 2019•53 min•Ep 50•Transcript available on Metacast Sink and source connectors are important for getting data in and out of Apache Kafka®. Tim Berglund invites Hans-Peter Grahsl (Technical Trainer and Software Engineer, Netconomy Software & Consulting GmbH) to share about his involvement in the Apache Kafka project, spanning from several conference contributions all the way to his open source community sink connector for MongoDB, now part of the official MongoDB Kafka connector code base. Join us in this episode to learn what it’s like to be ...
Aug 21, 2019•50 min•Ep 49•Transcript available on Metacast Streaming Audio welcomes Stéphane Maarek (CEO, Datacumulus) on the podcast to discuss how he got started hosting online Apache Kafka® tutorials and teaching on Udemy, the challenges he faces as an instructor, his approach to answering hard questions, and the projects he is currently working on. EPISODE LINKS KSQL Training for Hands-On Learning Join the Confluent Community Slack...
Aug 19, 2019•42 min•Ep 48•Transcript available on Metacast Whenever you see an Apache Cassandra™ in the wild, you probably also see an Apache Kafka®️. In this episode, Tim Berglund (Senior Director of Developer Experience, Confluent) and Jeff Carpenter (Director of Developer Advocacy, DataStax) discuss the best way to get those systems talking using the DataStax Apache Kafka Connector and build a real-time data pipeline. EPISODE LINKS About the DataStax Apache Kafka Connector DataStax Academy: DataStax Apache Kafka Connector Course Join the Confluent Co...
Aug 12, 2019•48 min•Ep 47•Transcript available on Metacast The General Data Protection Regulation (GDPR) has challenged many enterprises to rethink how they deal with customer data. Viktor Gamov chats with David Jacot about a unique approach to inter-broker traffic encryption that he implemented for his customer’s sidecar pattern use case. EPISODE LINKS Learn about Istio Learn about Envoy Learn about Linkerd Handling GDPR with Apache Kafka®: How to Comply Without Freaking Out? Join the Confluent Community Slack...
Aug 08, 2019•17 min•Ep 46•Transcript available on Metacast A quick summary of the most important features in Confluent Platform 5.3. We discuss improved Kubernetes and Ansible support, improvements to Confluent Control Center that give you better insight into the data in your cluster, and an important new set of security features—Role-Based Access Control—aimed at making complex deployments more secure. EPISODE LINKS Read the docs Read the blog Watch the video version of this podcast (featuring an actual stream) Download Confluent Platform 5.3 Join us i...
Jul 31, 2019•13 min•Ep 45•Transcript available on Metacast Zenreach is a company that makes tools to help retailers use digital marketing more effectively. If that sounds like a problem that only marketing people would be interested in, that’s because you don’t know what they do! There are all kinds of fascinating technology problems to solve by utilizing event streaming platforms to process data at volume. Rishi Dhanaraj, our guest today, worked at Zenreach as an intern, and took on a big pile of Python batch jobs, turning them into some really interes...
Jul 29, 2019•31 min•Ep 44•Transcript available on Metacast Is Apache Kafka® actually a database? Can you install Confluent Control Center on Google Cloud Platform (GCP)? All this, plus some tips from Dan Norwood, the first user of Kafka Streams. EPISODE LINKS Control Center Docker image Control Center Docker configuration Complete Streams example Watch the video version of this podcast Join us in Confluent Community Slack...
Jul 22, 2019•24 min•Ep 43•Transcript available on Metacast Author Dylan Scott tells all about his upcoming Manning title Kafka in Action , which shares how Apache Kafka® can be used by beginners who are just starting out their own projects and dispels common Hadoop-related myths, as Kafka has grown to become a powerful event streaming platform beyond big data ecosystems alone. To get 40% off Manning products, use the following code: podcon19 EPISODE LINKS Join us in Confluent Community Slack...
Jul 15, 2019•38 min•Ep 42•Transcript available on Metacast Friends don’t let friends do dual writes! Gunnar Morling (Software Engineer, Red Hat) joins us on the podcast to share a little bit about what Debezium is, how it works, and which databases it supports. In addition to covering the various use cases and benefits from change data capture (CDC) in the context of microservices—touching on the outbox pattern in particular, Gunnar walks us through the advantages of log-based CDC as implemented through Debezium over polling-based approaches, why you’d ...
Jul 10, 2019•49 min•Ep 41•Transcript available on Metacast Ever wonder what it’s like to be a distributed systems engineer at Confluent? Core Kafka Engineer Jason Gustafson dives into the challenges of working on distributed systems, particularly when it comes to a unique system like Apache Kafka®. He also discusses ways in which Confluent is working with the community to solve active problems and what it takes to be a distributed systems engineer. As always, Confluent is looking for engineers who are interested in distributed systems, and you don’t hav...
Jul 02, 2019•46 min•Ep 40•Transcript available on Metacast Tim Berglund (Senior Director of Developer Experience, Confluent) explains what’s new in Apache Kafka® 2.3 and highlights some of the most important Kafka Improvement Proposals (KIPs). EPISODE LINKS Read the blog Watch the video version of this podcast
Jun 25, 2019•14 min•Ep 39•Transcript available on Metacast If you operate a Kafka cluster, hopefully you upgrade your brokers occasionally. Each release of Apache Kafka® includes detailed documentation that describes a tested procedure for doing a rolling upgrade of your cluster. Couldn’t be easier, right? Well, what if you have to do it with hundreds or thousands of brokers, such as you’d have to do if you were running Confluent Cloud? Today, Gwen Shapira shares some of the lessons she’s learned doing just that. EPISODE LINKS Fully managed Apache Kafka...
Jun 25, 2019•43 min•Ep 38•Transcript available on Metacast Mitch Henderson (Technical Account Manager, Confluent) explains how to plan and deploy your first application running on Confluent Platform. He covers critical factors to consider, like the tools and skills you should have on hand, and how to make decisions about deployment solutions. Mitch also walks you through how to go about setting up monitoring and testing, the marks of success, and what to do after your first project launches successfully.
Jun 18, 2019•33 min•Ep 37•Transcript available on Metacast In this episode, Tim talks to Robin Moffatt about what Kafka Connect is and why you should almost certainly use it if you're working with Apache Kafka®️. Whether you're building database offload pipelines to Amazon S3, ingesting events from external datastores to drive your applications or exposing messages from your microservices for audit and analysis, Kafka Connect is for you. Tim and Robin cover the motivating factors for Kafka Connect, why people end up reinventing the wheel when ...
Jun 12, 2019•47 min•Ep 36•Transcript available on Metacast Tim Berglund and Magesh Nandakumar (Software Engineer, Confluent) discuss why schemas matter for building systems on Apache Kafka®, and how Confluent Schema Registry helps with the problem. They talk about how Schema Registry works, how you can collaborate around schema change through `avsc` files, and what it means for this to be available in Confluent Cloud today. EPISODE LINKS Schema Registry 101 Schema Management Migrate Schemas to Confluent Cloud Schemas, Contracts, and Compatibility Fully ...
Jun 03, 2019•42 min•Ep 35•Transcript available on Metacast Tim Berglund and Michael Drogalis (Product Lead for Kafka Streams and KSQL, Confluent) talk about all things stream processing: why it’s complex, how it's evolved, and what’s on the horizon to make it simpler.
May 29, 2019•46 min•Ep 34•Transcript available on Metacast Tim Berglund is joined by Viktor Gamov (Developer Advocate, Confluent) to discuss various approaches to testing Kafka Streams applications. EPISODE LINKS KafkaEmbedded TopologyTestDriver Mocked Streams (Scala) Mockafka Test containers Kafka containers...
May 20, 2019•43 min•Ep 33•Transcript available on Metacast It’s a problem endemic to the tech world that we are always focused on what’s coming next, that we often forget to look at where we’ve been. Chris Riccomini, who was there at LinkedIn when Apache Kafka® was born, tells us how Kafka and the stream processing framework Samza came about, and also what he’s doing these days at WePay—building systems that use Kafka as a primary datastore. EPISODE LINKS When It Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka So, You Want to Bu...
May 16, 2019•51 min•Ep 32•Transcript available on Metacast Gwen and Kai chat about machine learning architectures, and whether software engineers and data scientists can learn to get along. EPISODE LINKS Blogs on deploying machine learning workloads: Machine Learning with Python, Jupyter, KSQL and TensorFlow How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka Using Apache Kafka to Drive Cutting-Edge Machine Learning KIP-392: Allow consumers to fetch from closest replica Watch the video version of this podcast...
May 08, 2019•33 min•Ep 31•Transcript available on Metacast It has been said that in distributed messaging, there are two hard problems: 2) exactly once delivery, 1) guaranteed order of messages and 2) exactly once delivery. Apache Kafka® has offered exactly once processing since version 0.11, which allows properly configured producers and consumers to make the guarantee that each message will be processed exactly one time. In this episode, Kafka Streams engineer Guozhang Wang walks through the implementation of transactional messaging in Kafka in some d...
Apr 22, 2019•48 min•Ep 29•Transcript available on Metacast Stanislav Kozlovski joins us to discuss common pitfalls when using Kafka consumers and a new KIP that promises to make consumer restarts much smoother. EPISODE LINKS KIP-345: Static consumer membership KIP-211: Documents the current behavior of offset expiration Watch the video version of this podcast...
Apr 17, 2019•22 min•Ep 28•Transcript available on Metacast Microservices are pretty ubiquitous these days. Really “SOA done right,” they reimagine the services pattern in the context of the world we live in today, nearly two decades since the first big service-oriented systems hit production. But what have we learned in this time? There are plenty of war stories. System designers have explored different architectural patterns—REST, events and databases of all types. In this podcast, Tim Berglund and Ben Stopford explore the event-driven paradigm and how...
Apr 08, 2019•58 min•Ep 27•Transcript available on Metacast After several years of development, librdkafka has finally reached 1.0! It remains API compatible with older versions of the library, so you won’t need to make any changes to your application. There are, however, several important new features like the idempotent producer, sparse broker connections, support for the vaunted KIP-62 and a complete makeover for the C#/.NET client. EPISODE LINKS librdkafka v1.0.0 release notes...
Apr 03, 2019•47 min•Ep 26•Transcript available on Metacast Do metrics for detecting clients from old versions actually exist? Or is Gwen making features up? This and more useful advice is coming up on today's episode of Ask Confluent. EPISODE LINKS The Java property that will refresh DNS cache frequently: java.security.Security.setProperty(“networkaddress.cache.ttl” , “60"); Improvements to DNS lookups in Confluent Platform 5.1.2 (Apache Kafka 2.1.1): KAFKA-7755 KAFKA-7890 More reasons to upgrade to Confluent Platform 5.1.2 Monitoring clients ...
Mar 26, 2019•14 min•Ep 25•Transcript available on Metacast