Welcome to Streaming Audio, a podcast about Kafka, Confluent and the cloud. I'm Tim Berglund. I'm joined by my cohost Gwen Shapira, and today we're going to take listener questions. Now you might ask how in the first episode of a podcast, can you have listener questions? Well, we figured out a way. We're going to take some questions that we've gotten on Twitter. Mostly the people have asked Gwen, and I'm going to tee them up. She's going to knock them down.
We're gonna talk about some deployment topics and we'll get to some stream processing in KSQL things as well. Glad you're here. Let's get into it.
OK. Welcome. I'm Gwen Shapira.
I'm Tim Berglund.
And this is Ask Confluent, where we give real answers to pressing questions from Twitter, YouTube and elsewhere. Two weeks back we announced the Confluent Operator for Kubernetes, and boy does this generate a lot of questions. Yeah, it was kind of crazy, right? It kind of reminded me of the time when we announced exactly once and the entire world went, what, is that even possible?
They announced it was not possible. So same, same thing, yeah.
Yes, yes, exactly. So here we are again, people looking at Confluent Operator for Kubernetes and saying, is that even a thing?
Yeah.
So let's look at some questions from Twitter and see what they say.
Let's see what we have here? So, uh, John Hug, your friend and mine. So he's responding to a tweet of Jay's. Jay said a lot of people question whether you can run stateful systems like Kafka and Kubernetes. The answer is that you absolutely can, but there is really significant infrastructure needed to do this well just as there is for running on bare metal. That's Jay Kreps' tweet. John says, next question. Replace can with should. A lot of people question whether you can run stateful systems.
John is asking whether you should, you should. Yeah. What do you think?
Yeah. So obviously it kind of depends on what we're doing and we're not really here to sell anyone on Kubernetes specifically, but a lot of people are running a lot of stuff in Kubernetes. All their microservices, a lot of their stateless systems, a lot of stuff on Kubernetes already. And it makes sense since you have all this infrastructure, why not use the same infrastructure that you already set up to also run Kafka. And Confluent is not just about Kafka, right?
We also have Confluent connectors and Kafka Connect workers and Kafka Streams and KSQL. So it's really about running a big ecosystem on cluster management systems that is built to run big ecosystems,
Right. Like you're already there, you've got the investment, you're doing Kubernetes. Go all in. There's kind of a move for people to just go all in.
In the sense they want to. We want to make it easy on them. So it's up to them to decide if they showed it, but we want to help the people who decided they should. We want to help them out.
You don't have to, right, like you could have data infrastructure that's outside of Kubernetes and all of your microservices in Kubernetes or whatever. But that's not what a lot of people want. They want to. So yeah. Uh, should you, I don't know if we say should, but want to. We want to make it easy. Alright. What's next?
Um, so as a follow up...John Hug says it, he doesn't really know many people who went stateful and are happy with it? Yeah. And it's interesting because I think even I didn't find the exact tweet but prominent Kubernetes advocates say they're not very sure on whether stateful sets are still a good idea.
And that's what Kelsey Hightower, he's, he's famous for that.
You'd think he knows what he's talking about.
He knows what he's talking about. Yeah. So the, just the discussion's happening.
Yes. But then again, you talked to the other half of the world and they are running stateful sets quite happily and the thing, that exists. It's supported by not even just one company. And Kafka is not the only state for set out there. It seems to work. He talked, you know, he talks about my machine containers. So that means something too.
All right, what's next here? What do we got? A Stefan, Stefan, a Dejocio. A Stefan Dejocio, let's go with that. Uh, Stefan asks, is it very useful to put Kafka clusters on Kubenretes? Kind of the same question. Is this not just some overhead to get technical perfection in quotes? That is, everything in the same place (in parentheses). Kafka is inherently stateful, Stefan says, and needs good resources and monitoring, etc. Why bother?
That's a very good question. Why do more work if we don't have to? So, and someone actually asked me this morning a similar question, and he basically asked, do you see smaller companies running Kafka on Kubernetes, or is it larger companies? Because if it's smaller companies, it's kind of suspicious. Maybe people being cool just to be cool, and if it's like a huge bank doing it, obviously it's a different question.
And it's funny because most of the people that I've seen use Kafka on Kubernetes are actually the big players because they have the scale. If you have three microservices, you don't need Kubernetes, and you're right. It is pure overhead, technical perfection kind of game. But if you have 50,000 microservices and you want to treat Kafka as just another service, it's this big cluster with the monitoring and everything that you want to manage it as one set of infrastructure, then it's not overhead.
Then you start seeing the benefit because you already put in this investment for the first 99,000 services that went in there.
And you amortize that over many containers in many, many services and many infrastructure components to our cloud team.
Like they run on Kubernetes. I mean they don't have that much spare time. I don't think they will do it if it was overhead. I think they are seeing real professional benefits from running everything on Kubernetes, and everything includes Kafka.
Alright. Alright. Let's see what's next. Okay. Let's see. Um, respond. You were responding to Stefan's tweet. You said it's good to be mindful and picking up new technologies. Kubernetes is most useful when you think of Kafka and these stream processing slash event-driven microservices as a single ecosystem. Using Kubernetes to deploy the entire platform is actually simpler than other options. You kind of just said that. You said it on Twitter. You said it here.
I stay consistent.
Yeah. And that's good. So it's nice when you know you're not even thinking of the things you said before. You say it again. So anyway, Derek Troy West replies to you saying we orchestrate Kafka Streams with Kubernetes but weren't confident we could get the stateful broker bits right on a short project. Next time perhaps. What do you think?
Can I say yay to everyone running Kafka Streams on Kubernetes like the way it should be done because it also allows them to get a lot of the benefits of scaling in and scaling out. But it's also funny because I think of Kafka Streams. Is it stifled?
Exactly. That's what I thought when I saw this tweet. It's, I mean it doesn't have to be, but like almost certainly that's a stateful workload.
Definitely. So we actually had someone on our internal Slack ask about it, and basically we think it's easier to run Kafka Streams as a stateful service because then you get the benefits of RocksDB being managed as your state versus having to recover it every single time. So you could actually go both ways, and I can see someone preferring to start out stateless and move to state. But I still get the benefits from actually treating Kafka Streams as stateful because it has a stake in it.
Yeah. So you do stateful on Kubernetes. I mean, you're like the old palm oil commercial said you're, you're soaking it in, you know, you're already, you're already doing it. All right. Let's see. And then, wow, this big long conversation you said that it is. And then um, Michael Gash says for me it's about scaling the engineering org and forcing human operations to be codified and hardened, parentheses, increased robustness by breaking it.
I guess he means by breaking things in the system to make it, to make it antifragile or whatever, reducing the human error factor, which should not be underestimated. Certainly not my case. What do you think?
First of all, I love bringing the human element into it because at the end of the day, operations engineers—we talk about the technology—but at the end of the day, most of it is about the people running it.
The human element will bring itself into the future.
Exactly. Absolutely. Yes. And then the other thing, I love the beat where it says making increase robustness by codifying a lot of the operations, because this is really the big value add of operator your tech staff that was maybe inside the person's head and you're trying him the code that can now be tested, validated, source, controlled, evolved, all those good bits.
And if you think about it, until recently, if you had to configure monitoring or Jenkins or something, someone would go into UI and just use point and click to do it, and if something broke or someone's checked engine, there'd be no source control, there would be no way to, oh no, we lost our monitoring, let's just do two clicks and redeploy everything. I do have to painstakingly create the dashboards and now everything has API.
So there is absolutely zero excuse not to have everything very repeatable, and Kubernetes is kind of, is a kind of a forcing function to do that.
Yeah. Yeah. It's like the only way to get things done. Now they have to be coded. So there you go. Force. It makes you better. Yeah, it does. It does. Okay. Craig Hooper asks, he says, "I'm constantly like, uh, we're just waiting for this new feature that's in 1.8. Should be out in the next month or so," and then he says, "Wait it's 1.10 already?"
Kubernetes just moves so fast. It's kind of like, you know, while I was kind of (I'm still in the process), but we were writing this big reference architecture for Kubernetes. It'll be out in. Just wait a few seconds. Soon. Yes, and one of the things is that you start reading a lot of resources because you have those questions, right? How do I do X? And you started reading blogs and documentation and it's so easy to.
If you read documentation of two versions ago, hopelessly and completely out of date, you'll have to pay very close attention because this thing just goes. So I'm guessing I, I follow very closely so I may not feel it, but I think after maybe kind of the same way.
Probably feels that way from the outside.
I've, I've heard that from people like they want to keep, keep skipping, gutting, running by and who knows what's going on. It's, oh, it's version 1.1 already. Oh my God. I'm only running 0.9 in production. So basically responding to the entire thread and summarizing it very nicely.
Yeah, she said agree. There aren't many who have gone stateful on Kubernetes successfully. Kafka is no different. We learn to take significant expertise in both systems—that's Kafka and Kubernetes—to do it right, but that's surely, uh, but they're surely a lot of interest in deploying Kafka on Kubernetes if someone built the right thing, she says, smile. Should have been a winky smile, I think because yeah, obviously we...
And that's kind of summarizes things up. There is a lot of interest. Yes. It's not trivial. There was a lot of difficult problems there, but we're here to help.
Exactly. All right. Onto some YouTube comments. Uh, let's see. This was a comment on reading Kafka data from Casey. Cool. That's a KSQL tutorial video. Uh, Laura asks, is there a way to include all the property slash columns in a topic into a stream without having to list out individually? Now I'll take this one because I, that was me talking in that video, and it was recorded here.
And you are the KSQL expert.
I'm one of them at least, recorded here in this very studio, um, and the examples all showed, JSON. So if you're, if you happen to know that video and refer back to it, um, all of the topic data using JSON format. And so when creating a stream or a table, we have to be explicit about the metadata. You have to provide field names and types for the columns that you want to be a part of your stream that you're creating, uh, and go please go watch the video.
If you, if this is nonsense to you, it's not too hard to, to pick up, but if it's JSON data, that JSON data has fields, uh, but there's no typing to it. So in KSQL, you have to say, well, the thing called movie ID is a LONG and the thing called release here is an INT or whatever. If you don't like that, who could blame you, then you can just use Avro and the Confluent Schema Registry. And so if that topic is an Avro topic, then you don't have to do any of that metadata.
KSQL just pulls it out of the Schema Registry for you. Um, and there's a pretty straightforward way to convert JSON to Avro in KSQL, which I can't really read the KSQL. Doesn't read well on camera if you're just speaking it, but go watch the video. Watch the video. Uh, I suspect there is a url that you can, you can, you can type in or click on and you can go see that video.
So, wait, I wanted to use this excuse to redirect the questions I was asked yesterday. We realize that you are the perfect person to answer that. One of my customers that I talked to yesterday, said, you know, you keep recommending Avro and Schema Registry. And there are a lot of benefits to that and we get it, but all the examples are in JSON. Why are all the examples in JSON and not Avro?
JSON because it's easier. Um, so the demos that we make are, my team makes, the idea is, you know, clone this repo, here's this stuff and ingest this into a topic. Um, it's easy to do that, like here's a file, a text file that has JSON in it, and you can dump that into a topic and boom, there you are.
Um, and so for the text readability, yeah, the cloneable package of here's your stuff and, and type this and this and this, and you get a cool result and you feel good about yourself, which is, you know, the goal of those demos for you to feel good about yourself and you also, you also to learn. But yeah, that's easier. Um, but it's again, pretty straightforward if you have a JSON topic to create a stream whose value format is Avro and then you, KSQL does the conversion for you.
A good idea for a tutorial video. Thank you. All right. Um, that, um, YouTube username is a combination of characters. That's not a name, so I can't read it. Um, but can someone help with this please? Is there a video that explains why each microservice should have its own data store, parentheses, what Mr. Fowler talks about at the nine minute mark? So this is going to be a question on Martin Fowler's keynote from the Kafka Summit London 2018.
So if you're watching this hundreds of hundreds of years from now, find that in the archive. So, you know, what we're talking about.
You should see it. It's like one of the more enlightening keynotes I've seen and he goes through a real-world example on microservices, and it's really amazing. And the DJ was pretty good too. Yeah, yeah. The, the person who introduced him.
Oh yeah, yeah, no, he was great. The MC, he was solid. You know, you'd have to contact his representation if you want to book him for your event, but no serious. I'm glad. I didn't know you liked, it was good. It's good to hear. I loved it. So anyway, yeah, it was a super great keynote. Watch it, check it out.
Uh, what this is getting at is that at a certain point, Martin is talking about, uh Martin's co-presenter they're talking about microservices and Kafka as a messaging sub street for the microservices to exchange events and data. And, uh, he made the point that, uh, the mutable data store, like an indexable, we'll just call it a relational database. You don't have all these microservices sharing a database because you die that way, right? That's, you don't want that.
They communicate through immutable distributed logs, which is Kafka and the database now. They don't get away from needing a database, you know, Kafka is not indexable. You still need some indexable formatted data store and that can be a relational database and that gets sucked up into the service. Now. Um, they're also kind of some domain-driven design reasons that we won't be labor here for that. But those have to do with the bounded context.
If, if you know, each services, ideally what the DDD folks call a bounded context. There's like a particular business thing that, that service is doing. And if, if it shares a database with other services that context leaks. And so you don't, you don't have that conceptual isolation between services. So you still need a database, you just can't share a database between services. So pull that sucker up into the service. It's fine. Maybe it's not even a relational database, right?
Like maybe you need a graph database or you need elastic or who knows what you need. That lives inside the service.
Exactly. So there are DevOps reasons in addition to the domain-driven design reasons there are DevOps reasons to have the engineering team that owns the service, owns the entire vertical, so they own the infrastructure. They need too slightly different data structure. If they need to make changes, you know, maybe a new index.
They control their own fate, own their own timelines versus going to central database teams that owns the whole thing, and you have to, know, go and beg for an index, and they'll only have time to do it in the next quarter. And then your entire project to delay. This is not very agile, right? This is not what agile microservices are all about, which is all about letting small teams own their fate and kind of move fast.
So even though I'm a huge fan of big, complex relational databases that own the entire organization, kind of where you come from, yes. You have to admit that this is not conductive to small, fast, agile moving organization.
Fair point. All right. Very good point. So there you go. Hope that's helpful. Absolutely key point. And that's, you know, it's a big part of the story about this, of the broader value of Kafka.
That's basically how Kafka came to be.
Yeah, really is. Okay. Uh, Metcus says, "I wonder if the example," and this is in KSQL streams and tables video, another YouTube question here. "I wonder if the example stream which is Alice giving $100 to Bob can be translated into the corresponding table directly via KSQL, given the fact that the transaction info is embedded in the $100 to Bob value, got some, got some here, $100 to Bob value, or if some in-the-middle application is implied to perform such translation."
So that may not have been clear in the video. And if so I apologize because I was the one speaking there. KSQL is the one doing that translation. So if there's a stream of, of events and these events are things like, you know, Bob is giving $100 to Alice, it is precisely KSQL that's turning that into a table, or specifically it's Kafka Streams machinery underneath KSQL. But um, at the, at this level of abstraction, you can think of it as KSQL. So could there be, could that happen directly?
Yes, in fact, it's so possible that it's actual. So that's actually how it works. So sorry if I wasn't clear on that.
So possible that this is actual.
I love it. Okay. Uh, this is a comment on Neha's Kafka Summit London 2018 keynote. So watch it, please, like I can't make you, that's probably good because I would.
But this is very convincing. He says it's a super duper talk. So it's not us saying this. The guys on YouTube, which are clearly more reliable.
Speak. Yes. YouTube comments, definitely more reliable than the two of us. And absolutely nailing the American dialect for our international audiences. "Super duper," you're killing it there. So whatever you're doing, keep doing it. "I'm amazed at the clarity of old world issues and how it can be addressed for the new world focused on stream processing. I liked the whole talk and the KSQL promise. Apart from community, is there some enterprise grade support?" Yes.
We happen to be, you know, we call it Ask Confluent because Tim and I both happened to work for a company called Confluent. We represent and Confluent happened to provide enterprise grade support for not just Apache Kafka, but very nice ecosystem around it, including a lot of the things you heard. And a lot of the old world to new world digital transformations is exactly what our team specializes in, so we'd love to talk to you.
So yeah, that's the thing. We're mostly about just answering questions here, but this is where you'll get the questions answered for free. If you're interested in support. Honestly, it's super easy. You just go to confluent.io and this little thing will pop up down in the corner. There'll be this friendly face that'll say something like, can I answer any questions for you? Just click on the face anywhere on the face. It doesn't matter and the rest will be history.
All right, so is that selling we had for today, we answered a bunch of questions from Twitter, a bunch of questions from YouTube. Thank you so much for watching us on our very first Ask Confluent show. We hope this was really helpful and if you have more questions about Apache Kafka, about KSQL, about enterprise grade support, who we are, looking forward to get your questions, and we hope to see you next time.
Thanks for having me on the show. Bye! There you have it. I hope that was helpful to you. If you've got questions you can ask Gwen at @gwenshap on Twitter. You can ask me at tlberglund, or you can leave a comment on any of our YouTube videos. We will pick these things up and your question might be featured on the next episode of Streaming Audio. And feel free to subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold.
If you subscribe through iTunes, be sure to leave us a comment that helps other people discover the podcast, helps us get the word out, and we appreciate your support. See you next time.