¶ episode and guest intro
Hello, everyone, and welcome to Open Observability Talks. I'm your host, Dotan Horvitz, and here at Open Observability Talks, we talk about anything DevOps, observability, and open source, so may the open source be with you. And today we have a special episode. In a recent episode, we covered the rise of ClickHouse as an analytic database store. We discussed its design, choices.
actually enables it very to be very outstanding in its performance as you might have heard and we cover the use cases across multiple use cases from finance e-commerce marketing and of course observability and my guest robert shared back then how prominent the observability use case is among clickhouse deployment so now clickhouse is actually doubling down on observability
Hot news, earlier this year, Clickhouse Inc., the company behind the project, acquired HyperDX, a startup that implements observability on top of Clickhouse. And essentially, they... built, it's based on OpenTelemetry and obviously ClickHouse itself, the database for a full observability stack named click stack and offering more complete open observability solution so open observability talks that's a perfect stage and i invited mike she the co-founder of hyper dx and co-creator
Clickhouse, Clickstack, sorry, to tell us all about this new project. Mike is head of observability at Clickhouse and he brings prior observability experience with Elasticsearch and more. So it's definitely going to be fascinating. discussion hey mike thanks for joining me hello hello so great to be here really excited yeah where are you joining us from yeah uh i'm in my home down here in mountain view california so right in the bay area
Sounds good. So really exciting story. I gave the audience the gist of it, but really interesting to hear why you even started HyperDX and maybe why you chose Clickhouse for your startup. Yeah, absolutely. Oh, that's a really fun one. So why do we start HyperDX? So a fun fact, this actually wasn't our initial party idea. Like many other startups we were working on.
something tangential also in developer tools also kind of observability as well actually um and we were facing this issue where every time we deploy our software for a new customer, we run into issues with technical issues that we had to solve for them to start, you know, paying us. And this happened over and over again.
and we were using all sorts of observability tools. I come from an observability tool background, so I know a lot of the vendors that exist in the space. We're using a ton of them, and we couldn't get the insight we needed. It still took so long to get to the root cause of any specific issue.
And at some point, we just started building it for ourselves. HyperX was initially kind of just internally just for us. We wanted to build something that worked really well for ourselves. And then over time... We kept building. We showed it to a few other folks in the YC Committee, Y Combinator, which was Accelerator we're a part of. They were really excited by the idea as well. It helped them solve problems a lot faster.
And then we kind of kept going down and developing. Eventually we did our open source launch and then a lot of people I think were excited about the idea and the concept behind it. And to talk about the elastic search part versus ClickHouse maybe. So come from a product that was elastic based in the past. And then when, you know, building HyperDX had a choice of like, what's the backend storage?
It's a really important question for any observability product. And ClickHouse just seemed like it had all the right knobs for a builder to tune. to build the perfect like kind of whatever perfect means to you as a really builder right you can choose between your compression you could choose kind of the storage layout um uh you know tuning all different indices how you know storage versus compute trade-offs and all sorts of really great tools
And the underlying technology was just really well suited for observability because it was built for clickstream data, right? Observability data almost effectively. So that's kind of why I went down the path and kind of how we even built the product to begin with. Interesting.
¶ taking the open source path as an entrepreneur
You mentioned that it's from the get-go. Obviously, once you decided, you shifted, you pivoted and decided this is your main business, you decided from the get-go to make it open source, right? This is part of your strategy. Curious as a... founder as an entrepreneur why you chose the open source path for this yeah absolutely i mean you know from when i think one of the joys of building your company is that you get to stamp kind of your own opinions on how things are done
And me personally, I've always been a huge fan of open source software, right? I think as a developer, you kind of have to be when you get started developing the tools, everything you use is open source. And in fact, even, you know, we... we chose a license, MIT license, which is maybe even a stronger open source than some of the other ones. Not by a lot, but definitely a nice kind of open source license. And it's largely because, you know,
me and my co-founder Warren, we have such strong roots in using open source software so successfully and everything I love to use is open source as well. That's one of the personal reasons. As for kind of maybe our product reasons as well, open source allows
users to use their product without having to talk to a salesperson right like i'm not a professional salesperson i come from a technical background a product background um as much as i love talking to users i'm not the professional salesperson but open source kind of lets people use a product you know see if it works for them uh and if there's a world where like hey i want
you know, part of this product, but maybe commercially supported in some way. It's a really easy conversation. I don't have to go and, you know, sell them on kind of why they should be using, you know, ClickSack or HyperDX or whatnot. They've already used it, loved it.
um and we just show that our product is the best because open source and then you know hopefully that transitions into a commercial relationship in the future if it works out for them so i thought that was a really awesome fit for our skills and backgrounds interests as well and
worked out nicely with the acquisition afterwards because the company that you acquired by Clickhouse Inc obviously is roots also very centrally within centered within open source so the synergy on that regard was was very evident but i'm just curious before we move on to the technicalities because i have this conversation day in and day out with uh with young entrepreneurs about whether going down the open source path or not with your
you know thought process cycles ideation maybe even talking to other entrepreneurs in yc and whatnot what would be like your key considerations because obviously there is no clear cut each one needs to do his own math but what would be your like three top ask yourself these questions to figure out what's right for you yeah so i i'm assuming it's like you know you think about hey should i open source or not right um and i would say for sure there's
I think there's definitely a lot of nuance to this, right? I think open source works extremely well for a specific type of product, which tends to be like a technical product, right? For example...
if your buyer is completely non-technical, it might not make a huge difference to them whether it's open source or not, because they don't get to take the advantage of contributing back or self-hosting it. It's a lot harder for them to take advantage of the open source nature of your product at the end of the day.
So I think that's one big one is like, hey, is your audience largely technical or not? It doesn't have to be engineers, but folks who are comfortable with the idea of self-hosting, right? That's a huge advantage as far as the idea of maybe being able to contribute or run the software on their own. That's a big one. I think the other one as well really lies around.
are you also kind of like comfortable with the idea are you also comfortable idea of having competition right to the other extent as well when you create something as open source the biggest fear of course is that some can take your code and then run it right i think that's happened a bit in the industry uh unfortunately as well um and are you comfortable with the idea as a founder i think for us is like hey like we believe in our ability to execute
as a product and be able to, you know, even if it's hard fork somewhere else that will continue to build a product that does really well in that regards. And, you know, having that self-belief. And I think the third thing is what I mentioned earlier is, Hey, like, do you think that.
um you want to be selling to these people in a more kind of traditional sales manner right open source is a whole new different way of having uh people try your software uh and you know that introduces a whole different commercial aspect to it as your startup
And if you think you're maybe, maybe if you think you're great at sales, maybe you don't need open source, you don't want to do open source, right? You're probably not comfortable with it as well. But if, you know, maybe you come from technical background or product background. You want your product to be the thing that kind of sells the user. I think open source is a great, great channel for that. Yeah.
yeah it's a sort of a product-led growth that obviously exists also outside of open source but definitely open source lands itself very well to uh to plg to product-led growth so it's a it's a very very good Point to think what your skill sets are and where you want to grow them. So yeah, it's a good tip. So yeah, go ahead. I'm sorry. I was going to say like one thing I actually totally forgot. That's I think super important as well.
is that open source, another benefit of open source is kind of the community as well. A big part of what makes us successful is we get all this really great feedback and even contributions.
from folks who have different use cases that we could not possibly be building all of those um in-house uh and i think that's again another one where it's like hey you have a really complex product with a lot of use cases you need to satisfy again open source helps because there is a community which you have to invest into of course but they will kind of you know help you back as well and i think that's a really great part of building in the open
yeah no it's for me that's like the number one because you really get the force multiplier on the innovation around the product that you can never do with your headcount even if you're much much bigger than a startup it's still when you get the community like i don't know think about Prometheus or also Clickhouse, I think, is today very, very mature in terms of that. So you get this diversity and innovation that comes from directions that you wouldn't even expect. So that's amazing. Exactly.
So let's go back to what you've done. So you talked about the origins of HyperDX. And just to summarize, essentially, you took the backend based on...
¶ the HyperDX observability user experience
started sharing about the decisions of using clickhouse as the the foundation the back end and you created essentially the the front end experience the visualization experience right Yeah, I think that's a great, like maybe simplification of the product for sure. Yeah.
so no actually i want you to guide us oh yeah it has a lot of aspects from searching to sharding to alerting to whatnot so give us this this experience that you were trying to see and also actually mapping out the or or starting detailing these gaps that you couldn't actually find
you said, out there with the existing toolset. Yeah, absolutely. So I think for us, it's all about how do we... um go from a problem an engineer is facing to the the clues that they need to find that root cause right at the end of the day and that's really kind of the journey that we think about the most from the beginning i think this is actually really different from
a lot of the other tools that we've used in the past, because they all think about, I think, you know, a bit about, hey, like it's a logging product or hey, like it's a tracing product or hey, it's a metrics product. It's really signal kind of like centric. versus for us um you know it's very intentional that you don't see a a logging part of the app like there's no like logs and metrics and and epm or traces kind of part of the application and that's really intentional because
we think that, hey, like, we want you to think about, hey, are you searching for something, right? So we have like a search experience. You go in, you could type, and you could actually, you know, easily search across logs and traces in the same, you know, simple syntax at the end of the day.
Once you find one of those data points, it'll be automatically via the power of hotel, of course, right? If you have your logs correlated traces, you'll have this wonderful waterfall that includes both your traces and your logs embedded in the same UI. And then it's also...
uh correlatable with you know your session replay as well right so going from usually when you're building a web-based application you will have a user they'll be you know have some user identifier they might talk about hey i'm running into some issue you can help me out It's really easy to go find in that specific user identifier, view what they were doing to repro the steps, be able to find like, oh, that was that specific API call, that 500. Click into that API call, that 500.
go into the backend and see the backend traces and the logs associated with it. And then be able to say, ah, that's the part of the application that failed for this specific user. I can now go back and solve for that. And that's really kind of been the kernel of the problem that we've been solving.
And of course, over time, we're building more and more things like, hey, how do we connect to the infrastructure? How to make sure the pod is healthy, the pod hasn't been rapidly starting or running into some sort of memory pressure or whatnot. And we're kind of solving more and more problems over time. But that's kind of been like the kernel of the experience we built from day one. So maybe just to make sure that you understand. So we talked about a unified experience across.
different signals and also you mentioned already things that are more let's say client side from the client side interaction starting with a session replay is a very good example of observability on the client side all the way to the back end and so on so If you try to detail that, what elements of this observability...
maybe other signals that you incorporate, for example, profiles or others. Give us the full gist of what goes into this unified observability. Yeah, no, that's a really good question. I think it's, you know, it is still at the end of the day, like the signals will be, you know, is RUM a signal? I'm not too sure. Like client-side tracing and logging. What do you call that, RUM? You know, you have your sessions. So that's kind of like.
uh dom snapshots and deltas right as well uh you have your traditional uh you know back-end logs traditional spans your traditional metrics as well uh and you know do you count exceptions as their own signal uh or they really just span events you know i think there's a lot of admittedly i think there's a lot of blur between these um i understand kind of from one perspective why they have different names but on the other hand they're also kind of like hey they're all
events. I think, you know, definitely subscribe to that train of thought as well. Yeah, which I guess also lends itself well to the nature of the underlying storage, which is ClickHouse, obviously, because it's a columnar data store in essence. So, you know, wide event.
is something that uh it's by the way something that i've been discussing here on the show uh for already in the past we've heard that also from honeycomb at the charity majors here with me on the episode concluding 2024 and one of the things obviously that we talked about is is this this topic as well so
It is something that I see keep on coming up in discussions. It has obviously its benefits of being able to... uniformly put all the types of signals in one I guess wide event you know in a sense obviously it has also it's it's a it's drawbacks but I definitely see where it's coming from. And there is also the tieback to mapping that to the backend storage, but I'm curious.
¶ challenges in implementing observability directly on ClickHouse
hyperdx brought this this value add and we're now leading up to click stack which is uh essentially the the new stack for clickhouse combined with hyperdx and and open telemetry and so on but let me ask a step before that because People have been using ClickHouse for observability for quite some time now. We talked about it in the previous episode that I mentioned already with some names that I had on my show here, actually.
A gentleman from Shopify is Elijah McPherson, he's the engineering director of real-time storage systems who shared how Shopify rebuilt their observability from the ground up.
uh in-house they actually tore out vendors and built it uh homegrown and one of the pieces of this very very pretty stack that they brought up is uh click house so the question is um and then i'll share by the way the link uh with the uh with the listeners on the show notes so you can you should you should listen to the fascinating episode uh so
shopify took this click house and build their own observability why can't every organization essentially do that why why would not build directly on top of click house yeah that's an excellent question so um You know, there's a lot of teams that definitely do that, right? In fact, it was, I think, a week or two ago, we were at the Netflix office and they were talking about their, you know, logging platform built on top of ClickHouse as well, which is...
always fantastic to see about. I think the challenge there is that it's certainly a lot of engineering effort to turn a database into an observability platform. certainly you know click house has made it a lot easier and it's definitely i you know i'm biased obviously but i think it is the best data store for the problem uh currently um uh you know there's obviously edge cases and all that but i think it is a really great platform to build on top of
But the reality is that there's still some difficulty going from a data store into a whole of a platform. You know, there's things like tuning your indices, tuning your schema, building something that. also optimizes the searches as much as possible and also creates an experience that the users are used to, right? Even simple things like, hey, like how do we make sure we have good autocomplete?
uh those uh don't come out of the box with click house click house is a really great database but it doesn't have you know like field value autocomplete out of the box so they create your queries and think about how do you optimize that and create a really low latency fast experience for our users um
And then also piecing together the workflows and all that. So there's a bit of investment that goes into it. There's certainly teams that do go down a route and that's absolutely fantastic. And there's other teams who. um you know they have engineers who might not be able to you know put in the
Shopify spent a lot of time building, I think, their platform. They talked about that themselves as well. So I think not every company has the ability to invest into that. Some people just need a, hey, we just need a product. They'll just work out of the box. Maybe we'll have some stuff on our end, but it isn't, you know. a huge investment of custom engineering on top.
yeah makes sense and i think on a previous discussion that we had you you emphasized that the smaller companies uh small even medium companies that don't have these the luxury of putting these uh this amount of head count and expertise skill set to uh to really maintain your own and uh they need something simpler and something exactly available so i think this is uh although you know some large corporates still in terms of the it skills
uh i know they're big as a company but you know they they prefer to have something that is more readily available and uh not put the uh investment there so i think it makes uh it makes perfect sense uh which which leads us to now the current uh thing so uh give us the the elevator pitch on click stack as you present it as you view it today yeah i mean quick sack i mean just
¶ intro to ClickStack and incorporating OpenTelemetry
The short pitch is that it's a open source, high performance, observable platform, and it's built on top of ClickHouse, right? And that's kind of the short story of it. I think if you go deep into it, it's a lot about talking about how...
we're building a really you know uh engineering focused experience a really focused experience around solving you know as i was talking about earlier how do we solve the how do we present the clues how do we help you basically surface the data that you need to go from that incident or that bug to that root cause at the end of the day. And a lot of love and effort goes into making sure that's a smooth path as opposed to maybe a signals focused path as part of the pitch.
and uh and i think if we if we go a step uh further and you know our audience is very familiar with the with the observability stack so the the pieces if we break it down is obviously the hyper dx front end, very, very intelligent front end that you've put together as part of your startup hyper DX, Clickhouse that we're all we've been talking about the performance data store. And then there's OpenTelemetry.
right as the the third piece of this uh of this puzzle so exactly can you tell us a bit about this part and also you've made the choice for open telemetry also already in hyper dx right so this is something that you've done early on So maybe by now, and we talk about that in 2025 when we record it, OTL is already fairly established and it will be less of a question. But I'm actually curious going back then.
What made you choose hotel? Maybe what kinds of dilemmas or debates did you have in going down this path? If you can share with us about hotel back in the day, rewind to that decision point. Yeah, absolutely. No, I mean, I think open telemetry is such a game changer in the observability space, right? And there's a reason why all the folks that we talk to, whether it's users or vendors, right, we're all talking about open telemetry.
at the end of the day. Now, when I made the decision many years ago at this point to kind of invest heavily into OpenTelemetry, the project was definitely... a bit younger not as young as when i first remember the merge um between the projects but um
uh still quite early open tracing and open sensors for those who don't know which merge you're very so essentially open telemetry for our audience just to know is started as a merger of two other open source uh projects or standards the specifications that preceded it so that's Back to the old days. Yeah, exactly. Exactly. Right. And then, you know, it's quite a new project from that regards. But of course, you know, OpenSense and OpenTracing have had great work before as well. Now.
When we made that decision, I think there was a number of reasons to go down the path. One is that open source and standards really go hand in hand, right? The more that we can kind of take advantage of standards that everyone is moving towards. the stronger the ecosystem becomes. And certainly, you know, we wanted to be a great player in the ecosystem. And of course, you know, the reason we build this tooling is also, of course, supporting, providing another way to do observability.
uh in the open telemetry ecosystem right and as a builder as well it creates a lot of standardization that frankly uh we don't want to be in the business of thinking about right we don't want to create our own semantic convention uh we don't want to think about hey like
How do we name these different properties? To some extent, the community being able to coalesce around these set of standards makes it really easy for us to know, how do I know what's the pod ID? Like, hey, it's under either K, it's pod ID, right? so those semantic inventions makes it really powerful because when we talk about the story of unifying telemetry open telemetry is kind of you know being able to help unlock a lot of that right from the semantic convention so that we know
hey, this log, when it references this kspot ID versus this metric versus this trace, we know that those are all referring to the same exact ID. And the other part is, of course, you know, the API, the SDK API, right? Allowing to, you know, merge between your logs and your metrics and your traces because it's all being emitted through the same SDK, it has the same context. There's a lot of power behind what they built in Obitalometry.
Of course, I think there's been detractors over the years as well. I think there's definitely areas where the project continues to improve, but it's honestly amazing in terms of... what it's unlocked. And I think that's why everyone across the industry is adopting it. It's just kind of a net gain when everyone kind of builds into this ecosystem.
Yeah, and I really, you said that I would say even more, I think the real power of open source starts with the open specifications and the open standards. I think that, and I've been working for teams and groups and even vendors that had open source tools. But I think really having this common language, common specification is something very, very powerful to bring us together. Also to simplify for the vendors, as you said, because also as a vendor.
think about all the number of integrations that the old school APMs needed to maintain the data traces of the worlds, the app dynamics and the other these armies of engineers just to maintain these older variety of okay when i collected the telemetry from from i don't know from kafka and when i collected from redis and from apache and from this each one has its own spec and you need to support everything because you want to be universal
especially in observability that that's the point you you're the nexus point where where you show everything and when you have the standard it simplifies so i i definitely see and it's uh actually just today on the post i i posted on linkedin how at open telemetry and i'm
and as a sensitive ambassador i'm a passionate ambassador of open telemetry as well announcing uh the the rpc semantic convention stabilization project and i shared how how happy i am finally you know whether grpc or json rpc or apache board would not so the the goal is to actually have one uh unified semantic conventions one uh lingua franca for across all these things so really happy to see and great work by the way but the semantic convention sig the special
interest group so uh this is definitely something that i look into day in and day out i myself am a co-lead of the semantic conventions for for the cicd side of the house which is you know also talking about the release pipelines not just the production monitoring. So there's so much to do to tackle.
we're now looking into uh tagging resources like how do you re tag the resources to say that this is an environment this is the prod environment or the dev environment or and this is so today you have like the uh the cloud vendor specific terminology that they put. But then again, if you work multi-cloud, then you need to start translating and it doesn't necessarily translate well and so on. So this is another actually a TAG technical advisory group that are looking to propose.
to tackle this one as well uh so there's so much to be done but i think this is really the the power of things and maybe another anecdote just from open observability summit the first open observability summit that we had as part of the cncf just took place in june in denver was one of the keynotes which was actually a keynote speaker from uh from datadog
vendor well established vendor in the monitoring observability space that was a sponsored keynote and the choice that they made on the topic was the power of semantic inventions and i was really really happy to see that i said okay this is what they chose to spend their time on this just goes to show how
universal it is today also amongst vendors that used to make a lot of money out of exactly that these agents and all this intelligence there and invested a lot of you know engineering effort around that realizing that and that was a director of engineering someone definitely uh as a weight to to what he is saying it was really really uh amazing to see that and it echoed well with the audience there of course so
You said something very important here that I believe that you've done the right step, but maybe fast forwarding to today when you see that. So where do you see? uh from your perspective uh open telemetry both as someone building on top of integrating it into your product uh including also the gaps still that you see where you would like to see the project improving or taking additional directions uh curious to have your your expert view on an open telemetry
Yeah, absolutely. I mean, I think, you know, another vantage open telemetry we didn't talk about yet is that it gives users the option of choosing vendors, right? We've moved to a world where users instrument first with open telemetry.
and then they get a pick you know like hey i'm gonna go uh bake off your product with three other vendors right and at the end of the day that just means that the user gets the hopefully the best option because they get to try everything on equal footing right with open telemetry uh there's no excuse like oh like you know it's a different age or whatnot like it's the same data it's the same semantic conventions it's the same payloads
being sent out to these different vendors. And it's just about like, hey, how well does that work for their use case and gives them a lot of options. On our end, when it comes to the work with OpenTelemetry, I think a lot of it is, of course, you know. There's a big part of us maintaining the open telemetry export into ClickHouse. There's certainly always kind of like thoughts in our head. It's like, hey, how do we make this process more efficient, right, on the ingestion side alone?
writing bytes into a database isn't free, kind of processing and filtering isn't free, right? So there's definitely a bit of work that we want to continue improving on there. and then you know it also ties into some of the changes the data store level we haven't talked about json yet maybe we'll do a later uh in the segment but you know there's a lot of changes needed to support a new data type right and i think there is um some interesting work around
there's a lot of interesting work in ecosystem around, you know, like columnar storage formats or columnar serialization formats of telemetry data in the open telemetry ecosystem as well, which is also a very interesting part that we want to keep on top of. I think as a vendor building in this space, one of the areas that we want to help contribute to the ecosystem is continuing improving the ease of use and the ease of adoption of open telemetry.
I think that's probably one of the trickiest parts today is that going from not knowing much about OpenTelemetry into having it fully instrumented still takes a decent amount of knowledge and learning. And it's not as easy as some of the folks might want.
right i think that's kind of been reflected in both the surveys like hey documentation can always be improved that's with every tool that's complex of course but you know i think you know it's it's uh maybe uh objectively at this point we've seen like hey lots lots of folks find you know air documentation
And of course, every month when I go back, it has gotten better. And that's really fantastic. And we're looking forward to hopefully be able to contribute to that as well. And then also, of course, on the installation side, it's also gotten better every single year. There's always a new, better way to auto instrument.
um or to bring in your data or to simplify the the sdk installation process but of course there's still steps to go as well so we want to make it sure that's even easier to adopt open telemetry as an organization and then give some the flexibility to then choose Hey, what's the best product for them? And hopefully, you know, it's what we're building or what we've been spending our time on as well. Yep.
definitely uh resonates and a lot of work is being done to be uh to complement the folks or doing that including but there are also localization translation there's such a big audience out there that is not native english speaker so make it accessible to their to that audience as well and there's so much but yeah it's such a massive project that covering all and and also moving fast enough with the progress of innovation that's that's a challenge calling on our listeners if you are
passionate in one of the areas of hotel and you see gaps and you feel that you can contribute even in documentation fixing or or even opening issues on things like that that that'd be amazing so contribution doesn't have to be code could even be with documentation it's actually a good way
for maybe newer folks to open source to get their, dip their toes in open source. So that's a good starting point. Another thing that is interesting, you told me about about the shift that you did like in the original let's say v1 of the product the approach the
¶ balancing simplicity and flexibility
product approach was more like abstracting the underlying click house and to give sort of the most sterile experience whereas now you've reached sort of a different approach or conclusion that something a bit deeper that the underlying database being more accessible has its own benefits. Do you want to talk about this a bit?
Yeah, absolutely. So I think one of the things as an observability product, as I was kind of mentioned earlier, observability is a field where there's all sorts of different... use cases right uh some folks aren't even observing software they're observing hardware right things that run uh on the streets or you know uh in a factory uh and all those different use cases end up having
There's specialized needs, there's specialized questions, there's specialized workflows that one tool probably can't be the best fit for from the very beginning.
and that's kind of the one things that we ran into is that we talked to more and more people and more more use cases eventually there were you know we we had like hey there's all these different features we have to build um uh you know maybe one big one would be hey how do we do on the fly parsing right like i you know wrote a unstructured event maybe it's a log right and i need to now regex out some information and do some charting and aggregation on top of that um
That's something for if we were to continue abstracting the database, it's another feature we'd have to build and quite a big complex one at that. Versus when you're building on top of a powerful data store. Clickhouse supports SQL. A lot of folks already know SQL is really well documented and it's extremely flexible. You could write whole programs. In fact, one of our colleagues on our team wrote an entire RISC-V emulator.
in clickhouse sql uh so it really shows you how maybe complete um the language is and that kind of goes to shows like you know why are we spending so much work abstracting away the data a database when in fact one of the superpowers is that you can query the database directly if you need to go down to sql we should let users do that and be able to do that safely on top of click house
um and that's kind of one of the big revelations that we've made that we don't need to build a super complex product with a specialized feature for every single use case if we give them the flexibility of sequels like hey when you have a difficult problem you can break down a sequel
use that language that you're probably already familiar with and then get the data you need as opposed to waiting for us or you how to implement your own feature in open source and submitting a PR that way. So that was kind of the big shift that led us in that direction. which i think also relates maybe to uh and i you know i used to be a product manager in past life so i really it really resonates with me because there's also always this debate whether it should be yet another
UI experience, like visualization, something to do with the point and click and things like that, whether you should be more, in this case, like querying, just write your query and investigate your data in that manner. So like visualization. versus querying the data, what's the best way? And actually, there is no one best way, as you said. Exactly. It depends also on the persona. There may be someone who's more versed with the SQL will feel more natural actually querying the data.
more cumbersome for them to do the visualization. Whereas as someone who's not familiar, actually, the point and click will be the easier experience. So you need to cater for the novice, the versed. It's very difficult to get one platform to them all as you said so uh so that's an interesting and also you mentioned sequel that that brings another interesting question because click house is known for this
¶ SQL vs. Lucene query languages
I guess, strategic choice of aligning with SQL, which is a whole different debate out there among the storage engines. What's the right thing?
but you brought zooming into click also so you one of the novelties that you brought with you with from hyperdex is actually also the lucene side of the house so now users can have the lucene search experience to see for those who don't know this is the native language from the project apache seen and also people know it from solar and elastic search and open search and other projects that you that is are built on that
So, so today you provide both experiences, right? So there's the SQL analytics experience and the Lucene search experience. Can you tell us a bit why?
why lucene why sql maybe how you see them like what's the best practice that you'd recommend and guide users on that regard yeah absolutely so um you're totally right you know we chose a and that's one of the things you mentioned earlier is like you know there's no right answer between uh having a nice easy use visual experience versus a power user kind of um you know more of a uh bare metal maybe experience you know in this case that'd be sql um
But talking about our choice into using Lucene, that's kind of one of the things that we want to do is that, admittedly, there are times where Lucene is just going to be a lot faster, right? And it's also a higher level language in SQL, realistically speaking, as well.
So, for example, in Lucene, you can just type in the word error, right? You can type in the word error, hit enter, and you'll get your error logs. SQL, you can't type in the word error, hit enter, and then get your error logs. It's a little bit more verbose than that, right? But then...
Also, one of the benefits is that when you type in this high level language, we can make actually some optimizations under the hood. When you do type in the word errors, like, hey, we're going to do a token search by default. We're not going to do a substring search, right?
Most people who are using SQL, they're probably used to typing like or I like and then doing the percent, you know, word percent. And that's a substring search. That's a bit more expensive. That's actually quite a bit more expensive depending on how you, you know, build your database compared to a token search, which you can.
tokenize and build some indices on top of that ahead of time. So it only it gives users a higher level language to really easily start. And also it gives us the ability to help optimize for the user. They're specific for the observability use case, right? The other thing, too, is that I think everyone uses leucine syntax. If you're using Google, you're more or less using a similar, not exact, right?
And merely ours is also a Lucene-like syntax. It's not completely to the language. But, you know, if you're using Google, it's also a very similar experience you're using already. So basically everyone who's used Google, you're probably going to be comfortable using, you know. we've seen syntax and therefore hopefully comfortable using what we built with hybrid x as well nice nice another element that is uh very
¶ performance, cardinality and the new JSON type
relevant. It's very prominent in ClickHouse. And again, we talked about it in the previous episode a lot is performance. Performance is central to ClickHouse. It looks like it's, you know, design choice number zero. Yes. uh feature number zero and uh it's very very clear and obviously this is why why uh most people that hear about click house first click in hear about this in in this context which is great um and and the question is
You, in the context of observability, you've done significant work as well, focusing squarely on performance in the observability side. Can you share a bit about that side? Yeah, absolutely. So I would say maybe zooming out, you know, observability, why ClickHouse is so good at performance for observability in general, right? And I would say that it's because observability.
and why you know why is click house now like this kind of uh big name uh where a lot of vendors are using it and i say it's because like observability has kind of shifted it's problem statement over time right well i think we also shifted the word over time you go from like log management to you know monitoring and then observability and all that i won't get that right now but basically
We've kind of, as systems got more complex, as we're collecting more data, it turns out having, be able to understand trends, right? And to be able to have a high level view of a complex system.
is a lot more important than shelling into one VM and then tailing the log from that specific VM, right? So we kind of change how we view the problem of reliability on these kind of systems. And therefore... you know, as you're trying to get a higher level view, it turns out, you know, going for individual events is a little less relevant and be able to, you know, aggregate across, you know, millions or billions of events becomes a lot more pertinent to the problem.
And ClickHouse has kind of been born from that mindset, right? How do we analyze billions of events and build aggregations on top of it for analytics purposes and observe, let's kind of converge into that problem set. ClickHouse itself has already been really powerful because it's a calmer database to start with. So instead of having to scan, usually if you emit a wide event, you're not looking for every single property of that event, right? Maybe it's like, hey, I'm filtering by...
the user ID by this specific region, maybe by this even this pod identifier, right? So you're only pulling out specific subsets of the data and the columnar layout allows you to query that really efficiently.
creating a subset of data that you need. And then click house, of course, under the hood, as you mentioned, you know, performance has been kind of the hallmark trait of it. And there is a ton of like, it's, it's, um, uh, I'm not a database expert. I'm from the observable side, right. But there's a ton of.
um optimizations being made into you know being able to leverage uh the specific cpu architectures or you know the the vectorization on the cnd you know abilities of those cpus to be able to process batches of data at once, right, in parallel, as opposed to processing event by event. And that, of course, leads to speedups as well. So Datastore, really well built for it. And then it's about how do we make sure it works really well for the observability use case.
Right. And that's about how do we tune the right schemas? How do we make sure that the right data is cached at the right layers? It has great things like, you know, you could separate your data that sits on NVMe versus data that sits on S3, which.
you know, really popular story data on S3 these days, or your, you know, object storage or choice from your cloud vendor, right? You can kind of split that with ClickHouse. You can choose, you know, what indices you want to build, you know, how much storage do you want to build for that specific index.
the tokens that we talked about before. You want to build a token index, but you only want them to be a certain size of a certain number of hash functions for the Bloom filter. There's a lot of different things that we tune, and there's still more improvements actually coming. along the pipeline, there's still kind of more optimizations and more clever ways we think about how to make it more and more efficient for specific observability that, you know, hopefully we're excited to share.
as well i'm happy to talk about that all day but maybe uh that's that's a good enough high level overview no no you gave us a very good one and i think you talked about the things that are coming up one of the things that caught my eye uh was the uh uh the jason that you just you alluded to earlier on and i think first this is a classic move and i think a very clever move in terms of performance and also one that answered or hopefully i haven't tried it myself
that's right but at least the mission statement is one that i found when when we got with regards i mentioned to that uh popularization of the columnar wide event approach that uh that the event the advantages are clear with everything you stated everything is together then then you query only the relevant subset but i've seen at least also implications in performance in performance in typing and so on and
One of the things that I was curious back in the day when I saw different vendors and different projects going down this path is how do you address that? And I think one of the smart moves you've made with a JSON type addresses that. So I'm actually curious for you to share with you.
audience a bit more about this yeah absolutely so um the json type for folks that don't know it basically is trying to solve the problem of you know your logs your tracer your telemetry data is effectively even if it's
you know, you're structuring in JSON or whatnot, right? It's going to be semi-structured and have new properties pop up and, you know, properties won't exist in every single event. And in a columnar database, you know, in a rigid columnar, maybe a traditional columnar database, you'd have to define a column per property.
more or less, right? So if you have properties that show up one in a billion events, you're going to have a column that's not used very often, which isn't fantastic from a performance perspective. Now, there were ClickHouse... The JSON wasn't the only way to store these kind of dynamic properties. We had something called a map type, but the performance was more or less linear. As in, the more properties you add to it, the longer it will take for us to scan through that column.
The JSON type basically adds some more intelligence in a database into knowing about sparse properties. So basically, if you have, you know, maybe like you have like 5,000 properties. but um you know maybe most of them don't exist for every single event what we'll do is like hey for the most common properties you could choose as a setting uh either you know a thousand or a hundred whatever properties those get turned into actual columns and
ClickHouse will actually try to figure out, hey, what properties make the most sense to turn into a column by viewing, hey, does this property exist a lot? If it exists a lot, it should be turned into a column. If it doesn't, it will actually be stored into one shared column.
So you aren't creating millions of columns, you have millions of properties. All those different long-tail columns get put into their own little column. It's still performant. It still has some performance benefits to it, but it is not going to be as performant as its own column, of course. But if it is a very common property, it'll be put its own column, take advantage of the column of nature and parallel processing and higher compression rates, et cetera. And I think I did a demo.
was a week back where I think it was like maybe like 10x faster basic searches and maybe like 100x less data scan because you're only scanning for that one property as opposed to maybe the batch of properties that you have to scan before. And also in terms of the typing, right? Because essentially, you know, it's not everything is a string and then you go into additional both performance and also accuracy with the conversions rather than having the native type.
Exactly, exactly. So if you submitted, for example, a UN64, it will be stored as a UN64. It won't be in 64 overflow or stored as a stream where you cast it back. It is stored natively as a UN64, and that will, of course, take advantage.
of the fact that with the N64, it's a lot easier for a CPU to process that as a native data type than deserializing a string and all that. So there's a ton of different benefits. And also property collision, you don't have to worry about property collision. You have two properties.
with different types you don't worry about it you know colliding um at the schema level uh they'll be kept into their own subtypes and all that all sorts of stuff like that folks who've been running like you and me from with elastic search and the likes probably yeah Suddenly comes this one document that has the same field, but a different type. Okay, go figure out how to handle this one. Yeah, exactly. Exactly.
Yeah. Another challenge that we've all been facing, both as end users, as project maintainers, as vendors, is with the time series, with the cardinality. You know, we see that today. with all the levels of virtualization and kubernetes and microservices and pods and whatnot each one has its own dimensions and so on so and it's been each
Again, each tool tackled that differently. Do you want to share how you address that within ClickSack? Yeah, absolutely. So carnality is certainly a problem for, you know,
maybe like the traditional time series database where you create, you know, one time series per, you know, label set, right? And then you have, you know, high carnality, that means you have a lot of time series. That becomes a bit of an issue. Within ClickHouse, the storage format... is actually more of um it really takes advantage of like the the merge tree kind of structure that quick house has so instead of creating a separate
time series or a separate time series and that creates complexity in database and screen different time series. Instead, it's one giant table, but the keys are separated by the label set effectively. So basically you have a really long table as opposed to maybe a crude analogy and other types of your database maybe having a lot of different tables.
right um not perfect those that know databases will probably cringe a bit when i say that but um that's kind of maybe the biggest difference and the reason that we could still uh be performant in a really large table because one well click house is kind of built for really large tables to begin with and second of all because we can index into the part of the table that's relevant so even if you have you know uh you know millions or billions of time series um
All that will do is basically just expand the size of the table as opposed to managing a ton of overhead with having different time series that you need tracked at the database level. So that's maybe the TLDR of how that all laid out. it makes sense and and going back to the the sequel nature again sequel is a very uh i guess uh easy choice in terms of the how widely spread it is and how how easy it is but
Originally, it came from relational databases where it made sense with the way that it's stored and normalized and so on. But then when you try to apply SQL to other... data stores of sorts then you come with challenges like i guess the classic is the join operation but it can be a pain so maybe if you want to say a word about how you tackle that in your uh in your area
Yeah, absolutely. I mean, I think the reality is that data that isn't being joined is always going to be more performant than data that is joined. So it is an area that, especially in observability, where we talk about wide events, like one of the advantages that it's...
It is denormalized, right? You will have repetitive, you know, the labels are going to be a bit repetitive, but because of that, you'll get that performance back. And I think most data stores kind of adopt that principle. Again, with drawings today are still a bit rare in observability. That being said, of course, because ClickHouse is a SQL database and does have the ability to do joins, and there's a ton of progress that's been made by the core database team.
in improving that. And I think, you know, anyone who looks at the project has seen the progress being made there is that, yeah, you can actually join that data. And I'm actually really excited for the future where people start taking advantage of this fact that, hey, like.
I have some interesting business data, like maybe I have my customer IDs mapped to some business domain specific information that I want to query alongside our observability data. And I'm really excited that now that we have a data store that can do both.
pretty well, and you can do the joins across those without having them denormalized and stored in the same place from day one, that we're going to unlock a lot of really exciting kind of use cases and answer a lot more questions that we weren't able to before. I think that's some exciting stuff ahead that we can do that.
Yeah, that's exciting that tooling allows us because I've been preaching the fact that Obsorbit is far bigger and greater than just the IT side of Obsorbit and the product. Again, being a product manager, I found Obsorbit very insightful for me as a...
product manager to understand how my launch is performing and and to entrepreneurs that i've been working with that how to understand how their business is going how to structure their pricing model and so much more so i definitely with you on that i don't do uh
¶ use cases in production by OpenAI, Anthropic, Tesla and more
spend a bit of time in the time that is left to talk about how it's actually being put to use the the the open source click stack out there and their big names you actually just had the marvelous event for your your community at the open house event the first event and then by the way
it looked very impressive from afar for having followed the uh the feedback from that so uh sounded exciting and i know that some very very interesting uh use case came up with companies such as open ai and tropic tesla whatnot so really impressive companies that needed to solve their observability.
challenges and chose click stack as their go-to path and then the click house underline click house for that and so can you share with us maybe a bit of the experiences shared putting it in production running it at scale Yeah, absolutely. I mean, I think the big underlining point of all the talks, as you mentioned, we're really fortunate to have OpenAanthropic and Tesla all come live onto our event and talk about how they're using.
uh click house and leveraging uh the observable use case there um and i think the underlying point for all of them was scale right i think uh you know tesla was I think I had the most interesting way of saying it. I think there was like one quantillion rows, I believe, stored in their ClickHouse instance. That's a result of going at 1 billion events per second.
uh if i remember correctly there so uh it's a it's a testament to the amount of scale they're talking about and i think open air talked about how um you know again the open source nature they were actually had a story about how they contributed back to quick house and uh
greatly reduced the CPU usage of inserting the petabytes that they were doing. As you imagine, ChatGPT generates a lot of events that need to keep track of, especially as every time they do a new product launch, their usage grows exponentially.
um so again there's a lot of uh kind of talk about how it works at scale and not only that it scales really well but also of course you know it's cost effective any system if you throw enough money at will probably continue scaling but the big part was i think with the
with Anthropic was basically talking about, you know, like money is no longer on fire. When they moved to ClickHouse where it's before there, it was effectively like they were just shoveling money into a coal pit and lighting it on fire.
I think that's kind of one of the big key takeaways is scales really well. And it scales really cost-effectively and lets these companies collect the telemetry they need to solve the problems that they need. I think Anthropic talked about how like, you know, like their latest.
uh versions of their models were made possible because they had click house power and their observability so that they can of course you know diagnose and improve their systems based off their telemetry that they were collecting on top of it so that was all really exciting to see and also really exciting that the public gets to hear about all the hard work that these teams have put into building these systems.
that's amazing and i can definitely recommend the audience our listeners to check it out i think all the recordings are on youtube so check them out you can hear it straight from the engineers that implemented that in As you heard, OpenAI, Anthropic, Tesla, these are impressive companies and learn from their experiences and hear the other talks from the event. I think very impressive and interesting. And also maybe on this opportunity, can you tell the listeners?
¶ episode outro
where they can follow you when they can follow the click stack project how they can get involved Yeah, absolutely. So for me, I think my Twitter, I think my Twitter hands like Mike G42 or LinkedIn as well. I think I am. pretty active on there overall though uh you can go check out both the projects on github hyperdx io hyperdx go search on there
And then also we have clickhouse.com slash O11Y, so short for observability. There you could read all about ClickStack documentation, getting started, and then links to the repos and contributing. and all of that. So really excited to have more folks try it out, you know, run a production, help give feedback, help contribute, or even blessed with so many contributors after our launch as well. And all this kind of excitement from the open source community.
Yeah, really looking forward to all that. Amazing, amazing. Mike, thank you so much for joining me on this episode and congratulations again for both the acquisition and the launch of ClickStack as a holistic stack. uh with uh with a click house that's uh that looks very very exciting and uh as someone who's already marked click house uh potential for observability seeing that uh potential fulfilled or being taken to the next level is a holistic stack that's uh
really exciting to see for open observability community. So keep up the good work to all the team. um and uh yeah follow them obviously all the uh references the uh mike's links and the linkedin and twitter and whatnot will be on the show notes so check them out on your favorite podcast app And also references the links that I mentioned to the event recording. So you can check out the open house event recordings and the project GitHub repo, all of that. Again, check out the show notes.
with that i'd like to thank also our guests for being with us today as always all the episodes are on youtube or on your favorite podcast app
check them out. Some of the episodes are being also a live stream. So do follow us on Twitter or blue sky or LinkedIn to get the live stream times and also chime in with follow-up questions and with your comments and insights on the uh on the topics uh in terms of news we had last episode was about open source summit north america and we're heading up straight to open source summit Europe that is taking place August, end of August, August, last week of August in Amsterdam this year. So
You'll get that hot off the press and hope to see you there. I'll be there. Mike, are you going to be at the Open Source Summit Europe? Unfortunately not, but I will be in QCon India, I believe, next month as well. So that'll be exciting. That's another one. And will you have someone from ClickHouse that they can look them up? Yeah, we'll be there. It'll be me, one of my teammates, some folks from ClickHouse as well. So we'll be there. Really excited to see everyone who will be there.
Yeah, so follow this up and check out the team as well there. And until next month's episode, thank you very much for listening and may the open source be with you.
