Observability Cost-Savings and eBPF Goodness with Groundcover

⁠¶ Intro

00:00

Welcome to DevOps and Docker Talk, I'm your host, Bret Fisher. In this episode, I'm joined by the CEO and co-founder of Groundcover to dig into their new observability platform with a twist. It's built from the ground up using eBPF in the agents, and they have a platform architecture that allows you to store all those metrics, logs, and tracings in your own

00:24

cluster to avoid the premium that traditional monitoring platforms charge. CEO Shahar Azule joins me to discuss their design decisions and how the Groundcover team provides multiple options for deploying their solution and the benefits of using an on-server agent that's entirely eBPF base, which means it can detect a lot more what's going on in your apps without

00:47

the unusual permissions or lengthy configurations. Unlike the typical monitoring solution, I didn't go deep into the day-to-day features with Shahar because I found their unique architecture and the design to be one of the most interesting choices in their solution, and I dig it. So, we talked about it a lot. But without all the great features in using the product day-to-day,

01:09

you wouldn't care about me obsessing over their cool architecture approach, right? Well, I was also surprised here as Groundcover is showing off what it's like to create a fresh observability platform in the 2020s because it's built on so many new standards that didn't exist a decade ago. And I'm thinking that that's going to accelerate your implementation of it. For their dashboard, they support the Grafana dashboard catalog, which means a lot

01:34

of things you can just plug and play with. They support Prometheus style metrics collection. They support the PromQL query language that's growing in popularity due to Prometheus. They support the open telemetry endpoints standards that cloud integrations like CloudWatch and Kubernetes. So, if you asked me to deploy this solution today, I feel like their support

01:55

of modern standards is going to keep me happy. I'm calling Groundcover's system design a hybrid observability architecture from a systems point of view because it's not quite a full-sass, like similar tools in the space, but it's also not totally on-prem unless you want it to be. It's got a flexible enough design that you are able to pick which things you want to run yourself and which things you'd rather them run. We go into all that and

02:22

more in this episode. So, please enjoy our chat and demos of Groundcover. It's that time again, bringing in my special guest, Sahar Azulay. I am so excited to have you. Tel Aviv, by the way, is like, I think we have someone on Tel Aviv every other month. I don't know. It's like my other sister city. You all have sister city ideas over there. It's like my sister city. I'm in not in Virginia. I'm talking to people in Tel Aviv

02:49

or that area. It's just a hotbeds. It's crazy. Nice to meet you. Welcome to the show. Thank you. Thank you for having me. I'm going to be here. Yeah. You're the CEO and co-founder of Groundcover, which is a, I'm going to call it a startup. Maybe as a, I don't know if you consider that. Maybe if you consider yourself graduating out of the startup world. You're always a startup. You're always a startup. That's a good

03:10

attitude. And I'm 10 years into this Docker thing and I still feel like I'm a Docker startup. Yeah. So, tell me about your background because I was reading. You've had time at Apple,

⁠¶ Shahar's Background and GroundCover's Origin

03:21

I believe. You've had time at other companies. And then you, in the middle of the pandemic, you decide, I'm good. There is a problem in the monitoring and observability market. I need to solve it. So, how did we get here? What's this about? Yeah. That doesn't sound like a too smart idea, right? No, no, no, no, no. But you know, that happened. My background is similar to my founder, the background, Chaz, which we worked together for many years. And it's all basically revolving around R&D and R&D

03:49

management in different aspects. Both of us spend a lot of years in cybersecurity, which I think contribute to some part, you know, in some part to why groundcover is built the way it's built, widely we choose to go for kind of the deep-dank aspect of the ability, I would say. And we probably talked about it a bit later. But basically cybersecurity,

04:12

machine learning from different aspects over the years. And using these systems, right, using Grafala, using data.dog, building observability solutions for the teams, I worked with just because we had to. And, you know, I assume that's many, for many reasons, that's probably the main reason, you know, just using these tools at the knowing what works and

04:33

maybe what's that. Yeah, in fact, when you and I talked earlier this week, when we were sort of planning out the show and it kind of struck me, we've, you know, there's lots of ways to talk about all these tools in the cloud data ecosystem nowadays. And sort of, you know, we love to tell stories, those of us that are trying to help others. And it struck me as we've got lots of options in the, just in all the areas of tools, if you

04:58

just go look at the landscape on the CNCF website, right? It's a little overwhelming. Everybody talks about how many players are in there. It's funny because I feel like we've always had, and in a lot of places, we've had a gluttony of choice for decades now. But now I feel like we're in this gluttony of choice of open source. And it's, I, what I think is challenging is that we, it's hard to figure out what you want to use. And a lot of times

05:21

you're focusing on, do I do it myself for free? Free? Or do I outsource it completely to a SaaS? And it's going to cost me money, but it's going to save me time. Right? So I've always felt like it's a choice of time or money, time or money. And we were talking about like observability or just the monitoring and logging world that this all came out of was always, it always felt like a binary choice of I have to run all this all myself.

05:47

I have to figure out how to configure everything myself. There might be open source documentation that I can provide. But I'm probably going to end up on Stack Overflow and in a bunch of, you know, Slack server is trying to find help. And then I've got to run my own storage. I've got to be a storage expert. I've got to be redundancy expert and all that. It's,

06:02

you know, it's like, you're taking on a part-time job with monitoring. And I often will tell my clients and the students like, just try to avoid running your own, you know, monitoring even though permithius and Loki and all these things are fantastic. Like unless you have a bunch of free time on your hands, it's a lot of work, especially if you have more than a few servers. So I feel like we've had this choice for a very long time, decades even.

06:24

I either have to run it myself or have to pay someone else to run it. And I don't, there's no role middle or hybrid option. I'm going to keep using that word hybrid even though. I feel like it's overloaded right now.

⁠¶ Where Did the Hybrid Idea Come From?

06:35

Where did you come up with this idea? We're going to get more into it. But where did you get come up with this idea of maybe like a middle ground where maybe a part of it is self-hosted and part of it is in the cloud? How did that come about? So I think the first care about from, you know, still sticking to the understanding that, you know, regardless of open source, which is incredible and we should adopt it from various

06:56

reasons, right? Mostly due to standard, the standards and the customization of the domain itself, right? We shouldn't be, you know, locking people into specific things due to the vendor and so on. So there's definitely advantages to open source. But regardless of that, first understanding that observability is a profession, right? So, you know, even if you're using permithius, there's still the fact of how did you choose your character now, right?

07:21

What exactly are you measuring? What queries are you using, right? On top of that permithius. So people think it kind of ends there, which is, you know, deploying and maintaining that solution. So what's what's you understand, Dan? What happens to most vendors is he goes us, right? Because he say, then I have to be that, you know, housing for my clients, I have to, you know, support them. They don't know how to query the data, they don't know

07:43

how to choose what to store, how to store. And therefore, I'm going to take that part for them and bring the braze of observability into play. The main driver to go into a different approach is basically that the cost models of these solutions fail to be sustainable. And

07:59

they're not wrong. That's what I'm trying to say in the basic approach of, you know, is every company should every company just run permithius and elastic understack and just go by if you're a specific company, if you're willing to put out a specific expertise, but it's a profession. So they're not run by going there. But eventually the cost models are pricing people for volumes of data they can control or won't control or won't take

08:23

the profession of controlling. I've actually created a situation where it doesn't make sense anymore. And, you know, the open source right itself also doesn't make sense for many companies. And therefore, ground cover kind of file, what we think is the best of breed of saying we believe in open source. We believe in the fact that it should be transparent and accessible and in your control. But we also believe in the advantages of not marking

08:45

up costs on hosting and on our side. So maybe let's help you maintain a host on your side. Basically, that's exactly what we're doing for the last three years, which is pretty unique. Yeah. It's funny that you're talking about this as a professional. I love that's

08:58

a great tagline because there are a few areas, build engineering as well. I feel like is another area that's not just underappreciated like understanding the full breadth of observability and the platform that you're monitoring from and just the techniques and the standard

09:14

processes. That's almost like I feel like at the CNC, I was going to make more certifications which they are on a pretty good clip lately of making new ones that there should be some sort of observability certifications that you understand, a preliminary, you'll be

09:27

understanding even from the developer side about adding your own metrics into your actual software and not bolting on after the fact, but actually thinking of it from that point of view, which those people that are a tip of the spear that are running, you know, the people that are working in Netflix, people are working at these very large tech companies

09:42

that is all like old news to them. But there's still a major part of the market that is coming at this from a very traditional mindset of we will think about monitoring last, we will think about observability last once it's in production, we will, you know, install a product and that we can maybe get our existing engineers to do it without necessarily being a professional operations person. I was that person 20 years ago that came and lived

10:08

in operations. There was no DevOps word yet. And I was in the operations group in the SISADM group trying to help the developers go faster and do things more secure and better and quicker and recover faster. We didn't really know and have a lot of those things back

10:22

into mid 2000s. But I feel like there's not a lot that's changed in terms of expecting everyone to know a little bit about observability, what that even entails and then educating people to different parts of your pipeline with its developers or the pre-production people making sure that did your observability metric break in testing? Like not a lot of people are testing their metrics. So there's just a ton of stuff here that I think you could

10:46

dive into. But I feel like that's almost a separate podcast. Let's talk about the certifications of open telemetry later. But maybe we can break down. Okay, so we're talking about observability. Clearly ground cover is this is an option for people that want to manage the metrics, the logging, the tracing, the traditional monitoring ideas of Kubernetes. Let's talk a little bit about what's unique about the architecture. I'm going to use the

11:14

word hybrid. I made it up. I don't know if it's even the right word. If you have a better word for it. But you feel I feel like part of your deployment options and a lot of people that are watching and taking my courses in the DevOps world are saddled with this. They may not be running all the metrics or creating the Prometheus queries. But they are probably

11:33

the person that's responsible for deploying it and architecting this out. And one of the biggest challenges, usually with Prometheus and Grafana and that whole stack is that it's a huge stack. It actually is not one tool or two tools or even three tools. It's a dozen tools. Yeah, yeah. It's if you want the complete thing that you think of when you think of SaaS monitoring tools. And I think that really that discourages people,

11:56

but it also surprises people. I think a little bit when they come into it and they're like, I can't just look at one of these tools and actually make anything of it. They all work together. There's lots of these different open source tools. So how does your deployment model really shift that even though it's still partly in cluster?

⁠¶ GroundCover's Deployment Model

12:11

So in a sense, I think we're what we're doing from the deployment perspective of the actual background. I mean, ground cover is kind of two startups in one in a sense. One is how we collect the data, which is based on. Yeah, maybe not another great idea, but we love the hard ground as you understood already. So the one startup is building that sense story

12:30

that collects the data with the VPS. We can talk about later, but basically trying to get as much data as much accessibility as coverage as a faster or more responsible into that in cloud backend. And the other part of ground core is basically building that architecture that says, how can we store data on your cloud premises, manage it for you, and so on. One of the things that we're trying to do is first introduce customers

12:53

to the latest and best technologies that we believe in, right? It's part of the backend. And we're very proud of the actual, you know, if you got to data dog right now or to Nurellick, for example, you're not supposed to know or of course, you know, not necessarily exposed to the technologies that they're using, going to scenes to store your data, right?

13:11

It's not part of the deal, right? So ground cover is very proud of the fact we're using the Victoria metrics and click house to basically engine being cloud behind the scenes. And we're trying to say, you know, we know how to bring this best of breed technologies into your cloud while still managing for you. That's interesting for free reasons. One of them is that, you know, people use primitives because they know to use primitives or think they know to use primitives, right? Right. Primitives.

13:36

Peter Net tells them so. Yeah, exactly. Primitives is in that part of the adoption curve, right? Waiter, when you should know how to use primitives, even though there's like a lot of people's a lot of way. And for example, the star matrix isn't, right? The median delos, right? Wouldn't just go and try to lock with these are a matrix,

13:54

you will probably go for, you know, premier status stack or something like that. And basically what ground covers trying to say is you might not be familiar with this technology yet or you might have not have even used them yet. But we know how to first make it transparent to you so you might get exposed to them and might even fall in love with them to your other use cases.

14:15

But on the other hand, we we deploy them in your cloud premises, cloud native over Kubernetes, manage them for you, scale them for you, back it back them up for you, and make sure that basically you don't have to worry about it if you don't want. So we talked about, you know, before that profession thing, you know, should you be a full-time superbuilt engineer? Probably not in most cases, but if you do want to get into that,

14:38

you definitely can, which is kind of what we're on, come and press at the table. So, Inglow is highly managed. Some of our customers don't care, don't want to know, right, about what is going on there, which just works. And for others, it's much more than that, it's, you know, it's a reach back end that they want to explore, utilize, query deeply with different strategies and kind of learn, learn about these technologies through

15:00

that. So we kind of see both ends, they rock up, which again is unique, right, because you don't get it from your suspender, right? So the risk there is sort of the lack of turnkey, just when you have to deploy stuff yourself, there's always that additional education, instead of saying, just install our agent, and then the rest is magic. So you have this concept of Inglow, you mentioned that before, I just wanted to sort of talk about that for a second.

15:28

The around your docs, you have sort of a couple of different architecture designs, which I thought was pretty cool for how flexible the product is. And we were talking about Inglow is that feels to me like the default option is that how you're looking at it is like, this is the model that most of our customers may consider. So I mean, the the architecture of choice that Groundcover took is separating the data plan control plane

15:56

for any flavor of groundcover. So basically there is no situation where groundcover stores your data on the groundcover premises. We work with companies that are privacy where it allows us to break the pricing model that the market offers and price differently. So we're heavily invested in separating control plane and data plane experiences. The product is still sus, right? You go to groundcover.com, you know, see dashboards and so on. But eventually behind the scene, it's always

16:21

an in-cloud architecture. Now it has multiple flavors, we can touch on in a second, but the one thing that I would say that describe what groundcover stats for is we call it frictionless of

16:31

the reliability. And it's the fact that if we want to do something that we try to take it to be as out of the box as possible, because we believe that users suffer from moving between solutions and the turnaround time isn't, you know, great, but there is a trigger point because of cost paying off, you know, coverage because open telemetry isn't so easy to adopt and allow different,

16:55

you know, other reasons. Yeah. Right. In this picture, that's as possible. So on one hand, EDPF, you know, was super easy to install our agent and everything is magic as it stay, right? And on the other hand, in-cloud is also a bit built to be frictionless. Most people, when they see in-cloud things, oh, so I should bring my click house and Victoria matrix databases and then you just right there. No, so basically people actually provision across account for all for us.

17:19

And that's it. We spin up in-cloud for you from the ground up and then manage it for you. So in a sense, beside a few clicks in your AWS, your GCB or, you know, Azure console, you don't have to do anything to spin up, right? Covering even in the backhand itself. So you don't have to bring any database, you don't have to, you know, spin up any cluster, you don't have to do anything,

17:40

basically to get started. And then it is also kind of that magical experience combined with, you know, the sensor, which is basically a demon set, you can just, you know, install and remove. That's a pretty, our full stack, right, to cover sometimes dozens of Kubernetes clusters at once in one single envelope. Yeah. So you've got the advantage of all of the monitoring and logging

18:01

and tracing data being in the clients or the customers infrastructure. And we talked about earlier, the idea that there are multiple options for sort of the enterprise use case of selecting more and more of the components to be on prem. So it feels pretty flexible in that way. When I first looked at it, because the deployment model was, you were all these different options for deploying.

⁠¶ Monitoring More than Kubernetes

18:25

There was Argo deployment options with I like to love our CDs. So that was exciting. There's how deployment options. And so at first, I thought, is this just a Kubernetes monitoring solution? Because it was deploying in Kubernetes. But can you, I mean, clearly, it's not right. There's lots of other things that probably your clients have to monitor besides pods inside of Kubernetes. So talk a little bit about that. Yeah. So I mean, ground cover is definitely Kubernetes.

18:49

Central, first of all, we believe in Kubernetes. We think it's a different, WF flexible system, like any, it has it flaws, but in general, we know ground cover really isn't that. And therefore, we're being clouded, for example, we're run on top of Kubernetes. So we will basically use Kubernetes data technologies and manage our in-cloud in Kubernetes. But on the sensor perspective, basically, you pay a front-end in the space system, right? It can

19:11

run anywhere on your local hubs, if you want to. Our choice to be Kubernetes, South Africa is more oriented around product because basically, once you run the sensor in Kubernetes, using Kubernetes API, the ecosystem Kubernetes can basically provide a lot of context without any user interaction, right? So you know the hierarchy, you know, names of applications, you know, a lot of stuff that Kubernetes provides, which otherwise, they would have to help the vendor help

19:37

ground cover understand. In this VM, what is running, right? It's the one, so you can basically tag data for us. So we start out that way and then support a lot of integrations like cloud watch, you know, permedius, all the telemetry integrations, a lot of different stuff that we support to basically bring stuff from other cloud resources into ground cover. Like we're about to release very simple and also support for EC2, which will basically be our first step to run the sensor as South

20:03

Kubernetes because although we Kubernetes centric will probably be there for a while, right? We'll definitely starting to run our South Kubernetes into deployment types of, you know, EC2, ECS, things like that to basically allow ourselves to run where there are little machines, but you know, the orchestration is the Kubernetes. So we're definitely thinking about it, but right now, it's definitely focused around Kubernetes and outside integrations, I would say. The sensor is

20:28

South Kubernetes itself. Right. And you mentioned it several times, I just want to definitely highlight it.

⁠¶ eBPF from the Ground Up

20:32

For the last two, maybe three kubekons, whenever we do a live show from there and we talk about what's the, what's on the floor? What's the energy? What everybody's talking about, right? And so for like the last two couple of years, EBPF has sort of, I'm not, I want to say peak height, but that doesn't mean it's not a significant influential technology that's impacting. It seems like almost everything these days in terms of security, networking, and observability. But since you,

20:57

you are a relatively new offering, you were able to start with the first release on EBPF. Is that right? Like basically you went EBF from the ground up. And I think it's felt like a lot of different stuff. I mean, eventually when you don't start with EBF, I think the main difference in the company DNA is that you trust the customer to provide the data for you, right? So you call over to customers and say, you know, you have local lecturers that you have,

21:24

on the telemetry, right? Just ship it over to us and we'll go and blend this experience into something. Right. This advantage of that is that this is exactly what the data of the relic and data trace didn't do 15 years ago, right? They built the entire stack to provide something which is actually kind of a vendor, a great experience of, you know, I have the sensor, it collects the data as I want it, they just did it and store it as I want to and it presented

21:48

as I want it. And over time, you know, we started losing that into things where companies, basically saying, you know, everything is of resource now. So I'm not building it with age. I'm just, you know, a scene of all your data and I know how to correlate and visualize it. And I think that once Greta cover went for you, yeah, for the beginning, we were kind of not dependent on a data being provided by the customer. We definitely integrated with that. So if you

22:12

have open data, we want to bring in it, right? But yeah, even if you install Greta cover on a blank, you know, non-instrumented system, you will get tons of value. And therefore, we can also

22:23

provide the great experience that we believe in. For example, create metrics of the cardinality and control and depth that Greta cover was sample traces in in way that we believe that makes S, correlate between different experiences like Kubernetes events and logs and spas in a way that we believe that makes S. We kind of take ownership of the data from an early on stage was actually being collected at the node level. And that's a great advantage. I think that's what made, you know,

22:50

Dynatrace back then, what it is. And later on, you know, data log and erratic. And it's kind of made me going back up there, right? Before the EWBF, it was things moved more to the back-end side. I think I feel like with EWBF, we're kind of returning to the origins of the server built, where a company like Grunkov collects the data and then presents it like the full stack approach. Yeah, that's a good point. I have, you can call them nightmares. But I definitely have lots of

23:15

memories from a decade ago of lots of data, dog config files, right? And keeping track of those. And then realizing that it's, you know, you're pointing, this is, I think, actually pre-docker days that I'm remembering. Right when we were all starting to toy around with a femoral architecture. And so we were, you know, we had EC2 instances that were auto scaling. And so we needed agents to automatically install. And then we were like, well, these will be, she's over here, have a

23:43

bunch of different logs and different file paths. So then we have to make a config file for that. But this one over here has a different, and we would end up finding out that like, oh, those logs weren't even captured because we didn't configure that servers config file correctly. And of course, containers made that so much easier. But you make a really good point there. How does,

⁠¶ How Does Groundcover read eBPF Logs?

23:59

I mean, obviously EPF can be a very technical deep dive. But do you are you able to sort of detect including logs entracing inside of this agent? I mean, like, where's the edges of what this agent can do magically? Because I'm just assuming that it does some things better. But other things must be harder to do in EPF than maybe a traditional config file log directories and endpoint kind of config. Yeah, that's a great question. I think that one of the things that the EPF

24:29

provides you to do is basically break the tiers in how the ability products are built. If you think about, you know, why do I have to pay for a lot of management and stuff correctly and infrastructure monitoring separately and APM separately. It's some of the reason behind that is the fact that these things are collected with different technologies, right? You have a lot of collectors

24:48

basically reading files, right, sending them over. And you have an infrastructure agent, basically scraping metrics from, you know, exposed metric sources or collecting metrics somewhere else. And you have an APM SDK sometimes instrumented in the client code, right? So these are very different types of technology stacks collecting the data and therefore different product lines, right?

25:12

Different teams probably building those agents. Yeah, exactly. So in a sense, once you go with the EPF, you collect the traces on the level that the processor itself, we know instrumentation for the code. And suddenly you can do everything in the same sense. So you can collect labs, you can collect infrastructure monitoring, and you can also do APM in the same piece of code,

25:35

in the same integration correlation you're doing in the sensor. The only thing that we're currently not doing with the EPF classically is around labs with basically still, you know, resides in files being read somewhere and not collectors knowing how to read them and collect them. And our sensor basically uses the EPF for everything that requires application metrics and tracing and so on. And basically uses the classic little lab collection from SDD out and COVID-19 specifically

26:04

actually read logs being permitted into files. One of the cool things that we're doing right now, I mean, that works, that's great, but ground cover is also built for very high-scale performance using EPFs, you know, kind of puts us there where monitoring reddisk Kafka, you know, these huge scale protocols that eventually require a lot of CPU and memory to digest. Yeah, and logs is one of those data things that require a lot of resources to digest and filter and so on. So we're actually moving

26:32

our ingestion lot pipeline from classic files into EPF for lots as well. And we actually give a hubcom lecture about that last hubcom. It's a very, it's a very interesting take in the mostly related performance, right? It doesn't provide a new feature. Actually, logs can be still written read from files where we do pick them up from the EPF as they're being written, which is very, very cool and allows us to cut down on performance dramatically sometimes up to 40X, which can be

27:02

interesting, right? In heavy duty now, with a lot of applications, it containers, you know, dumping huge amount of logs and you know how to go right when it comes to log man. So we're definitely throwing that, but right now it's still classical. Yeah, that's what I was wondering if there, because I was trying to imagine like all the things that EPF could do, but how difficult would it be to find a log output for particular process to identify what container that's in?

27:27

I mean, we have all those today, but those are usually things that are talking to like the Kubernetes API. So they're not really thinking about an individual kernel. They're talking to Kubernetes API, getting data about a container, finding the login point for that container, and then streaming it out. And I was curious if the EPF abilities, you know, it's one of those things where I feel like it's infinite power, but it also could potentially be infinite

27:50

complexity. You know, everybody's raiding about the performance advantages and security advancement advantages of EPF tools, but a lot of us still aren't using day to day all of our stack having EPF features litin up in it. So I've just wondered if it was a lack of understanding from a lot of engineers or if it was just actually pretty hard to take what you maybe were writing before and then turn it into a kernel, a kernel oriented architecture like that.

28:18

So I think one of the misdirections in EPF is that, you know, says this technology is open source and pretty new basically then what happened is that people who were, you know, thought to believe that they need to learn it, right? They need to learn how to code it properly and eventually use it for, you know, their kind of usage and code

28:40

EPF. I think it's kind of similar to if you think of say open telemetry, right? You would definitely want to be the user of that, but you actually want to code the open telemetry ingestion by putting yourself, right? Not necessarily your skill set, right? So I think EPF went into, you know, this area of people would be so interested in that, that we saw in the conferences, right? And you know, we said years of lectures about coding EPF and getting into your first EPF program and so on.

29:05

Too big of a bleakly honest, I think that it's a mistake for most developers because this is not something in your day-to-day skill set and eventually you will be consuming EPF based tools. We already seen that in your cloud provider stack, you know, using EPF for networking and we will also see that in your security tools, using EPF to collect security

29:25

events or your observability tools using that to collect traces or whatever. I don't think people should expect to be, you know, the person converting their code to EPF, but definitely seek or be aware about off the shelf even if you have tools that they can integrate into their stack to

29:41

get more rich features or reduce performance issues and stuff like that. But I think we're moving there pretty quickly because the complexity of those solutions like Groundcover isn't something that a person can easily do it is on their home desktop, on their free hours, right? Yeah, that's a excellent point. We talked about earlier that if you, for the last couple of years, add any Coupcon. If you're looking at all these cloud-native solution boosts, there's two

30:05

trends, right? This everybody's got this little bitty b logo is somewhere on their screen or their board and then something about GNI or LLMs or something like that. And I could draw parallels with both of these as is most of us, if we're just consumers of that particular tool, won't actually have to know how LLMs work or how EPF, how to write EPF module, not module,

30:30

but program. And then it's a signal to me, those both of those terms are a signal that this company is writing in sort of like the latest gen of the technology that we're all agreeing upon is to be used. I think my first EPF tool was actually Falco. And my first use of it was, for those that are unfamiliar Falco's, a CNCF open source security tool, it sits there and watches through the kernel for bad behavior on your host. And one of my favorite things is it came out of

30:57

the box box, all these rules sets. And one of them was somebody's doing in exec in a pod and they shouldn't be. And so a lot of us like, the lot of us, we know that if a hacker is going to get successful eventually, they're going to try to start a shell or if they're going to get to, you have to go to access Kubernetes API, they're going to eventually be able to run any exec and they're going to try to get into a pod pod container. So I loved that I didn't have to tell Falco

31:20

anything other than here's the kernel go watch for these bad behaviors. And I feel like that's just one, one of a thousand examples honestly. There's actually for those that are interested more, if you don't necessarily want to learn the EPF, which is fine, a lot of us don't know it. I don't know. I have no plans to ever know it. There's actual great documentary on I think honeypot maybe

31:39

created it on YouTube. So if you just search YouTube for EPF documentary, it's actually fascinating how it sort of describes the way that the worldwide open source movement actually started changing the Linux kernel and proposing changes and how it snowballed effect. And nobody knew it was going to be this big of a deal. And now suddenly everybody wants to use it and all the products are trying to figure out how to take advantage of it. So I feel like it was like the less popular story of LLM

32:04

and how you know it's like the nerd version of LLM. So getting back to this, one of the questions I wanted to ask you is there are when I look at the website, there are you're very open about your

⁠¶ GroundCover's Stack and Compatibility

32:14

stack and your compatibility. So even though a lot of this is provided as a a SaaS like or a Cots like product where we're saying here's all the things they're all pre-configured. Everything just works like just go deploy it. You talk about you know using storing things in Victorian metrics is a part of your stack that's running in if it was Mike if I was Mike, Kubernetes would be running in my cluster not in yours. It's a pleasing click house on the front end

32:38

that you have Grafana dashboard compatibility. So lots of fans of Grafana out there, especially in getting the coolest looking dashboards to put on that monitor on the wall that you could have been sure boss to buy because you're like we got to have that 4K giant monitor to show our ops because everybody needs an ops screen up on the even though we all have phones in phones in front of us. And so you have this compatibility with Grafana where people can import dashboards

33:01

and stuff. So I'm just assuming that part of this is that you didn't want to have to hire a thousand engineers to create all these things for you. So you just started using all these off the shelf open source components. But can you talk a little bit about like what else is in this stack? Like what other things I mean we talk about EPS. It sounds like somehow you've got some sort of Grafana compatibility. Yeah, it just seems like a really cool stack that you've put together of all

33:21

these different open source cloud native tools. So as I mentioned before I think it's kind of part of the value that we provide our customers the you know our expertise in what we think is kind of a better risk that to store lots of monitor traces events. And basically once you build a stack that wide right to using EPS and you're building basically a sensor and talk about that. They just data at very high skills and also does a lot of stream process like sampling traces that

33:51

create metrics on the fly and things like that. And then doing stuff like current and larger reduction, cross cluster monitoring, things like that. You definitely don't want to be a database company as well. Right. I don't think that's that's a wise choice to say you know what I think we can build a better event database to kind of store what we're going to do. And basically it's kind of being the the wise engineer of how built on top of these great technologies something that

34:19

is deeply related to observability. Because as we mentioned, I don't believe the throwing logs in in in a database or you know some kind of you know data lake and then query it somehow is enough right. Right. Customers want log pipelines and they ability to create metrics for logs and visualizations and alerts and different stuff that eventually need to be implicit to what an observable the solution release. So we're using external metrics to click

34:45

out just because they're phenomenal right. We've benchmarked with just a lot of technologies and as much as we like graphar for example, we think that the graphar stack isn't built for the scale that the network works with from the database perspective. Vector and metrics is permeate is compatible, which is was always a feature requirement. All right. Aron basically knowing not reinventing another query language for customers, allowing them to query their metrics

35:11

on top of promQL, which is very prominent right. And being able to do stuff as flexible as you want to with things like that, click house, be in an SQL engine behind the scenes eventually and and one of the most amazing log analytics databases will be everything counter. We still have tons of brains on top of that right. From schemas to reducted views that allows query faster to the actual query. You're running on top that which are sophisticated and built for scale of our

35:41

customers. But behind the scenes, we're definitely proud of using these two great products for indeed products with relevant teams. And we definitely will see that as a long-term collaboration. And specifically, I think that we see that in most cases with ground cover mix and impact with a customer with even be all the way through the ink lab, they get another look at these technologies as well. And you know, you might find them using that somewhere else suddenly using

36:05

to our metrics as a much higher skill ability per meeting status. Yeah, regardless of welcome. So we want to be as just this stress-readable possible basically, just part of the approach. Yeah, well, I like that. And this is a PSA stand on my soapbox moment and tell you all, you may not

⁠¶ The Importance of PromQL

36:24

need or want to run Prometheus. Like it's a very popular tool. It's not been out. It's been out since Cloud Native is sort of dawned with Cloud Native. It's designed for Cloud Native containers. But it obviously works with lots of other things. But even if you don't want to run Prometheus, learn enough of Prometheus so you learn PromQL a little bit. Now, the cool thing is now most of these tools have query helpers in them so that you can maybe see type ahead and stuff like that

36:48

autocomplete whatnot. But when it first started, there was none of that. So we all had to like stare cheat sheets and manuals and try to figure out our queries. And I know that didn't work. Let me try the, oh, I forgot to print the parentheses. But that's a PSA moment. Just learn PromQL a little bit so that you feel comfortable even in a moment where you're troubleshooting and you're like, well, we don't have the right dashboard. We don't have the right thing to look at. I need to figure

37:10

out this versus that and I can look at the trend. Well, PromQL is the way for you to do that dynamically in real time. So it's kind of like everybody needs to learn a little SQL select statements and stuff like that. Even if you maybe aren't a SQL database engineer, you're going to probably need some of that even in just DevOps. It's always a little handy thing, kind of like knowing how routers and switches and subnets work. So yeah, PromQL. So box,

37:33

go learn it. It's great. And that was cool. I was also looking through documentation, trying to figure out some stuff. And I noticed the PromQL compatibility and stuff like that.

⁠¶ Groundcover Also OnPrem and Managed

37:41

I want to maybe cover one of the things that we talked about a second ago and just say that one of the advantages of running in cloud, which I think that you know, maybe interesting to a specific segment of the market. But I think that segment is kind of diversified. The fact that

37:58

we're not covered supports a solution that can be also considered as on-prem. So the reason that I wanted to maybe touch on that for a second because most of the vendors that offer a sub-solution eventually doesn't support on-prem since from the beginning, they basically say we've got to store your data. That's part of what we're doing. And therefore in an age where logs and traces include so many sensitive information about your customers, because you're a fit to company,

38:27

an HR tech company and at that company, it's endless. But how hard do you try to keep stuff out of those logs? Sometimes always creeping in. Exactly. Sometimes always creeping in. And we definitely see a lot of this is a more concerned type of company already turning into build, right? Turning into storing and maintaining their elastic and premieres. And so on. They don't get it's kind of underserved as a market. They don't get these solutions from data

38:52

dog, eurallyt, no on-prem option. Besides maybe the more heavy-duty solutions of the market, like the interest is plug. And therefore, ground cover is able to offer as a young startup, right? It's kind of more moderate solution. The ability to also eski serve these on-prem or yet companies by just running the UI of ground cover on the customer side. Inclav is already running there, right? Their entire data playing is running and maintained over there as well.

39:19

Already. So for us, this is a very small lead. And we work with these different type of industries. I think it's also an interesting take of what Inclav represents, right? Besides, you know, cost reduction and claiming responsibility for data, it also represents the ability to basically support on-prem privacy security or where architecture, which I think the market is heavily shifted towards the last few years. So that's kind of one take on that. Yeah, I live in the

39:47

on the east coast of the US. And that basically means I'm living in the industrial complex of government and military. So that's a very, you know, we go to a lot of conferences where there's a lot of Java and there's a lot of.net. So I know it's a very common request, obviously,

40:03

not just on-prem, but air-gapped modes and stuff like that. And it was cool to see, even honestly, it's pretty cool to see a young startup that actually has those level of options because the on-prem option always feels like it's something that people do in year seven and year 10. Or it's like their next big wave after they've had enough growth and they're like, okay, now we can go even bigger. Oh, let's go on-prem. But so that's pretty cool to have those options

40:26

in such a, you know, in earlier days of the product. And what, I mean, obviously, there's minimum requirements, right? There's things that scale with the architecture. You talked a little bit about that there is the architecture of running the data storage in my own infrastructure. One of the things I'm not sure we mentioned was I can choose to run that in my own cluster. Like, you have a free option for running in the cluster with the apps itself. And then there's another option where

40:54

you sort of, you partner with the companies. Is that how you would describe it? Where you're going to run in architecture of that they're managing, but you're going to manage the actual applications. There, how does that work? Yeah. So basically, we mentioned it before. We looked at things like manage. So when you run, run, run, run, run, run, run, run, and just deploy it, basically,

41:14

it will end up in what we call kind of an in cluster architecture. Basically, it will mean that the sensor and the even get the sensor itself and the backend as well will be running on your application cluster. Now, that is great, right? The UI is still served as a sauce. All their components are running on your premises, but it's not a great idea to run it on the ability database on your application cluster. You might be using spot instances. We don't have any access to,

41:38

to basically help you maintain that. And therefore, it's great for a free tier, but it doesn't feel as a sustainable architecture for Google. Another reason it doesn't work for, you know, high-scal companies is because you don't have one class, right? You have 50. And you definitely want to share those in-cloud resources between multiple clusters for different reasons, for the ability to cross-section data from different clusters in the same backend.

42:03

But also, the ability to basically share resources and be more cost effective. So, when we work with actual companies that, you know, by ground cover and not just use the free tier, that we operate on what we call a managing cloud architecture. Now, what it means is it's still the same, right? We're still using right-using dichotromanter, for example, at a click house or running them on the customer's side. The only difference is that the customer doesn't deploy in cloud

42:28

on their application cluster. They only deploy the agent, like they would do in data.dog, and they really only deploy the actual agent that collects the data on the application cluster. And this application cluster will basically shift data into the in-cloud from any cluster in their environment. Ground cover will have a control plane access to basically monitor and scale their in-cloud. So, that's why we call it managing cloud. It will allow us to do all the variety of

42:55

actions. We talked about the beginning of the session from, you know, monitoring it to basically make sure it's healthy, scale-up, disaster recovery, I think that basically needs to happen in observability back end. But still do that in the customer's environment, like we would do with the classical free tier. For us, it's kind of the great skill between, you know, kind of trying it out yourself, like you would just deploy primitives to actually getting a vendor-grade experience

43:20

that not having to worry at all about what happens with this database. And that's kind of how we see it. So, I hope that I've answered your question. Yeah. So, in other words, if I'm a paying customer, I can still call you when the monitoring doesn't work, even though it's technically running on my art, my infrastructure. And I just want to point out to everyone, by the way, like we've had lots of, you know, there's lots of monitoring tools out there. We've talked about monitoring observability

43:45

for years now. It's been sort of a core part of this show once a quarter or so once every six months, someone's coming on. We're talking about a little piece of that puzzle. And what I feel like is one of the best things that companies like yours have figured out how to do is that instead of what I used to do, which 15 years ago, even just 10 years ago, when I would switch or add a new monitoring solution, it was a complete rewrite of my brain. Like, there was nothing in common. And

44:13

other than the general experience of, I run this command or this curl to install the agent. Like, that was where the end of the similarities were. And we, and just 10 years and less than 10 years, we've, we've, we've just talked about a whole lot of them thinking about all the things that I already know, a little bit of prom QL, I know a little bit of Kubernetes architecture. What was the other thing? Oh, I already know a little bit of Grafana dashboards. I know I have my favorites. I have some

44:38

GitHub repos that I thought I'd track, you know, the different stuff that goes on there. And I feel with all, now that we're open to the imagery, we can, well, we're going to ignore the EBS stuff for a minute, because that doesn't affect me as a user so much. But there's so many of these different technology stacks that now I can come to tools like yours and say, well, I already know how I'm going to query it. I already know what my dashboards are going to look like. And I'm going

44:59

to probably use my, maybe I'm going to add my own in there. And I already know how it's going to deploy because I understand what the demon set is and what a pause is going to look like. And how you're going to set up the services. Like, it's not a completely foreign deployment of thing for me. And I don't think you could have made ground cover 10 years ago. Because if you weren't doing it on Kubernetes, there'd be a lot of stuff there that had to set up manually. And you might have

45:21

some wizard automated install script, right? But it would still be a lot of services and a lot of service that I don't really know how they all go together or how they even talk to each other. I agree. I think the deployment of the sensor itself and the data will be collected about the ecosystem so quickly from Kubernetes is one thing. And basically, the ink out on the other end

45:39

is one thing. The reason we turn to SaaS so quickly is because, you know, proprietary management system in my backhand as a vendor would make sense back then because there weren't too many tools to work with. Right? Now I can run ink loud on a remote cluster somewhere in the Philippines, right? The customers premises or whatever and manage it for them, scale up and down, you know, make sure that it fits the actual load of data that it pushed in. That couldn't have been

46:08

even five or six years ago. And we were just kind of starting this process with Kubernetes. And that knowledge is very not that event because it's so profoundly different how these observable solutions worked kind of 15 years ago. Right. They would build their stack basically build their build these databases, right? They didn't have the solution like click house on the thermometry. So you could just go spin out these databases, scale up on cluster modes on Kubernetes.

46:32

I think that's a major shift as well from the backhand. Yeah. For sure. I feel like this is sort of a new standard and new, I mean, it's obviously a trend when I'm pretending that you're the only ones that do some of these things, particularly when it comes to promQL and kerfana dashboards, but it's still amazing how many players are on the market that don't do that. And mostly just because they're legacy trying to catch up, they've had they've invested a ton of time and resources

46:54

in their own solutions like this. But I feel like to me as an inuser, talking to other end users that are watching this show, it's like, this is the new standard. Like if you're looking at products, if you're going to outsource this, if you're if you don't have the one or two dedicated full-time people to manage your observability platform, then you should be looking at tools that complement that. I think one of the things I've seen in the past is where you can maybe run some of your own

47:20

little stuff for free or for low cost on your non-production infrastructure. And that is compatible for you to move things you're experimenting with there, whether it's the, you know, QL statements, whether it's the dashboards that I've customized and figured out how the way I like it aligned. And you can move those into the paid solution and that you don't necessarily, it's not a completely 180 turn to be able to go from those kinds of toolings like open source from Itheus and that stack

47:48

to something else. This is also a big deal because I just talked last week to some people about the fact that because we're rotating engineers nowadays, I mean back when I started, you know, you were in the same job five, ten years. Like that was very normal. Now we're down to like two years or less. And so in the constant rotation of engineers, you can't keep saying, well, our monitoring solution is completely different than everyone else that you've been to. So you're

48:11

going to have to learn all that from scratch, right? We've learned that with languages and frameworks, we've learned that with standards and Docker and Kubernetes and YAML and all that stuff. And now I think that we're finally both my two interests are observability and this kind of stuff happening on observability. And then the the CICD deployment stuff and we're, we're starting to standardize on

48:28

that with GitOps and containers as an artifact. And so I'm very excited on both sides of these because it really means for us engineers, you know, we can use ground cover when we're able to use it. And if we can't, we don't have to completely throw out everything we've ever learned.

48:43

And we can build on these and that's just honestly, it's cool stuff. I'm excited for that because I feel like my knowledge, you know, the Prometheus book, I think I got free five years ago from a CubeCon and I read it at home six or seven years ago or whatever, like that knowledge is still valuable to me and I could still use new products coming out with that knowledge and understanding of metrics and in points and what Prometheus data scraping looks like and all that. It's got to

49:07

start learning from QL. Sorry, sorry, but hey, look, the great thing is once you learn a little bit of it, there's tons of cheat sheets out there. There's tons of helper tools. Once you learn a little bit of it, it gets easier and then nice thing is you can just use it everywhere. You can use it on all these different tools. So not horrible. Hey listeners, in this edited version of the show, we skipped the demo. So if you wanted to check that out, then check for the video version

49:32

in the show notes. Now back to the show. We could have spent another hour giving you a demo,

⁠¶ Getting Started with Groundcover

49:37

a walkthrough there. But the nice thing is on the website, this is what I was doing before the show because I didn't get a chance to deploy it and I wanted to know a little bit of the basics. So I was on their website and they have a new playground, which indeed give me an interface with some data in it. And I was just poking around because I was planning on deploying a bunch of stuff on my own Kubernetes. And then I realized, oh, this is much easier. This is so much easier.

50:01

I think we have to do all this myself. So for those of you that you want to poke around real quick, I mean, that's an option. You did mention a free tier. Yeah. We have a free tier up to one Kubernetes cluster, no matter the size fully featured up to three years. There's a lot of cluster. So you can just install ground cover from the UI immediately and get started and basically check it out on your data, which is sometimes more, more significant to you, right? Our demo. Yeah.

50:30

Yeah, especially if you've already got something in place. I feel like sometimes I don't know what I'm looking for until I've actually run something and I have whatever the legacy solution my team has been using for a while and we're just looking at it. We figured out some dashboard to figure out what's really important. Our problem area is the parts of the code that seem to be problems that

50:47

we're really focusing on that. And then when I want to look at something a similar product or a competing product, I'll often try maybe not in the production cluster, but I'll often try to blow up both. Hopefully they work side by side. They don't mess with each other. I think this is common thing anymore and probably with ebpf, it's completely fine. But that allows me to

51:07

like very quickly go, okay, I know this is what I care about. Let me look in this one over here and see if I can emulate that or if it has it built in or if I can figure out how to make that work. I feel like sometimes for people that are getting started in learning, you just don't know what to consider. You don't know what to look at. Like you see all these stats and they're very interesting, but which ones of these are actionable and which ones are vanity metrics is always a real

51:27

challenge for people. So if you've got something, you know, throw it on there and compare it to what you've got. Otherwise, hey, they got the playground in the free account. So if you don't have any of your own data, you can always use the playground. And you had a wide, I just should point out lastly that you had a wide support for all the a lot of different distributions of Kubernetes all the cloud distributions I saw there. Even microkates was an option, which is one of my favorites

51:48

for running locally and learning Kubernetes on. And yeah, basically we're native to Kubernetes. We can run almost on any flavor of Kubernetes, including the heavy duty enterprise stuff like oligshift and stuff like that all the way through, you know, the light light bare metal versions of Kubernetes, so kind of supporting all them. Nice. That's the wonders of the standard Kubernetes API. So ground cover has a YouTube channel. They've got, by the way, we didn't even mention this.

⁠¶ Groundcover Caretta

52:16

This is the next show that we'll have to have you back on. You have a ground cover, Coretta. Is that how you say it? Coretta? Yeah, Coretta. Coretta is kind of our sneak peek into what we're doing with the sensor. We'd allow you to see a dependency map on the network level. Not necessarily kind of the application level, but it's a cool example for people want to say, you know, show me the power of the UBS and also throw it on the graphite of the

52:41

dashboard because that's how I know how to operate. And we kind of dip that into dependency map that the graphite offers and basically show people what you can do with UBS. I saw that. Yeah. I number two product of the week in developer tools and product that's pretty awesome. It's a pretty cool tool. It's kind of an intro to what ground cover is doing,

52:59

but if you want to check out what you can do with the ethical, go and check it out. We have a support channel that deals with Questdown, our community Slack and always open for suggestions. Nice. And I'm guessing that the Discord and Slack communities, they can get all that on the front page. So you have the stuff on GitHub, you have the YouTube channel, you can find them on LinkedIn. All the socials, I mean, you know, they know what they're doing. They've got the standard

53:23

social stack. For getting to hold them. Yeah. Well, thank you so much for being on the show. This has been great. I was excited to have you on. I loved your approach because I want to infrastructure side. This idea of around allowing me to have some of the SaaS conveniences and, you know, the security of the, I think it's open ID or what have you're doing on the back in there for allowing me access to the portal. So now that's weird. How that can be some of the hardest parts

53:47

if you're trying to deploy stuff yourself. Well, that's the critics of a little bit easier nowadays, but just trying to get everything integrated with the authentication. Make sure that my portal secure that I'm adding the users I need to add. Like, that's not trivial. A lot of things, even our go CD when you're deploying our go CD and in cluster. Everybody wants to see access to it. Then you have to figure out authentication, users, databases, all that stuff. So that's actually

54:07

a pretty cool idea. I think of some of that hard stuff you guys keep running. If you, and you like you said, we can run in ourselves. We want to, but that in order to simplify costs, I can just run and, and like you said, for security, especially those logs, which I don't remember the last time I was with a company that we didn't have some sort of log incident. It's not always like AWS keys. It's maybe something a little bit less traumatic, but there's always something that gets

54:32

in the logs and we're like, that shouldn't be there. We need to use fluid fluid bit or something to get to keep that out of the data stack. So the fact that I can keep that on my clusters and we'll probably keep my security team happy. It's a pretty cool tool as well for the design. Well, thank you for being here. Thank you for having me. That's great. Yeah, I think I'm I'm glad I'm excited to see what's next for you all. I don't know if you have any hot takes for

⁠¶ What's Next for Groundcover?

54:56

a short quick on what's next. You have a like the inside track. You're a non public company. So I get to ask you. Yeah, we're going to release the tiny features around the experience we call monitors. It basically means that we're not covered by learning systems and then move ask for fine into something that you can define and control through the crowd cover experience. That's one of the things that we're already very soon. Along with a lot of other groups are coming up

55:20

quickly. Well, thanks for being here, Sharah. I appreciate it. Ground cover. Just Google it. You'll find all the things. You're either going to see plant stuff, but if you just type in if you just type in ground cover, Kubernetes, and you get all the thing, thank you. Thanks for listening. I'll see you in the next one.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Episode description

Transcript