Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656 | Ruby Rogues podcast

00:00

Hey everybody, welcome to another episode of the Ruby Rogues podcast. I am your host today, Valentino Stoll, and we are joined by a very special guest today, John Gallagher. John, can you introduce yourself and tell everybody a little bit about yourself from why we had you on today? Sure. Thanks for having me on. My name is John Gallagher and I am, on me, a senior engineer, I had a company

00:30

called Bigger Pockets and we teach how to invest in real estate based in the US. And I also run my own business on the side called Joyful Programming to introduce more joy to the world of programming. And I'm on today to talk a bit about observability, which is one of my many passions. I'm a bit of a polymath. This is one of the things that is really, really important to me and I'm passionate about. And particularly passionate about introducing this into Rails apps. So thanks for having me on.

01:02

Yeah, and thank you for all the joy you're bringing to people I hope. You've definitely picked the right language. If you're not familiar with this podcast, Ruby is a very joyful experience, personally. So it's very cool. I love, I've loved been digging into all of the observability talk that you have on, on Joyful Programming. And it's a kind of a very important topic that I

01:38

feel is definitely kind of overlooked if you're starting up. Maybe you get some like bug alerting or something like that in place as like a standard, but kind of anything performance monitoring wise is kind of like a, oh no, like something happened. Let's look into it now. I feel like it's like the typical flow of things as people start up. Do you want to just give us like a high level like what is observability and why should we care what, you know, we can drill into

02:19

the details of it after. Really? Well, I don't actually think anybody should care about observability. And I don't care about observability as a thing because it's just a means to an end. And what's the actual goal? It doesn't matter how you get there, but the goal is being able to, number one, understand your rails app in production. And number two, be able to ask unusual questions. Not questions that you've thought of a day, two days, three weeks ago,

02:52

because that's not really very useful or interesting. If we knew exactly the questions to ask in the future of our apps, everything would be easy. Just be like, how many, how many two hundreds have we had in the last week? Kind of boring questions to ask. Maybe a bit useful. I find the more obvious the question, the less useful it is. So observability is the practice of making a black box system more transparent. So I like to think of it. Imagine your entire rails app,

03:30

all the hosting, everything to do with that app is wrapped up in a no-pake black box. And somebody says, how does it work? And why is this thing going wrong? You would have no hope of understanding it. If the box is completely translucent, and you can see everything, which of course is completely impossible in software. But in theory, you'd have this completely translucent box, and you can ask all these questions and you get instant answers. That's like 100% observability. And of course,

04:03

that is absolutely impossible. And so what we're trying to do with observability is understand what is going on. Not just when it goes wrong, although that's the obvious use case, is we have an incident, the most critical point where observability comes into play, is an exact scenario that I landed in two weeks into a new role I had. So it was two weeks in, the site had gone down, I was in the UK, and the rest of my team were in the US, and there were two other engineers in my

04:40

time zone. And all of us had been at the company for a total of five weeks. So we've got this app, it's down, it's on fire, and we need to put the fire out. And we just, the three of us looked at each other, we were like, what, should we just restart the dinos? Yeah. So we restarted the dinos, we crossed our fingers, and it was pure luck that the app came back up. That is the exact opposite of what we want. And we've now moved to a situation where we can ask our app a whole load of very

05:14

unusual questions, and we will get an answer to that. Why are there a peak of 404s on iOS at 3am? Looks like that a lot of them are coming from this IP address. Okay, what's that IP address doing on the site? Okay, interesting. How many users are using that IP address? Five? So only five people are using, and you keep. So that's the point of observability to me, to be able to ask unusual questions that you haven't thought of already, dynamically, and explore the space,

05:52

and come to some conclusions. Yeah, I think that's a great overview. And you're debugging, reminding me of that I had the lucky experience of running Rails with Ruby 1.87, and every once in a while you just had to like, give the server a little kick, because it started to grow in memory size, and just giving it a quick little flush, reset things, and you're just like, oh, I guess that's how we're going to do it, until we can get some insight into what's happening. And I think that's

06:31

definitely underlines the importance of observability in general. How do you get those insights to begin with? And maybe that's a great starting point, where do you start looking at adding this insights? Right, like, is there a modular approach you could take, or is it more like you should look at doing everything all at once kind of thing? You should definitely not look at doing everything all at once, as I think we can all agree in software, doing everything all at once is a recipe for

07:05

disaster no matter what you're doing. There's no vendor, you could just like pay money to, and like you get 100% observability. There are vendors that tell you that you can do that, and whether you actually kind of know it's a different matter, spoiler alert you can't. So I just want to back up a little bit and talk about the feelings, because I think it's the feelings that

07:27

is where all of this start for me, probably. So I got into observability, and it's funny because for the first kind of year of my journey doing this, I didn't even realize I was doing observability. I'd heard about this observability thing, and it was out there in the universe. Maybe I should learn that, I should learn that, and I kept using the should, I should learn this and I should have

07:51

loads of other stuff to do. I've got loads of other things. I don't know what it is. I know it comes from controls here, and there's a Wikipedia page, it's really complex and really confusing, whatever. I've got real work to do. But what I know is that I kept coming across these bugs in bugs, and I sentry, airbreak, choose your air-reporting tool. They all help you to a degree, but they're not observable at. And I kept coming across these defects over and out, and the story

08:19

was exactly the same. Coming across a defect, I'd see the snap trace in the air-reporting tool, and I would look at it and it first, first emotion right out of the gate, complete confusion. What is going on here? No idea. So I dig a little bit into the code, I dig a little bit into the stack trace, so it's coming from here, and this thing is nil. Classic, right? This thing is nil. Where was it being passed in as nil? So now, I'm like, well, I can't just say I can't fix this.

08:51

So I now have to, well, do what? Exactly. I don't have any information to go off. Well, I guess we'll do that bug later. Let's look at the next one, and this just kept happening, and I would find myself going through all the bugs in the backlog, and I couldn't fix anything, and I just wasted four hours looking at things, asking questions that I couldn't explain, looking at things I didn't understand. And for years, I thought the problem was with me.

09:18

I honestly thought, I'm just not smart enough, I'm not a good engineer, blah, blah, blah, blah, blah. Bug fixing, just really my thing, I'm just not really good at it. And then, after many, many years of this, I was in a company, and I just got really sick of this. We just released a brand new app,

09:38

and it was a customer account app. And we were getting all these weird bug reports. People saying, how logging people can't reset my password, and every time we did this, we would add a little bit of this ad-hot logging, and then put the bug back in the backlog, and then it would come up again, and come up again. And after a while, I was just like, this is ridiculous. We're highly paid engineers. There's not a bad way. So then I started looking into, we were using Kibana at the time,

10:07

or rather, I should say, we were not using Kibana at the time. Kibana was there, we were paying for it, and I was like, I've heard this is something to do with logging. So where do we do our logging, people like Kibana? I have no idea what this even is. Let's open it up, and there's just all this trash, all this rubbish. I was like, what's this? How is this supposed to be useful? People are like, oh, we don't really look at that. It's not very useful. So how do you figure out bugs?

10:33

They're like, well, we just figure it out. Well, yes, but we're not figuring it out. So all of this was born through frustration. And so what I did back then is what I recommend, everybody does now, to answer your question, come back to the point, John, which is take a question that you wish you knew the answer to a very specific question. Not why is our app not performing as we want. Not as in like, why do you know, a very, very specific question. So take your big question,

11:07

and the time this was, why are people being locked out of the app? Why can they not reset their password? They're clicking on this password link and they're saying it's expired, or it goes nowhere, or it doesn't work. Okay. Why are those people like, why is that happening? So that's quite a general question. And you want to break it down into some hypotheses. So that's the first thing. I have a five step process, and this is step one. I'll go through the five step process

11:39

in a minute. So step one is think of a specific question. So a specific question, this might case, might be, okay, I've got one customer here. There's many, many different types of defects. So this one customer here is saying it was expired. I went to the webpage and the link said it had expired. Okay. When did they click on that link? What response did the app give to them? And when did the token

12:10

timeout? Right. So those are three questions. Now they're not going to get us to the answer directly, but there are three questions, very specific questions that we can add instrumentation for. So I take one of those questions. When did the token timeout? Great question. So in order to do that, we need to know when the token was created and what the expire of the token was. This is just a random example off the top and by hand. So if you like, okay, well, we need to know the customer ID.

12:46

We need to know the token. We don't need to actually need to know the exact token, but we need to know the customer ID. We need to know the time that the token was created and the expiry time of that token is at 15 minutes, is it two hours, whatever. So I would then look into the code. That's the next. So we've done step two. Step two is define the data you want to collect. User ID, token expiry and an event saying the token has been created now for this user ID.

13:23

Okay, so that's the second step. The third step is build the instrumentation to do that. So whatever you have to do, maybe it's you need to actually add structured logging to your entire app. I don't know. Maybe it's that you've got the structured logging fine, but there's nothing listening to it, maybe, maybe the tool just can't actually measure what you want it to measure. So maybe you need to invest in a new tool, whatever it is. And then you build some code to instrument

13:50

just that very small piece of functionality. And then once you've done that, you wait for it to deploy. And then you look at the graphs, you look at the logs, you look at the charts, whatever output you've got. And what normally happens is for me, I look at the charts and I say that is not what I wanted at all. Actually, I've misunderstood the problem. I've misunderstood the data I want. Now that I see it, just like you would with agility, true agility, not agile, because agile

14:20

means something else now. But true agility is, you do a little bit of work, you develop a feature, you show the customer they say, not quite right. Go back, adjust it. Closer, but still not quite right. But if you ask them to describe it exactly right from the beginning, it doesn't align with what they want at all. You need to show them. And it's only by showing them that you get feedback. And the same is true for ourselves. It's only by looking at the

14:47

graphs and the logs that I realized that's actually isn't what I wanted to begin with. Or it is, or I'm onto something there. And so I keep then, so I've used the graph. Maybe it was unusable. Maybe I couldn't query the parameter. Maybe there's all sorts of things that might be happening there. So then the last stage is improved. And so from improved, you can go back to the very beginning, ask a different question. Or maybe you just want to iterate on the instrumentation of it,

15:14

deploy it again. I'll add some more like it. Okay. So now we know the token expiry. What's the next question you want to ask? Well, why did, like when did the user actually hit the site? Was it after the token expiry or before? Hmm. Okay. Sounds like an obvious question, but maybe, maybe it's after which would indicate the token really had expired. Oh, it's before. Huh. How could it be expired when it was before? Oh, hang on. What's the time zone of the token?

15:44

Now we're getting into it, right? So you log the time zone. Holy cow, the time zone of the token is out of sync with the time zone of the user. That's what it is. Yeah, I love that. I love that analogy of identifying the use case in order to expose what to observe and where to insert, you know, all of these pieces that are missing. I identify them really, right? Not just in

16:11

certain, but to identify them. I think that's very important. I think in general is like trying to identify the actual use cases in order to know what you even want to capture to begin with, right? Like, yeah, we get through throw a wall of logs at a source resource like Kavanaugh. And it's not very useful, but once you start to abstract the ideas and use cases and how people are actually like using the thing that you've built, you know, you can definitely isolate what it is that you

16:45

actually care about. And I think that I think you're right. Like that is like kind of the whole importance of observability is identifying that use case and exposing what you actually care about as far as all these things that are out because I mean, you know, there's HTTP logs, there's like all kinds of logs and information available that's just like emitting all the time. Like, how do you know and identify, you know, which are really important? And I think it just depends, right? Like,

17:18

what do you, yeah, what are you trying to capture? So it's a great like step wise way to just like start to figure that out, right? Because yeah, it's depending on your role and depending on what, you know, your responsibilities are that could change and that could be different. And your observability needs will change with that. So identifying that is probably most important, I think. But as with everything else, I would say, if you're really not feeling any pain, don't bother,

17:50

just don't bother. I'm not into kind of not really interested in telling people what they should be doing or could be doing. I mean, goodness me, we hear enough of that in engineering, don't we? You should really learn a language every year. You should be Blair. You should be Blair. You should have a sick of it. Absolutely sick of all these gurus telling me what to do and what I should

18:11

be learning and what I and very few of them talk about what's the benefit to me. And in order for me to do anything, in order for me to change as a human being in any way, learn anything, I have to feel the pain of it. If you're not feeling the pain, don't bother. But if you are feeling the pain, if deploys are really glitchy, if you keep asking, for me, the kicker is, if I keep asking questions, I don't have the answer to. That's a concern. And if they're just minor, oh, like, why did I wake up

18:46

10 minutes late today? Who cares? It's not important. But if the site's gone down for the fourth time this month, and every time the site goes down, we lose at least five grand, 10 grand, maybe even more. And even worse, every single time the site does go down, we just kind of get it back up more by looking than good judgment. This kind of feeling of, oh, we kind of got away with it that time. That's okay. And oh, there was this weird thing. And it's still not really figured that one out,

19:20

but that's okay. We'll just put it in the backlog. It's the operational risk. You've got to decide, are you comfortable with that operational risk or not? Is it big enough? And in my experience, you've kind of got to hit rock bottom with this stuff. As I did, there were loads and loads of bugs that I could have investigated and added logging form and fix, but it's pushing a boulder up a hill. It's not actually worth it. And it was only when it reached my threshold of pain. I was like, you

19:48

know what, I have to do something about this now. This is just ridiculous. We're professional people, we're being paid a lot of money. And it's not working. The app that we've delivered is not working and what's more, we don't know why. But also, I do just want to add, and this may broaden out the conversation a little bit, you may want to, we may want to keep it narrow on Rails apps. But I've realized that observability principles go way beyond, how does our web app work? It applies to

20:21

any black box. So as an example, a few years ago, I was working at a company and their SEO wasn't great. And they just kind of were like, well, we'll try and fix it. And they had several attempts to fix it. None of them really worked. And every attempt was the same. They would get some expert in. The expert would give us a list of 100 things to do. We would do 80 of the 100. And then nothing would really improve. And then they'd be like, well, we did everything you said.

20:59

And then they'd move on to another and bring some repeat. Keep doing that. And then one day, within four weeks, 20% of the site traffic disappeared. And nobody could tell us why. Nobody understood why observability. Now Google is a black box. So, you know, you're not going to be able to instrument Google. But there's lots of tools that allow you to peer into the inner workings of Google, Sandrush, Scream and Frog, all these kind of tools. They are, in my opinion,

21:32

actually in to some degree the observability space. They're not, you know, everybody thinks of them as marketing tools, search engine optimization tools, whatever, whatever, whatever. They're allowing you to make reason guesses about why your searches aren't performing the way they are. And then you can actually take action on that because now you have some data. Oh, this keyword dropped from place four to place 100. Why is that? Okay, let's try A. Let's try hypothesis A, put that live

22:04

and see if Google will respond to that. Oh, and now up to, you know, position 80, whatever it is. So the idea of observability goes way, way beyond like data, dog and urelic. And obviously all of those people in the observability space. But I see it as a much, much wider, much more applicable topic. Yeah, I hear you there. And I'm all again, I'm all, I'm all also like a, you know, let's not just add new re like to every app that we deploy or, you know, it is bugstank even needed

22:41

for every app. Like these are questions that I ask myself to like, what value are you getting from all these auxiliary services that give you the observability into like just blanket things? Right? Like at what point do you like stop like that kind of mentality and be like, I will, you know, every real app should at least be able to get insight into the logs so that you can see what the application is doing. Like, well, how long do you capture that? Like what kind of time frame?

23:12

Do you have any like default standards where you're like, well, I know that I'm going to need to look at this at some point in the application cycle. Like, what are your defaults? Great question. I would say if you're, if you're making a small app with very little traffic and it's thresholds, I kind of think else, you're making a small app with very little traffic, I have a client at the moment I'm consulting for and I've made them an app and it has maybe

23:50

flipping 20 visits a day or something, 20 hits a day. So I installed roll bar, free version of roll bar, anything goes wrong, I get a notification, it's fine. The further up the stack you move, the more the defaults change. For a Rails app that's missing critical, I'm not even going to say missing critical, we're just serving a decent number of hits a month, 10,000, 20,000. I don't know, I've tried a lot of observability tools and there's

24:27

no one that yet that I can unreservedly recommend. They've all got their pros and cons. DataDog is a good option if money is no object. I kind of don't want to get into the tooling debate because there's, it's kind of a bit of a red-hiring I think in many ways. There's various cost benefit trade-offs there. But in terms of the defaults and terms of what you observe, requests has got to be put there. So every app that I have in my care of any significant size,

25:02

I would always say install semantic logger. Semantic logger is the best logger I've found. Out does Jason out the box, it's quite extensible. There are many problems with it, but it's the best option that we've got. So that's number one. That will log every, like Rails already, logs every request for you. That will form it in Jason for you. There are some notable missing defaults in semantic logger. I'm working on a gem at the moment that will add

25:34

some even more sensible defaults into it. So for example, I believe that request headers do not get logged out of the box. Certainly request body does not get logged out of the box. Requests headers might be. The user agent doesn't get logged out of the box. I mean, this is just pretty basic stuff. So I have a setup that I use that logs a whole load of things about the requests

26:04

out of the box. I like to add in user ID out of the box. It depends what kind of setup you have for authentication, but at the very, very least, some of these logged in, the ID of them should be logged in every single request. That is absolutely, you know, absolute basic stuff. A request ID is also a really, really useful one. I have a complex relationship with logs and tracing because tracing is essentially the pinnacle of observability. I hear a lot of people say logging, like logging

26:41

is a be all an end all. Logging is a great place to start, but tracing is really where it's at. I can go into that why that is in a bit. But logging is a great default. Logging is a good place to start. Start with semantic logger. Basically every single thing that's important in any request should be logged. So that's every header. Obviously you need to be careful with sensitive data in headers.

27:08

Do your Rails active, I can't remember what it's called, but there's the filtering module that you can add in. And sometimes semantic logger doesn't give you that bad default. You need to be a bit careful. A good default as well is logging all background jobs. Background jobs are one of the most painful areas of observability that I've experienced. We still haven't really cracked it. We have some very,

27:39

very basic logging out the box and semantic logger. I believe it logs the job class, the job ID, and a few other things. But it doesn't log the latency, which is a huge, huge missed opportunity. And it also, I don't believe it logs the request ID from when it was encuued. So when a job is encuued, it will by default, semantic logger will trigger a little entry in the logs. This job is encuued and it will tell you what request it came from.

28:13

But on the other side, when it's picked up and the job is performed, that request ID is missing. So you need to go into the request ID, find the encuued job, find the job ID, and then take that next leap. So it's a bit clunky, but it's manageable. So in short, semantic logger gives you some okay defaults out the box, but there's some really basics that it still misses. And so background jobs requests. Those are the two really, really big ones to start out with, but as you can imagine,

28:47

the return log. Yeah, you mentioned kind of some key pieces. I always think of with observability in general, which is like separating the pieces into their own puzzle, right? Like we have logs, which are kind of just like our data. And then we have individual metrics that were like snapshotting the logs for particular segments like traffic or number of people using it,

29:16

like the number of jobs that are running. And then there are traces, which we could dig into next, because I have a lot of, I have a lot of love for all of the standards that are coming out of this with open tracing and things like that. I love to dig in there. But also like alerting like, you know, how does anybody know that there's ever a problem?

29:38

So yeah, I mean, and I love, I love like thinking about it in these separate groups and categories, because I think it also helps to think about like the overarching theme, which is like getting insight, but also like getting meaningful insight. And like when you want really like the only the only reason everybody ever cares about it, to really be anyway, is like when something goes wrong, or something is problematic, that causes something to go wrong, and do you want to either

30:11

catch it early or try and remediate. And so like, where do you find like, I mean background jobs are like kind of like, I feel like the first instance where people realize like, oh, like we need to start looking at, you know, what it's doing. Right? Like you start throwing stuff in the background, you're like, okay, great, like it's doing the work. And then you don't maybe realize if you're on the same node, like, well, those, you know, slow requests can block the web requests. Right? And then,

30:48

okay, well, if you split those up, finally, you got that resolved. But then, okay, well, one problematic, you know, job can back up a queue that it's on, you know, like, where do you, to me, like the background processing aspect is like, why we have tracing to begin with, because it does like, it's concurrency, right? So it's like, that's where everybody like ends up hitting their pitfalls. As soon as you start doing things like all at once, like, and thinking, oh, like we just throw it

31:20

in the background and like process things as they come. And as things start to scale, I cause more problems as you try and find figure out timing and stuff like that. Like, where do you find the most important pieces of like making sure that you, you know, are capturing the right segments and the right flows, you know, in that process? Yeah. And there's so many things you touched on there I want to come back to to answer your question. First of all, and it's the five steps that I

31:52

walk through. Yeah. That's the short answer is if you have a specific question that you cannot answer, what we're really talking about is the implementation details of how you answer that question. So what question you pick determines a whole load of, a whole load of stuff. I can't just give you a bog standard answer because it just, it depends. I hate saying that, but it does. And so I think the first question is to ask the question figure out what data is missing and then choose the right

32:28

piece to add into your logs. I feel like I've maybe not understood your question, maybe. Yeah, I mean, it's more of like an open, open question. I guess, like, when trying to think about, like, one of my biggest debugging pitfalls is like trying to like reconstruct the state of what happened when something went wrong. It's like, I feel like that's like one of the most typical things. It's like, okay, something happened. Well, like, the data has changed since something

33:05

had happened. Maybe the change resolved the issue, but like, you know, trying to figure out what that is and going running through those questions, right? Like, how do you think about like reconstructing data or reconstructing the state of an issue? Like, is that not the right way to go about it or do you try and like do something else? Fantastic question. So, and this gets to the root of why the three pillars are complete nonsense. Okay, so there'll be a lot of the three pillars.

33:42

Metrics, traces and logs. Okay, nonsense. They're not three pillars. And the analogy I like to use is saying that observability is three pillars and it's traces, logs and metrics is a bit like saying programming is three pillars. It's arrays, integers and strings. It's the same kind of deal. It's, no, it's nothing to do with those things. Well, it is because you use those every day.

34:11

Yes, but you're kind of missing the point. So thanks to some amazing work by people at Honeycomb and Charity majors and reading their stuff and reading their incredible work, I've realized that, pardon me, metrics, tracing, logs and missing the point. The point is we want to see events that happened at some point in time. And that neatly answers your question about how do you reconstruct state of the app? I mean, the short answer is of course you can't.

34:45

If you're not in an event-driven system, if you're in a crewdap, if you're storing state to the database, there is no way you can go back in time and accurately recreate it. But we can give it a reasonably good step. And we can do this by capturing the state of each event when it was, forget about observability tools and logging and structured logging and tracing just now. Imagine if, when that incident happened, let's say my expired token would be a, would be a, maybe potentially a

35:18

good example. There are several points in that timeline that we want to understand. Number one, when the token was created. Number two, when the user hit the website and maybe there's a third one, when the account was created, let's say that. So imagine if at each of those three points, we had a rich event with everything related to that event in it. So when the account was created, we had the account ID, the status of the account, whether it's pending or not, the creation date,

35:51

the customer, the customer ID, blah, blah, blah, blah. And then when the user visited the site, what was the request? What was the request ID? What was user ID? What was the anonymous user ID, etc, etc. And then when the token was created, what was the expiry? What was the this? What was the that? What was the user ID? Okay. So if we have those three events and we have enough rich data gathered with each of the events, we can answer your question. Does that make sense so far? There's

36:24

a whole load of more blah, blah, blah, blah. But does that make sense so far? I think that you're making some great points of like capturing the transactional user information or user's actions. Yes. And also other events that happen in the system. So there's user did something, computer did something, computer, and queued about ground job performed, a job, etc, etc. So the way I think about it is everything that happens in your app, whether it's initialized by the computer,

36:54

an external data source, it's basic events, stopming stuff rarely. That creates an event. And that event, if you don't capture enough data, that is it. The data is lost forever if you're not in an event. Assuming you're not doing event sourcing and assuming you're not in an event driven system. So the way I think about it at the most core fundamental level is whether it's logstracist metrics, whatever it is, we need a way of capturing those events. And more importantly,

37:23

ideally, we need to link the events together. And this is really, really, really important. So if somebody creates, let's say somebody hits our app and it creates the token, well there's two parts that they hit the app, there was a request to our app. And then in the core stack somewhere, the token is created. Those two things are two separate events, but they're nested. We want to capture that causal relationship, one cause the other. One is a subset of the other, one is a

37:56

parent, a child, however you want to put it, without that causal link, we're lost again. We don't know what's caused what. So there are some like three or four ideas here. Number one, events, number two, contextual data with each of those events. And number three, nested, nested events. If you like, causal relationships between events. And with those three things, you can debug any

38:27

problem that you would like. It's my claim. And so if you just keep that model in mind, let's examine traces logs and metrics and see where they foreshort, see which one meets those criteria. So tracing gives us all three. So for those of you who I should explain what tracing is, because I was confused about what tracing even was for absolutely years. So tracing allows you to when somebody hits your app, a trace is started. So there are two concepts in tracing. There's traces

39:04

and there are spans. And then there's the data associating the span, but let's just leave that one side. So when somebody hits your app with a request, a trace is started. And so the trace will be like, OK, I've started. Here I am. You can append any data that you want to me while I'm open. It's like opening the cupboard door. And then you keep putting stuff in the cupboard. And I want to cupboard doors closed. You can't put any more stuff in it. Very simple analogy. So we open the door,

39:35

we start the trace. And so it goes down to the controller level. And the controller says, oh, I'm going to glom on some data into whatever the existing trace is about the method, the post body, the request, blah, blah, blah, blah, headers, whatever it is. I'm going to glom that onto the current trace. And then we get down into maybe you've got a service object to know some people hate them. I love them, blah, blah, blah, whatever. That's not the podcast about job. And so you get

40:03

into a service object. And the service object says, oh, whatever is in the current trace, I want you to know you hit me. And you hit me with these arguments. Cool. I'm going to append that to the trace as well. And then we, and cue a background job. That event gets added onto the trace. And then even more excitingly, there's a setting in open symmetry where when the job is picked up and performed, the the the traces kept open. And there's a whole load of debate about whether this is a good idea

40:32

or not, but you can do it. You can keep the trace open until that job is started. And so the job says, ah, I kept off now. It gloms a whole load muscle. Maybe you make an API request in the job. It gloms a whole load more stuff into the trace. And then it comes all the way back up the stat. And you have this trace with all this nested context. And when it's saying, I'm going to glom this data onto the trace, that's called a spam. And a spam is nested. So you can have spans nested inside

41:04

spans inside span. So essentially, it's this big tree structure. And you might have seen this before. It's the flame graph that you get in in data dog and New Relic and all these kind of things. Everybody looks at these things and thinks they're really pretty and they are indeed they are. So that is the that's the pinnacle of observability in my head. Traces give it a zole. And we can say, as you can do in any of these observability tools that support tracing, you can use some really

41:31

cool stuff. Show me all the requests that were a 200 that encued a job where the job lasted for more than three seconds. Holy cow, now we're cooking with gas. We've got everything that we need. Show me all the all the spans that indicated anything to do with the background job, where it was a 500 response, but the user was logged in and and and and and. And so we can start to not only query the the spans, but query the parents of the spans. So you've got all these nested

42:03

calls a relationship and it gets ridiculously powerful. So that's traces. Cool. Let's look at logs. What does what do logs give us? Well, it gives us events. That's all logs are really. It's a series of events that happen. Does it give us the ability to nest events inside one another? Nope. Sorry, you looks out. You can log causation IDs and you can link them together and obviously you can log request IDs and filter everything by the request I do, but there's no concept in the

42:34

log of this log is nested inside this other log. So that information goodbye is gone. Don't have it. But you have the rich data in every event. Let's look at metrics. What does metrics give you? Doesn't give you the events. Doesn't give you the nesting and it just gives you some aggregated numbers. So I don't think of them as three pillars. They're three rungs of a ladder. The very top

43:05

wrong is tracing. Awesome. The next one down is logs. Pretty good. And metrics are useless. Now when I say metrics are useless, people get upset with me and they're like, oh, well, I look at metrics all the time to understand my app. Yeah. Okay. But if you derive metrics from higher rungs, that's totally cool, totally fine. But what's a really bad idea is to directly say, I'm going to send this metric right now to my back. And people do this all the time. People think this is a good

43:38

idea. It's okay. I mean, it's better than nothing, right? It's just depends on the fidelity of information you want. But the problem is there's two problems actually. But the main one is you've sent that data. Okay. You've sent it to Prometheus, Datadog, whatever. You sent that one data point. So then you look in the metrics and you say, holy cow, we're getting all these 500s. Why is that? I'll sit here and wait as long as you want. You're not going to be able to tell me the answer to

44:07

the question. Unless it's blinding the obvious. And that's you can say, oh, well, this other bit of data over here is like correlates with it time wise. And maybe it might be that. Yeah. Okay, it might be that. How do you know it's that? Well, we're having to guess. Gress at guessing is not a strategy. Hope is not a strategy. I don't really want the debug by just flipping guessing. I want to know. And the only way of knowing is having traces. So the way I like

44:32

to think of it is tracing is the pinnacle. Logs can be derived from traces, which is why the three wrongs have ladder. And everything can be derived as a metric from the two wrongs above. So if you've got only logs, you don't have any nested context, but you can get metrics from logs. Fine. If you just have metrics, I would say you're not in great shape because you can't understand why without pure guessing. And it amazes me how many people push back on this idea and think just having

45:03

some metrics is enough. It's nowhere near enough. Not in my experience. If somebody wants to refute me and come on this podcast or have a chat with me after, I would love to listen to how metrics allow you to debug very, very deliberately and get the exact data that you need. You can send off dimensions to metrics and then your metrics bill explodes within about five seconds, especially

45:27

if it's high cardinality data like IP addresses. I've made that mistake before. We're going to send a dimension of IP with our metrics so that we can understand what's going on. In a week, my manager usually messages me usually in less than a week saying, can you turn that off? We just got a day stock bill of like five grand. What's easy? I guess I do have like maybe some specific instances where metrics alone can help

45:56

like identify things. And that's more where it's like the granular metrics are the things that you're actually looking like care about. Right? Like let's say for example, like back to the side kick background jobs example, like if you notice like your cues piling up and you happen to have your dashboard of metrics just looking at cue size and looking at throughput, like you can easily say, oh, like there's something blocking it and gives you kind of a point of where to look at

46:27

in the specific instance. Or as an example, like also, you know, you can notice like there's a leak in memory by monitoring, you know, your memory consumption of the app. And just looking at the metrics for that and getting an alert and say, why is the memory not stopping growing after a certain amount of time? I mean, these are like, you know, very specific

46:52

examples that I'm giving. But like I agree, like if you're looking for like, you know, it's not going to tell you like if your users are like back to your like token expiration, like are people having a problem with our application that we've made like, you know, and like, we keep getting these, you know, client, you know, emails coming in like, oh, I can't like sign into your app. Like what's happening? You know, you can just like take that and be like,

47:24

oh, yeah, it's obviously the tokens like expiration, right? Like it's your customers' emails aren't going to like translate directly to that and you're not going to know right away without having your tracing in place. So I'm going to do a few things that. Number one, you bring up a really good exception. I'd forgotten to mention conveniently. If it's infrastructure stuff, if it's like memory hard disk space, all that kind

47:51

of stuff, fair game for metrics by. The second thing is I'm quite hyperbolic. So I'm quite an extreme person. So when I say they're useless, I don't mean literally they're completely useless. I think of metrics as a hint. Hey, there's something going on over here. Cool, that's not useless. Obviously it's useful. But then the next question is why? And if you've got a super simple system, then it's probably like three things and you know, well,

48:20

there's only three jobs in the system. So cool. And maybe you've segregated your metrics by background jobs, which is fair, you know, it gives you a place to look and it gives you a starting point. But I've yeah, yeah, they're useful in the aggregate and they're useful at giving you a hint. But yes, they're useful in terms of like making surely infrastructure is still running. But I see a lot of people depending on them. And I, you know, there's a guy I really respect

48:52

used to work with him called Lewis Jones. And he and I have gone back and forth on this over over LinkedIn. And he is convinced I'm wrong about this. He's like, we run everything through metrics, metrics are awesome. You're just on cloud nine, if you think you can trace everything. And there's also a significant weakness with tracing as well, which is you can't trace everything. Unless you've got relatively low throughput. Or even medium throughput, you can, you can make it work.

49:20

If you trace every single request and you're doing millions of requests a day, I'd dread to think what your bill is going to be. So, and then that's where head tracing and that had sampling and tail sampling comes into it. We can get into that if you would like. I mean, I would love to dig more into tracing in general. And maybe more of the distributed aspect

49:40

of it. Because I think what you're talking about is very important. Like, if we're just talking about tracing through like a single request in a Rails app, it's not as useful as maybe what we're tracing really comes into play is where there's multiple things that start happening. Like, once you start having more than one application and the, you know, the data starts trickling

50:03

from one application to the other. Even in Sidekick example, right? If you're throwing stuff into the background, how does that data snapshot transition through the background jobs, especially if you have ones that start depending on each other, how do you then manage the queue? Like, and making sure that you know where it started and, you know, where it's going. Because sometimes you can catch a problem before it starts by having the traces in play and

50:28

know where it's heading, right? And so, I would love to dig into those aspects. Like, where do you, like, what tooling, or maybe we shouldn't talk about tooling specifically, but like, what aspects of tracing are most important for like holistically looking at your system outside of like, you know, running through your question. Like, I think at this point, we're beyond like having your questions of what you're trying to look at and that you already know what those questions are

50:59

and where do you start like setting up tracing? Because I know with like a, at Doug Simbri, we do use open tracing as like an open standard for tracing and observability across like platforms, languages and things like that. Do you find that the industry standards are like heading in the right direction or like, where are the pitfalls there? Like, I know it's like, it just introduces

51:24

a lot of dependencies once, once you start to adopt a lot of these things. Totally. So I should say, I am singing the praise as of tracing, but it's a slightly utopian vision that I'm painting because 90% of the work I've done is with logging purely because it's simpler to get going. It's more of a known quantity and a lot of my talks is why I'm not talking a lot about tracing and I'm talking about structured logging because I think structured logging gives you this kind of event-based mindset

52:00

that you can then start extending to tracing and the reverse is not true. Like, you can't take that event-based kind of mindset into metrics because metrics is just that aggregation right. So, but I have like recently I've been doing a lot of queries in our rails up and I've been going to, we use new relic, sorry, we use data dog at work and I've been going to data dogs tracing interface and really trying to answer my questions there instead of in logging. So we have both

52:33

tracing and logging. Tracing is hobbled a little bit just purely because of cost reasons and our logging is not so hobbled. So are there standards heading in the right direction? Yes, but it's going to take a really long time to get there. It's my short answer. There is a lot of different ways of going about tracing. The most promising as we all know is open telemetry, but I mean I read some pretty harsh critiques of open telemetry. There's kind of

53:13

as a topic that generally divides people. If you don't know anything about open telemetry, it sounds absolutely utopia. And I got really excited when I started researching into it. The more you dig into it, the more you realize how much complexity there is to resolve and how many challenges that project faces in order to resolve them. And so I mean what it's trying to resolve is 30 maybe 40 years, possibly even more of the legacy software, right? Because that's how long logging has been

53:47

around. And they're trying to aggregate all of that into one single standard good look. It's a very, very difficult problem to solve. And they're doing an incredible job, but it's very, very difficult. So they have open telemetry is where I'd start with the answer to your question. Open telemetry is 100% the future. I've not seen anything that rivals it. And open tracing, I believe came first, and then evolved into open telemetry. My understanding, apologies if I've got that slightly wrong.

54:18

And so yeah, I think there's a few options if you're in Ruby, none of which are ideal. So the open telemetry client in Ruby is not ready for prime time. It's quite behind the current standards in open telemetry. It doesn't obey any of the latest semantic standards, for example. I have played around with it in an example project. And when it's working, it's absolutely incredible. It's next level brilliant. There are a few problems with it. It's extremely slow. So I tried to use

54:56

tracing on our test suite at work using this open telemetry tracing. And it just, it's like, I can't remember the numbers, but it really slowed down our test suite to the point where it did really just want practical to use. Because we were trying to measure the performance of the test suite. So, you know, I've got to be doing something stupid there. It's very possible that I

55:17

was wasn't using it the right way. So sorry, I'm sorry, I'm actually having to have. I got that. I know thinking a lady is called Kaylee, who is from New Relic, and she and I'm so sorry, my name is the names that escaped me. But there's a whole bunch of people in the Ruby space who are working really hard on open telemetry. But it's just that like the open telemetry project is moving so fast, that's your problem. So that's option number one, open telemetry. You could maybe fork it

55:52

and tweak it yourself. And the second option and what we use at work is because we're using data. We use data dogs tracing tool, which is pretty good. But then even with tracing or logging, I feel like we're kind of maybe 20 years behind where everybody else is in programming in terms of observability. Because one of the questions I often have when I look at this stuff and even think about tracing, I maybe have like five, six, seven questions that even I can't resolve. Which is what do I trace?

56:28

How much detail do I trace in? How much is this going to cost me? And we're still in the stone age with a lot of this stuff. So I don't have any good answers for you in that regard. So we use the vendor tooling for tracing. I'm sure URelic has its own version of that. In fact, I know they do. I know, Sentry does. There are certain other providers that don't have any tracing capabilities at all. So I would say for now, the best option we have is relying on the vendor tracing tools, I would say.

57:02

Yeah, it's funny you mentioned data dog. We've had Ivo on before from Dave Dog to talk about a lot of the, I think, memory profiling. He works on a lot of granular Ruby performance tooling. Really interesting stuff. Yeah, I would love to see maybe some more, I don't know, higher level examples of making use of open telemetry in the Ruby space in general.

57:32

Because I think that, I mean, especially with all of the solid queue, like our solid trifecta or whatever stuff that's coming around, it would be nice to see something like tracing specifically introduced to Rails that would make more sense in that ecosystem. Yeah, I mean, where do you start profiling stuff is like kind of like an intro to tracing? Like if you wanted to see the request, it reminds me of was it Rackmini profiler tool,

58:05

right? Where you, you can just see a little tiny tab that says, oh, it took this number of seconds to load this particular page you wanted to get. And you can click on an expand and see, oh, well, what did your application do at each step of the way and see how long you just think took, right? And I think of that as like a trace a lot of the times, right? And it's very like useful,

58:29

like even when you're just starting out to see that, right? And it helps you visualize the, and so I feel like maybe that's what's missing is a lot of like visualization aspects of all this tracing stuff because there's something that you look at or find useful when you're starting to dig into like structuring the traces and things like that. Definitely. That's leading me up to my one of my big kind of rants, passions, whatever within the observer's space.

59:02

And I don't see anybody talking about this. I feel like it's either I'm onto a really great idea or it's an unbelievably idiotic idea for some reason that I don't know. It's usually the latter as a spoiler. Okay, so when I'm looking at traces, there's almost never enough information, almost never enough information. And this is why charity majors and the team at Hunnicome and

59:31

there's von Jones always talk about have wide context aware events. That's their mantra, wide context aware events and events we've already talked about, context we've already talked about. We haven't talked much about the wide. So wide means lots of attributes. So their take on it is add as many attributes as you can to every event. I make them high card and I'll see attributes. What does that mean? It took me about three months to wrap my head around

01:00:06

what high card and I'll see means it means anything ending in an ID. There you go. That's an easy explanation. So a request ID and a thing that looks, oops, sorry, that was me in my microphone. Anything that looks good like anything that is a unique identifier for anything. So that's user ID, request ID, but anything that is a domain object and this is the real missed opportunity, I think that we have in the Rails community. I'm the observability community potentially in general.

01:00:40

When there is, when something goes wrong, even when something goes right, let's take the token as an example. When that token is created, the token is a domain object. Now, okay, it's to do with authentication. So it's not really a domain object in a way. But let's say that cusp is signing up for an account. The account definitely is a domain object. And if you want to understand what I mean by a domain object, I just mean an object that belongs to the business domain in which

01:01:19

you're operating. It's a business object, a domain object, call it what you will. But when the CTO or the, even better, the CEO or somebody in marketing talks about this customer account, they talk about people creating accounts, they use that word account. And that's your first clue that it's a really important concept in the domain. So that's what I say when I mean domain objects. I mean words that non-technical people use to describe your app. So they're domain objects.

01:01:50

Why are we not adding every relevant domain object to every event? We don't do it. And so what you'll see is people do this kind of half-hearted, oh well, we'll add the ID to the current spam or the current trace or even the current log. We'll add the ID. And that's okay. That'll be enough. But you're not capturing the state of the object. Why not just take the object in this case the account, convert it into a hash and attach it to the event. Why can't we do that? Now there's a

01:02:25

number of reasons why we actually can't do that in some cases. If you're build in terms of the size of your event, so if you're build on data, obviously that's going to get expensive fast. But if you're build on pure events, as in your observability provider, your observability tooling is, is saying for every X number of events or X number of logs per month, we will charge you

01:02:52

this much. But the size doesn't matter. Then this is a perfect use case to be taking those rich domain objects, converting them into a structured format and dumping them in the log or the trace. And so I've kind of thought about this quite a lot. I've come up with a few quite simple ideas that people can use starting tomorrow in their rail times. Not without their problems, but the first of which is, I don't know if anybody's worked with Formatted. So two Formatted S

01:03:29

for date time strings. And we have this idea in Ruby, don't we, of duct typing. We have an object and really good OO designers that you shouldn't understand anything about that object. You just know it's got four methods on it. And it can be an account, it can be an invoice, it can be many different things. So my approach, and I'm testing this approach at work at the moment, is instead of having two Formatted S have two Formatted H. What does that mean? It means you're going to Format,

01:04:02

the domain object as a hash. And so two Formatted S allows you to pass in a symbol to define the kind of format that you want. So it can be short, ordinal, long, humanized, and it will output a string, output a string-ified version of that date in these different formats. So my idea is, why can't we have a method on every single domain object in our Rails app called two Formatted H, and you pass it in a format, that format could be then open to Lermetry. It could be anyone of the numbers a

01:04:40

short, compact. And so for every trace, the way I like to think of it is, I want to into that trace, add every object that's related to that. And you could format those in open to Lermetry format, for example, or you could have a full format, or a long format, whatever you want. And so that way you can say, I just want to, I want a representation of the account that is short, and it's just got the RD, and that's a totally minimal skeleton, and that's enough for me.

01:05:14

But actually here, the work I'm doing is a bit more involved. So I want to call two Formatted H with full, and that will give the full account, like the update that created that everything about it, and then that will be sent to my logs and traces, and I now have a standardized way of observing what's going on with all the rich data of my app, app state at that point with all the relevant domain objects in it. So that's my dream that I'm headed towards with this gem.

01:05:45

So that's kind of the way I think about structuring it. And I think about the people, I see people doing all this ad hoc kind of, well, this is an ID, and then we'll call the job ID, a job underscore ID, I suppose, and we'll, what's the account? We can call that account underscore ID, and I'd just like to think of it as, imagine your domain object, so an account has a customer,

01:06:08

a customer has some bank details, bank details is a bad idea, but address maybe. And so we could have these different formats that load nested relationships or not, and obviously you've got to be careful about the performance problems with that. And so you'll have the exact structure of your domain object in your logs, in your traces, that for me is the dream. And then every single time an

01:06:34

account is logged, it's in the same structure. Awesome. So I know that an account is always going to have an ID, it's always going to have a, whatever other attributes account at pending status, whatever it is. And so therefore I can say show me every trace where the account was pending.

01:06:51

Yeah, I love that idea. And it reminds me a little of the introduction of the you know, the new Rails like, you know, logger, where you can tag, you know, the tag logger was, it was kind of like a start to kind of this idea, okay, capture all of these pieces with this tag, and it's like almost a pseudo trace, I call it, but it does go along that formatting aspect of like, okay, format all the things like this in a specific way. And I agree that there's definitely

01:07:30

a lot to unwind there. We'll have to have you on more if you at, you know, when you put this together as a gem or something, because I would love to dig into that. Cool. Yeah, I mean, it definitely is, I love the idea of like the domain objects and extracting those out into a formatted way that you can then trace and follow through, because that,

01:07:56

that design decision has definitely missed a lot. And seeing things like PacWork as an example was a great step in the right direction I thought, and I like to see more of that kind of evolve in the reals ecosystem of abstracting the domains into their own kind of segments, and then being able to format them for traceability and things like that, I think you're onto the right. You're onto a lot here. And then I mean, the thing that I think is unbelievably heroic is,

01:08:28

all I'm talking about is convention over configuration. And is that not why we all got into rails? I know Ruby is a different thing, but rails is all about convention over configuration. And the entire area of observability strikes me could do with a massive dollop of convention over configuration. And that's what open symmetry are trying to do. The one last thing, I know the time is getting on, but the one last thing I want to just say on that is,

01:08:54

the other huge opportunity is adding context to errors. So we have these exception objects in Ruby, and people store strings with them. It's like, what? How do you suppose how am I supposed to understand anything from a string? And then people try and put IDs and the strings and you're like, no, so at work, I've made this extremely simple, basically a subclass of standard error, where you can attach context. So when you create the error, you pass in structured context.

01:09:27

So if our logs are structured, surely our errors should be structured as well. Make sense, right? So you can say, this error happened, and here was the account associated with it when that error happened. And here's a user, and here's this, so it gets attached into the error. And then using rails is new error handling, rails.error.handle. If you've not used it before, look it up. It's absolutely awesome. It's one of my favorite things that they've added to rails recently, rather simply recently

01:09:57

last few years. And you can basically have listeners to these events, to these errors, they pardon. It will catch the errors. And then the context is encapsulated in the error. So you can pass these errors around, and then you can do interesting stuff with that context. And all I do is pull out all the context and send it straight into the logs. And that has absolutely changed the way I debug.

01:10:25

Because when there's an error, and it has all those rich data, you just look in the rich data, and you're like, oh, that was the account, that was the Shopify ID, that was a product ID. I've got it. And then you just look at the ID in your external, all right, okay, it's out sync, whatever it is. It makes life so much easier. So that's something I'm really passionate about, as well, having domain objects encapsulated within errors. So we've got structured errors,

01:10:50

not just structured logs. Yeah, I mean, that's definitely one thing that I look for when I'm looking for, you know, installing dependencies, right, like does the gem have its own, you know, base error class that it then can, you know, give metadata about whatever that it's raising the errors about, like more than just like a string of some error that then you have to figure out what it is.

01:11:13

Like having that extra-mated data that you could just, because you can, you could just add attributes to a class, right, and say, this error has these attributes, like it has, you know, meaning associated with the error. I think more people doing that is definitely going to be making that easier to do, first of all. But yeah, and then also getting more people to take on that convention.

01:11:34

I completely agree with you there. Yeah, I mean, we are getting a time here. Is there any last, you know, pieces you wanted to, you know, quickly highlight or mention before we, you know, move into fix? I think the main thing is if you're listening to this and anything that I'm saying is resonating, forget about the domain object stuff. That's like getting really into the nitty gritty. But coming back to the beginning, if you're frustrated by your debugging experience, if you're thinking,

01:12:06

why am I not smart enough to understand this? Chances are the problem is not with you. It's with the tools. So if you improve the tools, not only do you make your life easier and better, you level up everybody around you, because all the engineers can use the same tools. And that's what we've experienced at Big O'Pockets. And that culture of observability has really worked its way into our culture, so that now anybody is equipped to go into the logs and ask any question that they

01:12:36

want. So it is a long road, but it all starts with a single step. And so if you are feeling that pain, feel free to reach out to me. I can go through all my socials in a minute, but feel free to reach out to me, ask me any questions. I'm happy to jump on a Zoom call for half an hour and help you for free. But basically, it all starts by taking very small steps towards a very specific question. Don't try and add observability, because you'll still be here next Christmas. So take

01:13:10

heed, there is hope. And if anything that I say resonates, please feel free to reach out to me, and I'll help you figure out. That's awesome. Yeah, I also echo that sentiment of tooling is so important. And open tracing definitely is a great framework. And if we can improve that in the Ruby space, that will definitely will be reaping the rewards as well. So let's move into PIX. John, do you have anything that you want to share first or you want me to go? I'm I limited to one

01:13:54

pick because I have many. Okay, go ahead. So the first one is a new language. And I already thoroughly trounce the idea that we should be learning one programming language a year, or rather, I just just distit off without actually giving much justification. So I'm going to go back on what I just said and say that this language has changed the way I think pretty much forever. And it's changed the way I see Ruby and Rails and just programming in general. And the language is called

01:14:31

Unison. Now it's a very, very strange, unusual language. It's maybe not that readable in places. And it's also extremely new. I mean, it's been going for five or six years, but what they're trying to do is incredibly ambitious. But look it up. It's, yeah, it's an incredibly interesting language and it will expand your mind. And that's what it's certainly what it's done for me. And so it's kind of a language that's targeted at creating programs that are just much, much simpler. But

01:15:11

actually more, more difficult to get your head around. It's a completely new paradigm for distributed computing, basically. And it's absolutely fascinating. So I would highly suggest check that out. I know that Dave Thomas at Yuruko, when I spoke at Yuruko recently, he was on stage and he was championing Unison. And he called it the future of programming and I could not agree more. It's an incredible language made some, both some incredibly smart people. So that's number one.

01:15:40

Number two, there is a static site builder. I've used pretty much all the static site builders on planet earth. And this is my favorite. It's called 11 tea. It's a really odd name. But I am embarking upon this project at work that really is exciting me, which is how do you serve UI components from a dynamic app for it. So Rails and meld them into a static site builder

01:16:12

without having a pile of JavaScript that you have to wave through. So I want to offer my UI components in Rails and I want to deliver them extremely fast through a static site that's just a blog without having to run that blog on Rails. So 11 tea is my go to tool for doing all that stuff. It also encompasses this thing called WebC, which is my new favorite templating language. Yes, I know another templating language. I promise, I promise it's really good. It's not another

01:16:44

retread of all these other templating languages that are very, very niche and very, whatever. So WebC is compatible with WebComponents and it's a fantastic way of making HTML-like components that are server-side rendered. And I would love to see a plug-in for that come to Rails because it is

01:17:06

absolutely phenomenal. So those are my two favorite things at the moment. If anybody is trying to wrestle with UI components in Rails and trying to extract them out of Rails components also, I would love to chat through that with anybody who's interested in that kind of area because I think it's yeah, there's a potential to really break you ground. How about you? Yeah, thanks. I'll definitely be digging into some of those. Yeah, I was at a in New York City the

01:17:36

other day for the Ruby AI happy hour that they've been doing every couple months. This time they did demos and I demoed this real-time podcast buddy that I've made. It's called podcast buddy. It just kind of like listens in the background and in real-time, it keeps track of the topics and the discussions and some example of questions worth mentioning or maybe some topics to transition to. And it's a lot of fun. I just did it for fun. But I recently refactored it to use the ASync framework

01:18:14

and shout out to Samuel Williams, just phenomenal, like so well put together. The documentation is coming along. It is lacking in some areas but I was able to just completely refactor the code so that it works with ASync and runs things as they come in and it's streaming the whisper, transcripts, it performs actions in the background just like in the same thread all managed with ASync. I love it. So check out podcast buddy and check out ASync. You can't go wrong. ASync website.

01:18:51

Now you can handle even web sockets asynchronously, just like completely seamless HGDB2 and one compatible love it. So check those out. And John, if you want to reach out to you on the web or just in general, how can they how can they reach you? Thank you. Yeah. So I'm on LinkedIn. That's a platform I most active on. And my LinkedIn handle is Synaptip Misapp, which is yeah, I really regret that. Sorry everybody.

01:19:29

But yeah, so if you just search for John Gallagher, G A W L A G H E R and maybe Rails or observability, you should be able to find me. I've got quite a cheesy photo, black and white photo, me in a suit. It's a horrible photo. And I blog at joyfulprogramming.com. It's a substack. So is this still a blog anymore? I have no idea. But that's where I write. I'm on Twitter at Synaptip Misapp and my GitHub handle is John Gallagher or one word. So yeah, joyful programming is the

01:20:06

main source of goodies for me. I've also got a fairly minimal YouTube channel called joyful programming. So if you're free to reach out to me, connection request me, ask me any question. I would love to engage with some really folks about observability. Tell me your problems and I'll try and help you wherever I come. Awesome. I love it. Keep up the great work and keep chatting from the mountaintop about observability pulling those fillers down and just focusing on

01:20:36

the important stuff. I love it. So until next time everybody, I'm Valentino. Thanks John for coming on and with more the next time. Thanks for having me Valentino. It's been amazing. Awesome.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656

Episode description

Transcript

Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656

Episode description

Transcript ✨

Transcript