Crawling smarter, not harder - podcast episode cover

Crawling smarter, not harder

Aug 08, 202440 minSeason 1Ep. 79
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

In this episode of SOTR, John Mueller, Lizzi Sassman, and Gary Illyes talk about misconceptions around crawl frequency and site quality, what's challenging about crawling the web nowadays, and how search engines could crawl more efficiently. 

Resources:
Episode transcript → https://goo.gle/sotr079-transcript
Gary's post on LinkedIn  → https://goo.gle/3YAT55q 
Crawling episode with Dave Smart → https://goo.gle/3WShUsf 
If-Modified-Since  → https://goo.gle/3ywXvja 
About the IETF → https://goo.gle/3SGVVlo 
Robots Exclusion Protocol → https://goo.gle/4dgmBSg 
Proposal for new kind of chunked transfer  → https://goo.gle/3AgMF1c 

Listen to more Search Off the Record → https://goo.gle/sotr-yt
Subscribe to Google Search Channel → https://goo.gle/SearchCentral

Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

#SOTRpodcast

Transcript

- John Hello, and welcome to another episode of Search Off the Record, a podcast coming to you from the Google Search team. My name is John, and today we have Lizzi and Gary. Say hi. - Lizzi Don't tell us what to do. - Gary Yeah. Hi. - John Thank you. Thank you. So nice to have you here. Last time we talked with Dave Smart, and apparently we also talked about crawling, but I was not here.

- Gary For the listeners, John is trying to figure out Lizzi's notes because Lizzi started reading this, or wanted to read this, and then John was like, "No, I'll do it." - Lizzi He would not let me do the intro so that we are left with this intro, which is very confusing. - John Okay, go for it, Lizzi.

- Lizzi Okay, so this is supposed to be a part two for people who are not following along, I guess. We had episode one with Dave Smart to talk about what is crawling, and we sort of did a background, I don't know, set-the-stage episode. Since then, Gary has posted too many times about crawling on LinkedIn, so we thought maybe we could talk about that. What?

- Gary What? What do you mean? A) Why was I not told that Dave was part one? 2) What does it mean I'm posting too much or too many things about crawling? What does two mean? - Lizzi 2222. T-W-O. - Gary Your English construction is weird. - Lizzi I heard that you posted about crawling, but I actually didn't-- - Gary You heard? - Lizzi Yes, I heard. You told me that you posted about crawling on LinkedIn and you got some surprising responses from people. Surprising in more senses than one.

- Gary Are you sure? - Lizzi I'm pretty sure it was you. - Gary Oh. - Lizzi I also heard that this year you were going to work on crawling. - Gary What? - Lizzi Is that a true statement? - Gary Yeah. - Lizzi At the beginning of the year, you thought maybe you would do something with crawling?

- Gary Well, yeah. I mean, we've already done some things, I think. But, in general, yes, I think we should do more on crawling in the sense that we should make it more... Well, we should crawl somehow less, which would mean that we crawl more.

- Lizzi I think you did post about that on LinkedIn, and then Barry cross-posted that "Google wants to crawl less." And then the internet broke because they were like, "What?" - Gary Ah, Barry from-- - Lizzi This is like Barry from Search Engine Roundtable. - Gary Yeah. Right. - Lizzi Yes. Barry Schwartz. - John Oh, cool. I mean, it's something I hear from SEOs a lot where they think, well, Google usually crawls more when he thinks my site is good. - Lizzi He? - John Google, the Googlebot.

- Gary They slashed them. - Lizzi Googlebot accepts all pronouns. - John Okay, then that was fine. I'm sorry. - Gary Are you a spokesperson for Googlebot? - Lizzi Yes. - John Okay, so. So, people thought that Googlebot usually crawls more when Googlebot thinks that something is good. The assumption is that you can turn it around as well and be like, "Well, I will push Googlebot to crawl more and then Googlebot will think my site is actually good." Which... - Gary No.

- Lizzi I mean, is that like a chicken and an egg thing though? - John What? - Lizzi Like, does your site have to be good first for Google to then crawl it more? Or does Google crawling more then means your site is good. - John I don't know, Gary. What do you think? - Gary Why me? - John If I can make Googlebot crawl my site more because of my fancy robots.txt file, does that mean that my site will be better in Search? - Gary No. I mean, why would it?

- Lizzi I mean, it sounds like people are using this as a proxy. Like, if Google is interested in my site more often, then that means my stuff is good. - Gary It could also mean that there's an infinite space on the site. - John Oh, that's a cool hack. I'll put a calendar script on my site. - Gary No, no. No. Sit down, please. - Lizzi Has this always been a thing, that people think that more crawling is equals good.

- Gary Yes, I think so. I mean, in one of the presentations that we keep doing, Search Central Live events, that is actually about myth busting. And it has at least one or two questions about crawling. And then it's like, "Oh, Google is crawling my site a lot, so my site must be very good." And it's like, "Nah, not really." Like it can mean many things, but generally if the content of a site is of high quality and it's helpful and people like it in general, then Googlebot--well, Google--tends to crawl more from that site, but it can also mean that, I don't know, the site was hacked. And then there's a bunch of new URLs that Googlebot gets excited about, and then it goes out and then it's crawling like crazy. Or we discover John's calendar script, and then we try to crawl every single URL for every day until 2077. It can mean other things as well than just quality. But then, on the flip side, if we are not crawling much or we are gradually slowing down with crawling, that might be a sign of low-quality content or that we rethought the quality of the site.

- Lizzi But what if it's not changing, like the content? We go and crawl it, and they haven't made a change. Why would we need to go crawl that often again if they're not making a lot of changes? - Gary I mean, we have to go back and see if it changed, right? - Lizzi But if we notice that it's not changing, do we then slow it down? - Gary Well, we still have to go back. - Lizzi But would that result in, over time, less?

- Gary Probably, but I don't know. John has a site that he hasn't updated in like 72 years. I'm looking at the logs here. - John Sure. - Gary And he could say: - John It still gets crawled. Yeah. I think it's challenging with those kind of sites because maybe it didn't get updated in the last couple of months, but maybe it gets updated in five minutes. - Lizzi Okay, so Google still wants to check, just in case.

- John That's my understanding at least. Yeah. I think with regards to the amount of crawling and the external perception, there's also the aspect of like a lot of sites have a lot of different pages, and then it's not so much that Google crawls one page very often. It's sometimes just like, well, if you have all of these pages and Google has never crawled them, then Google wouldn't be able to know what to do with it. So some of that perception of like, well, if only Google could crawl more, then it would see that I actually have some good content. I can kind of understand that.

- Lizzi Is it more about crawling more often? - John My assumption is that a lot of people just look at the Crawl Stats report in Search Console or server logs and just look at the number of requests over time. And then you don't necessarily see it's like, "Oh, it's looking at my home page every day," but more like, "It's looking at 500 pages every day." But which ones? - Lizzi Are they hoping to see that just increasing over time? What's the ideal state from a site owner's perspective?

- Gary I think so. - Lizzi Because that also seems like it maybe bad. - Gary You know that form that we linked to on onesie on developers.google.com/search where you can report issues with Googlebot? - Lizzi Yeah.

- Gary Those reports end up in our inboxes. There we see sometimes that people are like, "Increase our crawl over time." And it doesn't work. We are not going to increase anyone's crawling if they write in through that form. Like, if there's some crawling emergency, then we would decrease the crawl volume for that site. But it's obvious that they want increased crawling over time. Some people want.

- Lizzi Okay, so you're saying that the form is there and you're supposed to use it only to report like too much, like your servers are being overloaded. - Gary Yeah, but, I mean, it's a form. - Lizzi But people are filling it out anyway and they're like, "Give me more!"

- Gary Yeah, but it's a form. We are quite explicit about what you should use that form for. But then it's a form. So it's like people are going to people anyway. We get other requests as well, which we cannot satisfy, but we still get them. - Lizzi How would that work? Or have we ever considered a method like that, where people can ask? - John Automatically? - Lizzi Yeah. - John We had the setting in Search Console, but that was about limiting, so reducing the amount of crawl.

- Lizzi Still about a limit. - Gary But it's always about limiting, because the upper part that has to be determined about what the server tells us about how much it can handle. - Lizzi What if it says, "I can handle everything?" - Gary Well, it would not be able to. At one point, we would crash the server and we wouldn't be able to connect to it. That would be a very clear signal that we have to slow down.

- Lizzi Okay, so is it more of a site owners not understanding that dynamic, like what it means to request more, that that effect will then be that their servers crash?

- Gary I think the confusing part is that there are two parts to this. One is what the server can handle, and then there's the quality aspect to it. The content on the site has to be of high quality and useful for users or helpful for users. And then the Search demand for crawling would increase and then we would crawl more potentially. And then the technical part comes into play, like how much can we actually crawl without harming the server? - Lizzi Okay.

- Gary But it's not infinite. There has to be a limit because the server doesn't have infinite resources. - Lizzi Right. But this year you thought we can optimize there, that there's something that we can do?

- Gary I mean, we were thinking about this for a long time. There was always crawl optimizations going around. If you look at the early posts on blog posts on onesie, on the blog, even in the early days 2006, 2007, they--Vanessa Fox, former product manager for the old Webmaster Tools, and the team--were already thinking about how to optimize crawling more.

- Lizzi Is it usually the same sort of approach, like we want to be more efficient about what we're doing, or is it like a timing thing? Is there something new that we could be doing that we haven't thought of before?

- Gary It's a combination, I guess. Sitemaps, I don't know. John was involved with sitemaps early on, but sitemaps was one of those optimizations. And, on our site, I don't know, like 304, and if modified since, that was something that had to be implemented on our side, the support for it, I mean. - John Cool. And, with If-Modified-Since, is that something that you see people are doing correctly or is that something others should be doing differently?

- Gary Wait. If modified since, that's a request header so it's us doing it correctly. - John Well, it could be that the site says, "Oh, yes, everything changed today." - Gary Oh I see. - John It's like we asked, "Has it changed since yesterday?" and the site said, "Yes, yes." It's like, "You must take a look."

- Lizzi I see. Because it could be something that's automatically in place, like, "Yes, I update a link," but then my CMS says, "Okay, today is the new date that I published content." And so therefore it gets interpreted that I made a change, therefore come look at it. - Gary The response to an if modified since would be a 304, right. - John I think a 304 is not modified. I don't know offhand. I would have to ask my friend Gemini. - Lizzi 304. Not modified. HTTP server response code.

- John Okay, so 304 would be like, "No, Google. Nothing has changed here." And a 200, I think would be the response then if it's like, "Okay, here is actually the new version."

- Gary Right. I think there's also caching directives that you can respond with. There's, I don't remember the name of the Apache server module, but there are other caching directives as well that you can respond with. I think, on our side, it's implemented externally. Doesn't seem to be used enough, I think. Basically, people are just responding with, like even if we send out the If-Modified-Since request header, servers are responding with just 200, basically just ignoring it. I don't think that's necessarily a good thing. But then, at least at Google, there are a few products that probably prefer that. Probably.

- Lizzi How so? - Gary Like, for example, news. I would imagine that they don't want, especially for live news, live blog stuff. - Lizzi Like really time sensitive things that are happening, like as a cricket match is happening or something.

- Gary We don't want to cache those, I guess. I don't know, but this is exactly what I want to analyze: how much 304 is used by external sites, how many If-Modified-Since headers are we sending out with our fetches, and then try to encourage people to use it more because it can save quite a bit of bandwidth and by definition, also resources for the servers. - Lizzi I see. - Gary Like, on our side, we don't particularly care about the resources for crawling.

- Lizzi How does it save resources? Is it because we can just do a little quick check and then we don't have to fully look at everything? - Gary Yeah, exactly. A 304 response that, if I remember correctly, the RFC, the standard says that you don't put the HTTP response body in it. There should not be a response body. It's just the headers. So, basically, you send back, what, like a 1000 bytes instead of like 100,000 bytes or whatever it is.

- Lizzi It's a lot smaller back, and therefore not taking up as much space from our side. - John Yeah, and I guess the server doesn't need to compile the full page. - Gary Yeah. - John The server can just do the lookup in a database and you're like, "Oh, nothing new. Move along." without having to actually compile the whole thing. It makes it more efficient, I imagine, for both sides.

- Gary If you are thinking about our CMS that we are using for onesie, there are lots of moving parts on onesie. Like, for example, if you go to, I don't know, the blog home page, then you have the TOC on the left, or whatever we call it, but the book on the left, you have the title, you have the metadata that we have in the HTML. We have them at the data from dev site, the CMS that we use, and then you have the content. And then, for all of those, you have to make these weird calls to pull in and to compile. And then, all those calls, they cost resources. But then, if you can just make that one call that John said, that just check whether anything changed. Just one call. Just one call.

- Lizzi And it doesn't matter, like that's part step number two, to figure out whether or not something actually did change. We're just checking anyway. It doesn't matter if the change is big or not. I assume, in the next step, it would be to see like, "Okay, well what changed?" - John Well that's, I think on the server side, the server basically just says, "Something changed. Here's everything." It's not like, "Here's a part of the page that has changed." - Gary Which would be interesting.

- Lizzi Is that something that a theoretical space that we could look at? Like, if we could say like, "Hey, actually it was just this one paragraph. That's where I made the change. You don't need to look at everything. Just this one thing was the changed." Would that be helpful if that were able to be like compartmentalized somehow? - John From my point of view probably, but implementing it sounds like a nightmare. I don't know, maybe Gary wants to do it anyway. - Gary What?

- Lizzi I mean, is this something that you would be thinking about or is this like, nope, crazy? - Gary No, it's not. I mean, it's crazy, but it's the kind of crazy that we actually like. What? - John Good. Okay.

- Gary It's a challenging task that can save lots of resources for the internet. Not on our side because, again, I wouldn't say that we have infinite resources, but especially with crawling, it's a tiny, tiny, tiny fraction of our resource usage. I ran out of air. Crawling is a tiny fraction of our resource usage, but from an external perspective, where they have to render the pages and make all those calls to make one page, just sending back the part that actually changed, that sounds like a cool thing, especially with, even in older HTTP versions, like I think starting from 1.1, there was chunked transfer. Basically, you could just say that, "From this segment to this segment, this is the part." And then you could just give that to the client from the server. But it was more complicated then. I think it was slightly broken. Every now and then, the chunks would get messed up. But then, someone pointed out on LinkedIn that someone on the IETF Track, Internet Engineering Task Force, which is a standards body where the Robots Exclusion Protocol also lives. Someone submitted a proposal for a new kind of chunked transfer. I'm watching that closely to see where it's going.

- Lizzi How are they currently thinking about it? Is it like navigation, up here, and then the middle of the page is here? Or is it something more like, "This stuff changes really frequently." } - Gary That's my naive thinking. I think it's more complex than that. I would need to check the current draft to tell you how it actually works. But my naive thinking, that was that. Like, "Here's the header. Here's the sidebar." I'm fairly certain it's not that simple.

- John I imagine that's tricky because you almost have to render the page to understand the DOM if you're saying like, "Oh, the header changed." whereas, from a technical point of view, if you can say, "Oh, bytes 500 to 700 are now this thing," then that's easier. - Lizzi But people don't reliably put it in that same spot because it's free.

- Gary It's more interesting, and more reliable most likely, because it's not up to the person. It's down to the server. And, of course, you can hack around with the server, like both John and I did stupid things with our servers to fool people. Okay. - Lizzi Interesting. - Gary Apparently John didn't. Okay. I take it back. - John Never. Never.

- Gary You can make the server do stupid things, but you need quite a bit of knowledge about, well, in my case, I was on Apache about server modules, like Apache modules, and especially C to be able to modify modules enough to make them do something stupid. - John I think it's also challenging because it mixes the content with the infrastructure. It's almost like different levels of interaction. But I think it would be cool if people could say, "Oh, actually, only this news item changed."

- Lizzi Yeah, or like, on a product page, my pricing, "This little area is the thing that is changing all the time, but the description of this pair of shoes is the same." - John Exactly. Yeah, I don't know, from a personal point of view, I think that would be cool. Yeah, and the chunked transfer, I think is pretty common. It's also done for videos, I think, or large files where you have to-- - Gary For large files, for sure. - John Yeah. - Gary Also I think POSTs, like POST methods.

- John Yeah. I don't know that sounds pretty cool. What other kinds of optimizations do you see happening with regards to crawling? - Gary Maybe better URL parameter handling. - Lizzi What? - John Oh, okay. - Lizzi Like hashtags. - Gary Oh, hashtags. Hashtags are complicated, and we have a very complicated relationship with them, I think. - John Do you mean hashtags or like, what is it, anchors, like the pound symbol? - Lizzi Oh, sorry. The pound symbol. The hash symbol?

- Gary Yeah. I just assumed that you meant that. - Lizzi Sorry, I did mean that. - Gary The problem with them is that they only live on the client side. - Lizzi Okay. And why is it a problem? - John Oh, this is because you hate JavaScript, right? - Gary What? I mean, yeah, but what? - John They're used for JavaScript sites, right? - Lizzi For the whole client side / server side, why is it a problem that it's on the client side? It's harder for us to get there? - Gary Pretty much.

- Lizzi Okay. It's further away from us. - Gary Well, technically Googlebot cannot get there. - John Without rendering. - Gary Without rendering. - Lizzi I see. Okay. - John And the URL parameters that you mentioned, that would be something like the URL Parameter handling tool that we used to have more in a protocol format where you say, "This parameter is optional"? - Gary Oh, that's a good idea. - Lizzi Can you give me like a real example? Like, what do we mean by URL Parameter handler?

- Gary Like hl=en and whatever parameters that we have on onesie and on support.google.com. - Lizzi Okay, but what would make it hard, I guess, the fact that we're using those?

- Gary Because, technically, you can add that in one almost infinite--well, de facto infinite--number of parameters to any URL, and the server will just ignore those that don't alter the response. Basically, it will just discard them. But that also means, that for every single URL that's on the internet, you have an infinite number of versions. - Lizzi Because all of this stuff is tacked on? - Gary Because you can just add URL parameters to it. - Lizzi Okay.

- Gary The server is instructed to ignore them. It will not alter the content that it returns. But it also means that when you are crawling, and crawling in the proper sense like "following links," and I'm air quoting here, then everything-- Why are you laughing? Like, we are not following links properly. It's just like we are collecting links and then we are going back. - Lizzi Well, you imply that there's an improper use of crawling or an improper way to crawl.

- Gary Well, yeah, it's my pet peeve. On onesie, we keep saying Googlebot is following links, like, no, it's not following links. It's collecting links, and then it goes back to those links. It's not like properly following links. The picture that we are painting is that Googlebot is like hopping from-- - Lizzi Is it because it's going into the anthropomorphic territory where Googlebot thinks, Googlebot sees, Googlebot-- - Gary Understands.

- Lizzi Understands, follows, walking around on all eight legs. - Gary Wait! - Lizzi Six legs. How many legs? - John Eight? Don't judge. - Lizzi What do you mean? There's got to be a correct answer for this for spiders. - Gary Nine.

- Lizzi No spiders, they have an even amount of legs. URL parameters, why is this a problem in terms of crawling efficiently? It sounds like it's because we're maybe wasting time looking at parameter versions of the links when it could be the same thing, but sometimes it is different. - Gary Exactly. Sometimes it is different, and that's the problem. - Lizzi We don't know based off of the URL.

- Gary We basically have to crawl first to know that something is different, and we have to have a large sample of URLs to make the decision that, "Oh, this these parameters are useless." - Lizzi Okay, and there's no way for site owners to tell us how they're grouped now? - Gary Do you know how we like to remove features from Search Console? - Lizzi Yes, I remembered that we took it away because it was not used, I think. - Gary I mean, it was not used.

- Lizzi Yes. And now it seems like there's a need to be able to control this. But they weren't using the tool, so maybe there needs to be some other kind of solution that would be-- - Gary Right. But like, if someone is complaining that we are over crawling them because they have one of these weird URL spaces with an infinite number of URL parameters, then we could just tell them that, "Okay, use this method to block that URL space." - Lizzi What kind of method?

- Gary Even robots.txt could be used. It doesn't have to be like-- - Lizzi Oh, like, "Anything that is after this symbol, don't look at it"? - Gary Or this combination or something like that. - Lizzi Interesting. - Gary Because, with robots.txt, it's surprisingly flexible what you can do with it. - Lizzi And that's something that we could do now? - Gary Yeah, we just have to figure out what to say. - Lizzi Oh, interesting. - Gary And I don't have brains to think about it.

- Lizzi Okay. - John Oh, so the solution to crawling is more documentation. - Lizzi Oh. - Gary Job security. - Lizzi Darn. Wait wait, wait. We haven't asked John enough questions about what his ideas are. - Gary Yeah. John, what are your ideas? - Lizzi You keep asking Gary. - Gary Tell us your ideas. - Lizzi Have you had any harebrained ideas? - John Hair-brained ideas? - Lizzi It's top of mind for me. - John Top of mind? - Lizzi I'm so sorry. Oh my God. What's top of mind for you?

- John I think it's challenging because I like sitemaps, for example, and apparently people also like sitemaps and they submit them in lots of really weird and broken ways. So that makes me a little bit jaded, almost, in the sense that it's like, "We will come up with a new method to make crawling more optimal for you." And then everyone's like, "Well, I will just use it incorrectly." - Gary Yeah.

- John So that's kind of the challenge. On the other hand, I also would like to make it so that Google or other search engines don't have to guess how to crawl optimally. - Lizzi Like it should be more clear and easy for other search engines to follow. Why do we need to go reinvent the wheel?

- John Maybe. Maybe, I don't know. But I think also just the awareness of everything around crawling, I think that makes a big difference. I noticed that, for example, when I launched my first crawler back in the year 1822, it ran on this obscure operating system called Windows. When I initially launched that, I noticed that almost every site that you put in there to try to crawl, it goes crazy, like finds all of this crazy stuff. And it essentially shows how complicated the web is, like all of these weird links, and they go in all different places and some of them are broken, some of them are infinitely long. I think just generally the awareness of how crawling works has gotten a lot better over that time. People use common content management systems, like WordPress, now, which make crawling a lot easier. And maybe some of that awareness just has to go a little bit further to make it so that more people understand potential pitfalls and then think about like, "Oh, this parameter that I want to add for tracking, maybe I shouldn't or maybe I should do it in a different way so that it doesn't affect crawling."

- Lizzi Like, what could be the consequence of my actions of implementing this thing? It could cause a domino effect somewhere else. - John Yeah, I think for smaller sites, you can do a lot of things wrong and you have a thousand URLs instead of ten, that doesn't change anything. But, if you're a giant e-commerce site and suddenly you have 100 billion URLs instead of one million, then that's kind of a big difference. So some amount of awareness from both sides I think is important.

- Gary Also the thing about, "Okay, but I have enough resources, so just go ahead and crawl them anyway." But then it's like we could spend that time on URLs that will actually help your site, because, sure, I don't like when people think about crawl budget, but we are still spending time on crawling. - Lizzi And you could apply it in a productive way. It's not just an exponential, just everything, firehose, and you will catch also the garbage stuff that doesn't matter. It's not helping anyone.

- Gary Yeah. - Lizzi If you had to say one thing that you wish people wouldn't do or your pet peeve, what would it be? John, you can go first.

- John My pet peeve is, at the moment, and I guess at the moment means I recently received some messages from folks about this, is people who don't look at the server stats in Search Console, the Crawl Stats in Search Console, because there's a lot of information in there if you just look at it. For example, response time is in there, average response time. - Lizzi Are they just coming to your inbox and saying, "John, what is my average response time?" - John No.

- Lizzi And you're like, "Hello, you can just go look it up," or what kind of question? - Gary Oh no, he actually answers like, "792 milliseconds."

- John No. Well, the problem, for me, is when it's not milliseconds anymore, they're like, "Oh, why are you not crawling my site enough?" And then I look at the stats and it's like, "Oh, it takes, on average, three seconds to get a page from your server. That's actually a very long time." We don't really tell people like what they should be aiming for there.

- Lizzi I see. Is it an on and off thing? Like, it's either working or it's not. And, if it takes two seconds versus ten seconds, we're not showing it as broken.

- John Well, I mean several seconds is actually fairly long. If you want us to crawl a million URLs from your website and, instead of 100 milliseconds, it takes like ten times as much or 20 times as much, that's a big difference. And that's something where, if you looked at those stats, then you could go to whoever's running your server and be like, "Look at these numbers. These numbers are objectively bad. You can improve them." And then they have something that they can work on, which is very different from a lot of other SEO things where it's like, "Oh, my relevance is not great." And then someone else on the server side is like, "Well, okay, I can't change that."

- Lizzi This is more like a clear, like it's a black-and-white sort of number that you can take back and say like, "Things are bad, please fix it." - John Exactly. And you can multiply the number of pages on your site by the response time. You're like, "This is a lot of time that is being wasted." - Lizzi Okay, so open the Crawl Stats report. - John So look at Search Console. Yeah. - Lizzi And Gary. - John What do you think, Gary? - Gary What?

- John You mentioned your pet peeve was people anthropomorphizing. That's your pet peeve that I do maybe. - Gary Yes. - Lizzi But, for the rest of the people, or in general, a pet peeve that you have about crawling that you wish that people either knew or a misconception that you see, like, "What the heck? If people would just do this, or stop doing this." - Gary I don't know if I have a pet peeve really. - Lizzi Or a hill you will die on.

- Gary I kind of want hosting companies to help more, their customers, when things go wrong. Because I wouldn't say very often, but every now and then, we see sites complaining to us that Googlebot is not crawling them. And then we look at what's happening and it's their DNS server is blocking us, or their server is blocking us, or their network is blocking us, and then we are like, "We have no idea where it's blocking, but it's blocking and it's on your side." And they are like, "No, because the hosting company was like, 'It must be you,' but it cannot be you. We see that we cannot connect to your server. Why would we not want to connect to your server or your DNS or whatever?" And it's like, "No, but the hosting company was like, 'It's on your side.' " I understand that because of how hosting companies are set up nowadays, that they are behind the CDN, that also eats up some of the trace information, or they are on elastic clusters that grow and shrink and, again, some of the traces are lost. But, still, if we could just spend more time on telling people, we, as those who worked on networking or whatever, or server management, how connections are made and then help people understand and also debug their problems, that would be fantastic. Because, if you know how a connection is made between a client and a server, then saying that it's on the client side, the problem, when a client cannot connect to a server, that's like a stretch.

- John So you're saying more Search Console? - Gary What's a Search Console? - John More features in Search Console. - Lizzi I was hearing education ideas. - John Tell you when you're doing something wrong, or tell the site owner or the hoster. - Gary We should send more messages, but we should send all the messages on a single day. - John On a single day. - Gary Yeah, pile them up, and then on, I don't know, first day of the month, just send out all the messages.

- John I have a better idea. We post the messages on social media and then anyone can fix any site's problem. - Gary I know. And then we tag. - John We tag people. - Gary People. Yeah. "Hey, this is your site." - John This is your site, and we tag all the hosting companies. - Lizzi Oh, to come fix it, like hello, we can @ them directly, like the companies. No, that's too much.

- John I mean, sometimes the crawling problem is also on our side. We kind of have to accept that they will do the same thing. - Lizzi Maybe it's a last resort. We were not able to contact you via this message, so we are now broadcasting. - Gary Oh, we did that before. - John We've done that before. We've also sent faxes before. - Lizzi Really? - John Faxes? Yes. - Lizzi Is this like a setting? This would be great actually. - John A great setting in Search Console?

- Lizzi In Search Console, so instead of email notification, what method would you like to be notified, a fax option. - John A fax. A fax number. - Lizzi Yes. Handwritten from John. - John Handwritten from John. - Gary Wait. We want people to be able to read that. - Lizzi You have bad handwriting. I don't think I've ever seen your handwriting. I can't confirm. - John See.

- Lizzi I've never seen you write. Maybe it's only speech to text. All right. I think we are way over time, potentially. My timekeeper didn't gesture anything, so I'm not sure. - John We gestured a little bit. - Lizzi A little bit, and I missed it because I can't see. - John That's fine. - Lizzi Okay. - John It was fun. It was a good discussion. - Gary Oh, it was? - John Yeah. - Gary Oh. - Lizzi Well, it was supposed to be painful. This was supposed to be-- - Gary Well, it was painful to me.

- Lizzi Good. Okay, well, that's it for this episode. Next time on Search Off the Record, we'll be talking with Mihai, another product expert, about working with the Search Console API. Thank you folks for listening, and goodbye. - John Goodbye. - Gary Buh-bye.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android