¶ Breaking News
Hello , it's Ed here . Welcome to this breaking news edition of SEO is not that hard . This is a out of sequence podcast today because there's been some breaking news , and that is that a whole load of internal Google search API documentation has been leaked onto the internet . Now this is rather interesting because Google has always been a black box .
We've never been able to see anything on the inside of it . We can only ever work out from this of the , the information that google let let out themselves , and then from experience of our own seo efforts as to what does and doesn't work . And you know we've always had to trust what Google tell us and you know plenty of times there's been some misinformation .
We all know that Now this document leak was put out on Twitter today by Rand Fishkin , and he also published a write-up of what he found on it and also a guy called Mike King from iPullRank . Randall had shared these documents with Mike over the weekend and he has also put out a write-up and also links to the actual API documents themselves .
Now , this is really early days . I've only spent maybe a couple of hours looking at this myself , so there's still loads more to learn of this , but I wanted to break the news that these documents are now available . I will put links in the show notes to Rand's write-up of it , to Mike's write-up of it and also to where you can actually find the docs .
Headline takeaways , though , which have come out from it which are quite interesting . First one is click data .
There is lots of reference to how Google use data on click interactions from the search results , for example , whether people are pogo sticking , and that is , the jumping from search results to web pages and then back to the search results and how that can interact and have an effect on rankings .
There's lots of data about chrome click streams , and that's how google takes actual , real-time user data from how you use chrome . Now , billions of people worldwide use chrome to sort of surf the web and what , what they're actually kicking on , how long they're spending on on pages , and things like that . There's lots in there about that . That's very interesting .
How google are using click data for link weighting . Now this is a really , really interesting one , apparently . Um , google they put it . Let me just check here .
Yeah , so they put their , they classify their link indexes into three buckets , three tiers low , medium and high quality , and they use click data to work out which link graph a link you know should be placed within .
So , for example , if a link is getting no clicks from the click stream data it's picking up from places like chrome , then google's going to consider that a low quality link and ignore it .
But if there's a high volume of clicks from um you know , from verifiable devices , so , like I said , let's clone chrome data , then that would go into the high quality index and it's going to pass ranking signals like page rank .
So lots to think about there in terms of the kind of links um that people build , especially if you're buying links , which obviously I've made my thoughts clear on whether that's a good idea or not in previous episodes , um , but if you are doing things like that , obviously links that are just from um sites and pages that are never ever going to get seen or
clicked by anybody clearly are going to get ignored . So , um , that's some really interesting stuff there . Some other interesting attributes that I came across um site authority attribute . Google for years have said there's no such thing as a site weighting , so domain rating , that kind of thing .
There is mention of a site authority attribute within the documentation that can be applied to web pages based on the site authority attribute . That goes site wide . Next , we've got an attribute called host age and that is used it says in the document specifically to sandbox fresh spam in serving time , um .
So this is a attribute where they obviously look at how old a website is the age of the host and then they could apply this um attribute to it to essentially it's the sandbox . I know there's a lot of it's one of those topics that people have argued for and against a lot over time , um , but clearly there is some kind of sandboxing in there .
Now , whether it applies , for what length of time it applies , we don't know how they decide if they're going to apply it , we don't know . I mean , there might be ways out of that . For example , if a site gets a good number of links when it's brand new from authoritative sources , it might bypass this sandbox filter .
But there definitely is something in there called hostage , now this sandbox filter . But there definitely is something in there called host age . Now , to that I found particularly interesting access . I've got a big interest in topical Authority .
I mean , building topical Authority is what I've tried to do for years with websites , I think reasonably successfully , and there's two attributes in there . First one is called site focus score and this is how much a site is focused on one topic . So obviously Google is looking at sites and trying to work out . It's just a generous site is focused on one topic .
So obviously Google is looking at sites and trying to work out is this a generous site ? Is it on one topic ? What range of topics is it on ? So they've got that site focus score and to go with that they've also got a site radius attribute , which is a measure of how far away from the main topic of a site a page strays . So this is so they .
This is , for example , you put a new page up and if it's not on topic with the current site , then the radius score is going to be high because you can become further away from the core topic of the site . So quite an interesting one here .
There's a lot more research to do on this one because it might show some interesting ways of how we can work out how tight to a site's core topic is the topical authority area of pages .
And when trying to decide which new pages to add to a site , it might be interesting to try and work out with embeddings how close , how far away from the center of that topic area are pages , and it might be helpful to identify what core pages need to be worked on on a site . So there's something really interesting to think about there now , as I say .
This is these this leak covers two and a half thousand modules with over 14 000 attributes for across these modules . There's a huge amount of data here . Caveatsats are obviously we don't know which of these , any of these modules and attributes , might be deprecated , that's as in they're no longer being used by Google . Date wise , it looks like they're somewhere .
These docs are somewhere between just a month or so old or up to a year old , so there is maybe a time gap in there where some things might be out of date .
In terms of its authenticity , um rand specifically mentions in his post how he's approached various other ex-googlers to see if they think it looks genuine , and the kind of feedback he's getting is that yes , it does look genuine . I mean , if it is a hoax , someone spent an awful lot of time creating this , even if it was done with ai , I mean .
I think if someone had used ai to try and create this , I think the two guys who looked at it so far around and might probably would have picked up if there was ai and the inconsistencies there , um , to create this , so I think it looks genuine . Um , ex-googlers think it looks genuine , so it does look genuine .
But , yeah , obviously the caveats are we don't know what isn't there . We don't know what's deprecated . Again , with all of these things , there is no further detail on the various importance of different things . So , while it definitely clears up some questions
¶ Opinions on Authenticity of Post
that people have had for many years over what doesn't , doesn't or hasn't hasn't been included , and it definitely gives lots of information around there , I suspect there's going to be a lot of research and analysis done by lots of people on this . There's going to be loads more coming out .
So , um , there's going to be lots to watch here , um , but I just wanted to get it out quickly , get it shared so you can take a look yourself . Um , I'd love to hear everyone's thoughts , things that interesting things you find . Just get in touch in the usual ways and , yeah , hope it's useful and good luck reading into it .
