Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov - podcast episode cover

Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov

Jan 29, 20269 minEp. 279
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

James Dooley is joined by Sergey Lucktinov to explain how large language models retrieve information during AI searches. They break down the full retrieval pipeline, from metadata-only filtering to light skimming and deep page parsing. The discussion clarifies when LLMs rely on meta titles and descriptions, when pages are never opened, how schema markup is interpreted, and how knowledge vault answers bypass search entirely. This episode gives SEOs and marketers a clear framework for optimising content to survive each LLM retrieval stage.

Transcript

James Dooley: Hi. Today I’m joined with Sergey, and we’re covering a really interesting topic. When people search using AI tools like ChatGPT, Gemini, Claude, or Perplexity, there’s a big debate in the SEO community. Do large language models only use metadata like the meta title and meta description from the search engine results page, or do they actually open pages and parse the content? Sergey Lucktinov: It actually does both. There are different stages of retrieval. The first stage is when query fan-out happens and the LLM fetches results from a search engine, usually Google or Bing. At this stage, it only sees metadata. That includes the website name, page URL, meta title, and meta description. If those elements are irrelevant to the fan-out query it is trying to answer, the result gets removed immediately. So at the first stage, meta titles and meta descriptions are extremely important. This is why leaving meta descriptions blank, which used to work years ago, is no longer effective. You should always include a proper meta description that covers the likely fan-out intent you want to be relevant for. James Dooley: So once it moves past metadata and decides it wants more information, does it just pull a central snippet from the page, or does it read the full content? Sergey Lucktinov: After the metadata stage, it moves into light skimming. This is where it checks the DOM structure, page stability, and whether the page is clean and properly structured. If the page passes that stage, then it moves into full parsing. At that point, every section of the page is analysed, chunk by chunk, to see whether it makes sense and matches the intent. The second stage is critical. If you do not lead with meaning, you fail early. If you want to be relevant for a specific fan-out query, the answer needs to appear in roughly the first 50 to 70 words. If the LLM does not see relevance there, the page gets discarded before full analysis happens. James Dooley: And does it read schema markup as part of this process? Sergey Lucktinov: Yes, absolutely. Schema markup is important, but it is not a silver bullet. It works more like a handshake. You are making the LLM’s job easier by clearly explaining the context and content of the page. However, if schema is implemented incorrectly or mismatched with the actual content, it can hurt you. If someone does not understand schema properly, it is often better not to use it at all. When done correctly, schema reduces computational effort for the model and helps it understand your content faster and more cheaply. James Dooley: What about meta keywords and meta descriptions? Many SEOs stopped using both. Are either of them used by LLMs today? Sergey Lucktinov: Meta keywords are completely irrelevant. They are a relic from the past and have no value in the LLM era. Meta descriptions matter because of what is shown on the search engine results page. That text is used during the first retrieval stage. Even though Google sometimes rewrites meta descriptions dynamically, you should not rely on that. You want to control the message and clearly explain the intent of the page. James Dooley: A lot of people argue that leaving meta descriptions blank is fine because Google will pull a relevant snippet dynamically. Are you saying that is now a bad approach? Sergey Lucktinov: Yes. You should explain clearly what the page is about. Sometimes Google does a good job, but you cannot rely on it 100 percent of the time. If you hardcode the intent into the meta description, the LLM immediately understands that the page may be relevant and allows it to pass the first stage. That is a much safer approach. James Dooley: So if Gemini performs a Google search and the top results all say roughly the same thing in their metadata, does the LLM sometimes skip opening the pages entirely and just answer from the search results? Sergey Lucktinov: Yes, depending on the query. If the topic is not subjective or controversial, it may not open any pages at all. It will simply rely on metadata consensus. James Dooley: And if the answer is already known, like the capital of France, it will not even search? Sergey Lucktinov: Exactly. That information already exists in the pre-trained knowledge. There are three main retrieval paths. The first is from pre-trained data. You cannot optimise for that. The second is metadata-only retrieval, which is enough for simple or non-controversial queries. The third is full fan-out and deep analysis, which happens when the topic is subjective, ambiguous, or complex. James Dooley: That makes complete sense. Hopefully this clears things up for people wondering whether LLMs open pages, rely on metadata, or use internal knowledge. Thanks very much for joining us, Sergey. Sergey Lucktinov: Thank you.
Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android