AI Overviews Are Not Magic
AI Overviews look mystical if you stare at the interface. It feels like a smart friend summarising the internet for you.
Under the hood it is boring. Pattern matching. Ranking. Extraction. Aggregation. All the usual search plumbing, just with a language model bolted on the front.
Once you accept that, a nice thing happens. You can reason about it. You can push it. You can get your content cited on purpose instead of praying to the GEO gods.
I will walk through how I think AI Overviews work right now, then the three levers that actually move the needle for me:
- Entity consistency
- FAQ schema
llms.txt
This is not theory. This is what I am running on my own projects and client sites. Including the stuff that did not work at first.
How AI Overviews Probably Work (In Plain Language)
I am not inside Google, but you can get pretty close by watching behavior across lots of queries and sites.
Here is the simplified mental model I use.
1. Classic retrieval still matters
First, the system runs a fairly standard retrieval pass. Think: query expansion, embeddings, BM25 style keyword matching, whatever flavor of vector search they are using this quarter.
It fetches a batch of candidate documents. Still looks like search. Scores, filters, reranks. This is the pool your content has to live in if you want any chance to be cited.
2. Entity-centric understanding on top
Then comes entity extraction. This is where things get interesting.
The system tries to anchor the query to known entities. Brands, people, products, locations, concepts. It leans heavily on its internal knowledge graph and external ones like Wikidata.
Good content helps here by being boringly consistent. Same entity names. Same relationships. Structured data that lines up with what the graph already believes.
3. The LLM is a summariser, not a god
After that, a language model pulls from the retrieved, entity-aligned sources. It writes the Overview. It is not browsing the entire public web live. It is sampling from a filtered, ranked set of candidates.
Sources get cited when they are:
- Trusted enough to show up in the candidate set
- Clear enough for the model to extract atomic facts from
- Structured enough that the system can line up answers with sub-questions in the query
Notice what this means: you are not “optimising for the LLM” directly. You are optimising for the pipeline that feeds the LLM.
4. GEO is still experimental and jumpy
AI Overviews (GEO) roll out, roll back, and mutate. I have watched citations appear, disappear, and reshuffle week to week with no on-page changes.
So I do not chase micro-changes. I focus on stable surfaces: entities, structure, and explicit LLM instructions. Those hold up even when the GEO UI changes.
The Three Levers That Actually Move The Needle
I have tried a lot of SEO tricks on this stuff. Most of it is noise. Three things consistently correlate with getting cited:
- Entity consistency
- FAQ schema
llms.txt
I will go through how I use each of them, with specific patterns that worked.
Lever 1: Entity Consistency
Entity work sounds abstract until you get your hands dirty. Then it is almost mechanical.
When I say “entity consistency”, I mean:
- The same names used the same way across your site
- Your schema markup matches your copy
- Your entities line up with external sources like Wikidata, GMB, social profiles
GEO loves sources that are boringly predictable. If the model can snap you into its internal graph without guessing, you win.
My basic entity checklist
Here is what I actually do on builds.
1. Lock in canonical names
For each important entity on the site:
- Brand or organisation
- Products or services
- People (authors, founders, experts)
- Locations
I nail down a canonical string. For example, not “RL” in one place and “Richard Lemon” somewhere else and “Rich Lemon” in a byline.
Then I use exactly that string in:
- Titles and H1s
- Meta titles and descriptions
- Author bios
- Internal links and anchor text
I am aggressive about killing cute variations. The model does not need your brand personality. It needs clean mapping.
2. Schema as the source of truth
I add structured data that mirrors those entities. Not bloated, just precise.
For example, a typical article gets:
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "How AI Overviews Actually Work (And How I Get Content Cited)",
"author": {
"@type": "Person",
"name": "Richard Lemon",
"sameAs": [
"https://richardlemon.com",
"https://github.com/richardlemon",
"https://www.linkedin.com/in/richardlemon"
]
},
"mainEntityOfPage": "https://richardlemon.com/ai-overviews-entity-consistency-faq-schema-llms-txt"
}
I keep schema fields aligned with the visible page as much as possible. No fantasy data. If the model sees conflicting names or URLs, you lose trust.
3. External entity alignment
Internal consistency is not enough. GEO leans on the broader graph.
For serious projects I:
- Make sure the brand entity exists in places like Google Business Profile, Crunchbase, or Wikidata where relevant
- Use the same logo, name, and description snippets across major profiles
- Link back to the main site from official accounts
I think of it as closing loops. Every profile or listing should point back to the same canonical domain and entity name. The model can then connect the dots without writing fan fiction.
Where this showed up in GEO
The first time I saw entity work pay off was on a B2B SaaS client. Originally GEO surfaced competitor docs more often, even when we outranked them in classic organic.
After we cleaned up entity naming, fixed schema across about 40 docs, and tightened author profiles, the AI Overview started citing our pages as the primary source on feature-specific queries.
No big content rewrite. Just entity discipline. That sold me.
Lever 2: FAQ Schema That Feeds Sub-Questions
AI Overviews love answering compound queries. Stuff like:
“how does x work + pros and cons + cost + implementation steps”
A single GEO result will often fan out into little sub-answers. That is where FAQ-style content shines.
Old-school FAQ schema was used to get collapsible questions in the SERP. GEO repurposes the same structure in a more interesting way.
How I structure FAQ content for GEO
The mistake I see: dumping a random list of questions at the bottom of the page just to satisfy a plugin.
What works better for me:
- One topic per page, but several narrow questions around it
- Each FAQ answer is 2-4 sentences, self-contained, and fact-dense
- Questions match natural-language queries, not marketing slogans
Example pattern for this post could be:
- “How do AI Overviews choose which sources to cite”
- “What is entity consistency for GEO”
- “Does FAQ schema help with AI Overviews”
- “What is an llms.txt file”
I then mirror those questions exactly in FAQ schema.
Minimal FAQ schema pattern
I keep it simple:
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is entity consistency for GEO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Entity consistency means using the same names, schema, and external references for your key entities across your site and profiles so AI systems can map you cleanly into their knowledge graph."
}
},
{
"@type": "Question",
"name": "Does FAQ schema help with AI Overviews?",
"acceptedAnswer": {
"@type": "Answer",
"text": "FAQ schema helps AI Overviews because it gives the system clearly-bounded questions and concise answers that can be lifted as sub-answers for complex, multi-part queries."
}
}
]
}
Important detail: the FAQ questions and answers are visible on the page. I do not hide them. GEO works better when the visible HTML, schema, and underlying content all line up.
Why this seems to help
My hunch: GEO breaks complex queries into sub-intents, then looks for tight spans of text that answer each one.
FAQ blocks hand it perfectly-delimited QA pairs. Easy to score, easy to attribute, easy to cite. It is extraction-friendly text instead of a wall of narrative.
On one technical tutorial site I run, pages with good FAQ blocks get cited more often in GEO for long, fussy queries than equally strong articles without them. Correlation is not causation, but it is consistent enough that I keep doing it.
Lever 3: llms.txt As A Content Contract
robots.txt tells crawlers what they can fetch. llms.txt is the rough equivalent for LLM-style usage. It is not a standard in concrete yet, but some big players are already reading it.
You can fight it or you can use it. I prefer using it.
What I put in llms.txt
On projects where GEO and AI assistant traffic matters, I add an llms.txt file at the root. Something like:
# llms.txt for richardlemon.com
# Allow major, reputable models to train and cite
User-agent: openai
Allow: /
User-agent: google-extended
Allow: /
# Disallow shady scrapers if they respect this (many will not)
User-agent: *
Disallow: /private/
Disallow: /admin/
I keep it boring and explicit. You can get fancier with custom directives, but right now I care about two things:
- Making it clear that I am okay with training and citation for public pages
- Keeping private or sensitive areas off-limits
Is this legally binding? Probably not. But it is a strong signal. LLM builders are under pressure to respect explicit instructions, so they are motivated to comply in public-facing products.
Why llms.txt matters for GEO
For Google specifically, google-extended is their flag for AI usage beyond basic indexing.
If you block that completely, do not be shocked if your content plays a smaller role in AI Overviews. They will not say it directly, but if the lawyers tell them they cannot reuse your text for generative products, your pages become a legal minefield.
So on most public sites I do the opposite. I tell Google it can use the content, but I fence off private areas.
I treat llms.txt as a contract: you can use this stuff, but cite me and do not leak the rest.
Putting It All Together As A Workflow
Nice theory. How does this look when I actually build or refactor a site with GEO in mind?
1. Entity pass first
I start by identifying the key entities:
- Brand
- Owner or main faces
- Top products or services
- Primary location(s)
I write a short internal “entity spec” doc: exact names, canonical URLs, reference profiles.
Then I sweep the site:
- Fix inconsistent names in headings and copy
- Update schema to mirror the spec
- Make sure author bios and about pages tell the same story
2. FAQ retrofits on high-value pages
I pick a handful of high-intent pages that already rank or convert.
For each page I map:
- What are 3 to 6 questions a human would actually type before or after reading this
- Which of those are compound enough that GEO might break them into sub-answers
I then add a small FAQ section near the bottom or between sections. Clean questions. Tight answers. No fluff.
Finally I wire it up with FAQ schema and test it with structured data testing tools.
3. Add llms.txt and clean robots.txt
At the root of the site I create or review:
robots.txtfor crawling basicsllms.txtfor AI usage
I keep both readable. Future humans will inherit this.
4. Wait, watch, iterate
Then I wait. GEO adjustments are not instant. I usually watch for 4 to 8 weeks.
What I monitor:
- Queries where the site already had impressions and GEO shows
- Whether my pages are cited at all
- Which snippets GEO is lifting
If I see the wrong snippet quoted, I adjust that section of the page. Shorten it. Make the claim clearer. Add a FAQ variant that answers in a single, precise paragraph.
This is closer to prompt engineering than classic SEO. You are writing for an extractor, not a human skimmer.
What I Do Not Bother With
Since everyone asks: no, I do not chase “AI Overview optimisation hacks” on Slack every week.
Things I mostly ignore:
- Stuffing “according to Google” or similar phrases into copy
- Weird HTML tricks to force citations
- Endless A/B tests on intro paragraph style for GEO
The system is too noisy and too early for micro-optimisation. I would rather invest in entity clarity and structured content that will still make sense when GEO v3 ships.
AI Overviews Are Just Another Interface
AI Overviews feel new, but from a builder’s perspective they are just another interface to the same old thing: text, structure, and trust.
If your content:
- Maps cleanly to entities the model already trusts
- Exposes compact, structured answers through FAQ-style blocks
- Lives behind clear
robots.txtandllms.txtrules
Then you have a realistic shot at being the site that GEO leans on, rather than the one that gets paraphrased anonymously.
I like that trade. It rewards builders who care about structure instead of just spamming more words. And it is not magic. Just plumbing you can actually work with.
Subscribe to my newsletter to get the latest updates and news
Member discussion