May 03, 2026 Build in Public

How AI Overviews Actually Work (And How I Get Content Cited)

AI Overviews are not magic. They are structured pattern-matching machines. Here is how I use entity consistency, FAQ schema, and llms.txt to get my content cited.

Photo by Luke Jones / Unsplash

AI Overviews Are Not Magic

AI Overviews look mystical if you stare at the interface. It feels like a smart friend summarising the internet for you.

Under the hood it is boring. Pattern matching. Ranking. Extraction. Aggregation. All the usual search plumbing, just with a language model bolted on the front.

Once you accept that, a nice thing happens. You can reason about it. You can push it. You can get your content cited on purpose instead of praying to the GEO gods.

I will walk through how I think AI Overviews work right now, then the three levers that actually move the needle for me:

Entity consistency
FAQ schema
llms.txt

This is not theory. This is what I am running on my own projects and client sites. Including the stuff that did not work at first.

How AI Overviews Probably Work (In Plain Language)

I am not inside Google, but you can get pretty close by watching behavior across lots of queries and sites.

Here is the simplified mental model I use.

1. Classic retrieval still matters

First, the system runs a fairly standard retrieval pass. Think: query expansion, embeddings, BM25 style keyword matching, whatever flavor of vector search they are using this quarter.

It fetches a batch of candidate documents. Still looks like search. Scores, filters, reranks. This is the pool your content has to live in if you want any chance to be cited.

2. Entity-centric understanding on top

Then comes entity extraction. This is where things get interesting.

The system tries to anchor the query to known entities. Brands, people, products, locations, concepts. It leans heavily on its internal knowledge graph and external ones like Wikidata.

Good content helps here by being boringly consistent. Same entity names. Same relationships. Structured data that lines up with what the graph already believes.

3. The LLM is a summariser, not a god

After that, a language model pulls from the retrieved, entity-aligned sources. It writes the Overview. It is not browsing the entire public web live. It is sampling from a filtered, ranked set of candidates.

Sources get cited when they are:

Trusted enough to show up in the candidate set
Clear enough for the model to extract atomic facts from
Structured enough that the system can line up answers with sub-questions in the query

Notice what this means: you are not “optimising for the LLM” directly. You are optimising for the pipeline that feeds the LLM.

4. GEO is still experimental and jumpy

AI Overviews (GEO) roll out, roll back, and mutate. I have watched citations appear, disappear, and reshuffle week to week with no on-page changes.

So I do not chase micro-changes. I focus on stable surfaces: entities, structure, and explicit LLM instructions. Those hold up even when the GEO UI changes.

The Three Levers That Actually Move The Needle

I have tried a lot of SEO tricks on this stuff. Most of it is noise. Three things consistently correlate with getting cited:

Entity consistency
FAQ schema
llms.txt

I will go through how I use each of them, with specific patterns that worked.

Lever 1: Entity Consistency

Entity work sounds abstract until you get your hands dirty. Then it is almost mechanical.

When I say “entity consistency”, I mean:

The same names used the same way across your site
Your schema markup matches your copy
Your entities line up with external sources like Wikidata, GMB, social profiles

GEO loves sources that are boringly predictable. If the model can snap you into its internal graph without guessing, you win.

My basic entity checklist

Here is what I actually do on builds.

1. Lock in canonical names

For each important entity on the site:

Brand or organisation
Products or services
People (authors, founders, experts)
Locations

I nail down a canonical string. For example, not “RL” in one place and “Richard Lemon” somewhere else and “Rich Lemon” in a byline.

Then I use exactly that string in:

Titles and H1s
Meta titles and descriptions
Author bios
Internal links and anchor text

I am aggressive about killing cute variations. The model does not need your brand personality. It needs clean mapping.

2. Schema as the source of truth

I add structured data that mirrors those entities. Not bloated, just precise.

For example, a typical article gets:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How AI Overviews Actually Work (And How I Get Content Cited)",
  "author": {
    "@type": "Person",
    "name": "Richard Lemon",
    "sameAs": [
      "https://richardlemon.com",
      "https://github.com/richardlemon",
      "https://www.linkedin.com/in/richardlemon"
    ]
  },
  "mainEntityOfPage": "https://richardlemon.com/ai-overviews-entity-consistency-faq-schema-llms-txt"
}

I keep schema fields aligned with the visible page as much as possible. No fantasy data. If the model sees conflicting names or URLs, you lose trust.

3. External entity alignment

Internal consistency is not enough. GEO leans on the broader graph.

For serious projects I:

Make sure the brand entity exists in places like Google Business Profile, Crunchbase, or Wikidata where relevant
Use the same logo, name, and description snippets across major profiles
Link back to the main site from official accounts

I think of it as closing loops. Every profile or listing should point back to the same canonical domain and entity name. The model can then connect the dots without writing fan fiction.

Where this showed up in GEO

The first time I saw entity work pay off was on a B2B SaaS client. Originally GEO surfaced competitor docs more often, even when we outranked them in classic organic.

After we cleaned up entity naming, fixed schema across about 40 docs, and tightened author profiles, the AI Overview started citing our pages as the primary source on feature-specific queries.

No big content rewrite. Just entity discipline. That sold me.

Lever 2: FAQ Schema That Feeds Sub-Questions

AI Overviews love answering compound queries. Stuff like:

“how does x work + pros and cons + cost + implementation steps”

A single GEO result will often fan out into little sub-answers. That is where FAQ-style content shines.

Old-school FAQ schema was used to get collapsible questions in the SERP. GEO repurposes the same structure in a more interesting way.

How I structure FAQ content for GEO

The mistake I see: dumping a random list of questions at the bottom of the page just to satisfy a plugin.

What works better for me:

One topic per page, but several narrow questions around it
Each FAQ answer is 2-4 sentences, self-contained, and fact-dense
Questions match natural-language queries, not marketing slogans

Example pattern for this post could be:

“How do AI Overviews choose which sources to cite”
“What is entity consistency for GEO”
“Does FAQ schema help with AI Overviews”
“What is an llms.txt file”

I then mirror those questions exactly in FAQ schema.

Minimal FAQ schema pattern

I keep it simple:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is entity consistency for GEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Entity consistency means using the same names, schema, and external references for your key entities across your site and profiles so AI systems can map you cleanly into their knowledge graph."
      }
    },
    {
      "@type": "Question",
      "name": "Does FAQ schema help with AI Overviews?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "FAQ schema helps AI Overviews because it gives the system clearly-bounded questions and concise answers that can be lifted as sub-answers for complex, multi-part queries."
      }
    }
  ]
}

Important detail: the FAQ questions and answers are visible on the page. I do not hide them. GEO works better when the visible HTML, schema, and underlying content all line up.

Why this seems to help

My hunch: GEO breaks complex queries into sub-intents, then looks for tight spans of text that answer each one.

FAQ blocks hand it perfectly-delimited QA pairs. Easy to score, easy to attribute, easy to cite. It is extraction-friendly text instead of a wall of narrative.

On one technical tutorial site I run, pages with good FAQ blocks get cited more often in GEO for long, fussy queries than equally strong articles without them. Correlation is not causation, but it is consistent enough that I keep doing it.

Lever 3: llms.txt As A Content Contract

robots.txt tells crawlers what they can fetch. llms.txt is the rough equivalent for LLM-style usage. It is not a standard in concrete yet, but some big players are already reading it.

You can fight it or you can use it. I prefer using it.

What I put in llms.txt

On projects where GEO and AI assistant traffic matters, I add an llms.txt file at the root. Something like:

# llms.txt for richardlemon.com

# Allow major, reputable models to train and cite
User-agent: openai
Allow: /

User-agent: google-extended
Allow: /

# Disallow shady scrapers if they respect this (many will not)
User-agent: *
Disallow: /private/
Disallow: /admin/

I keep it boring and explicit. You can get fancier with custom directives, but right now I care about two things:

Making it clear that I am okay with training and citation for public pages
Keeping private or sensitive areas off-limits

Is this legally binding? Probably not. But it is a strong signal. LLM builders are under pressure to respect explicit instructions, so they are motivated to comply in public-facing products.

Why llms.txt matters for GEO

For Google specifically, google-extended is their flag for AI usage beyond basic indexing.

If you block that completely, do not be shocked if your content plays a smaller role in AI Overviews. They will not say it directly, but if the lawyers tell them they cannot reuse your text for generative products, your pages become a legal minefield.

So on most public sites I do the opposite. I tell Google it can use the content, but I fence off private areas.

I treat llms.txt as a contract: you can use this stuff, but cite me and do not leak the rest.

Putting It All Together As A Workflow

Nice theory. How does this look when I actually build or refactor a site with GEO in mind?

1. Entity pass first

I start by identifying the key entities:

Brand
Owner or main faces
Top products or services
Primary location(s)

I write a short internal “entity spec” doc: exact names, canonical URLs, reference profiles.

Then I sweep the site:

Fix inconsistent names in headings and copy
Update schema to mirror the spec
Make sure author bios and about pages tell the same story

2. FAQ retrofits on high-value pages

I pick a handful of high-intent pages that already rank or convert.

For each page I map:

What are 3 to 6 questions a human would actually type before or after reading this
Which of those are compound enough that GEO might break them into sub-answers

I then add a small FAQ section near the bottom or between sections. Clean questions. Tight answers. No fluff.

Finally I wire it up with FAQ schema and test it with structured data testing tools.

3. Add llms.txt and clean robots.txt

At the root of the site I create or review:

robots.txt for crawling basics
llms.txt for AI usage

I keep both readable. Future humans will inherit this.

4. Wait, watch, iterate

Then I wait. GEO adjustments are not instant. I usually watch for 4 to 8 weeks.

What I monitor:

Queries where the site already had impressions and GEO shows
Whether my pages are cited at all
Which snippets GEO is lifting

If I see the wrong snippet quoted, I adjust that section of the page. Shorten it. Make the claim clearer. Add a FAQ variant that answers in a single, precise paragraph.

This is closer to prompt engineering than classic SEO. You are writing for an extractor, not a human skimmer.

What I Do Not Bother With

Since everyone asks: no, I do not chase “AI Overview optimisation hacks” on Slack every week.

Things I mostly ignore:

Stuffing “according to Google” or similar phrases into copy
Weird HTML tricks to force citations
Endless A/B tests on intro paragraph style for GEO

The system is too noisy and too early for micro-optimisation. I would rather invest in entity clarity and structured content that will still make sense when GEO v3 ships.

AI Overviews Are Just Another Interface

AI Overviews feel new, but from a builder’s perspective they are just another interface to the same old thing: text, structure, and trust.

If your content:

Maps cleanly to entities the model already trusts
Exposes compact, structured answers through FAQ-style blocks
Lives behind clear robots.txt and llms.txt rules

Then you have a realistic shot at being the site that GEO leans on, rather than the one that gets paraphrased anonymously.

I like that trade. It rewards builders who care about structure instead of just spamming more words. And it is not magic. Just plumbing you can actually work with.

Subscribe to my newsletter

Subscribe to my newsletter to get the latest updates and news

How AI Overviews Actually Work (And How I Get Content Cited)

by Richard Lemon

AI Overviews Are Not Magic

How AI Overviews Probably Work (In Plain Language)

1. Classic retrieval still matters

2. Entity-centric understanding on top

3. The LLM is a summariser, not a god

4. GEO is still experimental and jumpy

The Three Levers That Actually Move The Needle

Lever 1: Entity Consistency

My basic entity checklist

1. Lock in canonical names

2. Schema as the source of truth

3. External entity alignment

Where this showed up in GEO

Lever 2: FAQ Schema That Feeds Sub-Questions

How I structure FAQ content for GEO

Minimal FAQ schema pattern

Why this seems to help

Lever 3: llms.txt As A Content Contract

What I put in llms.txt

Why llms.txt matters for GEO

Putting It All Together As A Workflow

1. Entity pass first

2. FAQ retrofits on high-value pages

3. Add llms.txt and clean robots.txt

4. Wait, watch, iterate

What I Do Not Bother With

AI Overviews Are Just Another Interface

Member discussion

How I Actually Time Block a Week Without Burning Out →

I Asked 3 AI Tools The Same Question About My Client. The GEO Results Were Wild. →

Schema.org For Non‑Developers: The One Markup That Makes AI Understand Your Site →

AI Overviews Are Not Magic

How AI Overviews Probably Work (In Plain Language)

1. Classic retrieval still matters

2. Entity-centric understanding on top

3. The LLM is a summariser, not a god

4. GEO is still experimental and jumpy

The Three Levers That Actually Move The Needle

Lever 1: Entity Consistency

My basic entity checklist

1. Lock in canonical names

2. Schema as the source of truth

3. External entity alignment

Where this showed up in GEO

Lever 2: FAQ Schema That Feeds Sub-Questions

How I structure FAQ content for GEO

Minimal FAQ schema pattern

Why this seems to help

Lever 3: llms.txt As A Content Contract

What I put in llms.txt

Why llms.txt matters for GEO

Putting It All Together As A Workflow

1. Entity pass first

2. FAQ retrofits on high-value pages

3. Add llms.txt and clean robots.txt

4. Wait, watch, iterate

What I Do Not Bother With

AI Overviews Are Just Another Interface

Similar topics

How I Actually Time Block a Week Without Burning Out →

I Asked 3 AI Tools The Same Question About My Client. The GEO Results Were Wild. →

Schema.org For Non‑Developers: The One Markup That Makes AI Understand Your Site →