Insights · AI Search

How AI Overviews choose which brands to cite.

Inside the mechanics of AI Overviews — how Google assembles a generated answer, and the signals that decide which brands get cited in it.

01The article

When Google shows an AI Overview, it answers the question directly on the results page and cites a small set of sources beside the answer. For a brand, the stakes are simple: the cited sources are part of the answer, and everything else sits below it. So the practical question for any marketing leader is the mechanical one — how does the system decide which brands to cite?

Google does not publish the criteria. But the pipeline that produces an AI Overview is partly documented, partly observable, and it rewards specific, controllable things. Understanding the mechanism is more useful than chasing any single tactic, because the mechanism explains why the tactics work.

How an AI Overview is assembled

An AI Overview is not a ranking with sentences attached. It is the output of a pipeline, and each step of the pipeline is a filter a brand can pass or fail.

Query fan-out. Google has said publicly that its generative search features run multiple related searches behind a single question — a process it calls query fan-out. A question like “should I use a mortgage broker or go direct to a bank” may spawn sub-queries about broker commissions, lender panels, approval odds and regulation. The overview is assembled from the results of all of them, not just the visible query.

Retrieval from the index. Candidate material comes from the same index that powers ordinary search. A page that is not crawlable, not indexed, or excluded from snippets is not in the room. This is why technical SEO remains the entry ticket: the generative layer retrieves; it does not discover.

Passage selection. The system works with passages, not pages. From the retrieved documents it pulls sections that appear to answer a specific sub-question — a clean definition, a direct comparison, a stated fact. A page is not cited as a whole; a passage from it is.

Synthesis. A language model composes the answer, grounded in the retrieved passages. Grounding is the operative constraint: the model is steered towards statements its retrieved sources support, because unsupported generation is the failure mode Google is most visibly trying to avoid.

Citation selection. Finally, links are attached to support parts of the generated answer. The sources cited are the ones whose passages the answer actually leaned on — which is why citation behaves less like a rankings table and more like a bibliography.

Each step filters the field. A brand can fail at retrieval, fail at passage selection, or survive both and still lose at citation because a competitor’s passage supported the claim more cleanly.

The signals that correlate with citation

Nobody outside Google can enumerate the weightings, and anyone claiming otherwise is selling something. But the sources that keep being cited — across AI Overviews and across ChatGPT, Perplexity and Copilot, which face the same grounding problem — share observable traits.

Passage-level answers. Content that answers one question per section, in complete declarative sentences, gives the system something to lift. A definition stated plainly in two sentences is quotable; the same information dissolved across a narrative is not. The unit of competition is the passage, so the passage has to stand on its own.

Entity clarity. The systems reason over entities — who the brand is, what it does, where it operates. Schema markup, consistent naming and unambiguous organisational facts make a brand a thing the model recognises rather than a page it once crawled. Ambiguity is expensive: a model unsure which entity it is describing tends to describe a different one.

Corroboration across sources. A generated answer repeats claims at scale, so the safest claim to repeat is one that multiple independent sources agree on. Facts that appear only on a brand’s own site are weaker candidates than facts echoed by directories, press coverage and third-party commentary. Corroboration is the closest thing the pipeline has to trust.

Freshness. Retrieval favours current material, particularly where the question implies recency — pricing, regulation, anything with a year in it. A page that was authoritative when written and untouched since keeps its ranking longer than it keeps its citations.

None of this is published as a recipe. It is the pattern that falls out of how a grounded generation system has to behave: retrieve candidates, prefer extractable passages, prefer claims that corroborate, prefer claims that are current.

Why ranking first does not guarantee citation

The most common misreading of AI search is the assumption that citation is a reward for rank. It is not, for four mechanical reasons.

Ranking is a page-level judgement about relevance to the visible query. Citation is a passage-level judgement about usefulness to a specific sub-question the user never typed. A page can be the best overall result and still contain no passage that cleanly supports any sentence of the generated answer.

Fan-out widens the field. The overview draws on queries adjacent to the visible one, so sources that rank for the sub-queries — not the headline term — enter the candidate pool. A brand watching only its primary keyword cannot see most of the contest it is in.

Extractability beats position. When two candidate passages support the same claim, the plainly stated one is easier for the system to use and attribute. A top-ranked page with its answer buried loses to a mid-ranked page that states the answer in a sentence.

And corroboration filters late. A claim the model cannot see supported elsewhere is a risk to repeat, however well the page hosting it ranks. Rankings measure authority of pages. Citations lean on agreement between them.

What brands can control

The model is not controllable. What it retrieves is. In practice, that means a defined body of work.

Structure content so each commercially important question is answered in its own section, in sentences that survive being quoted out of context. State facts declaratively — what the service is, who it is for, where it operates — rather than implying them. Mark up the organisation, its people and its services with schema, and use the same names and descriptions everywhere. Reconcile the facts across the site, directories, profiles and press, because every contradiction is a reason for the system to hedge or to cite someone else. Earn third-party corroboration for the claims that matter most, since a claim echoed independently is a safer claim to repeat. Keep the pages that answer buying questions current, and visibly so.

Then measure the thing itself: run the buying questions through the engines on a schedule and record who gets cited, how the brand is described, and which competitors appear instead. Rankings are no longer a proxy for this. The only reliable read on citation is asking the engines.

A practical checklist

  • List the questions buyers ask before choosing a provider in your category — not just the keywords they search.
  • Check whether each question has a page, and whether that page answers it in a liftable passage near the top.
  • Validate schema for the organisation, services and FAQs, and fix naming inconsistencies across the web.
  • Compare how your site, your directories and your press describe the business; reconcile the conflicts.
  • Identify your most commercially important claims, and find or build independent corroboration for them.
  • Date and refresh the pages that answer buying questions, so retrieval sees them as current.
  • Query AI Overviews, ChatGPT, Perplexity and Copilot with your buying questions monthly, and record the citations.

Done once, this is an audit. Done continuously, it is the discipline of AI search optimisation — making sure that when the engines assemble an answer in your category, your brand is part of it.

03Contact

Let’s talk about what’s next.

For executive advisory, fractional CMO, AI search strategy or speaking enquiries.

sam@sampark.com.au
Brisbane, Australia
Enquiry form