Most executives can pull up their search rankings in seconds. Almost none can say whether their brand appears when a buyer asks ChatGPT, Perplexity or Google’s AI Overviews who to use. That gap is measurable, and you do not need a tool subscription or an agency to take the first measurement. You need a defined query set, a spreadsheet, and a few disciplined hours a month.
What follows is the complete method — the same logic a professional audit runs, laid out so you can execute it yourself. An AI search visibility audit measures three things — whether a brand is cited in AI-generated answers, how it is described when it appears, and which competitors are cited when it is not — and ends in a fourth question: why. Everything below serves those four questions.
Choose the queries that matter
The audit is only as good as its query set, and the most common mistake is auditing the wrong questions. Do not start from your keyword rankings. Start from the decisions buyers make on the way to choosing a provider in your category, because those are the questions they now put to AI engines.
A useful AI search audit starts with the questions buyers ask before choosing a provider, not the keywords the brand already ranks for. Build the set from four intent types:
- Category questions. How the buyer frames the problem before they know the solution — “how do I get a home loan approved with irregular income”, “what does a marketing audit involve”. These reveal whether you are visible early, when preferences form.
- Comparison and shortlist questions. “Best mortgage broker in Brisbane”, “top providers of X”, “A vs B”. These are the money queries — the ones where an engine names two or three brands and effectively writes the shortlist.
- Evaluation questions. What buyers ask once a shortlist exists — pricing questions, “is X worth it”, “what should I look for in a provider of Y”. Presence here shapes the final decision.
- Brand questions. Your own name, plainly: “who is [brand]”, “is [brand] reputable”, “[brand] reviews”. You will almost always appear for these — the audit question is whether the description is accurate and current.
Fifteen to thirty queries is the practical range for a self-run audit. Fewer than fifteen and the sample is too thin to read patterns from; more than thirty and the monthly time cost makes the discipline collapse. Weight the set towards comparison and evaluation queries, because that is where citation converts to revenue, but keep at least a few from each intent type. Write the queries as a buyer would phrase them — full questions in natural language, not keyword fragments — and then freeze the set. The value of the audit comes from running the same queries over time.
Run them across the four engines
Run every query through each of the four engines that currently matter for Australian buyers: Google AI Overviews, ChatGPT, Perplexity and Microsoft Copilot. The same query set across all four, so the results are directly comparable.
A few rules keep the data honest:
- Reduce personalisation where you can. Use a private or logged-out browser session for Google, and be aware that ChatGPT and Copilot results still vary with account history and location. You cannot eliminate personalisation entirely — note it as a limitation rather than pretending it away.
- Use the query verbatim. Resist the urge to rephrase when an engine misunderstands. A buyer would not rephrase around your brand’s blind spot, and neither should the audit.
- Capture evidence as you go. Screenshot or export each answer. Memory is not a record, and when a citation appears or disappears next month you will want to see exactly what changed.
- Record non-answers too. If a query produces no AI Overview, or an engine declines to name providers, that is a finding — it tells you where the generated-answer contest has not started yet in your category.
Expect the full pass to take a working half-day the first time, and less once the routine is set.
Record it in a fixed template
You do not need software for this. A single spreadsheet, one row per query per engine per run, does the job — provided the columns are fixed before you start. Record, for every row: the date of the run, the engine, the exact query, whether an AI-generated answer appeared at all, whether your brand was cited or named in it, how prominently (named in the answer text, cited as a linked source, or both), a verbatim note of how the brand was described, every competitor named or cited, and the sources the citations pointed to — because the engines often cite publishers and directories rather than brands directly, and those intermediary sources are where visibility is frequently won or lost.
Two of those columns deserve discipline. The description note should be copied word for word, not paraphrased, because drift in how an engine describes you — an old service line, a wrong location, a stale positioning — is one of the most actionable findings an audit produces. And the competitor column should include everyone named, not just the rivals you expected, because the engines routinely surface competitors a brand has never benchmarked against.
Score it and set the baseline
Raw rows become useful when they are compressed into a small set of numbers you can track. Three are enough.
Citation share. The percentage of query-engine combinations where your brand appears in the generated answer. Citation share is the AI search equivalent of a rankings report — one number, tracked over time, that tells an executive whether visibility is improving. Calculate it overall and per engine, because the engines behave differently and a brand can be strong in Perplexity while invisible in AI Overviews.
Description accuracy. Of the answers where you do appear, the share where the description is accurate and current. A simple three-level grade — accurate, partially accurate, wrong — is sufficient. An engine citing you with outdated or incorrect information can be worse than not appearing at all.
Competitor citation frequency. A count of how often each competitor appears across the set. Rank it. The top of that list is the competitive map for AI search in your category, and for most organisations it is the first time anyone has seen it.
The first run is the baseline. It will probably be uncomfortable, and that is the point — every later run is measured against it, which turns AI visibility from an anxiety into a tracked metric.
Repeat it on a cadence
One run is a snapshot; the value is in the series. Monthly is the right cadence for most organisations — frequent enough to catch movement, infrequent enough to sustain. Keep the query set frozen between runs, and when you do add queries, add them alongside the original set rather than replacing it, so the baseline remains comparable.
Cadence matters for a second reason: AI answers are non-deterministic. The same query can produce different citations on different days, so visibility is properly measured as a rate over repeated runs, not a single snapshot. A citation that appears in one run and vanishes in the next is not a data error — it is what an unstable position looks like, and only a series reveals it.
Read the results as a diagnosis
The numbers tell you where you stand. Reading why requires a model of how the engines choose, and four factors do most of the explanatory work: content structure, entity clarity, authority signals and consistency of facts.
If competitors are cited for questions you have never answered in a dedicated, liftable passage, the gap is content structure — the engines extract passages, not pages, and they cannot cite an answer you have not written. If the engines describe you vaguely, confuse you with another organisation, or hedge on basic facts, the gap is entity clarity — schema, naming and organisational facts that let a model treat you as a known thing. If your content answers the question well but the citations keep going to publishers and directories, the gap is authority — the engines are choosing sources they already trust, and your version of the facts lacks third-party corroboration. And if your description varies engine to engine, check what the web says about you — conflicting facts across your site, directories and coverage give a model reason to hedge or cite someone whose story holds together.
The mechanics behind these factors — why models select the sources they do — are covered in more depth in what makes a model cite one brand over another. For the audit itself, the four factors are enough: every gap the spreadsheet surfaces will trace back to at least one of them, and that tracing is what converts a measurement exercise into a work program.
The honest limits of doing it yourself
This method is real and the results are usable. It also has limits worth stating plainly, because they are the difference between a self-run check and a professional audit.
Query sampling. Fifteen to thirty queries is a sample of a much larger space. The engines fan a single question out into many related searches, and buyers phrase the same intent dozens of ways. A small sample can miss the phrasings where the real contest is happening, and choosing a genuinely representative set is harder than it looks from inside the brand.
Personalisation and variance. Logged-out sessions reduce personalisation; they do not remove it, and they do not remove run-to-run variance. Reading signal from noise across four engines takes either statistical care or enough repeated runs that patterns settle — both of which cost time.
Time cost and interpretation. The recording is mechanical; the reading is not. Tracing a citation gap to its cause, ranking the fixes by commercial impact, and knowing which gaps are worth closing at all is judgement work, and it is where self-run audits most often stall — a spreadsheet full of findings and no sequenced plan.
None of that is a reason not to run the method. Run it. A brand that measures its own AI visibility monthly is doing something very few organisations in its category have started. But if the baseline is confronting, or the findings need to carry weight with a board, the fixed-scope AI search visibility audit is the senior-led version of exactly this method — the same questions, run with a defensible query set and documented evidence, ending in a competitive read and a roadmap ranked by commercial impact rather than a spreadsheet awaiting interpretation.