Technical SEO checklist for the AI era

The technical SEO checklist has been growing for twenty years. Every algorithm update added items; almost nothing was ever removed. The result, in most organisations, is a checklist of a couple of hundred items in which a handful genuinely matter, a few dozen mattered once, and the rest exist because a tool flags them and nobody has the authority to say they are noise.

AI search is a good moment to say it. The engines that now answer buying questions — AI Overviews, ChatGPT, Perplexity, Copilot — did not add a new technical discipline. They raised the price of failing the old one, and they made most of the rest irrelevant. Technical SEO in the AI era is a shorter list held to a higher standard.

This article makes the split honestly: what still matters and why, mechanically; what stopped mattering or never did; and the small set of new items the old checklists do not contain.

Why the split exists at all

The mechanics decide the list, so they are worth stating once. AI answer engines do not maintain a private map of the web. They retrieve — from a search index, a crawl store, or a live fetch — and then generate an answer grounded in what came back. Google’s AI Overviews draw on Google’s index. ChatGPT’s search features depend on OpenAI’s own crawling. Perplexity retrieves and reads pages in close to real time.

That retrieval step is the entire reason technical SEO survives. A page that cannot be crawled cannot be retrieved; a page that cannot be retrieved cannot be cited. The infrastructure questions — can a machine reach this content, read it, and understand what it is — matter exactly as much as they did in 2010, and for the same reason. What changed is everything downstream of retrieval. Ranking signals that decided position four versus position seven decide very little in an answer that names three brands and ignores the rest — how engines choose those brands is its own subject. The technical layer is now pass/fail: it cannot win you the citation, but it can silently disqualify you from it.

Everything below follows from that.

Still matters: crawlability and indexation

This is the item that carries the list. If crawlers cannot reach the page — robots.txt rules written years ago for a site that no longer exists, authentication walls, orphaned pages with no internal links, redirect chains that expire the crawler’s patience — the page does not exist to any engine, traditional or generative. Indexation is the same gate one step later: a page crawled but not indexed, because it is duplicated, thin, or accidentally noindexed, is equally absent from the answer.

The reason this deserves standing attention is not that it is complicated. It is that it regresses. Every replatform, every staging deployment, every well-meaning developer copying a robots.txt across environments can quietly de-index commercially important pages, and nothing visible breaks. The site works; the traffic erodes; the citations never appear. Crawl and indexation status of the pages that answer buying questions is one of the few things worth checking on a schedule, because failure is silent and total.

Still matters: rendering

AI-era retrieval sharpened an old problem. Content that exists only after JavaScript executes — client-rendered product details, specifications loaded into tabs, answers assembled in the browser — depends on the crawler being willing to render it. In practice Google largely will, at a delay. Many AI crawlers and live-fetch systems largely will not: they take the server’s HTML response and move on.

The practical rule is blunt. Any content a brand wants quoted in an answer should be present in the server-rendered HTML — visible in view-source, not just in the browser. Server-side rendering or static generation for commercially important pages is no longer a performance preference; it is the difference between content that every retrieval system can read and content that only the most patient one can. Sites built as JavaScript applications with content injected at runtime are structurally invisible to a meaningful share of the systems now answering their buyers’ questions, and no amount of content quality compensates for markup the reader never receives.

Still matters: speed — but at the extremes

Site speed remains real, and the honest version is unfashionable: it matters at the extremes and barely anywhere else. A page that takes eight seconds to respond wastes crawl budget, fails live fetches, and loses human readers before the first paragraph. Fixing that is high-value work.

Shaving the last hundred milliseconds off an already-fast page is not. The industry built a cottage economy around chasing perfect Core Web Vitals scores, and the returns diminish sharply once a site is simply, ordinarily fast. A slow site is a technical finding worth budget. A fast site made marginally faster is an invoice.

Still matters: structured data, architecture, canonical hygiene

Three items share a mechanism, which is disambiguation — helping a machine be certain what a page is and which version of it counts.

Structured data earns its place because generative systems reason over entities, not just text. Schema that declares the organisation, its services, its people and its location in machine-readable terms gives every retrieval system a version of the facts that cannot be misread — the implementation detail sits in a companion article on the machine-readable brand layer, and the broader discipline of being a clearly-defined entity is covered under entity SEO. The point here is simply that schema moved from rich-snippet decoration to entity infrastructure.

Clean information architecture matters because retrieval is passage-level and context-hungry. A site where URL structure, internal links and headings agree about what each page is for gives an engine confidence about which page answers which question. A site where five pages half-answer the same question gives it a reason to cite none of them.

Canonical hygiene is the quiet one. Parameter duplicates, http/https splits, trailing-slash variants and syndicated copies without canonicals fragment one page’s standing across several URLs. Engines that cross-check sources treat unresolved duplication the way they treat any inconsistency — as a reason to prefer a cleaner competitor.

What stopped mattering — or never did

Now the other half of the ledger, which is longer.

Keyword-density rituals are dead, and generative retrieval killed them twice over — the systems reading pages today are language models, and they read meaning, not term frequency. The meta keywords tag has been ignored by Google for well over a decade and still appears in audits as a finding. Chasing 100/100 scores in auditing tools is a category error: the tools measure what is measurable, not what is commercial, and a 96 on a fast, crawlable, well-structured site is a number, not a problem.

The larger, less discussed category is the one-time item billed as ongoing work. Most entries on a typical technical SEO checklist — XML sitemaps, canonical tags, redirect maps, robots.txt, hreflang, HTTPS — are hygiene: configured correctly once, they stay correct until the site materially changes. They justify a thorough initial fix and a periodic check. They do not justify a permanent monthly line item, and a large share of technical SEO retainers are precisely that — one-time hygiene wearing the costume of an ongoing program, re-verified monthly and re-billed monthly because nobody asked the difference between monitoring and doing.

None of this means the items are worthless. It means they are finished, and finished work should stop being paid for.

The new entrants

Three items belong on the list that the old checklists do not contain.

The first is the AI crawler access decision. Sites now receive distinct crawlers with distinct purposes, and the controls are independent of each other. OpenAI’s crawler documentation distinguishes, among others: OAI-SearchBot, which surfaces sites in ChatGPT’s search answers; GPTBot, which gathers training data; and ChatGPT-User, which fetches pages on a user’s behalf. Search inclusion is controlled via OAI-SearchBot, not GPTBot — OpenAI directs publishers to use OAI-SearchBot in robots.txt for search opt-outs. Google’s equivalent control, Google-Extended, governs whether content trains or grounds Gemini models and, per Google’s documentation, does not affect inclusion or ranking in Google Search. These are distribution decisions with commercial consequences — who may read the content, who may answer with it — currently being made by default in robots.txt files nobody has reviewed since the crawlers appeared. Deciding them deliberately is new technical work, and it is strategy, not configuration.

The second is extractable content structure: whether the pages that answer buying questions contain passages that survive being quoted alone — the answer stated declaratively, near the top, under a heading that names the question. This sits on the border of technical and editorial, and it is examined properly in the citation framework; it earns a mention here because templates decide it as much as writers do.

The third is entity markup treated as core infrastructure rather than an enhancement — already covered above, listed again only because its priority changed more than any other item on the list.

A shorter list, and what to do with it

Put together, the AI-era technical position is this: crawlability and indexation of the pages that matter, server-rendered content, absence of genuine speed failures, accurate structured data, coherent architecture, resolved duplication, and deliberate AI-crawler decisions. That is the whole list. Everything on it is verifiable, most of it is fixable once, and very little of it justifies recurring spend once fixed.

Which is why the right frame is governance, not activity. The commercial questions for a marketing leader are simple: is the short list actually right on our site, and are we still paying for the long list? Both are audit findings. The first is checked against the site itself in days; the second is checked against the invoices. Technical health on exactly these terms — what is broken, what is finished, what is being billed as ongoing that is not — is part of what an independent marketing audit verifies, alongside the paid accounts and the measurement the technical layer feeds.

The era did not complicate technical SEO. It clarified it. A machine either reaches, reads and understands your pages or it does not — and for the first time, almost everything else on the checklist can be safely crossed off.

Technical SEO in the AI era: what still matters, what stopped mattering.

Why the split exists at all

Still matters: crawlability and indexation

Still matters: rendering

Still matters: speed — but at the extremes

Still matters: structured data, architecture, canonical hygiene

What stopped mattering — or never did

The new entrants

A shorter list, and what to do with it

Let’s talk about what’s next.

Why the split exists at all

Still matters: crawlability and indexation

Still matters: rendering

Still matters: speed — but at the extremes

Still matters: structured data, architecture, canonical hygiene

What stopped mattering — or never did

The new entrants

A shorter list, and what to do with it

Where this fits in the practice.

Content architecture: why some sites rank and most blogs don't

Measuring organic in a zero-click world: what replaces traffic as the KPI

How to audit your own AI search visibility

Let’s talk about what’s next.