RETURN_TO_BLOG
AI & SEO 14 min

Topical Authority — Building Topic Clusters and Internal Links That Google and AI Trust

Paweł Wiszniewski
Paweł Wiszniewski
SEO & GEO Specialist · AI Engineer

Topical authority is the degree to which a search engine — and, increasingly, an AI model — treats your domain as a credible, complete source of knowledge in a specific field. For years it was a "practitioners' folklore" concept, but since the Google Content Warehouse documentation leak (May 2024) we know topical concentration is measured directly: the siteFocusScore attribute (how strongly a domain concentrates on a topic) and siteRadius (how far a given page deviates from the site's topical core) are computed on domain-level embeddings. The practical conclusion of this article fits in one sentence: build complete content clusters around a narrow topic, bind them with internal links and descriptive anchors — and avoid topical sprawl and cannibalization.

The Google Content Warehouse leak confirmed that a domain's topical concentration is measured mathematically (siteFocusScore, siteRadius), and a 23-million-link study showed 4× more clicks for well-linked pages. The complete guide: the hub-and-spoke model, designing a topical map, data-driven internal linking and defending against cannibalization.

This is the third missing piece of the puzzle I've been building on this blog: E-E-A-T answers "who to trust", entity SEO — "who/what is this content about", and topical authority — "does this source know the topic exhaustively". In this guide: how we know topical authority exists algorithmically, how to design a hub-and-spoke cluster, what the data says about internal linking (Zyppy's 23-million-link study) and how not to kill the effect with cannibalization.

How we know topical authority exists — evidence, not folklore

The concept's history has three turning points:

  1. 1.Hummingbird (2013) — Google moves from matching phrases to understanding meanings and intent. From then on, the "topic" is the unit of understanding, not the keyword.
  2. 2.Topic Layer (2018) — Google officially announces a topic layer in the Knowledge Graph: hundreds of millions of topics connected by relationships, with an assessment of how a site's content covers a topic over time.
  3. 3.The Content Warehouse leak (May 2024) — the exposed API documentation includes, among others, the siteFocusScore, siteRadius and siteEmbeddings attributes. Topical authority stops being a hypothesis — it's a computable signal.

/// GOOGLE CONTENT WAREHOUSE LEAK (2024) — TOPICAL ATTRIBUTES

Topical authority is not folklore — it is a computable, embedding-based signal

siteFocusScore
How strongly the domain concentrates on one topic. A higher value = a sharper topical profile of the site.
siteRadius
How far a given page deviates from the domain's topical core. Content far from the core may be scored worse.
siteEmbeddings
A vector representation of the whole domain — a mathematical "summary" of what the site is about, computed from all pages.

How does it work technically? Google builds a vector representation (embedding) of the whole domain — a mathematical "summary" of what the site is about. Every new page gets its own embedding, and the system can compute (e.g. via cosine similarity) how close it sits to the topical core. A page far from the core raises the siteRadius — and according to analyses of the leak, may be scored worse because it "doesn't fit" what the domain knows best.

For a practitioner this means two things. First, a narrow, deep domain has a structural advantage over a broad portal writing "about everything" — a smaller site with a sharp profile can beat a giant with a blurry one. Second, every off-topic publication has a cost: it dilutes the domain embedding and weakens the concentration signal.

Why topical authority counts double in the AI era

Generative search engines amplified the weight of topic coverage for a very specific reason: query fan-out. Google AI Mode splits the user's question into a series of sub-queries and hunts for sources for each one separately — and a domain that covers the topic completely can answer many sub-queries at once. I break down the fan-out mechanics in the Google AI Overviews & AI Mode guide.

Add RAG mechanics: models pull semantically matching fragments from the index. A site with ten deep articles around one topic exposes dozens of "citable" fragments; a site with one shallow post — a few. Google's official guide to optimizing for AI features (May 2026) says nothing about tricks — it talks about unique value and complete intent coverage. That is precisely the definition of topical authority.

The hub-and-spoke model: a pillar page plus its cluster

The standard topical architecture is the hub-and-spoke model:

/// THE HUB-AND-SPOKE MODEL — CLUSTER ANATOMY

Bi-directional linking turns a set of articles into a cluster

PILLAR PAGE (HUB)
The complete guide to the main topic (2,500+ words) · targets the head phrase · links to every spoke
↑↓ bi-directional links
Spoke: subtopic 1 (long-tail)
→ pillar + 2–3 neighboring spokes
Spoke: comparison ("X vs Y")
→ pillar + 2–3 neighboring spokes
Spoke: case study (Experience)
→ pillar + 2–3 neighboring spokes
Spoke: service page (conversion)
→ pillar + 2–3 neighboring spokes
  • The pillar page (hub) — a broad, complete guide to the main topic (2,500+ words), targeting the widest phrase. It answers all the key questions at an overview level and links to every spoke.
  • Spokes (cluster content) — narrower articles deepening single subtopics, each targeting a specific long-tail intent. Every spoke links back to the pillar and to 2–3 neighboring spokes.
  • Bi-directional linking — this is what turns a set of articles into a cluster: it signals to the search engine that the pages form a whole and distributes authority from the pillar down and back up.

The practical architecture rule: every important page at most 3 clicks from the homepage. Content buried deeper gets less crawl budget and fewer signals — I describe how that works in the crawl budget article.

Cluster elementRoleExample (an "invoice automation" cluster)
Pillar pageThe complete topic guide, targets the head phrase"Invoice workflow automation — the complete guide"
Informational spokeDeepens a subtopic, targets long-tail"How AI reads invoices from email into your ERP"
Comparison spokeSupports decisions, targets "vs" and "which" phrases"n8n vs Make vs Zapier — a comparison"
Commercial spokeConverts, targets service phrasesThe "AI Automation" service page linked from the cluster
Experience contentE-E-A-T proof, AI citabilityA case study with implementation numbers

How to design a topical map

You design a cluster before writing, not after. The proven order:

  1. 1.Define the central entity. One sentence: "my domain is the expert on X." If you can't finish that sentence, you have a strategy problem, not a content problem.
  2. 2.Derive subtopics from intent, not keywords. Four buckets: what the user wants to *know* (informational), *compare* (commercial research), *do* (instructional), *buy* (transactional). The phrases will follow.
  3. 3.Collect real questions. "People Also Ask" sections, customer questions from emails and sales calls, industry threads. In the fan-out era, every question is a potential sub-query your cluster should have an answer for.
  4. 4.Map existing content onto the structure. What you already have, what needs updating, what's missing — and which old posts do NOT fit the topic (candidates for removal or rewriting; remember siteRadius).
  5. 5.Set the publication order. The pillar first (a "version 1.0" you'll expand is fine), then spokes by intent priority. A cluster left incomplete for a year is a cluster that doesn't work.

The largest public study of internal linking (Zyppy: 23M links, 1,800 sites, ~520K URLs matched with Search Console data) produced numbers worth memorizing:

/// INTERNAL LINKING IN NUMBERS

more Google clicks for pages with 40–44 internal links vs 0–4 links
Zyppy — 23M links
more traffic for pages with at least one descriptive (exact-match) anchor
Zyppy — 23M links
45–50
inbound links — above this threshold the effect reverses and the signal dilutes
Zyppy — 23M links
≤3 clicks
from the homepage to every important page — the cluster architecture rule
Hub-and-spoke practice
  • Pages with 40–44 internal links earned ~4× more Google clicks than pages with 0–4 links. Internal links aren't cosmetics — they're one of the cheapest traffic levers you have.
  • Above ~45–50 inbound internal links the effect reverses — the signal dilutes. More isn't better; distribution to the right pages is.
  • Pages with at least one exact-match anchor had ~5× more traffic than pages without one. A "see more" anchor is a wasted link — a descriptive anchor tells the search engine and the AI model what the target page is about.

An honesty caveat: this is a correlational study, not an experiment — treat the numbers as strong directional guidance, not laws of physics. The practical rules that follow from it (and from practice):

  • Link contextually from body copy, not only from navigation and the footer — in-text links carry semantic context.
  • Vary descriptive anchors around the same intent ("AI visibility audit", "how to check your brand's AI visibility") instead of repeating one pattern.
  • Breadcrumbs + a logical URL hierarchy — a cheap, systemic linking layer.
  • Zero orphans: every published page needs at least 2–3 inbound links from its cluster. A page without internal links is a page that doesn't exist.

Cannibalization — the shadow of badly built clusters

A cluster builds authority only when one intent has one URL. When two articles target the same question, Google (and AI models) must choose between them — the signals split, and often both lose. Symptoms: two pages "flip-flopping" in results for the same phrase, neither holding a stable position, CTR lower than the position would suggest.

How to diagnose and fix:

  1. 1.Detection: the Search Console performance report filtered by query — if two URLs collect impressions for the same query, you have a candidate. Auxiliary check: site:yourdomain.com phrase.
  2. 2.Decision: which URL is stronger (age, inbound links, slug-to-intent match)?
  3. 3.Fix: merge the content into the winner (move over the loser's unique value) and 301 the losing URL. For partial overlap, sharpening titles/metas and separating intents is enough.
  4. 4.Close the loop: update internal links so they don't route through the redirect.

First-hand: I ran exactly this operation on this blog — two broad GEO guides competed for the same head phrase, so I merged them into one Generative Engine Optimization pillar with a 301 on the weaker URL. A cluster of 60+ articles needs this review every few months — cannibalization emerges naturally as a topic accretes content.

The step-by-step rollout plan

  1. 1.Audit your topical concentration. List all publications and assign them to topics. What share of content sits outside the core? That's your approximate "siteRadius to fix".
  2. 2.Pick 1–3 strategic clusters — where competence, demand and business value intersect. Better to close one cluster than open five.
  3. 3.Design the topical map (process above) with a list of pillars and spokes and each page's intent.
  4. 4.Write or rebuild the pillar in answer-first format: direct answers at the top of sections, tables, data, an FAQ.
  5. 5.Publish spokes in series and link bi-directionally from day one (pillar ↔ spoke, spoke ↔ 2–3 neighbors).
  6. 6.Link backwards. After every new publication, add 2–3 links from existing cluster content with descriptive anchors.
  7. 7.Clean up off-topic content. Update and fold into clusters, rewrite closer to the core — and remove no-value, no-traffic content with a redirect to the topically nearest URL.
  8. 8.Close the entity and trust layer. Consistent structured data, an author bio on every piece, sources next to claims — a cluster without E-E-A-T is a skeleton without muscles.
  9. 9.A quarterly cannibalization review (process above) plus pillar updates with a visible date.
  10. 10.Measure like an analyst: positions and clicks for the whole cluster (not single pages), the share of topic queries you answer in the top 10, and your Share of Voice in AI answers.

The most common mistakes

  • Publishing for volume instead of coverage. Fifty shallow posts won't build the authority twelve deep ones covering every intent will.
  • A cluster without linking. Articles on a shared topic without mutual links aren't a cluster — they're an archive.
  • "Click here" anchors. They waste the cheapest semantic signal you have.
  • Ignoring the cost of off-topic content. An "about everything" post on an expert domain dilutes the site's embedding (siteRadius grows).
  • Opening clusters without closing them. Five half-done topics lose to one complete one.
  • Confusing topical authority with word count. Intent coverage and unique value matter, not volume — a long empty text loses in RAG to a short dense one.

Summary

Topical authority has stopped being SEO folklore: the Content Warehouse leak showed Google computes a domain's topical concentration mathematically, and the AI era raised the stakes — query fan-out and RAG reward domains that cover a topic completely. The recipe is known and measurable: a sharply defined central entity, hub-and-spoke clusters with bi-directional linking, descriptive anchors, a quarterly cannibalization review, and patience measured in months.

Strategically: start with one cluster at the intersection of competence and business value, close it before opening the next — and measure results at the cluster level, in AI answers too, not just in rankings.

---

I design topical maps and build content clusters that Google and AI models recognize — as part of technical SEO and AI optimization (GEO). I teach it in the SEO & GEO course. Get in touch — I'll start with an audit of your domain's topical concentration and a map of the missing content.

Paweł Wiszniewski – SEO & GEO Specialist & AI Engineer
About the authorPaweł Wiszniewski

SEO & GEO specialist and AI engineer from Białystok. 10 years building search visibility for recognized brands and 3 years delivering AI — agents, automation and LLM integrations (Next.js, React, Node.js).

/// AUTHOR
Paweł Wiszniewski – AI & Web Engineer

Paweł Wiszniewski

SEO & GEO Specialist & AI Engineer

SEO/GEO specialist (10 years) and AI engineer (3 years). I build search visibility, AI systems and automations that reduce costs and improve operational efficiency.

Signal received?

Terminate
Silence

Initiate protocol. Establish connection. Let's build something loud.

> WAITING_FOR_INPUT...