Topical Authority — Building Topic Clusters and Internal Links That Google and AI Trust
Topical authority is the degree to which a search engine — and, increasingly, an AI model — treats your domain as a credible, complete source of knowledge in a specific field. For years it was a "practitioners' folklore" concept, but since the Google Content Warehouse documentation leak (May 2024) we know topical concentration is measured directly: the siteFocusScore attribute (how strongly a domain concentrates on a topic) and siteRadius (how far a given page deviates from the site's topical core) are computed on domain-level embeddings. The practical conclusion of this article fits in one sentence: build complete content clusters around a narrow topic, bind them with internal links and descriptive anchors — and avoid topical sprawl and cannibalization.
The Google Content Warehouse leak confirmed that a domain's topical concentration is measured mathematically (siteFocusScore, siteRadius), and a 23-million-link study showed 4× more clicks for well-linked pages. The complete guide: the hub-and-spoke model, designing a topical map, data-driven internal linking and defending against cannibalization.
This is the third missing piece of the puzzle I've been building on this blog: E-E-A-T answers "who to trust", entity SEO — "who/what is this content about", and topical authority — "does this source know the topic exhaustively". In this guide: how we know topical authority exists algorithmically, how to design a hub-and-spoke cluster, what the data says about internal linking (Zyppy's 23-million-link study) and how not to kill the effect with cannibalization.
How we know topical authority exists — evidence, not folklore
The concept's history has three turning points:
- 1.Hummingbird (2013) — Google moves from matching phrases to understanding meanings and intent. From then on, the "topic" is the unit of understanding, not the keyword.
- 2.Topic Layer (2018) — Google officially announces a topic layer in the Knowledge Graph: hundreds of millions of topics connected by relationships, with an assessment of how a site's content covers a topic over time.
- 3.The Content Warehouse leak (May 2024) — the exposed API documentation includes, among others, the siteFocusScore, siteRadius and siteEmbeddings attributes. Topical authority stops being a hypothesis — it's a computable signal.
/// GOOGLE CONTENT WAREHOUSE LEAK (2024) — TOPICAL ATTRIBUTES
Topical authority is not folklore — it is a computable, embedding-based signal
How does it work technically? Google builds a vector representation (embedding) of the whole domain — a mathematical "summary" of what the site is about. Every new page gets its own embedding, and the system can compute (e.g. via cosine similarity) how close it sits to the topical core. A page far from the core raises the siteRadius — and according to analyses of the leak, may be scored worse because it "doesn't fit" what the domain knows best.
For a practitioner this means two things. First, a narrow, deep domain has a structural advantage over a broad portal writing "about everything" — a smaller site with a sharp profile can beat a giant with a blurry one. Second, every off-topic publication has a cost: it dilutes the domain embedding and weakens the concentration signal.
Why topical authority counts double in the AI era
Generative search engines amplified the weight of topic coverage for a very specific reason: query fan-out. Google AI Mode splits the user's question into a series of sub-queries and hunts for sources for each one separately — and a domain that covers the topic completely can answer many sub-queries at once. I break down the fan-out mechanics in the Google AI Overviews & AI Mode guide.
Add RAG mechanics: models pull semantically matching fragments from the index. A site with ten deep articles around one topic exposes dozens of "citable" fragments; a site with one shallow post — a few. Google's official guide to optimizing for AI features (May 2026) says nothing about tricks — it talks about unique value and complete intent coverage. That is precisely the definition of topical authority.
The hub-and-spoke model: a pillar page plus its cluster
The standard topical architecture is the hub-and-spoke model:
/// THE HUB-AND-SPOKE MODEL — CLUSTER ANATOMY
Bi-directional linking turns a set of articles into a cluster
- The pillar page (hub) — a broad, complete guide to the main topic (2,500+ words), targeting the widest phrase. It answers all the key questions at an overview level and links to every spoke.
- Spokes (cluster content) — narrower articles deepening single subtopics, each targeting a specific long-tail intent. Every spoke links back to the pillar and to 2–3 neighboring spokes.
- Bi-directional linking — this is what turns a set of articles into a cluster: it signals to the search engine that the pages form a whole and distributes authority from the pillar down and back up.
The practical architecture rule: every important page at most 3 clicks from the homepage. Content buried deeper gets less crawl budget and fewer signals — I describe how that works in the crawl budget article.
| Cluster element | Role | Example (an "invoice automation" cluster) |
|---|---|---|
| Pillar page | The complete topic guide, targets the head phrase | "Invoice workflow automation — the complete guide" |
| Informational spoke | Deepens a subtopic, targets long-tail | "How AI reads invoices from email into your ERP" |
| Comparison spoke | Supports decisions, targets "vs" and "which" phrases | "n8n vs Make vs Zapier — a comparison" |
| Commercial spoke | Converts, targets service phrases | The "AI Automation" service page linked from the cluster |
| Experience content | E-E-A-T proof, AI citability | A case study with implementation numbers |
How to design a topical map
You design a cluster before writing, not after. The proven order:
- 1.Define the central entity. One sentence: "my domain is the expert on X." If you can't finish that sentence, you have a strategy problem, not a content problem.
- 2.Derive subtopics from intent, not keywords. Four buckets: what the user wants to *know* (informational), *compare* (commercial research), *do* (instructional), *buy* (transactional). The phrases will follow.
- 3.Collect real questions. "People Also Ask" sections, customer questions from emails and sales calls, industry threads. In the fan-out era, every question is a potential sub-query your cluster should have an answer for.
- 4.Map existing content onto the structure. What you already have, what needs updating, what's missing — and which old posts do NOT fit the topic (candidates for removal or rewriting; remember siteRadius).
- 5.Set the publication order. The pillar first (a "version 1.0" you'll expand is fine), then spokes by intent priority. A cluster left incomplete for a year is a cluster that doesn't work.
Internal linking — what 23 million links say
The largest public study of internal linking (Zyppy: 23M links, 1,800 sites, ~520K URLs matched with Search Console data) produced numbers worth memorizing:
/// INTERNAL LINKING IN NUMBERS
- Pages with 40–44 internal links earned ~4× more Google clicks than pages with 0–4 links. Internal links aren't cosmetics — they're one of the cheapest traffic levers you have.
- Above ~45–50 inbound internal links the effect reverses — the signal dilutes. More isn't better; distribution to the right pages is.
- Pages with at least one exact-match anchor had ~5× more traffic than pages without one. A "see more" anchor is a wasted link — a descriptive anchor tells the search engine and the AI model what the target page is about.
An honesty caveat: this is a correlational study, not an experiment — treat the numbers as strong directional guidance, not laws of physics. The practical rules that follow from it (and from practice):
- Link contextually from body copy, not only from navigation and the footer — in-text links carry semantic context.
- Vary descriptive anchors around the same intent ("AI visibility audit", "how to check your brand's AI visibility") instead of repeating one pattern.
- Breadcrumbs + a logical URL hierarchy — a cheap, systemic linking layer.
- Zero orphans: every published page needs at least 2–3 inbound links from its cluster. A page without internal links is a page that doesn't exist.
Cannibalization — the shadow of badly built clusters
A cluster builds authority only when one intent has one URL. When two articles target the same question, Google (and AI models) must choose between them — the signals split, and often both lose. Symptoms: two pages "flip-flopping" in results for the same phrase, neither holding a stable position, CTR lower than the position would suggest.
How to diagnose and fix:
- 1.Detection: the Search Console performance report filtered by query — if two URLs collect impressions for the same query, you have a candidate. Auxiliary check: site:yourdomain.com phrase.
- 2.Decision: which URL is stronger (age, inbound links, slug-to-intent match)?
- 3.Fix: merge the content into the winner (move over the loser's unique value) and 301 the losing URL. For partial overlap, sharpening titles/metas and separating intents is enough.
- 4.Close the loop: update internal links so they don't route through the redirect.
First-hand: I ran exactly this operation on this blog — two broad GEO guides competed for the same head phrase, so I merged them into one Generative Engine Optimization pillar with a 301 on the weaker URL. A cluster of 60+ articles needs this review every few months — cannibalization emerges naturally as a topic accretes content.
The step-by-step rollout plan
- 1.Audit your topical concentration. List all publications and assign them to topics. What share of content sits outside the core? That's your approximate "siteRadius to fix".
- 2.Pick 1–3 strategic clusters — where competence, demand and business value intersect. Better to close one cluster than open five.
- 3.Design the topical map (process above) with a list of pillars and spokes and each page's intent.
- 4.Write or rebuild the pillar in answer-first format: direct answers at the top of sections, tables, data, an FAQ.
- 5.Publish spokes in series and link bi-directionally from day one (pillar ↔ spoke, spoke ↔ 2–3 neighbors).
- 6.Link backwards. After every new publication, add 2–3 links from existing cluster content with descriptive anchors.
- 7.Clean up off-topic content. Update and fold into clusters, rewrite closer to the core — and remove no-value, no-traffic content with a redirect to the topically nearest URL.
- 8.Close the entity and trust layer. Consistent structured data, an author bio on every piece, sources next to claims — a cluster without E-E-A-T is a skeleton without muscles.
- 9.A quarterly cannibalization review (process above) plus pillar updates with a visible date.
- 10.Measure like an analyst: positions and clicks for the whole cluster (not single pages), the share of topic queries you answer in the top 10, and your Share of Voice in AI answers.
The most common mistakes
- Publishing for volume instead of coverage. Fifty shallow posts won't build the authority twelve deep ones covering every intent will.
- A cluster without linking. Articles on a shared topic without mutual links aren't a cluster — they're an archive.
- "Click here" anchors. They waste the cheapest semantic signal you have.
- Ignoring the cost of off-topic content. An "about everything" post on an expert domain dilutes the site's embedding (siteRadius grows).
- Opening clusters without closing them. Five half-done topics lose to one complete one.
- Confusing topical authority with word count. Intent coverage and unique value matter, not volume — a long empty text loses in RAG to a short dense one.
Summary
Topical authority has stopped being SEO folklore: the Content Warehouse leak showed Google computes a domain's topical concentration mathematically, and the AI era raised the stakes — query fan-out and RAG reward domains that cover a topic completely. The recipe is known and measurable: a sharply defined central entity, hub-and-spoke clusters with bi-directional linking, descriptive anchors, a quarterly cannibalization review, and patience measured in months.
Strategically: start with one cluster at the intersection of competence and business value, close it before opening the next — and measure results at the cluster level, in AI answers too, not just in rankings.
---
I design topical maps and build content clusters that Google and AI models recognize — as part of technical SEO and AI optimization (GEO). I teach it in the SEO & GEO course. Get in touch — I'll start with an audit of your domain's topical concentration and a map of the missing content.
Related articles
- E-E-A-T in 2026 — building trust for Google and AI
- Entity SEO and the Knowledge Graph — semantic optimization
- Google AI Overviews & AI Mode — optimization
- GEO — Generative Engine Optimization: guide & strategy
- Link Building 2026 — how to earn links that build authority
- Crawl Budget — how Google indexes a large site

SEO & GEO specialist and AI engineer from Białystok. 10 years building search visibility for recognized brands and 3 years delivering AI — agents, automation and LLM integrations (Next.js, React, Node.js).
/// RELATED_SERVICES
Need these concepts implemented? Explore the services related to this topic.
Technical SEO
Search engine dominance. Technical SEO that devours the competition.
View serviceServiceContent Marketing & SEO
Content that sells. Articles that rank #1 and convert readers into customers.
View serviceServiceAI-GEO
Optimization for AI engines. Be the answer ChatGPT and Gemini give.
View service/// SOURCES
- 01Hobo – Topical Authority: Site Radius & Site Focus Score from the Google Leak
- 02Szymon Słowik – SiteFocus, siteRadius and topical authority in SEO
- 03Zyppy – 23 Million Internal Links: SEO Case Study
- 04Search Engine Land – The complete guide to topic clusters and pillar pages
- 05Google – Optimizing your website for generative AI features
/// RELATED_RECORDS
Google AI Overviews & AI Mode — How to Optimize Your Site So AI Cites You
AI Overviews already trigger on a large share of queries, and a #1-ranked page loses up to 58% CTR when they appear. In May 2026 Google published its first official AI optimization guide. What's required, what's a myth (llms.txt, special markup), how AI Mode's query fan-out works, and what measurably increases your citation odds.
E-E-A-T in 2026 — How to Build the Trust Google Rewards and AI Models Cite
Google says it plainly: trust is the most important part of E-E-A-T. And AI search engines cite the sources they trust. What Google's systems and AI models actually evaluate, which signals you control, and how to turn them into a plan — with hard data from the GEO study (KDD 2024), SE Ranking and Ahrefs.
llms.txt — Does It Actually Help AI Visibility? The Complete 2026 Guide
llms.txt is a Markdown file meant to point AI models to your most important content. But do Google, ChatGPT and Perplexity actually use it? The hard Ahrefs and SE Ranking data, Google's position (John Mueller), and the one case where llms.txt really works.
Signal received?
Terminate
Silence
Initiate protocol. Establish connection. Let's build something loud.
