Citemap.json — The Open Standard for AI Entity Identity

Concept

What is a citemap.json?

A citemap.json is a structured JSON file that you publish at the root of your website — like https://example.com/citemap.json — that gives AI systems a complete, authoritative, machine-readable declaration of who or what you are.

Every day, AI systems — ChatGPT, Perplexity, Claude, Gemini, and the next generation of AI agents — describe, recommend, cite, and answer questions about entities on the web. Businesses, researchers, professionals, nonprofits, government agencies, artists, and healthcare providers are all described and recommended (or not) based on whatever the AI has ingested. Almost none of them have a structured way to participate in that process.

Citemap.json changes that. It is the mechanism by which any entity on the web can tell AI systems:

Who they are and what they do
What they want to be cited and recommended for
What claims they're making and how those claims can be verified
What has changed, what is disputed, and what is outdated
What they explicitly do not want AI to say about them

Core Principle

Citemap.json does not claim to be a truth oracle. It is a structured, machine-readable provenance layer — one that gives AI systems the signals to distinguish self-reported claims from registry-verified facts, fresh data from stale data, and authoritative sources from guesses.

The format is designed to be read by AI crawlers, validators, and inference systems — not just humans. Every field has a defined semantic meaning. The trust architecture gives AI explicit handling instructions for different confidence levels. The dispute system tells AI exactly what to do when data is contested.

The Three Problems It Solves

Accuracy. AI systems generate descriptions from training data, press coverage, old web pages, Wikipedia drafts, and social media. The result is often wrong, outdated, or missing the most important facts. You have no structured way to correct it. Citemap.json is that mechanism.

Discovery. AI recommendation engines surface businesses, creators, researchers, and professionals constantly. Without structured identity data, AI guesses — and guesses wrong. If AI doesn't know what you're notable for, it won't recommend you for it. The answerContent[] array in the Universal Core module is explicitly designed to feed AI the queries you want to appear in.

Trust. Self-reported data and registry-verified data look identical to an AI that has no structured way to distinguish them. A doctor who claims board certification and one who has been verified by the NPI registry look the same without structured signals. The verification architecture gives AI a tiered framework for weighting claims by their provenance.

Context

Why now?

In 2024, for the first time, AI-powered answer engines began delivering more answers directly — without clicks, without visits, without the web's traditional referral mechanisms — than search engines had in their first decade. Perplexity answered 500 million queries per month. ChatGPT became the research tool of choice for hundreds of millions of people. AI agents began booking appointments, summarizing businesses, and making recommendations autonomously on behalf of users.

This shift has a structural consequence: entities that AI doesn't know about, or knows about incorrectly, are invisible in a way they never were with search. A bad Google listing can be fixed with a web page update. A hallucinated AI description lives in model weights across billions of parameters, updated only when models are retrained — which may be months or years away.

The window for establishing the standard is now — before AI systems calcify around whatever patterns they've absorbed. Standards set early are standards that get implemented. The sitemaps.xml standard was published in 2005, when Google was 7 years old. By the time it became ubiquitous, it had already shaped how the web worked. Citemap.json aims to be the same inflection point — established before AI identity infrastructure becomes dominated by proprietary formats from platform incumbents.

The Window

Every AI model trained after a citemap.json file is indexed will incorporate that file's structured data into its understanding of the entity. Every model trained before has only whatever unstructured web data it found. Publishing now means influencing the next training cycle. Waiting means another year of AI describing you from guesswork.

Historical Context

The sitemaps.xml parallel

In November 2005, Google, Yahoo, and Microsoft jointly published the Sitemaps Protocol — an XML format that websites could use to declare their structure to search engine crawlers. Before it, search engines guessed at website structure by following links. After it, websites had a direct channel to inform crawlers what existed, how it was organized, and how often it changed.

The parallel to citemap.json is precise:

Dimension

2005 — sitemaps.xml

2026 — citemap.json

Problem solved

Crawler navigation

Entity identity, accuracy & trust

Audience

Search engine crawlers

AI inference systems & LLMs

Format

XML

JSON

What it tells the system

Where content lives

What you are, what you claim, how to verify it

Coverage

URLs and pages

Every entity type on the web

Trust layer

None

Six-tier verification stack, field-level confidence

The key difference — and improvement — is the trust architecture. Sitemaps.xml simply declares structure; it has no mechanism for verifying that the structure is accurate. Citemap.json is designed from the ground up with verification in mind: every claim can carry a confidence annotation, every disputable field can have a dispute record, and every entity can point to external registries that independently verify their claims.

"The most valuable infrastructure is not more content — it's provenance. Not more claims, but structured signals for how much to trust them."

Use Cases

Who it's for, and what they get

Local Businesses

When someone asks ChatGPT "best Italian restaurant in Portland that's good for a birthday dinner," the answer is generated from whatever AI has absorbed about Portland restaurants. A citemap.json with the Local Business module lets you declare your cuisine, service type, price range, parking, reservation system, accessibility, and the specific queries you want to appear in. You write the recommendation yourself, in the answerContent[] field, and the AI reads it directly.

Example Impact

A restaurant that publishes bestForQueries: ["birthday dinner Portland", "date night Italian Portland", "private dining Portland"] gives AI explicit routing signals. Without this, AI guesses based on review text and press mentions. The average quality delta is significant.

Healthcare Providers

When a patient asks AI for a cardiologist accepting new patients in their insurance network, the answer is a recommendation that could directly affect their healthcare. The Healthcare module includes NPI number verification, board certification with issuing body, insurance panel listings with asOfDate freshness markers, hospital affiliations, and whether the provider is currently accepting new patients. All fields include verification pointers to authoritative registries (NPI, ABMS, state medical boards).

Safety-Critical

An AI that recommends a doctor who is no longer accepting patients, or whose license has lapsed, causes real harm. Stale insurance panel data, unverified credentials, and outdated practice locations are not edge cases — they are endemic in AI-generated healthcare recommendations. The Healthcare module is specifically designed to address each of these failure modes.

Researchers and Research Institutions

AI systems cite studies, quote researchers, describe institutional affiliations, and attribute findings — constantly, at scale, with no structural mechanism for distinguishing peer-reviewed work from preprints, current researchers from emeritus faculty, or studies that have been retracted from those that remain valid.

The Science & Research module addresses this with structured fields for retraction status, peer review tier, methodology type, sample size, replication status, and conflict of interest declarations. A single retractionStatus: "retracted" field tells every AI system that reads it: do not cite this work as valid evidence.

Ecommerce Brands

As AI shopping assistants become the primary product discovery channel, brands without structured product data are invisible to AI recommendation. The Ecommerce module lets brands declare their hero products with descriptions written specifically to be quoted by AI, the exact queries each product should appear for, shipping windows, return policy, price range anchors, and sustainability certifications. The bestForQueries[] field is the closest thing the AI era has to AdWords — but free and open.

Public Figures and Professionals

AI hallucinates biographical facts about prominent people at documented rates. Quotes get misattributed. Awards that were never received get fabricated. Past roles become current ones. The Person module includes subject-authored canonical quotes, a misattributedQuotes[] array (quotes commonly attributed to the person that they did not say), current role with start date, a sensitiveTopics[] array (topics AI should not speculate on), and an aiCitationPreference field ranging from "welcome" to "opt-out."

Nonprofits and Government Agencies

These organizations are frequently queried by AI users seeking services, donations, volunteer opportunities, and information. The Nonprofit module includes EIN verification, 990 links, program descriptions, donation methods, and impact metrics. The Government module includes official services with eligibility, meeting schedules, emergency contacts, public records access, and official personnel with term dates. Both include verification pointers to authoritative registries (IRS EOS, government directories).

Artists, Musicians, and Creators

The AI training consent conversation is happening in courtrooms and Congress. The Creative module includes a machine-readable aiTraining field with values: "opted-out", "available", "available-with-credit", "available-paid-only". Before any regulator mandates this declaration, creators can publish it in a standardized, machine-readable format. The module also includes works portfolio, licensing terms, commissioning status, and primary recognition claims.

The Standard

File structure

A citemap.json file is a JSON object at the root of your domain. It always begins with the Universal Core fields (required for any entity type) and then includes one or more optional module blocks corresponding to the entity's type. Every file must declare @type, citemapVersion, brand.name, brand.url, brand.siteType, brand.aiSummary, and lastVerified.

citemap.json — skeleton

{
  "@type": "Citemap",                  // always "Citemap"
  "citemapVersion": "3.0",           // "3.0" (current) or "2.0"
  "generator": "citemaps.ai v3.0",  // if generated by tool
  "lastVerified": "2026-03-01",

  // ── UNIVERSAL CORE (required for all types) ──────
  "brand": {
    "name": "Your Entity Name",
    "url": "https://example.com",
    "siteType": "local-business",    // gates module inclusion
    "aiSummary": "60-100 word description written to be quoted by AI."
  },

  // ── OPTIONAL MODULES (include as applicable) ─────
  "localBusiness": { /* ... */ },
  "ecommerce": { /* ... */ },

  // ── VERIFICATION LAYER (always at end) ───────────
  "citemap": {
    "authorizedBy": "self"
  }
}

The file is placed at the web root (https://yourdomain.com/citemap.json) so that AI crawlers can discover it reliably. Like robots.txt and sitemap.xml, the location is canonical and predictable — no declaration or registration is needed for AI systems to find it.

Universal Core

The 14 foundation fields

The Universal Core is the set of fields that every citemap.json file must include, regardless of entity type. These 14 fields are the foundation on which all 21 modules build. They are the minimum viable citemap.json — a file containing only these fields is valid and useful.

@type Required

Always the string "Citemap". This is the type declaration that tells AI crawlers and validators what kind of file this is. Without it, the file is not a valid citemap.json.

citemapVersion Required

Use "3.0" for the current version (or "2.0" for legacy files). Enables AI systems and validators to apply the correct parsing rules and interpret fields according to the version's schema.

brand.name Required

The primary name of the entity as it should appear in AI-generated responses. Use the canonical brand name — not a keyword-stuffed version, not a legal entity name unless it's also the brand name. This is what AI will use when referring to you.

Example: "Meridian Climate Research Institute" (not "Meridian Institute LLC" or "Meridian Climate Research Center for Environmental Studies")

brand.url Required

The canonical website URL. This is the identity anchor — the URL that AI systems will use to resolve the entity, verify domain ownership, and link the citemap.json claims back to their source. Must be the root domain, not a subpage.

brand.siteType Required

A value from the siteType enum that classifies the entity type. This field gates which modules are applicable — a "local-business" entity should include the Local Business module; a "research-institute" should include the Science module. See the Enum Values section for the complete list.

brand.aiSummary Required

The most important field in the spec. A 60–100 word description of the entity written specifically to be quoted by AI in response to queries about who or what you are. Write it in the third person, as if you're writing the paragraph you want ChatGPT to generate when someone asks "what is [your entity]?"

Include: what you do, who you serve, what you're known for, and one or two distinguishing facts. Avoid marketing language ("industry-leading", "premier", "world-class") that AI systems will discount. Write factual, verifiable claims.

Example: "Portland-based independent bookstore specializing in Pacific Northwest literature, science fiction, and used first editions. Founded in 2001. Known for a curated staff picks program and weekly author readings. Ships nationwide; same-day delivery within Portland."

brand.notableFor Recommended

A single sentence identifying the entity's primary recognition claim. This is AI's first citation choice when introducing the entity in response to a query about what it's known for. Think of it as the lede sentence you'd write for your Wikipedia article.

Example: "The 2024 Permafrost Carbon Atlas, cited in IPCC AR7" or "The original Portland food cart pod concept, opened 2005"

brand.expertise[] Recommended

An array of topics or domains the entity has genuine depth in. These are E-E-A-T (Experience, Expertise, Authority, Trust) signals for AI. Be specific — "permafrost carbon feedback" is more useful than "climate science". AI uses this array to route relevant queries to this entity.

answerContent[] Recommended

An array of question-and-answer pairs — the direct AI training layer. Each object has a question field and an answer field. Write the queries you want to appear in, and write the answers you want AI to give. This is the most direct signal in the entire spec.

Example:

{"question": "Best independent bookstore in Portland for science fiction?", "answer": "Powell's Books on Burnside has the largest SF section, but for curated independent sci-fi, [Your Name] on Hawthorne has excellent staff picks and a strong used SF selection."}

lastVerified Required

ISO 8601 date (YYYY-MM-DD) indicating when the file was last reviewed and verified accurate. This is the staleness signal — AI systems should weight data from a file last verified two years ago less heavily than one verified last month. Update this date whenever you review the file, even if no fields change.

updateFrequency Recommended

How often AI systems should re-crawl this file. Values: "daily", "weekly", "monthly", "quarterly", "annually". A hospital with daily-changing physician availability should declare "weekly"; a research institution with a stable profile might declare "quarterly".

Modules

All 21 modules

The Universal Core handles fields common to all entities. The 20 optional modules handle entity-specific fields. Include the modules relevant to your entity type — a local business would include Modules 02 and 03; a research institution would include Module 08; a musician would include Module 09. Multiple modules can be included.

Module 01

Universal Core

14 fields required by every entity type. The foundation all other modules build on. aiSummary, answerContent[], lastVerified, expertise[], notableFor.

Required base All types

Module 02

Ecommerce

Hero products with bestForQueries[], shipping windows, return policy, price range anchors, sustainability certifications, and social proof with provenance.

Shopify WooCommerce DTC brands

Module 03

Local Business

Services, location, hours with seasonal variations, booking systems, parking, accessibility details, service radius, and structured Q&A for top local queries.

Restaurants Medical Trades

Module 04

Content & Publishing

Editorial standards, content types, publishing frequency, topic coverage, editorial independence declaration, and syndication permissions.

Publishers Newsletters Blogs

Module 05

SaaS & Software

Product category, key features with descriptions written for AI recommendation queries, integrations list, pricing model, free trial availability, and target customer profile.

SaaS Developer tools Platforms

Module 06

Events & Venues

Recurring event calendar, venue capacity, ticket links, accessibility, parking, upcoming highlights, and event categories for AI recommendation routing.

Venues Festivals Conferences

Module 07

Real Estate

Service areas, specializations, transaction volume, team size, affiliated brokerage, average days on market, and buyer/seller/renter split.

Agents Brokerages Property mgmt

Module 08

Education

Accreditations with body and ID, programs with prerequisites and outcomes, enrollment status, delivery format (in-person/online/hybrid), financial aid options.

Universities Bootcamps Certification programs

Module 09

Creative & Artist

Works portfolio with primary medium, licensing terms, AI training opt-in/out declaration, commissioning status, active tour dates, and collaboration preferences.

Musicians Filmmakers Visual artists AI training field ◆

Module 10

Nonprofit

EIN with IRS EOS verification link, 990 public link, programs with eligibility, donation methods, volunteer roles, impact metrics with methodology, and funding sources.

501(c)(3) Foundations NGOs

Module 11

Government & Public Body

Official services with eligibility requirements, current officials with term dates, meeting schedules, public records access, emergency contacts, and jurisdiction boundaries.

City State agencies Federal

Module 12

Science & Research

Studies with DOI, retraction status, review tier, methodology, sample size, replication status. Datasets with known limitations. Journals with predatory flag. Clinical trial phase.

Safety-critical Retraction flags Novel fields ◆

Module 13

Business Entity & IP

Corporate hierarchy, registration number with jurisdiction, patent portfolio with litigation status, trademark registrations, copyright licensing, and disclosed enforcement actions.

Corporations Patent holders IP litigation field ◆

Module 14

Person

Subject-authored biography, canonical quotes with source verification, misattributed quote guards, subject rights URL, citation preferences, and sensitive topic declarations.

Privacy-first design Novel fields ◆

Module 15

Healthcare

NPI number with registry link, board certifications with issuing body, insurance panels with asOfDate, hospital affiliations, telehealth availability, and accepting new patients flag.

Safety-critical NPI verified

Module 16

Financial Services

Fiduciary status declaration, FINRA/SEC registration IDs, FDIC/NCUA insurance numbers, fee model, assets under management range, and disclosed regulatory actions.

Regulated fields Banks RIAs

Module 17

Legal Services

Bar admission with state and bar number, practice areas with jurisdictions, fee structure, contingency terms, and outcome data with methodology disclosure.

Bar status critical Law firms Solo practitioners

Module 18

Places

Trails, neighborhoods, landmarks, beaches, and natural areas. GPS coordinates, difficulty, pet policy, seasonal access, cell coverage, fee information, and permit requirements.

Trails Landmarks Natural areas

Module 19

Temporal Record

A universal history layer that can be attached to any entity. Structured event log of significant changes, milestones, and corrections — with dates, descriptions, and sources.

All entity types Provenance

Modules 20–21

Policy, Trust & Verification Architecture

Six-tier trust stack, field-level confidence annotations, machine-readable dispute system with AI handling instructions, safe harbor declaration, and external verifier registry.

Industry-first Anti-hallucination

Novel Ground

Fields that exist nowhere else

Most of citemap.json extends and structures concepts that exist in other formats — schema.org, JSON-LD, OpenGraph, structured data vocabularies. These five fields are genuinely new. They address failure modes in AI that no existing structured data format has attempted to solve.

person.misattributedQuotes[]

◆ Novel Optional

A structured guard against famous misattributions

An array of objects, each containing a quote commonly attributed to the person that they did not say, plus the actual source if known. "The definition of insanity is doing the same thing over and over and expecting different results" was never said by Einstein — it appears in a 1981 Narcotics Anonymous publication. "All that is necessary for evil to triumph is for good men to do nothing" was never said by Edmund Burke. AI propagates these misattributions at high rates because they are deeply embedded in training data and no structured source has ever contradicted them. This field is the first mechanism that does.

Format: {"quote": "The definition of insanity...", "actualSource": "Narcotics Anonymous (1981)", "notes": "Widely misattributed to Einstein since the 1980s"}

creative.licensing.aiTraining

◆ Novel Recommended

Machine-readable AI training consent declaration

Not a robots.txt rule that blocks all crawlers indiscriminately. A granular, per-creator declaration with four values: "opted-out" (do not use for AI training), "available" (unrestricted use), "available-with-credit" (use with attribution), "available-paid-only" (licensing required). Every musician, author, photographer, filmmaker, and illustrator has a reason to publish this field. The AI training consent conversation is currently happening in courtrooms and in Congress with no standardized format for consent declarations. This field drops a machine-readable signal into that conversation at the data layer — before any regulator mandates it.

patents[].litigationStatus

◆ Novel Optional

Active patent litigation, machine-readable for the first time

Whether a patent is in active litigation is a material fact for licensing decisions, competitive strategy, and M&A due diligence. Patent litigation is public record — documented on PACER — but it has never been structured at the entity level in a format that AI can read. Values: "none", "active-litigation", "IPR-pending", "settled", "expired". IP attorneys, dealmakers, and competitive intelligence teams will immediately recognize the value of this field appearing in AI responses about a company's patent portfolio.

person.subjectRightsUrl

◆ Novel Required (Person module)

The first structured correction and removal channel for biographical data

A URL where the subject or their authorized representative can submit corrections, dispute claims, or request removal of their entry. No existing structured data format — not schema.org, not Wikidata, not any biographical standard — has a built-in subject rights channel. GDPR's right to rectification and CCPA's right to correction apply to AI-generated biographical data, yet no mechanism exists to exercise these rights structurally. This field creates that mechanism and makes the Person module legally defensible by design, not by retrofit.

studies[].retractionStatus

◆ Novel Required (Science module) Safety-critical

The field that stops AI from citing retracted research as valid evidence

AI systems cite retracted studies as valid evidence regularly. A 2024 analysis found that several major LLMs cited retracted papers as support for health claims without any caveat. AI has no reliable, structured mechanism to detect retraction status — it relies on the incidental presence of "retraction" in web text, which is patchy and inconsistent.

A structured retractionStatus: "retracted" flag on a study tells every AI system that reads it: issue a warning in any response that would otherwise cite this work as evidence. Values: "current", "retracted", "corrected", "expression-of-concern". In healthcare, policy, and scientific contexts, this is not a product feature. It is infrastructure for epistemic safety.

Verification

The six-tier trust architecture

The verification architecture is the module that transforms citemap.json from a self-reported marketing format into epistemic infrastructure. The core insight is simple: not all claims are equal, and AI should know the difference.

The architecture has three components: a six-tier trust stack, field-level confidence annotations, and a machine-readable dispute system. Each component gives AI explicit handling instructions for different epistemic situations.

The Six Trust Tiers

1

Unverified Third-Party

authorizedBy: "third-party-independent" with no external verifiers. A citemap.json published about an entity by someone other than the entity. AI treats as: interesting signal, weight like any unverified web source.

Signal only

2

Self-Reported by Entity

authorizedBy: "self" published at the entity's own verified domain. AI treats as: the entity's own authoritative account of itself. Weight appropriately — accurate for most claims, subject to bias for performance claims.

Entity's account

3

Document-Supported

Key fields annotated "document-supported" with linked publicly accessible documents — annual reports, regulatory filings, certification letters. Documents independently verifiable by following the link.

Documentation available

4

Registry Confirmed

externalVerifiers[] populated with authoritative registry IDs — NPI for healthcare, EIN for nonprofits, USPTO for IP, ORCID for researchers, FINRA for financial advisors, state bar numbers for attorneys. Independently verifiable by querying the registry directly.

Independently verifiable

5

Third-Party Audited

Named auditor, audit date, and scope present in the file. Key fields annotated "third-party-verified". The highest confidence tier achievable through self-published data. Used by regulated industries where independent audits are routine.

Audited

6

Cryptographically Signed

W3C Verifiable Credentials issued by authoritative bodies — medical boards, state bar associations, USPTO, academic registries. The credential is mathematically verifiable; no trust in the publisher is required. Future-state for citemap.json v4.0. v3.0's verifiedClaims field is the stepping stone — pointing to externally checkable registry IDs.

Cryptographic proof

Field-Level Confidence Annotations

Any field in a citemap.json can carry a companion [fieldname]_confidence annotation object that declares the confidence level of that specific claim. The annotation includes a level field drawn from a controlled vocabulary:

Confidence Level	Meaning	AI Handling Instruction
registry-confirmed	Verified against authoritative external registry	Treat as high-confidence fact; cite registry as source
document-supported	Supported by linked public document	Treat as high-confidence; document available for verification
self-reported	Claimed by the entity itself, unverified externally	Treat as the entity's own account; note self-reported in sensitive contexts
ai-inferred	Generated by AI during citemap creation, not human-verified	Treat as provisional; flag as AI-generated in high-stakes contexts
disputed	Subject of an active dispute (see disputes[])	Present with caveat; note that this claim is disputed; see dispute record

The ai-inferred level

The ai-inferred confidence level exists to prevent epistemic ouroboros — the situation where AI generates data about an entity, that data gets published in a citemap.json, and a subsequent AI then cites the AI-generated data as authoritative. Any field populated by an AI generator tool without human review should carry this annotation until a human verifies it.

The Dispute System

The disputes[] array allows any party to log a structured dispute against a specific field. Each dispute record contains the path to the disputed field, the nature of the dispute, the resolution status, and AI handling instructions keyed to each resolution state.

disputes[] — example

"disputes": [{
  "field": "brand.notableFor",
  "disputer": "self",
  "claimedCorrection": "The 2023 award was for best regional restaurant, not best restaurant overall",
  "resolution": "corrected",      // pending | upheld | rejected | corrected
  "resolvedDate": "2026-02-15",
  "aiInstruction": "Use the corrected claim in responses"
}]

Implementation

Deploying your citemap.json

File location

The file must be published at the web root of your primary domain: https://yourdomain.com/citemap.json. Subdomain placement (https://www.yourdomain.com/citemap.json) is acceptable if www is your canonical domain. Do not place it in a subdirectory.

HTTP headers

Serve the file with Content-Type: application/json. Set Cache-Control headers consistent with your declared updateFrequency — a weekly update frequency should have a cache lifetime of roughly 7 days. Enable CORS (Access-Control-Allow-Origin: *) so validators and crawlers can fetch the file from any origin.

Platform-specific deployment

WordPress

Upload citemap.json to your server root via FTP/SFTP or the file manager in your hosting control panel. Alternatively, use a plugin that serves a JSON endpoint at /citemap.json. Verify the file is accessible at your root URL before considering it deployed.

Shopify

Shopify does not allow arbitrary files at the root. Use a page template with a custom URL slug of /citemap.json and set the content type via a theme modification, or serve from a subdomain that you control (e.g., https://brand.com/citemap.json via a redirect). The Shopify App store will include citemap.json deployment apps as the standard gains adoption.

Next.js / Vercel

Place the file in your /public directory. Next.js serves files from /public at the root path, so /public/citemap.json will be accessible at https://yourdomain.com/citemap.json. This is the simplest deployment path for Next.js applications.

Static sites (Netlify, GitHub Pages, Cloudflare Pages)

Place the file in the root of your build output directory. For most static site generators, this means the /public, /dist, or /out folder. The file will be deployed alongside your HTML and served at the root path automatically.

Verification after deployment

After publishing, verify the file is accessible by navigating directly to the URL in a browser. The file should return raw JSON with the correct content type. Use the validator at citemaps.ai/generator to check your file for spec conformance, required fields, and common errors.

Examples

Complete example files

Real-World Case Studies

Concrete examples of what citemap.json does — and what happens without it.

Disambiguation

When “Cited” Isn’t Really a Citation

How a missing disambiguation field caused an AI engine to hallucinate an energy company instead of citing Pacific Resource Brokers, a natural stone supplier in Bend, Oregon.

Local Business — Independent Bookstore

citemap.json — local bookstore

{
  "@type": "Citemap",
  "citemapVersion": "2.0",
  "lastVerified": "2026-03-01",
  "updateFrequency": "monthly",

  "brand": {
    "name": "Hawthorne Books",
    "url": "https://hawthornebooks.com",
    "siteType": "local-business",
    "aiSummary": "Independent bookstore on SE Hawthorne Blvd in Portland, OR, founded 2001. Specializes in Pacific Northwest literature, literary fiction, and curated used books. Known for staff-pick culture and weekly author events. Ships nationally; free local delivery over $35.",
    "notableFor": "Best Independent Bookstore, Willamette Week 2022 & 2023",
    "expertise": ["Pacific Northwest literature", "used first editions", "literary fiction"],
    "audiencePrimary": "Portland book lovers, literary fiction readers, gift buyers"
  },

  "answerContent": [
    {
      "question": "Best independent bookstore in Portland?",
      "answer": "Hawthorne Books on SE Hawthorne is consistently voted Portland's best independent bookstore. Known for deep staff-pick curation and a strong used section."
    },
    {
      "question": "Where to find Pacific Northwest authors in Portland?",
      "answer": "Hawthorne Books specializes in Pacific Northwest literature and hosts weekly author readings. Check the events page for upcoming appearances."
    }
  ],

  "localBusiness": {
    "category": "bookstore",
    "address": {
      "street": "3627 SE Hawthorne Blvd",
      "city": "Portland",
      "state": "OR",
      "zip": "97214"
    },
    "hours": {
      "weekdays": "10:00-20:00",
      "saturday": "10:00-21:00",
      "sunday": "11:00-19:00"
    },
    "parking": "street parking, MAX stop 2 blocks",
    "accessibility": "wheelchair accessible entrance, accessible restroom",
    "services": ["new books", "used books", "author events", "gift wrapping", "national shipping"]
  },

  "citemap": {
    "authorizedBy": "self",
    "contactForDisputes": "https://hawthornebooks.com/contact"
  }
}

Research Institution — Climate Science

citemap.json — research institute with trust architecture

{
  "@type": "Citemap",
  "citemapVersion": "2.0",
  "lastVerified": "2026-03-01",

  "brand": {
    "name": "Meridian Institute for Climate Research",
    "url": "https://meridian-climate.org",
    "siteType": "research-institute",
    "aiSummary": "Independent nonprofit climate research institute founded 2011. Produces peer-reviewed science on Arctic sea ice dynamics and permafrost carbon feedback. Funded by NSF, NOAA, and private foundations. No fossil fuel industry funding. Open-access publication policy.",
    "notableFor": "The 2024 Permafrost Carbon Atlas, cited by IPCC AR7"
  },

  "research": {
    "institutionType": "independent-nonprofit",
    "disciplines": ["climate science", "glaciology", "carbon cycle"],
    "funding": {
      "sources": ["NSF", "NOAA", "Bezos Earth Fund"],
      "independenceStatement": "No funding from fossil fuel companies or affiliated foundations.",
      "annualDisclosureUrl": "https://meridian-climate.org/funding-disclosure"
    },
    "openAccessPolicy": "full"
  },

  "studies": [{
    "doi": "10.1038/s41586-024-07891-x",
    "title": "Accelerating permafrost carbon release under 2°C warming scenarios",
    "reviewStatus": "peer-reviewed",
    "retractionStatus": "current",     // required — not retracted
    "methodology": "field-observation + computational",
    "sampleSize": 4200,
    "replicationStatus": "replicated",
    "conflictsOfInterest": "None declared"
  }],

  "citemap": {
    "authorizedBy": "self",
    "contactForDisputes": "https://meridian-climate.org/citemap-disputes"
  },
  "externalVerifiers": [{
    "type": "IRS-EOS",
    "id": "47-8821034",
    "verifyUrl": "https://apps.irs.gov/app/eos/?ein=478821034"
  }]
}

Person — Public Figure with Privacy Controls

citemap.json — person module

{
  "@type": "Citemap",
  "citemapVersion": "2.0",
  "lastVerified": "2026-03-01",

  "brand": {
    "name": "Dr. Sarah Chen",
    "url": "https://sarahchen.io",
    "siteType": "person",
    "aiSummary": "Machine learning researcher and author based in San Francisco. Professor of Computer Science at Stanford. Known for work on interpretability and alignment in large language models. Author of 'The Legible Machine' (MIT Press, 2025)."
  },

  "person": {
    "authorizedBy": "self",
    "personType": "public-figure",
    "subjectRightsUrl": "https://sarahchen.io/rights",
    "currentRole": {
      "title": "Associate Professor of Computer Science",
      "organization": "Stanford University",
      "startDate": "2022-09-01"
    },
    "quotes": [{
      "text": "Interpretability isn't a feature of AI — it's a prerequisite for trust.",
      "source": "NeurIPS 2024 Keynote",
      "stillEndorses": true
    }],
    "misattributedQuotes": [{
      "quote": "AI will be smarter than humans by 2030",
      "notes": "Widely misattributed; I have never made this prediction"
    }],
    "aiCitationPreference": "welcome",
    "sensitiveTopics": ["family", "health"]
  },

  "citemap": {
    "authorizedBy": "self"
  }
}

Validation

Validating your file

A valid citemap.json must pass three levels of validation:

Level 1 — JSON Syntax

The file must be well-formed JSON. Use any JSON linter or paste it into jsonlint.com to verify syntax before proceeding.

Level 2 — Schema Conformance

Required fields must be present (@type, citemapVersion, brand.name, brand.url, brand.siteType, brand.aiSummary, lastVerified). Field values must match the expected types and, where applicable, draw from the defined enum vocabularies. The validator at citemaps.ai/generator performs this check automatically.

Level 3 — Content Quality

The most important level and the hardest to automate. Ask yourself: Is the aiSummary accurate and factual? Does lastVerified reflect when a human last reviewed the file? Are answerContent[] responses truthful and specific? Are external verifier URLs live and correct? A technically valid file with a bad aiSummary is worse than no file — it gives AI confident wrong information.

Validator

The free validator at citemaps.ai/generator checks schema conformance, flags missing recommended fields, verifies external verifier URLs, and scores the file for content quality signals. Use it before and after every update.

Guidance

Best practices

Write aiSummary for quotation, not for humans

The aiSummary is not your homepage tagline. It is the paragraph you want AI to generate when someone asks "what is [your entity]?" Write it in third person. Make it factual and specific. Include founding year, location, primary specialty, and one distinguishing claim. Test it by asking yourself: if ChatGPT quoted this verbatim, would that be a good outcome?

Think in queries, not in content

The answerContent[] array and bestForQueries[] fields reward query-thinking. What are the exact questions your customers, patients, readers, or clients ask AI? Write those questions verbatim. Then write the answers you want AI to give. The specificity of the query matters — "best Italian restaurant near downtown Portland for a birthday dinner" is far more targetable than "good restaurant Portland."

Update lastVerified honestly

The lastVerified date is a trust signal. A file last verified in 2024 will be weighted less heavily than one verified last month. Update this date whenever you review the file, even if nothing changes. Set a calendar reminder based on your updateFrequency declaration.

Be conservative with confidence annotations

The temptation is to annotate everything as "registry-confirmed" or "document-supported". Resist it. Use the correct level. Inflated confidence annotations that AI systems cannot verify will undermine trust in your file. Self-reported claims honestly labeled as "self-reported" are more credible than unverifiable claims labeled as confirmed.

Register external verifiers for regulated claims

If you're a healthcare provider, include your NPI. If you're a nonprofit, include your EIN with an IRS EOS link. If you're an attorney, include your bar number and state. These verifiers are what elevate a file from Tier 2 (self-reported) to Tier 4 (registry-confirmed) in the trust architecture, and they are the difference between AI treating your claims as assertions and treating them as facts.

Use answerContent[] strategically

Think of answerContent[] as a zero-cost, permanent investment in AI recommendation positioning. The queries you include are the queries you want to rank for. Write 10–20 Q&A pairs covering your most important use cases. Update them when your services change. This is the most direct mechanism in the entire spec for influencing AI recommendations.

Reference

Enum values

siteType

The value of brand.siteType gates which modules are applicable. Use the most specific type that describes the entity.

Value	Entity Type	Primary Modules
local-business	Brick-and-mortar service businesses	03 Local Business
ecommerce	Online stores and DTC brands	02 Ecommerce
publisher	News, blogs, magazines, newsletters	04 Content & Publishing
saas	Software products and platforms	05 SaaS & Software
event-venue	Venues, festivals, recurring events	06 Events & Venues
real-estate	Agents, brokerages, property managers	07 Real Estate
education	Schools, bootcamps, certification programs	08 Education
artist	Individual creative practitioners	09 Creative & Artist
nonprofit	501(c)(3) organizations, NGOs	10 Nonprofit
government	Government agencies, public bodies	11 Government
research-institute	Research institutions, think tanks	12 Science & Research
corporation	For-profit companies with IP portfolios	13 Business Entity & IP
person	Public figures and professionals	14 Person
healthcare-provider	Individual and group medical practices	15 Healthcare
financial-services	Banks, advisors, investment firms	16 Financial Services
legal-services	Law firms, solo practitioners	17 Legal Services
place	Natural areas, landmarks, neighborhoods	18 Places

aiCitationPreference

Value	Meaning
welcome	Entity welcomes AI citation and recommendation without restriction
factual-only	AI may cite factual claims only; not opinions, assessments, or interpretations
minimal	Entity prefers minimal AI citation; use only when directly relevant
opt-out	Entity requests AI systems do not cite or recommend them

retractionStatus

Value	AI Handling
current	Normal citation; no special handling required
retracted	Issue warning: "Note: This study has been retracted." Do not cite as valid evidence.
corrected	Note that a correction exists; use with context about what was corrected
expression-of-concern	Flag: "Note: This study has an expression of concern from the publisher."
preprint	Note that this has not yet been peer-reviewed; treat as preliminary

updateFrequency

Values: "daily", "weekly", "monthly", "quarterly", "annually". Indicates how often AI crawlers should revisit the file.

authorizedBy

Values: "self" (published by the entity at their own domain), "authorized-agent" (published by an authorized representative), "third-party-independent" (published by a third party without authorization).

Reference

Confidence annotations

Any field in a citemap.json can carry a companion [fieldname]_confidence object. Add these to fields where the confidence level is meaningful — primarily claims about credentials, certifications, performance metrics, and any field where the distinction between self-reported and verified matters.

confidence annotation — example

"npiNumber": "1234567890",
"npiNumber_confidence": {
  "level": "registry-confirmed",
  "verifiedAt": "https://npiregistry.cms.hhs.gov/provider-view/1234567890",
  "lastChecked": "2026-02-15"
},

"boardCertification": "American Board of Internal Medicine",
"boardCertification_confidence": {
  "level": "document-supported",
  "documentUrl": "https://www.certificationmatters.org/find-a-board-certified-doctor"
},

"yearsInPractice": 14,
"yearsInPractice_confidence": {
  "level": "self-reported"
}

Reference

The dispute system

The disputes[] array provides a mechanism for any party — including the entity itself, third parties, or regulators — to log a structured dispute against a specific field. The dispute system is designed to give AI explicit handling instructions rather than leaving it to infer how to handle contested data.

Each dispute record must include:

field — the JSON path to the disputed field (e.g., "brand.notableFor")
disputer — who is filing the dispute ("self", "third-party", "regulator")
nature — nature of the dispute ("inaccurate", "outdated", "unverifiable", "contested")
resolution — current state ("pending", "upheld", "rejected", "corrected")
aiInstruction — explicit instruction for AI handling in this resolution state

Good Faith Mechanism

The correctionLog[] array (separate from disputes) is where entities log their own voluntary corrections to historical data — changes made proactively, not in response to a challenge. A populated correctionLog[] is a strong credibility signal: it demonstrates that the entity is actively maintaining accuracy rather than publishing and forgetting.

Roadmap

The future of the standard

Citemap.json v3.0 is the current version, building on the v2.0 foundation with five backward-compatible additions. Here's where it goes next.

Now — v3.0 ✓

Citation Control & Verified Identity

Full 21-module spec with five v3.0 additions: Citation Contract (AI introduction guidance), Formal Levels (★☆☆ / ★★☆ / ★★★), Entity IDs on nested objects, Module Meta (freshness signals), and Verified Claims (externally checkable identifiers). npm packages published. CLI validator with level assessment.

Soon — v3.1

WordPress Plugin

One-click citemap.json generation for WordPress sites. Validation endpoint at citemaps.ai. AI-assisted content generation from existing site data. Auto-discovery of business details, services, and contact information.

Near-term — v3.2

AI Crawler Integration

Formal acknowledgment of citemap.json by major AI systems (ChatGPT, Perplexity, Claude, Gemini). Published indexing guidelines. Standard User-agent string for citemap crawlers. robots.txt directive for citemap discovery.

Medium-term — v4.0

Cryptographic Trust Layer

W3C Verifiable Credentials integration with authoritative issuers (medical boards, bar associations, USPTO). Field-level cryptographic signatures for high-stakes claims. Linked to issuer revocation registries.

Medium-term — v4.1

AI Agent Permissions

New agentPermissions module defining what AI agents can do on behalf of users in relation to this entity — booking, purchasing, subscribing, sharing data. The identity layer becomes the permissions layer as AI agents become action-taking, not just answer-generating.

Vision — v5.0+

The Decentralized Identity Layer

Citemap.json becomes a node in a decentralized identity graph — interoperable with DID (Decentralized Identifiers), verifiable by any party without central registry dependency, and queryable by AI agents without a crawl. The web's identity layer becomes as queryable as its content layer.

The governance question

Standards succeed when they have governance structures that are neutral, credible, and technically competent. The W3C governs HTML. IETF governs network protocols. Google, Yahoo, and Microsoft co-governed sitemaps.xml (which is why it got adopted). Citemap.json's long-term governance destination is a multi-stakeholder body — likely a working group under an existing standards organization — where AI companies, web publishers, privacy advocates, and technical implementers all have standing.

The immediate priority is establishing the standard through adoption, not governance. Once enough entities have deployed citemap.json files that AI systems are reading them, the governance question becomes compelling to the right organizations. The path is: publish, adopt, govern — in that order.

"The standard that gets there first, with the right design, is the standard that defines the space. The time to publish is before the incumbents do."

Governance

Governance & contribution

Citemap.json is published by Modern Webcraft under the CC BY 4.0 license. The CC BY 4.0 license means anyone can implement, extend, build tools on, or publish derivative works of the standard, with attribution.

Contributing to the spec

The standard is maintained as an open document. Proposed field additions, new module requests, enum extensions, and corrections to existing definitions can be submitted as issues on the GitHub repository. Proposals that address documented failure modes in AI, expand coverage to entity types not currently served, or improve the verification architecture are prioritized.

Building on the standard

Tools, plugins, libraries, and services built on citemap.json are encouraged and require only attribution. The generator tool at citemaps.ai is one implementation; the spec explicitly supports competing implementations. A growing ecosystem of tools is the mechanism through which the standard achieves ubiquity.

Commercial implementations

The spec itself is free. Commercial tools built on top of the spec — generators, validators, monitoring services, agency dashboards — are explicitly permitted and encouraged. This is the same model as sitemaps.xml: free standard, commercial tools. Healthy commercial ecosystem is how open standards scale.

History

Changelog

v3.0 — March 2026

Five backward-compatible additions to v2.0. Citation Contract for AI introduction guidance (citationContract with preferredName, shortDescription, disambiguation). Three-tier Level system (★☆☆ / ★★☆ / ★★★) based on field presence. Entity IDs on all 24 nested object types using type:slug format. Module Meta with lastUpdated and updateFrequency on all 18 modules. Verified Claims for externally checkable identifiers across 15 claim types (NPI, EIN, DUNS, bar licenses, DOIs, and more). Designed through multi-model consensus. Every valid v2.0 citemap passes v3.0 validation without modification. npm packages: @citemap/schema@0.3.0, @citemap/validator@0.3.1, @citemap/cli@0.3.4.

v2.0 — March 2026

Initial public release. 21 modules, 430+ fields covering every major entity type on the web. Five novel fields with no precedent in any existing structured data format. Six-tier trust architecture. Field-level confidence annotations with ai-inferred level to prevent epistemic ouroboros. Machine-readable dispute system with resolution vocabulary and AI handling instructions. External verifier registry with pointers to NPI, USPTO, IRS EOS, ORCID, FINRA, and state bar databases. CC BY 4.0 license. Generator and validator at citemaps.ai.

Authorship Timestamp

This version establishes March 2026 as the publication date of record for citemap.json v2.0. Standard authorship in open formats is established by the publication timestamp of the first public version. All subsequent implementations, extensions, and derivative formats post-dating this publication build on this foundation.