Google Warns Against Markdown Versions For AI SEO

Search Engine Journal and Digital Marketing Desk report that on a recent episode of Google's Search Off the Record podcast, engineers John Mueller and Martin Splitt pushed back on proposals to publish parallel, stripped-down Markdown versions of websites as an "AI SEO" optimisation. Search Engine Journal reports that Mueller and Splitt argued Markdown can remove structural signals, navigation, internal links, accessibility markup, that search systems use for discovery and ranking, and that the proposed LLMs.txt approach does not address discovery. Search Engine Journal quotes Splitt: "And I think that's also why people think it's good for LLMs, because you have less stuff, less tokens." Editorial analysis: For practitioners, the exchange highlights a tradeoff between token-efficient formats and preserving site structure that aids discovery, indexing, and accessibility.
What happened
Search Engine Journal and Digital Marketing Desk report that on a recent episode of Google's "Search Off the Record" podcast, Google engineers John Mueller and Martin Splitt pushed back on ideas circulating in AI-SEO circles to publish parallel, stripped-down Markdown versions of pages to make content more consumable for large language models. Search Engine Journal reports that Mueller and Splitt argued these Markdown-first proposals tend to remove non-visible or structural elements that search systems rely on for discovery and ranking. Search Engine Journal also reports that Mueller discussed the original intent of LLMs.txt and noted that the proposed standard, as described in public discussion, does not include a discovery mechanism.
Technical details
Search Engine Journal quotes Martin Splitt explaining the appeal of Markdown as token-efficient for LLMs: "And I think that's also why people think it's good for LLMs, because you have less stuff, less tokens," and adds that raw HTML can look like "cruft" when seen without browser rendering. Digital Marketing Desk and other coverage summarise Mueller's point that HTML parsing and extraction are well understood by crawlers and accessibility tools, and that conversion from HTML to clean text is a routine part of indexing. SEJ reporting highlights the specific structural elements at risk when publishers publish separate Markdown-only endpoints: navigation, internal linking, hierarchical cues, and accessibility markup.
Editorial analysis - technical context
Industry observers note that the tradeoff being debated is between two technical goals that are not identical: reducing token count for cost-efficient LLM consumption, and preserving the metadata and graph signals search systems use for discovery and contextual ranking. For practitioners building content pipelines, the removal of internal links and navigation hurts site graph visibility and complicates canonicalisation and crawl scheduling. For retrieval systems that rely on dense passage retrieval or embeddings, stripping structural metadata can also reduce the effectiveness of context-aware reranking and passage selection.
Context and significance
The discussion matters because it brings a mainstream search-authority perspective to emerging AI-SEO tactics. Public reporting frames the conversation as a corrective to proposals that treat LLMs as a reason to publish parallel text-only endpoints. While Markdown can simplify tokenisation and human-readable editing, industry experience shows that converting HTML to plain text is already automated at scale and that structural signals often improve discovery, accessibility, and the quality of indexed content. The coverage of LLMs.txt in Search Engine Journal also raises a practical point: standards that expose content to AI systems without handling discovery are incomplete from a search-architecture perspective.
Editorial analysis
For AI/ML practitioners, the episode reframes the AI-SEO tradeoff: token efficiency versus preserving structured signals that retrieval and indexing systems need. The guidance is directly applicable to anyone building RAG pipelines, content ingestion systems, or LLM-focused publishing strategies.
What to watch
Observers should follow whether publishers experiment with hybrid approaches that retain navigational and link structure while exposing cleaned content to downstream AI consumers, and whether tooling emerges that extracts enriched payloads (content plus structured metadata) for LLM consumption without creating separate, crawlable Markdown endpoints. Also monitor additional guidance or examples from Google engineers or public search documentation clarifying how discovery and structured signals interact with AI-focused content feeds.
Scoring Rationale
A practitioner-relevant clarification from Google engineers on why HTML remains the standard for search and AI discoverability, and why parallel Markdown endpoints for LLMs create structural signal loss. Useful guidance but scoped to a podcast discussion of best practices rather than a new policy or tool launch, placing it in the solid-but-not-notable tier.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems
