Skip to content

docs(agentic): generate llms.txt and Markdown twins for AI agents#1276

Open
codyjlandstrom wants to merge 1 commit into
mainfrom
docs/llms-txt-for-agents
Open

docs(agentic): generate llms.txt and Markdown twins for AI agents#1276
codyjlandstrom wants to merge 1 commit into
mainfrom
docs/llms-txt-for-agents

Conversation

@codyjlandstrom

Copy link
Copy Markdown
Contributor

What

Adds @signalwire/docusaurus-plugin-llms-txt so the docs build emits AI-agent-friendly Markdown alongside the normal HTML:

  • llms.txt — index of all current-version pages (llmstxt.org standard)
  • llms-full.txt — the full corpus in one file
  • a per-page Markdown twin — append .md to any docs URL (e.g. /docs/get-started/install-okteto-cli.md)

Why

Agents currently scrape rendered HTML (nav, scripts, styling) to read our docs. Serving clean Markdown gives them the content directly. This fits the ongoing agentic-docs direction.

Why this plugin

I spiked the plugin from the EkLine guide (docusaurus-plugin-llms) first. Because it reads raw .mdx source, it produced degraded output on our MDX-heavy docs:

Metric (≈155 current-version pages) EkLine plugin This PR (SignalWire)
Broken internal links (point to .mdx source) 119 0
Raw JSX leaked (<CodeBlock>, <Tabs>) 67 0
Unresolved ${variables.*} 11 0
Heading-anchor noise n/a 0

SignalWire post-processes the rendered HTML, so links resolve to URLs, variables.json substitutions are applied, and MDX components are flattened — none of which a source-reading generator handles.

Notes

  • No user-facing change. The plugin runs in postBuild and only emits extra files; the served HTML pages are untouched.
  • Scoped to the current version (includeVersionedDocs: false) to keep the corpus current and skip frozen snapshots.
  • A small inline rehype plugin strips Docusaurus heading-anchor links (.hash-link) that would otherwise render as noisy "Direct link to …" fragments in the Markdown.
  • Admonitions (:::tip) flatten to plain text in the Markdown twins only — content preserved, styling lost. Acceptable for agent consumption.

Test plan

  • yarn build succeeds; plugin reports 155 documents processed
  • yarn serve/docs/llms.txt, /docs/llms-full.txt, and <any-page>.md resolve
  • Verified served HTML still renders admonitions and heading anchors unchanged

🤖 Generated with Claude Code

Add @signalwire/docusaurus-plugin-llms-txt to emit an llms.txt index,
an llms-full.txt corpus, and a per-page Markdown twin (append .md to any
docs URL) so AI agents consume clean Markdown instead of scraping HTML.

The plugin post-processes rendered HTML, so internal links resolve to
URLs, variables.json substitutions are applied, and MDX components are
flattened — none of which a source-reading generator handles. Scoped to
the current version (includeVersionedDocs: false) to keep the corpus
current and skip frozen snapshots.

A small rehype plugin strips Docusaurus heading-anchor links (.hash-link)
that would otherwise render as noisy "Direct link to" fragments.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Cody Landstrom <cody@okteto.com>
@netlify

netlify Bot commented Jun 24, 2026

Copy link
Copy Markdown

Deploy Preview for okteto-docs ready!

Name Link
🔨 Latest commit 2e3aaf3
🔍 Latest deploy log https://app.netlify.com/projects/okteto-docs/deploys/6a3b73ce72ef5d0008558917
😎 Deploy Preview https://deploy-preview-1276--okteto-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant