Skip to content

DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP)#181

Open
JakeSCahill wants to merge 43 commits into
mainfrom
feature/mcp-email-auth
Open

DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP)#181
JakeSCahill wants to merge 43 commits into
mainfrom
feature/mcp-email-auth

Conversation

@JakeSCahill

@JakeSCahill JakeSCahill commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Jira: DOC-2262

Goal

Add authentication to the docs MCP server (docs.redpanda.com/mcp) so AI tools (ChatGPT, Claude, Cursor, VS Code) have users sign in with their Redpanda Cloud account, letting us capture verified work emails and attribute docs usage to organizations.

Architecture (decided with Cloud Identity)

The docs service runs its own OAuth 2.1 Authorization Server (AS), with the Cloud IdP (Auth0) as the upstream identity provider. AI tools register and authenticate against our AS; we federate the human login to Auth0 and issue our own tokens.

Why this shape:

  • ChatGPT only supports spec OAuth (no static tokens), so OAuth is required.
  • The Cloud Auth0 tenant has DCR and CIMD disabled (tested — see comments), so AI tools can't register with it directly. Brokering means the Cloud IdP only ever sees one client (ours), and client self-registration happens on our side where we control it.
  • We get the user's verified email/org from Auth0 to capture + attribute.

Division of responsibilities

  • Cloud / Identity: one Auth0 public client (client_id, Authorization Code + PKCE, no secret), our /callback redirect URIs allow-listed, ID token returns email/email_verified + org. One app covers MCP now and docs-site login later.
  • Us: the AS — /authorize, /callback, /token (+ refresh w/ rotation), client registration (DCR + CIMD), JWKS, consent/login UI; federate login to Auth0; issue/validate our own tokens. State in Netlify.

Future (phase 2)

The same Auth0 federation core also powers human login to the docs site — it just sets a browser session instead of issuing tokens. One Auth0 app, two consumers; MCP ships first.

Testing

2026-06-18_19-50-10.mp4

The deploy preview is wired to the integration Auth0 tenant, so you can test the full flow there.

1. Add the preview server to Claude Code:

claude mcp add --scope local --transport http redpanda-preview https://deploy-preview-181--redpanda-documentation.netlify.app/mcp

2. Authenticate and use it:

  • Run /mcp, select redpanda-preview, choose Authenticate
  • A browser opens → "Continue with Redpanda Cloud" → sign in on the integration tenant → pick an org at the prompt → it connects
  • Ask something that hits a tool, e.g. "Using redpanda-preview, list the Redpanda API reference pages."
  • Clean up with claude mcp remove redpanda-preview

Notes:

  • You need an account on the integration tenant (integration-cloudv2.us.auth0.com), not prod.
  • Unit tests: npm run test:mcp plus the tests/mcp-oauth-*.test.ts suites.

Kapa now has your conversation along with your email and org

2026-06-18_20-09-48

History

Explored a pure OAuth resource server pointing at the Cloud IdP (blocked: DCR/CIMD disabled on that tenant), landing on the AS-broker design above after the call with Cloud.

@netlify

netlify Bot commented Jun 15, 2026

Copy link
Copy Markdown

Deploy Preview for redpanda-documentation ready!

Name Link
🔨 Latest commit 82cac4f
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-documentation/deploys/6a34469c9b51050008a22bcf
😎 Deploy Preview https://deploy-preview-181--redpanda-documentation.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 87 (🟢 up 3 from production)
Accessibility: 92 (🔴 down 2 from production)
Best Practices: 92 (no change from production)
SEO: 83 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Add a lightweight email->token authentication gate to the docs MCP server
to capture users' work email addresses for lead capture and usage attribution.

- New /mcp/register endpoint: users submit a work email and the bearer token
  is delivered ONLY by email (never in the HTTP response), so possession of a
  working token proves the address is real and owned.
- Mandatory 4-layer validation: format, work-domain filter (reject free/
  disposable providers), MX-record check, email delivery.
- Tokens stored hashed in Netlify Blobs; auth middleware in mcp.mjs threads the
  authenticated email/domain to Kapa via _meta.user for attribution.
- Bearer header and ?token= query fallback (for clients that can't set headers).
- Gated behind REQUIRE_AUTH (grace period -> enforce); per-token rate limiting.
- Captured emails -> Netlify Blobs + logs + optional CRM_WEBHOOK_URL forward.
- Docs: registration + per-client setup + privacy/consent note; server-card
  and server.json advertise the token requirement.
- 17 unit tests (tests/mcp-auth.test.ts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill JakeSCahill force-pushed the feature/mcp-email-auth branch from 9c6f267 to 3a89e42 Compare June 15, 2026 10:56
@JakeSCahill JakeSCahill changed the title Add email-capture auth to the docs MCP server DOC-2262: Add email-capture auth to the docs MCP server Jun 15, 2026
github-actions Bot and others added 5 commits June 15, 2026 11:06
Netlify Functions don't reliably set NODE_ENV=production at runtime, so the
previous NODE_ENV-based dev bypass could fire in deployed environments —
silently logging tokens instead of emailing them and not failing when
RESEND_API_KEY is missing. Gate the bypass on NETLIFY_DEV (set only by
`netlify dev`/`functions:serve`) so any deployed env without a key errors loudly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the custom email->token gate with a standard MCP OAuth 2.1 resource
server delegating to the Redpanda Cloud IdP (auth.prd.cloud.redpanda.com). This
is required so ChatGPT can authenticate (ChatGPT only supports spec OAuth, not
static tokens), while still capturing users' verified work emails.

Verified the Cloud IdP supports everything needed (open Dynamic Client
Registration, CIMD, PKCE S256, public clients, email scope, userinfo).

- /.well-known/oauth-protected-resource (RFC 9728) edge function advertises the
  Cloud IdP as the authorization server; clients self-register via DCR/CIMD.
- mcp.mjs auth middleware validates the bearer token against the IdP /userinfo
  endpoint, extracts the verified email/org, captures it (Blobs + log + optional
  CRM_WEBHOOK_URL), and threads it to Kapa via _meta.user.
- Optional work-email enforcement (REQUIRE_WORK_EMAIL, default on) returns 403
  for personal providers; REQUIRE_AUTH keeps the grace->enforce rollout.
- Remove the email->token registration endpoint and email-sending module.
- Docs updated: clients prompt for Redpanda Cloud sign-in (no token to paste).
- Unit tests rewritten for the OAuth logic (16 tests).

Production hardening (needs identity team): register an Auth0 API for the MCP
resource so tokens are audience-bound JWTs, and add email as an access-token
claim. Until then we validate via /userinfo (no audience binding).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill JakeSCahill changed the title DOC-2262: Add email-capture auth to the docs MCP server DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP) Jun 15, 2026
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

⛔ Blocked: identity team — Dynamic Client Registration is disabled on the Cloud IdP

The OAuth resource server, discovery, and token validation are implemented and verified, but end-to-end auth cannot work yet because MCP clients can't register with the Redpanda Cloud IdP.

What works (verified against prd auth.prd.cloud.redpanda.com)

  • Protected-resource metadata (/.well-known/oauth-protected-resource) + authorization-server discovery
  • PKCE (S256), email / email_verified scopes
  • Token validation via /userinfo
  • Our server's 401WWW-Authenticate → metadata chain (confirmed on the preview)

The blocker

MCP clients (ChatGPT, Claude, Cursor, VS Code) have no pre-registered client_id — they self-register at runtime via Dynamic Client Registration. The Cloud IdP rejects this:

POST https://auth.prd.cloud.redpanda.com/oidc/register
→ 400 {"error":"Bad Request","message":"dynamic client registration is disabled"}

Confirmed with a valid registration body. Reproduced in Claude Code, which fails at exactly this step:

No client info found
SDK auth error: L7H
Error during auth completion: SDK auth failed

(Failure happens before the browser opens — i.e. at client registration, not at user login.)

Needed from the identity team (DOC-2262)

  1. Enable Dynamic Client Registration for public clients (authorization_code + PKCE, token_endpoint_auth_method: none, client-supplied loopback/https redirect URIs) — and/or confirm CIMD (Client ID Metadata Documents) is actually enabled. The discovery doc advertises client_id_metadata_document_supported: true, but DCR was advertised too and is disabled, so advertised ≠ enabled. CIMD is what ChatGPT prefers.
  2. (Hardening) Register an Auth0 API/resource with audience https://docs.redpanda.com/mcp and add email, email_verified, and org (org_id/org_name) as access-token claims → lets us validate audience-bound JWTs instead of /userinfo.
  3. Confirm whether to enable on prd or a staging tenant first.

Status

  • Code is complete and verified up to the registration step; keeping this PR in draft until the IdP is configured.
  • Tracking the identity-team request in DOC-2262.
  • Recommend leaving REQUIRE_AUTH in grace (unenforced) on deployed environments until DCR/CIMD is enabled, so the server isn't gated for clients that can't yet authenticate.

Comment thread data-platform/modules/ROOT/pages/how-to-use-these-docs.adoc Outdated
JakeSCahill and others added 8 commits June 15, 2026 14:20
Drop the unused OAUTH_ISSUER export from idp.mjs and de-export the
FREE_EMAIL_DOMAINS / DISPOSABLE_DOMAINS sets (used only internally).
No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The probe confirmed CIMD is not enabled on the Cloud IdP (a valid client
metadata document used as client_id still returns 'Unknown client').

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

Update — call with Cloud Identity (Santi): implementation direction decided

Architecture: We unblock MCP auth by having the docs service run its own OAuth 2.1 Authorization Server (AS), with the Cloud IdP (Auth0) as the upstream identity provider. This sidesteps the earlier blocker (the Cloud Auth0 tenant has both DCR and CIMD disabled — see prior comments): the AI tools register and authenticate against our AS, not the Cloud IdP directly. The Cloud IdP only ever sees one client — ours.

Division of responsibilities

  • Cloud / Identity (Santi): provide one Auth0 public client (client_id, Authorization Code + PKCE/S256, token_endpoint_auth_method=none, no secret), with our /callback redirect URIs allow-listed, and the ID token returning email, email_verified, and org (org_id/org_name). One app covers MCP now and docs-site login later.
  • Docs (us): implement the OAuth 2.1 AS — /authorize, /callback, /token (+ refresh with rotation), client registration (DCR + CIMD) for the AI tools, JWKS, and the consent/login UI; federate the human login to Auth0; issue our own tokens to MCP clients. State in Netlify DB (Neon Postgres).

Flow: AI client → our AS (/authorize, PKCE) → redirect to Auth0 login → our /callback (exchange code, read verified email/org) → we mint our own token → client calls /mcp with it → we validate + attribute usage to the user/org.

Open asks to Santi (in flight):

  • Confirm public client + exact /callback allow-listing + email/org claims.
  • Which tenant for dev vs prod (auth.prd.cloud.redpanda.com is prod) + a test user or two.

Future (phase 2): the same Auth0 federation core also powers human login to the docs site — it just sets a browser session instead of issuing tokens. One Auth0 app, two consumers; MCP ships first.

Next steps:

  1. Spike the AS slice (/authorize → Auth0 → /callback → issue + validate a JWT) to confirm Netlify DB + serverless fit and that the Auth0 client works end-to-end.
  2. Phased build: AS core → client registration (DCR/CIMD) → refresh + hardening → swap the resource-server to validate our own tokens → docs + rollout (REQUIRE_AUTH grace → enforce).
  3. Rescope PR DOC-2262: Add OAuth authentication to the docs MCP server (Redpanda Cloud IdP) #181 from "resource server pointing at the Cloud IdP" to "we run the AS, Auth0 upstream."

… (M1)

Replace the superseded resource-server-pointing-at-Cloud approach with the
agreed broker architecture: our service is the OAuth 2.1 Authorization Server,
federating the human login upstream to Auth0 and issuing/validating its own
tokens. Ports the validated spike to production shape.

Added (Milestone 1 — AS core):
- lib/oauth/keys.mjs  — jose RS256 sign/verify + JWKS; key from env
  (MCP_OAUTH_SIGNING_JWK) or dev-generated + persisted in Blobs (the spike
  proved an in-memory key breaks the flow)
- lib/oauth/store.mjs — auth requests + auth codes on Netlify Blobs (interface
  is the seam for a Netlify DB/Neon backend when relational queries are needed)
- lib/oauth/pkce.mjs, config.mjs, upstream.mjs (Auth0 + dev mock federation,
  id_token validated against Auth0 JWKS)
- mcp-oauth.mjs — AS endpoints: discovery (RFC 8414), JWKS, /authorize,
  /mcp/callback, /token (authorization_code + PKCE)

Changed:
- mcp.mjs resource server now validates OUR OWN access tokens (jose) instead of
  calling the upstream /userinfo
- protected-resource metadata + server card point authorization_servers at us
- removed lib/idp.mjs (superseded /userinfo validation)

Deferred (clearly marked): DCR/CIMD client registration (M2), refresh_token
grant + rotation (M3), consent UI, revocation. Neon backend is a documented
swap behind the store interface (needs Netlify DB provisioning). Auth0 mode
needs Santi's client_id; defaults to a dev mock until then.

Tests: 22 pass (PKCE incl. RFC 7636 vector; JWT issue/verify; JWKS leaks no
private key; wrong-audience/tampered rejected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

Milestone 1 landed — production AS scaffold (jose + storage), federating to Auth0

Pivoted the branch from the superseded resource-server-pointing-at-Cloud approach to the agreed broker architecture: our service is the OAuth 2.1 Authorization Server; it federates the human login upstream to Auth0 and issues/validates its own tokens. The validated spike is now ported to production shape.

In this milestone

  • lib/oauth/keys.mjsjose RS256 sign/verify + JWKS. Key from env (MCP_OAUTH_SIGNING_JWK) or dev-generated + persisted in Blobs (the spike proved an in-memory key breaks the flow).
  • lib/oauth/store.mjs — auth-requests + auth-codes on Netlify Blobs; the interface is the seam for a Netlify DB / Neon backend when we want relational queries (swap behind the interface; needs DB provisioning).
  • lib/oauth/{pkce,config,upstream}.mjs — PKCE (S256), config, and Auth0 federation (id_token validated against Auth0 JWKS) with a dev-mock fallback.
  • mcp-oauth.mjs — AS endpoints: discovery (RFC 8414), JWKS, /authorize, /mcp/callback, /token (authorization_code + PKCE).
  • mcp.mjs resource server now validates our own access tokens (jose) instead of calling the upstream /userinfo; protected-resource metadata + server card point authorization_servers at us; removed lib/idp.mjs.

Tests: 22 pass (PKCE incl. the RFC 7636 vector; JWT issue/verify; JWKS leaks no private key; wrong-audience/tampered rejected).

Deferred (clearly marked in code): DCR/CIMD client registration (M2), refresh-token grant + rotation (M3), consent UI, revocation.

Still gated on:

  • Santi: the Auth0 client_id (public client + /mcp/callback allow-listed + email/org claims). Until then upstream defaults to a dev mock; flip with REDPANDA_OAUTH_CLIENT_ID + MCP_OAUTH_UPSTREAM=auth0.
  • Netlify DB provisioning if/when we move auth state off Blobs.

The flow itself was already validated end-to-end on Netlify Functions in the spike branch (spike/mcp-oauth-as).

JakeSCahill and others added 3 commits June 16, 2026 17:18
The dev mock issues canned identities, so it must never be reachable by accident
in a deployed environment. Resolve the upstream mode fail-closed: mock is only
allowed under an explicit dev signal (NETLIFY_DEV or MCP_OAUTH_ALLOW_MOCK=true).
Anything that would otherwise silently fall back to mock (e.g. a prod deploy
missing REDPANDA_OAUTH_CLIENT_ID) resolves to null, and the AS returns 503 on
the flow endpoints instead of handing out mock tokens. Discovery + JWKS stay up.

- config.mjs: resolveUpstreamMode() (pure, tested) + UPSTREAM_MISCONFIGURED
- upstream.mjs: throw if neither auth0 nor mock is active
- mcp-oauth.mjs: 503 on /authorize, /callback, /token, mock-idp when misconfigured
- tests: 6 cases covering the resolution matrix (28 total pass)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Netlify statically analyzes config.path at bundle time, so it can't be an array
of imported constants (PATHS.*) — that failed bundling (and the PR preview
build) with 'path: Must be a string or array of strings'. Use literal paths.

Verified the full M1 flow live (functions:serve, mock upstream): authorize ->
mock-idp -> /mcp/callback -> /token -> AS-issued JWT, then /mcp accepts that
token (200) and rejects no-token / garbage (401). Confirms cross-function token
validation via the Blobs-shared signing key.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Netlify Blobs defaults to eventual consistency (deletes/updates propagate up to
60s). For one-time-use auth codes and refresh-token rotation/reuse-detection
that window would let a consumed code/token be replayed, so the auth store now
uses { consistency: 'strong' }. The dev signing-key store does too, so the
resource server reads the key the AS just wrote rather than regenerating.

Verified live (functions:serve): full flow issues a token, /mcp accepts it
(200), and replaying a consumed auth code is rejected (400).

Note: Blobs still has no atomic CAS, so a sub-second concurrent replay remains
theoretically possible — negligible at our volume; a relational DB is the only
full fix (documented as the future swap behind the store interface).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
JakeSCahill and others added 2 commits June 17, 2026 09:59
MCP clients can now identify themselves to our AS, so real clients (ChatGPT,
Claude, …) can connect:
- lib/oauth/clients.mjs: DCR (RFC 7591) registerClient; CIMD getClient that
  fetches+validates a URL client_id's metadata document (https-only + loopback/
  private-host SSRF guard, timeout, size cap); redirect_uri matching (exact, with
  loopback port-flexibility for native clients like Claude Code)
- store.mjs: putClient/getStoredClient (Blobs, strong consistency; resilient to
  store errors)
- mcp-oauth.mjs: POST /oauth/register; /authorize now resolves the client and
  validates redirect_uri BEFORE any redirect (open-redirect guard) and rejects
  unknown clients; /token binds the code to its client_id; metadata advertises
  registration_endpoint + client_id_metadata_document_supported
- config.mjs: register path + registration_endpoint

Verified live (functions:serve): register -> authorize -> token -> /mcp (200);
unknown client -> 400 invalid_client; bad redirect_uri -> 400 invalid_request.
37 unit tests pass (incl. CIMD URL detection, doc validation, SSRF guard,
loopback redirect matching). CIMD is wired + unit-tested but not yet
live-exercised (needs a hosted client doc).

Deferred: refresh-token grant + rotation (M3), consent UI, /register rate-limit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The code grant now also issues a refresh token, and /token handles
grant_type=refresh_token so clients renew access tokens without re-login.

- lib/oauth/refresh.mjs: token gen (hashed at rest), newFamilyId, and a pure
  decideRefresh() (rotate / reuse / invalid) — unit-tested
- store.mjs: refresh-token + family ops (Blobs, strong consistency)
- mcp-oauth.mjs: issue first refresh in a new family on the code grant; on
  refresh, rotate (supersede old, issue new) and detect reuse — replaying a
  superseded token revokes the whole family (theft signal -> forces re-auth);
  client-id binding enforced; metadata advertises refresh_token grant
- config.mjs: REFRESH_TOKEN_TTL_SEC (default 30d)

Verified live (functions:serve): code grant -> refresh; rotate -> new tokens,
new access works at /mcp (200); replaying the old token -> 400 reuse + family
revoked; the latest token then also fails (family_revoked). 44 unit tests pass.

Note: refresh re-issues from stored claims (doesn't re-check Auth0 each time) —
standard; periodic re-validation can be added later.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

M2 + M3 landed — client registration and refresh tokens (core AS feature set complete)

Building on M1 (AS core), the authorization server now supports the full client lifecycle:

M2 — client registration

  • DCR (POST /oauth/register, RFC 7591) — clients self-register, get a public client_id.
  • CIMD — a URL client_id is fetched + validated (https-only, loopback/private-host SSRF guard, timeout, size cap).
  • /authorize now resolves the client and validates redirect_uri before any redirect (open-redirect guard) and rejects unknown clients; /token binds the code to its client. Metadata advertises registration_endpoint + client_id_metadata_document_supported.

M3 — refresh tokens (rotation + reuse detection)

  • Code grant issues a refresh token; grant_type=refresh_token renews access without re-login.
  • Rotation (each refresh supersedes the old) + reuse detection via token families: replaying a superseded token revokes the whole family (theft signal → forces re-auth).

Verified live (functions:serve, mock upstream): register → authorize → token → /mcp (200); unknown client/bad redirect rejected; refresh rotates and the new access works; replaying an old refresh → reuse-revoked, and the latest token then also fails (family revoked). 44 unit tests pass.

Status

Milestone State
M1 — AS core (jose, Blobs strong-consistency, PKCE, fail-closed) ✅ live-verified, preview-green
M2 — DCR + CIMD registration ✅ DCR live-verified; CIMD unit-tested
M3 — refresh + rotation + reuse-detection ✅ live-verified

Remaining before real use

  • Santi's Auth0 client_id — the only thing gating real logins (dev-mock until then; fail-closed in prod without it).
  • CIMD live-exercise (needs a hosted client doc; logic is unit-tested).
  • Polish: consent screen, /register rate-limit.
  • Rollout: set env (REDPANDA_OAUTH_CLIENT_ID, MCP_OAUTH_UPSTREAM=auth0, REQUIRE_AUTH=true), then enforce.

All on Netlify Blobs (strong consistency) — no DB/credits needed at current volume.

JakeSCahill and others added 2 commits June 17, 2026 12:45
…n B)

Per the product call: docs MCP auth is login-only for now (no inline account
creation / org provisioning). To onboard prospects without a Redpanda Cloud
account, /authorize now renders a small interstitial — "Continue with Redpanda
Cloud" + "Sign up at cloud.redpanda.com" — before redirecting to the IdP.

- lib/oauth/pages.mjs: loginInterstitialHtml() (pure, attribute-escaped — unit-tested)
- config.mjs: SIGNUP_URL (default https://cloud.redpanda.com) + LOGIN_INTERSTITIAL
  (set MCP_OAUTH_INTERSTITIAL=off to redirect straight through, e.g. if the
  signup link later lives on the Auth0 login page)
- mcp-oauth.mjs: /authorize returns the interstitial (200 HTML) instead of an
  immediate 302; the Continue link carries the upstream authorize URL

Verified live (functions:serve): /authorize returns the interstitial; following
Continue completes mock-idp -> /callback -> client redirect with a code.
82 tests pass (incl. HTML escaping / attribute-breakout guard).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- lib/oauth/ratelimit.mjs: per-IP fixed-window limiter for /oauth/register,
  backed by Netlify Blobs (strong consistency, since in-memory doesn't survive
  serverless invocations). Defaults: 20/hour/IP (MCP_OAUTH_REGISTER_LIMIT /
  _WINDOW_SEC). Fails open if the store is unavailable. Returns 429 when exceeded.
- how-to-use-these-docs.adoc: note the free signup path (cloud.redpanda.com,
  surfaced on the sign-in screen) and soften the org wording (org captured when
  available; otherwise attribution is by email domain).

Verified live (functions:serve, limit=2): register -> 201, 201, 429.
47 unit tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

Tested the whole flow against the real integration Auth0 tenant today (integration-cloudv2.us.auth0.com), driven through Claude Code, and it works end to end.

What happened: Claude Code registered itself via DCR, hit our /authorize, got the "Continue with Redpanda Cloud" interstitial, bounced to the real Auth0 login (which went through the Okta SSO connection), came back to /mcp/callback, we exchanged the code and validated the ID token, pulled the verified email, issued our own token, and Claude Code then made authenticated calls to /mcp. The user record we captured shows domain: redpanda.com with a real Auth0 sub, so email extraction is working.

Request trace from the run: POST /oauth/registerGET /oauth/authorizeGET /mcp/callbackPOST /oauth/token → several authenticated POST /mcp.

One log line that looks alarming but isn't an auth issue: Missing env var: KAPA_API_KEY. My local server didn't have the Kapa key, so ask_redpanda_question couldn't reach Kapa — but the authenticated /mcp calls still went through, so auth and transport are fine. Prod already has the Kapa key.

Also confirmed the organization_usage = require prompt works for existing users (I have an org on integration). The only case it would block is a brand-new signup before their Serverless org exists, which is acceptable since the signup link routes through the Cloud console and that provisions one.

So at this point everything is proven against real Auth0, not just the mock: DCR + CIMD, PKCE, the interstitial + signup link, federated login, email capture, token issuance, refresh with rotation/reuse-detection, and resource-server validation.

@JakeSCahill JakeSCahill marked this pull request as ready for review June 18, 2026 15:41
@JakeSCahill JakeSCahill requested a review from a team as a code owner June 18, 2026 15:41
JakeSCahill and others added 5 commits June 18, 2026 16:42
With Cloud login the email is already verified by Auth0, so blocking
personal-domain Cloud accounts (gmail etc.) is just friction with little
benefit. Flip the default off: we accept and capture every verified Cloud
login, and still record the email domain for attribution. Set
REQUIRE_WORK_EMAIL=true to re-enable the free/disposable 403 rejection.

- config.mjs + auth.mjs: default false (only true when explicitly 'true')
- test updated; docs 403 note reworded (personal emails accepted by default)
- 47 unit tests pass
…endpoints

CodeRabbit flagged the protected-resource metadata edge function for missing
OPTIONS/CORS preflight. Fixed there, and applied the same to the sibling
endpoints browser-based OAuth clients fetch cross-origin (authorization-server
metadata, JWKS, /token, /register): OPTIONS now returns 204 with the CORS
headers, and the JSON responses carry Access-Control-Allow-Origin. /authorize
and /callback are top-level navigations, so they don't need it.

Verified: OPTIONS -> 204 + CORS, GET discovery -> 200 + ACAO, OPTIONS /oauth/token -> 204. 47 tests pass.
@JakeSCahill JakeSCahill marked this pull request as draft June 18, 2026 17:04
JakeSCahill and others added 2 commits June 18, 2026 19:32
… test email_verified

- clients.mjs: harden the CIMD SSRF guard against IPv6 — strip brackets and block
  ::1/::, IPv4-mapped (::ffff:), ULA (fc00::/7) and link-local (fe80::/10), in
  addition to the IPv4 private/loopback/link-local ranges. Extracted isBlockedHost
  and unit-tested it (incl. bracketed forms).
- mcp-oauth.mjs: require client_id in token requests for both grants (public
  clients don't authenticate, so RFC 6749 requires it) — missing -> 400
  invalid_request, mismatch -> 400 invalid_grant.
- tests: cover email_verified=false / absent (still allowed since SSO logins
  often omit it, but flagged unverified in the captured context).

Verified live: token without client_id -> 400, with -> 200; 52 unit tests pass.
@JakeSCahill JakeSCahill marked this pull request as ready for review June 18, 2026 18:54
JakeSCahill and others added 10 commits June 18, 2026 20:13
- CIMD fetch: refuse redirects (redirect: 'error') and cap the response
  size with a streaming reader, so a malicious or huge client doc can't
  drive an SSRF or memory blowup
- Rate-limit CIMD client resolution per IP in /authorize (allowCimd),
  alongside the existing /register limit
- Pin RS256 when verifying our signing key and the upstream Auth0 ID token
- Capture email_verified on the user record, log line, and CRM payload
  (we record it rather than block, since SSO often omits it)
- Note the read-then-delete race for one-time codes/tokens (no Blobs CAS)
  and that the Postgres backend fixes it transactionally
- Clarify the MCP server is an OAuth resource server (not authless) and
  why GET/SSE isn't gated
- Fix the docs rate-limit sample (40 -> 60) to match the real limit
- Tests for the size cap, blocked redirect, and IPv6 SSRF guard
Add a privacy-policy link to the login interstitial (and the docs auth
section) so users know we collect their work email and track MCP usage
before they sign in. The URL is configurable via MCP_OAUTH_PRIVACY_URL
and defaults to redpanda.com/legal/privacy-policy.
Remove the affirmative query-handling promise from the login interstitial
and docs note. Queries are proxied to Kapa, so the claim is hard to stand
behind end-to-end; data handling belongs in the linked Privacy Policy where
it can be properly qualified. The notice keeps to what we need consent for:
collecting the work email and tracking usage.
Update the default SIGNUP_URL and the two sign-up references in the docs
to the dedicated sign-up page rather than the Cloud landing page.
Add a 'Staying signed in' subsection explaining that the MCP client
handles token renewal automatically: sign in once, the client refreshes
in the background, active users stay signed in, and 30 days idle triggers
a fresh (usually silent) sign-in. No manual token handling.
@JakeSCahill

Copy link
Copy Markdown
Contributor Author

After thinking through the storage options for the OAuth layer, I’m leaning toward using Neon Postgres (via Netlify’s integration) as the system of record for auth state, rather than Netlify Blobs.
Even though this MCP is currently lightweight (public docs + agent access), the direction we’re heading in includes a shared authentication system for both the docs site and a chatbot with saved conversations. That introduces more durable identity state (sessions, refresh tokens, conversation links), which benefits from strong consistency and transactional guarantees.

Postgres gives us:

  • safe single-use semantics for authorization codes (transactional “check + consume”)
  • clean modeling for users, sessions, orgs, and conversations
  • a natural path to unify docs + chatbot identity under one system

Netlify Blobs feels fine for simple metadata or low-risk storage, but it doesn’t provide strong enough guarantees for OAuth flows where race conditions or replay could become an issue.
Given our current scale (~1k users/week), Neon is more than sufficient and keeps the system simple while leaving room for future expansion into persistent user identity and chat history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant