Skip to content

[HDX-4120] feat(api): support heatmap tiles in external dashboards API#2200

Merged
kodiakhq[bot] merged 14 commits into
mainfrom
alex/HDX-4120-external-api-heatmap
May 12, 2026
Merged

[HDX-4120] feat(api): support heatmap tiles in external dashboards API#2200
kodiakhq[bot] merged 14 commits into
mainfrom
alex/HDX-4120-external-api-heatmap

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

@alex-fedotyev alex-fedotyev commented May 5, 2026

Summary

Heatmap was the only builder-mode display type that did not round-trip through the external dashboards API. The serializer dropped it into the "unsupported" fall-through, so creating, fetching, and updating heatmap tiles via /api/v2/dashboards lost the config.

This wires up heatmap end-to-end on the external API: a dedicated select-item schema, an explicit case in both serialization directions, OpenAPI JSDoc, and tests.

Follow-up to #2107 (review feedback from @pulpdrew, who asked whether we had a follow-up ticket to update the external API for the new visualization type).

What's in the diff

  • Zod schemas (packages/api/src/utils/zod.ts): a heatmap select-item schema that exposes the literal aggFn: "heatmap" plus valueExpression, optional countExpression, alias, heatmapScaleType; and a heatmap chart-config schema with optional groupBy and numberFormat. Heatmap is added to the builder discriminated union only.
  • Conversion utilities (packages/api/src/routers/external-api/v2/utils/dashboards.ts):
    • Builder serializer: replaces the old case DisplayType.Heatmap: fall-through with an explicit case that reads heatmap-specific fields off config.select[0] and emits aggFn: "heatmap" on the external surface.
    • Builder deserializer: new case 'heatmap': mirroring the Pie pattern; maps the external aggFn: "heatmap" back to the internal aggFn: "count" that the editor form persists, while preserving countExpression, alias, and heatmapScaleType on the select item.
    • The raw-SQL switch is intentionally left untouched: heatmap stays in the unsupported fall-through there because heatmap rendering requires isBuilderChartConfig.
  • OpenAPI JSDoc (packages/api/src/routers/external-api/v2/dashboards.ts): HeatmapSelectItem and HeatmapBuilderChartConfig components, and heatmap added to the TileConfig oneOf and discriminator mapping.
  • Tests (packages/api/src/routers/external-api/__tests__/dashboards.test.ts): heatmap added to the existing "round-trip all supported chart types" tests for both POST and PUT, plus an explicit rejection test confirming raw-SQL heatmap tiles return 400.
  • packages/api/openapi.json: regenerated.

Notes for review

  • No raw-SQL heatmap variant. PR feat: Wire heatmap chart into dashboard editor and tile rendering #2107 made heatmap builder-only and DBDashboardPage.tsx requires isBuilderChartConfig for heatmap rendering, so the raw-SQL fall-through stays.
  • heatmapScaleType and countExpression are persisted on the per-select-item level (via DerivedColumnSchema in packages/common-utils/src/types.ts), not on the chart config root. The form binds them as series.0.heatmapScaleType / series.0.countExpression. The schema and conversion utilities follow that.
  • The aggFn translation (external "heatmap" ↔ internal "count") keeps the saved Mongo document identical to what the editor form produces, so heatmap tiles created via the API render the same way as ones created via the UI.

Tier

Lands as review/tier-4 because anything under packages/api/src/routers/external-api/ is on the critical-path list. Diff is ~250 prod lines (most of it OpenAPI JSDoc and Zod boilerplate); no schema migrations or auth changes.

Test plan

  • make ci-lint (yarn lint, tsc --noEmit, OpenAPI lint)
  • make ci-unit (common-utils + app)
  • CI runs the full integration suite (heatmap round-trip POST + PUT + raw-SQL rejection); local make dev-int requires Docker BuildKit which isn't available on this host.

Deep-review carryover (2026-05-07)

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 5, 2026

🦋 Changeset detected

Latest commit: d6f27b3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 12, 2026 4:45pm

Request Review

@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (2):
    • packages/api/src/routers/external-api/v2/dashboards.ts
    • packages/api/src/routers/external-api/v2/utils/dashboards.ts
  • Cross-layer change: touches frontend (packages/app) + backend (packages/api) + shared utils (packages/common-utils)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 9
  • Production lines changed: 746 (+ 493 in test files, excluded from tier calculation)
  • Branch: alex/HDX-4120-external-api-heatmap
  • Author: alex-fedotyev

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

E2E Test Results

All tests passed • 175 passed • 3 skipped • 1250s

Status Count
✅ Passed 175
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

<!-- claude-code-review -->

PR Review

✅ No critical issues found.

Implementation is clean and well-scoped:

  • Explicit DisplayType.Heatmap case added in both serializer and deserializer (packages/api/src/routers/external-api/v2/utils/dashboards.ts), replacing the previous unsupported fall-through.
  • Dedicated Zod schemas (externalDashboardHeatmapSelectItemSchema, externalDashboardHeatmapChartConfigSchema) with valueExpression.min(1) and select.length(1) matching the editor's validateChartForm rule.
  • Heatmap added only to the builder discriminated union (raw-SQL stays rejected at the zod boundary), matching DBDashboardPage.tsx's isBuilderChartConfig requirement.
  • Trace-source enforcement via getHeatmapTilesWithNonTraceSources mirrors the UI's ChartEditorControls source-picker gating.
  • Test coverage is thorough: full round-trip (POST + PUT), minimal-fields round-trip, raw-SQL rejection, non-trace-source rejection, empty-valueExpression rejection, multi-select rejection.
  • OpenAPI JSDoc and openapi.json regeneration are in sync.

Minor (non-blocking) observations:

  • getSources() is now called 3× in parallel inside Promise.all (was 2×) since getHeatmapTilesWithNonTraceSources does its own fetch rather than receiving a shared existingSources map. Consistent with the pre-existing pattern (getMissingSources and getSourceConnectionMismatches already each fetch independently), so no regression — but a future refactor could deduplicate.
  • The PR description mentions an optional groupBy on the heatmap chart config schema, but the actual Zod/OpenAPI schema correctly omits it (matching HeatmapSeriesEditor). The code is right; the description copy is slightly out of date.
  • In convertToExternalTileChartConfig, numberFormat: config.numberFormat is emitted unconditionally where elsewhere in the file the conditional-spread pattern (...(x !== undefined ? { x } : {})) is used for optional fields. JSON serialization strips the undefined so behavior is correct, just stylistically inconsistent.

@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Thanks for the review.

  • Optional field consistency (point 1): Good catch. Fixed in 9e09229convertToExternalHeatmapSelectItem now uses !== undefined to match convertToInternalTileConfig, so empty-string round-trips don't silently drop fields.

  • numberFormat: config.numberFormat (point 2): This is actually the prevailing pattern in the file (10+ uses across Line, StackedBar, Table, Number, Pie, and the raw-SQL variants). Leaving it for consistency with the surrounding code; happy to swap to the spread form here AND across the file in a follow-up if we want a sweep.

  • knip: previously addressed in c232564 (extracted convertToExternalHeatmapSelectItem helper that types its return as ExternalDashboardHeatmapSelectItem).

alex-fedotyev added a commit that referenced this pull request May 6, 2026
The external API surface in PR #2200 exposes where/whereLanguage at the
chart-config level for heatmaps (UI HeatmapSeriesEditor doesn't render
per-series filters; the persisted shape stores them once via
SelectSQLStatementSchema). The MCP heatmap schema was dropping them, so
once #2200 lands a save through hyperdx_save_dashboard would silently
discard the filter on read-back.

Add `where` and `whereLanguage` to mcpHeatmapTileSchema.config with the
same defaults the search tile uses (where: '', whereLanguage: 'lucene').
Extend the round-trip test with a Lucene filter, add an explicit
whereLanguage="sql" round-trip, and a negative test for an invalid
whereLanguage. Update the example string and changeset to document the
new fields.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Review

Heatmap round-trip support looks solid: schemas, serializer, deserializer, OpenAPI, source-kind gate (REST + MCP), and tests for create/update/round-trip/rejection/legacy-corrupted GET all line up. Promise.all hoisting is a nice cleanup. A couple of doc-rot items only — no functional blockers.

  • ⚠️ Stale comment in heatmap deserializer (packages/api/src/routers/external-api/v2/utils/dashboards.ts:627-628): says "aggFn is the literal 'heatmap' on the external surface, mapped to the internal 'count' aggFn" — but commit 476fa998 dropped aggFn from externalDashboardHeatmapSelectItemSchema. The deserializer just hardcodes aggFn: 'count'; there is no external→internal aggFn translation any more. → Update the comment to reflect the current contract (no aggFn on external; displayType is the sole discriminator).
  • ⚠️ PR description out of sync with last commit: the "What's in the diff" section still describes "emits aggFn: 'heatmap' on the external surface" and the "external aggFn: \"heatmap\" ↔ internal aggFn: \"count\"" translation. After 476fa998 the external item has no aggFn at all. → Rewrite that paragraph so future reviewers/release notes aren't misled.
  • ℹ️ GET placeholder violates the published OpenAPI: the "corrupted heatmap" fallback returns valueExpression: '', which violates the schema's minLength: 1 / z.string().min(1). The trade-off is documented in-code and is preferable to silently downgrading to line, but clients that validate responses against the OpenAPI spec will reject these. Worth a one-liner in the OpenAPI HeatmapSelectItem description (or release notes) so external consumers know GET may return an empty valueExpression for legacy/corrupt docs.
  • ℹ️ Error ordering on PUT (minor): the heatmap source-kind 400 now fires before the "dashboard not found" 404 check, because existingDashboard is awaited in the same Promise.all and the check uses existingDashboard?.tiles ?? []. For a PUT against a non-existent dashboard ID whose payload contains a heatmap on a non-Trace source, the response is 400 ("Heatmap tiles require a Trace source"), not 404. Not security-relevant — just a minor UX regression. Consider gating the heatmap check behind if (existingDashboard) or moving the 404 short-circuit earlier.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Compound Engineering Review

✅ No critical issues found. Security audit returned no findings (tenant scoping, IDOR, info-leak, auth all sound). Type safety, test parity, and adherence to sibling patterns (Pie/Search/Markdown) are clean.

P2 — Important

  • P2 packages/app/src/components/SourceSelect.tsx:85 + ChartEditorControls.tsx:108allowedSourceKinds prop typed as mutable SourceKind[] forces [...HEATMAP_ALLOWED_SOURCE_KINDS] spread (no other call site spreads); allocates a new array each render → widen prop to ReadonlyArray<SourceKind> and drop the spread.
  • P2 packages/api/src/routers/external-api/v2/utils/dashboards.ts:103-109, 530-536 — conditional-spread ...(item.x !== undefined ? { x: item.x } : {}) repeated 6× for countExpression/alias/heatmapScaleType → replace with pick(item, ['countExpression','alias','heatmapScaleType']) (lodash already imported); preserves empty-string round-trip and removes the comment that motivated the verbose form.
  • P2 packages/api/src/routers/external-api/v2/dashboards.ts:1683-1722, 1918-1957 — POST/PUT each run 4 sibling validators in Promise.all, several re-querying sources; create+update branches are now byte-identical → extract validateDashboardReferences(teamId, tiles, filters) that fetches sources/connections once and returns a structured result, plus a respondIfInvalidReferences(res, result) helper to dedupe the 4 error blocks.
  • P2 packages/api/src/routers/external-api/v2/utils/dashboards.ts:514-528 — internal aggFn: 'count' + aggCondition: '' are encoding applyHeatmapDefaults's UI-form quirk in the API converter; if the editor default ever shifts (e.g. histogram), API-built and UI-built tiles diverge silently → share the default-builder via common-utils (buildInternalHeatmapSelectItem) imported from both the form and the converter, or at minimum add a unit test asserting parity so the next editor change fails loudly.
  • P2 packages/common-utils/src/guards.ts:13-35HEATMAP_ALLOWED_SOURCE_KINDS is a 1-element array wrapped in a ReadonlyArray<SourceKind> const + 4-line predicate + 9-line JSDoc with two consumers using different shapes (UI spreads array, API calls predicate); when metric-heatmaps need a richer "kind + metricType" rule, the kinds-array contract breaks → simplify to a single as const constant and let the API call .includes(...) inline, or expose only isHeatmapCompatibleSource(source) + a getHeatmapAllowedSourceKinds() accessor so the predicate can grow without the UI ossifying around the array.
  • P2 packages/api/src/routers/external-api/v2/utils/dashboards.ts:95-110convertToExternalHeatmapSelectItem param typed as Exclude<BuilderSavedChartConfig['select'][number], string> reads heatmap-only fields (countExpression, heatmapScaleType) that only exist on DerivedColumn → tighten the param to DerivedColumn (or a Pick<DerivedColumn, …>) so the contract reflects what the body requires.
  • P2 packages/api/src/routers/external-api/v2/utils/dashboards.ts:291-319, 524-538 — aggFn round-trip is silently lossy: serializer always emits 'heatmap', deserializer always writes 'count', regardless of the internal value; legacy/manually-written docs with aggFn: 'avg' would be relabeled without notice → either log a warning when item.aggFn !== 'count' && item.aggFn !== 'heatmap' in the serializer, or document the lossiness explicitly in the existing comment.
  • P2 packages/api/src/routers/external-api/v2/utils/dashboards.ts:303, 644-651 — two dead defensive guards: (a) typeof item === 'string' on a select element where the schema guarantees objects (string case is already caught one level up by Array.isArray), and (b) && tile.config.sourceId truthiness check on a field that objectIdSchema already validates as a non-empty string → drop both for clarity (or leave a one-line "belt-and-suspenders" comment).

alex-fedotyev pushed a commit that referenced this pull request May 6, 2026
Compound-review feedback on #2200, all P2:

- Drop redundant `whereLanguage.optional()`; the underlying
  `SearchConditionLanguageSchema` (alias `whereLanguageSchema`) is
  already `.optional()` and the sibling
  `externalDashboardSearchChartConfigSchema` writes it without the
  outer wrap.
- Drop the `where: externalConfig.where ?? ''` redundancy in the
  deserializer; the Zod schema declares `.default('')` so the field is
  always a string post-parse.
- Rename schema `HeatmapBuilderChartConfig` -> `HeatmapChartConfig`
  to match the sibling `SearchChartConfig` / `MarkdownChartConfig`
  naming. The `Builder` suffix is reserved for cases that disambiguate
  from a sibling `RawSqlChartConfig`; heatmap has no raw-SQL variant.
- `convertToExternalHeatmapSelectItem` now takes a non-optional item.
  The case-arm caller checks for missing/string/empty-valueExpression
  items and falls through to `defaultTileConfig` with a logger.warn
  instead of silently emitting an out-of-contract payload that
  violates the external schema's `min(1)` rule.
- Extract `HEATMAP_ALLOWED_SOURCE_KINDS` and
  `isHeatmapCompatibleSource` into common-utils. Both
  `ChartEditorControls.tsx` (UI) and the API's
  `getHeatmapTilesWithIncompatibleSources` (renamed from
  `...WithNonTraceSources`) reference the same set so UI and API
  gates move together.
- Add a parity comment on the deserializer pointing to
  `applyHeatmapDefaults` so the `aggCondition`/`aggConditionLanguage`
  defaults stay coupled to the editor's behaviour.
- Consolidate three pure-Zod schema-rejection tests
  (raw-SQL-heatmap absent, empty valueExpression, multiple select
  items) into a single parametrized `it.each` block; the runtime
  non-Trace-source rejection stays as its own test because it
  exercises the new `getHeatmapTilesWithIncompatibleSources` path.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

P2 cleanups addressed in e89989d:

  • Dropped redundant whereLanguage.optional() and where: ?? '' to match sibling chart-config arms
  • Renamed HeatmapBuilderChartConfigHeatmapChartConfig (the Builder suffix is reserved for cases that disambiguate from a sibling RawSql type; heatmap has no raw-SQL variant)
  • convertToExternalHeatmapSelectItem is now non-optional; the case-arm caller falls through to defaultTileConfig with logger.warn for legacy/corrupted Mongo docs that lack valueExpression
  • Extracted HEATMAP_ALLOWED_SOURCE_KINDS and isHeatmapCompatibleSource to common-utils so ChartEditorControls.tsx and getHeatmapTilesWithIncompatibleSources (renamed) reference the same set; UI and API gates now move together
  • Parity comment on the deserializer pointing at applyHeatmapDefaults for aggCondition/aggConditionLanguage
  • Three pure-Zod rejection tests consolidated into a single parametrized it.each; the runtime non-Trace-source rejection stays as its own test because it exercises the new code path

alex-fedotyev and others added 6 commits May 8, 2026 00:19
The HeatmapSeriesEditor in the UI binds a single SearchWhereInput to the
top-level `where` / `whereLanguage` and does not render any groupBy
input. The previous schema had it backwards: it carried groupBy (not
exposed in the UI, never set by the form) and was missing
where / whereLanguage (the only filter heatmap actually uses).

- zod: drop groupBy, add where + whereLanguage to
  externalDashboardHeatmapChartConfigSchema
- conversion utils: serializer emits chart-level where + whereLanguage;
  deserializer maps them onto BuilderSavedChartConfig.where /
  whereLanguage instead of stuffing them into select-item aggCondition
- OpenAPI: HeatmapBuilderChartConfig has where + whereLanguage instead
  of groupBy; description notes the row-level filter is at chart level
- tests: POST round-trip exercises sql where ("ServiceName = 'api'")
  with heatmapScaleType: 'log'; PUT round-trip exercises lucene where
  ("service:api") with heatmapScaleType: 'linear'. Both languages and
  both scale types are now covered end-to-end.
- regenerated openapi.json
…ression

Two more gaps found while auditing the schema against the heatmap UI surface:

1. The heatmap UI's source picker is restricted to SourceKind.Trace
   (ChartEditorControls.tsx:103-107: allowedSourceKinds={[SourceKind.Trace]}).
   The external API previously accepted any source kind, so a metric or log
   source would round-trip cleanly through /api/v2/dashboards and produce a
   tile that does not render in the UI. Adds getHeatmapTilesWithNonTraceSources
   and wires it into both POST and PUT alongside the existing source /
   connection validators, returning 400 with a descriptive message.

2. validateChartForm in the editor rejects empty valueExpression on heatmap
   ("Value expression is required for heatmap charts"). The Zod schema
   accepted empty string. Tightens to z.string().min(1).max(10000) and
   updates the OpenAPI minLength to match.

Tests added: rejection on non-Trace source (with assertion on the error
message), rejection on empty valueExpression, rejection on multi-item
select array.

Regenerated openapi.json.
The existing realistic-payload round-trip exercises every optional field
populated. The minimal path (no countExpression / alias /
heatmapScaleType / where / whereLanguage / numberFormat) is what the
deserializer's !== undefined guards in v2/utils/dashboards.ts exist to
protect, and was uncovered until now.

The Zod schema applies where: z.string().optional().default(''), so the
round-trip surfaces an empty string for where while the other optional
fields stay undefined. The expected response keeps explicit on the
applied default rather than asserting strict equality with the request.
When the external API request omits whereLanguage on a heatmap tile,
the deserializer at v2/utils/dashboards.ts:514 fills it with 'lucene'
so the Mongo doc stays consistent across heatmap and non-heatmap chart
types. The minimal-fields round-trip needs to expect whereLanguage on
the response, not undefined.

The schema-level constraint stays optional (matching the OpenAPI
shape); the default lives at the persistence boundary, not the
contract boundary.
Compound-review feedback on #2200, all P2:

- Drop redundant `whereLanguage.optional()`; the underlying
  `SearchConditionLanguageSchema` (alias `whereLanguageSchema`) is
  already `.optional()` and the sibling
  `externalDashboardSearchChartConfigSchema` writes it without the
  outer wrap.
- Drop the `where: externalConfig.where ?? ''` redundancy in the
  deserializer; the Zod schema declares `.default('')` so the field is
  always a string post-parse.
- Rename schema `HeatmapBuilderChartConfig` -> `HeatmapChartConfig`
  to match the sibling `SearchChartConfig` / `MarkdownChartConfig`
  naming. The `Builder` suffix is reserved for cases that disambiguate
  from a sibling `RawSqlChartConfig`; heatmap has no raw-SQL variant.
- `convertToExternalHeatmapSelectItem` now takes a non-optional item.
  The case-arm caller checks for missing/string/empty-valueExpression
  items and falls through to `defaultTileConfig` with a logger.warn
  instead of silently emitting an out-of-contract payload that
  violates the external schema's `min(1)` rule.
- Extract `HEATMAP_ALLOWED_SOURCE_KINDS` and
  `isHeatmapCompatibleSource` into common-utils. Both
  `ChartEditorControls.tsx` (UI) and the API's
  `getHeatmapTilesWithIncompatibleSources` (renamed from
  `...WithNonTraceSources`) reference the same set so UI and API
  gates move together.
- Add a parity comment on the deserializer pointing to
  `applyHeatmapDefaults` so the `aggCondition`/`aggConditionLanguage`
  defaults stay coupled to the editor's behaviour.
- Consolidate three pure-Zod schema-rejection tests
  (raw-SQL-heatmap absent, empty valueExpression, multiple select
  items) into a single parametrized `it.each` block; the runtime
  non-Trace-source rejection stays as its own test because it
  exercises the new `getHeatmapTilesWithIncompatibleSources` path.
Deep-review on #2200 flagged the previous comment as overstating
parity. The editor's `applyHeatmapDefaults` writes `numberFormat:
{ output: 'duration', factor: 0.001 }`, `series.0.countExpression:
'count()'`, and `aggConditionLanguage: getStoredLanguage() ?? 'lucene'`,
while this converter hardcodes `'lucene'` for the language and passes
`numberFormat`/`countExpression` through verbatim from the external
payload (or leaves them absent).

Comment now enumerates the divergences and the rendering implications:
the renderer does not read `aggConditionLanguage` for heatmap tiles
(heatmap has no per-select where), and an API-built tile renders
without duration formatting unless the caller specifies `numberFormat`.
No behavior change.
Six findings from the bot review on #2200:

P0/P1
- MCP schema rejected heatmap tiles. Adds `mcpHeatmapTileSchema`
  (mirroring `externalDashboardHeatmapChartConfigSchema`) and includes
  it in the `mcpTilesParam` union so `hyperdx_save_dashboard` accepts
  the same heatmap shape the REST endpoint accepts.
- MCP create/update did not run the heatmap source-kind gate. Both
  paths now call `getHeatmapTilesWithIncompatibleSources` after schema
  validation so an MCP-issued create cannot persist a heatmap that
  the REST PUT would reject with 400 on the next round-trip. The
  update path scopes the check to heatmap tiles whose `sourceId` or
  `displayType` changed in this request, matching the REST PUT.
- `buildQueryGuidePrompt` did not document heatmap. Adds entries to
  TILE TYPE GUIDE, PER-TILE TYPE CONSTRAINTS, AGGREGATION FUNCTIONS,
  and COMMON MISTAKES so the LLM cannot skip the Trace-source and
  non-empty-`valueExpression` rules when generating tiles.

P2
- `convertTileToExternalChart` returned `displayType: 'line'` via
  `defaultTileConfig` for any heatmap tile whose internal doc lacked a
  non-empty `valueExpression`. A GET / mutate / PUT round-trip would
  then silently overwrite the heatmap with a line chart in Mongo
  (data loss). The heatmap branch now emits a heatmap-shaped
  placeholder with empty `valueExpression`, so the response preserves
  `displayType` and a re-PUT surfaces the breakage as a clear schema
  error from the input schema's `min(1)` rule rather than dropping
  the heatmap.
- The PUT-time `getHeatmapTilesWithIncompatibleSources` ran on every
  request regardless of whether heatmap tiles changed, so a source
  whose `kind` was changed to non-Trace after the dashboard was
  originally accepted wedged every PUT. The check now scopes to
  heatmap tiles whose `sourceId` or `displayType` changed compared
  to the existing dashboard via a new `filterChangedHeatmapTiles`
  helper.
- POST and PUT each ran three separate `getSources(team)` queries
  (one per validation helper). The helpers now accept a pre-fetched
  `SourceForValidation[]`, hoisted into a single
  `fetchSourcesForValidation` call shared across
  `getMissingSources`, `getSourceConnectionMismatches`, and
  `getHeatmapTilesWithIncompatibleSources`. Same change applied to
  the MCP create/update paths. Drops two redundant DB round-trips
  per request.

Tests cover the MCP schema accept/reject paths, the heatmap
source-kind gate at the MCP layer, the GET placeholder behavior, the
PUT scoping (no 400 on unrelated edits when the underlying source's
`kind` was changed later), and that the prompt documents heatmap in
all four sections.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Pushed a fix-pack addressing the deep-review findings in commit a217900. Mapping:

P0/P1

  1. schemas.ts:312 MCP heatmap schema gap. Adds mcpHeatmapTileSchema mirroring externalDashboardHeatmapChartConfigSchema and includes it in the mcpTilesParam union.
  2. saveDashboard.ts:115 MCP source-kind gate. Both the create and update paths now run getHeatmapTilesWithIncompatibleSources. The update path scopes the check to heatmap tiles whose sourceId or displayType changed in the request, matching the REST PUT.
  3. content.ts:27 heatmap missing from prompt. Added entries to TILE TYPE GUIDE, PER-TILE TYPE CONSTRAINTS, AGGREGATION FUNCTIONS, and COMMON MISTAKES (the bot called out three sections; AGGREGATION FUNCTIONS was added too because aggFn: "heatmap" is heatmap-only).

P2

  1. utils/dashboards.ts:309 silent line downgrade. convertTileToExternalChart now emits a heatmap-shaped placeholder with empty valueExpression instead of falling through to defaultTileConfig. A re-PUT surfaces the breakage as a clear schema error rather than persisting displayType: 'line'.
  2. dashboards.ts:1921 PUT wedging on unrelated edits. New filterChangedHeatmapTiles helper restricts the source-kind check to heatmap tiles whose sourceId or displayType changed. Same scoping applied to the MCP update path.
  3. dashboards.ts:1691 triple getSources(team). Hoisted into a single fetchSourcesForValidation call shared across all three helpers (REST POST, REST PUT, MCP create, MCP update). Drops two redundant DB round-trips per request.

Tests cover the MCP schema accept/reject paths, the heatmap source-kind gate at the MCP layer, the GET placeholder behavior, the PUT scoping (no 400 on unrelated edits when the underlying source's kind was changed later), and the prompt content.

Skipping the P2/P3 items the task spec marked as deferred (renames, cosmetic TS, aggFn translation tracked separately, duplicate POST/PUT validation ladder, /api/v2/charts/series heatmap gap, P3 nitpicks).

Copy link
Copy Markdown
Contributor

@pulpdrew pulpdrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small thoughts on the API spec, otherwise LGTM

Comment thread packages/api/src/utils/zod.ts Outdated
aggFn: z.literal('heatmap'),
valueExpression: z.string().min(1).max(10000),
countExpression: z.string().max(10000).optional(),
alias: z.string().max(10000).optional(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias isn't relevant to heatmaps, is it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Dropped from the external schema, the OpenAPI JSDoc, the MCP companion, and the converter (both directions). The per-feature code map at notes/repo-conventions/hyperdx/heatmap.md already flagged alias as "not in heatmap UI" and I kept it anyway for a hypothetical round-trip preservation case that never fires, since the UI never sets it for heatmap. Fixed in 476fa99.

// asks for it.
select: [
{
aggFn: 'count',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're just always saving count here, could we just remove the aggFn property from the OpenAPI spec / zod schema for the heatmap select item? Why make the user provide it if we ignore it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Dropped aggFn from the zod schema, the OpenAPI JSDoc, the MCP companion schema, the MCP prompt examples, and the placeholder-on-read path. The chart-level displayType: "heatmap" is the discriminator; the per-select-item literal was pure noise. Fixed in 476fa99.

kodiakhq Bot pushed a commit that referenced this pull request May 11, 2026
…ontainer/tab refs (#2201)

## Summary

PR #2015 added a dashboard organization layer (containers with optional tabs, plus per-tile `containerId` and `tabId`) but the v2 external API was not updated to round-trip the new fields. External integrations that build dashboards programmatically had no way to use the new layer.

This wires the full set of fields through `CREATE` / `GET` / `LIST` / `UPDATE` on `/api/v2/dashboards`. Dashboards saved without containers round-trip unchanged.

Closes #2150. Follow-up to #2015 (commit 7665fbe).

## What's in scope

- Dashboard body Zod schema gains `containers: DashboardContainer[]?` (imported from `@hyperdx/common-utils`) and the tile schema gains `containerId?` and `tabId?`.
- `convertToExternalDashboard` now emits `containers` (only when at least one is present, so dashboards without the layer round-trip with the field absent).
- `convertTileToExternalChart` and `convertToInternalTileConfig` propagate `containerId` and `tabId`. The legacy `series`-format translator in `externalApi.ts` also propagates them so both code paths preserve the fields.
- The `containers: 1` projection is added to the Mongoose `find` and `findOne` calls.
- New cross-field validation on the body schema:
  - container ids unique within a dashboard
  - tab ids unique within a container
  - tile `containerId` resolves to a real container
  - tile `tabId` resolves to a tab inside that container
  - tile `tabId` requires `containerId` to be set
- OpenAPI JSDoc additions for `DashboardContainer`, `DashboardContainerTab`, the new tile fields, and the new dashboard field on `Dashboard` / `CreateDashboardRequest` / `UpdateDashboardRequest`. `openapi.json` regenerated.
- A changeset entry.

## Out of scope

Each item below has a tracking issue so the gap is visible after merge.

- The legacy `type: 'section'` discriminator removed in #2015 (commit 7665fbe). Not emitted, not validated against. Legacy documents parse cleanly via Zod's strip-unknown. No follow-up needed; documenting for clarity.
- Container-level `repeat` field (future work per #2015 review notes). Tracked in #2213.
- The MCP `hyperdx_save_dashboard` tool is unchanged in this PR. Its inline `inputSchema` does not yet expose `containers`, and adding it there is its own follow-up; doing both at once would tangle two surfaces. Tracked in #2212.
- Heatmap chart-type round-trip lands separately in #2200.
- Per-tile `onClick` drill-down round-trip is tracked in #2214.

## Tier

The triage classifier marks `packages/api/src/routers/external-api/v2/*` as critical-path, so this lands as **Tier 4** by directory rule, even though the diff is small (~284 prod lines) and additive. Splitting further would separate the body schema, the conversion utilities, and the route wiring from each other and not actually reduce review burden. Happy to break this up if there's a preferred way to slice it.

## Test plan

- [x] `yarn ci:lint` (lint + tsc + spectral) on `@hyperdx/common-utils`, `@hyperdx/api`, `@hyperdx/app`
- [x] `yarn knip` (no new unused exports)
- [x] Integration: `yarn jest dashboards.test.ts -t "Containers and tabs"`, all 8 new tests pass
- [x] Integration: full `yarn jest dashboards.test.ts`, 86/86 tests pass (no regressions in old or new format suites)
- [x] Integration: `yarn jest src/mcp/__tests__/dashboards.test.ts`, 19/19 MCP dashboard tests pass (the MCP body schema shares with the external API body schema, so this confirms the new validations don't break the MCP path)
- [x] `openapi.json` regenerated and committed; spectral lint passes
@kodiakhq kodiakhq Bot removed the automerge label May 11, 2026
@kodiakhq
Copy link
Copy Markdown
Contributor

kodiakhq Bot commented May 11, 2026

This PR currently has a merge conflict. Please resolve this and then re-add the automerge label.

Review feedback from pulpdrew on #2200:

- "Alias isn't relevant to heatmaps, is it?" (zod.ts:310)
- "If we're just always saving `count` here, could we just remove the
  `aggFn` property from the OpenAPI spec / zod schema for the heatmap
  select item? Why make the user provide it if we ignore it?"
  (utils/dashboards.ts:561)

Both call out the same root failure: the external heatmap select item
schema carried fields the heatmap UI and the converter do not honor.

- The chart-level discriminator is `displayType: "heatmap"`. The
  per-select-item `aggFn: "heatmap"` literal was pure noise: the
  converter always wrote `aggFn: "count"` internally regardless. Drop
  it from the zod schema, the OpenAPI JSDoc, the MCP companion
  schema, the MCP prompt examples, and the placeholder-on-read path.
- `HeatmapSeriesEditor.tsx` does not render an alias input
  (per-feature code map at
  notes/repo-conventions/hyperdx/heatmap.md line 70 says: "not in
  heatmap UI"). The previous schema kept alias for a hypothetical
  "preserved through round-trip" case that never triggers because the
  UI never sets it. Drop it from the schema, the OpenAPI JSDoc, the
  converter (both directions), the MCP companion schema, and the
  tests that pass it.

Tests updated to use the new minimal shape `{ valueExpression: "..." }`.
The minimal-round-trip test, the rejection-cases parametrized test, the
non-Trace-source test, the multi-tile round-trip test, the PUT
round-trip test, and the PUT-scoping test all drop `aggFn` and `alias`
from their fixtures.

openapi.json regenerated.

No behaviour change for callers who weren't sending these fields
(serialization path strips them on read so existing dashboards with
internal `alias` set will not surface it through the external API; the
UI never sets it for heatmap, so this is a no-op for UI-created tiles).
External-API callers that were previously sending `aggFn: "heatmap"` or
`alias: "..."` will now see those fields ignored by the schema's
default unknown-key-pass-through behavior; the converter no longer
reads them.
Resolves conflicts with the containers/tabs PR #2201 (41395ca) which
landed while this branch was iterating on deep-review feedback.

Conflicts and resolution:

- packages/api/src/routers/external-api/v2/dashboards.ts
  - Imports: keep IDashboard (used by Partial<IDashboard> setPayload),
    drop getSources (replaced by fetchSourcesForValidation helper from
    the deep-review refactor).
  - getSourceConnectionMismatches signature: keep the sync (sources,
    tiles) shape from this branch; the async (team, tiles) shape on
    main re-fetched sources, which the deep-review explicitly flagged
    as redundant.
  - POST endpoint: combine main's tileRefIssues container/tab check
    AHEAD of this branch's hoisted-fetch + 4-validator block. Both
    additions are orthogonal; the tile-ref check runs first because
    it short-circuits on a bad request body and avoids any DB calls.
  - PUT endpoint: drop main's standalone Dashboard.findOne; the
    hoisted Promise.all in this branch already fetches existingDashboard
    once and reuses it. Bumped the projection from { tiles: 1, filters:
    1 } to { tiles: 1, filters: 1, containers: 1 } so the new
    tile-ref check sees the existing container set.

- packages/api/src/routers/external-api/v2/utils/dashboards.ts
  - Imports: combine main's validateDashboardContainersStructure /
    validateDashboardTileContainerRefs from dashboardValidation with
    this branch's isHeatmapCompatibleSource from guards.

Verified yarn workspace @hyperdx/api ci:lint clean after rebuilding
common-utils; the 19 unit tests in utils/__tests__/dashboards.test.ts
pass. Integration tests (__tests__/dashboards.test.ts, mcp/__tests__)
need the make dev-int harness.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Rebased onto main to clear the kodiakhq conflict.

The merge commit (d2795e02) resolves the overlap with the containers/tabs PR #2201:

  • Imports: kept IDashboard (used by Partial<IDashboard> setPayload), dropped getSources since the deep-review refactor replaced it with fetchSourcesForValidation.
  • getSourceConnectionMismatches: kept the sync (sources, tiles) signature from this branch; the async (team, tiles) version on main re-fetched sources, which the deep-review flagged as redundant.
  • POST endpoint: kept main's tileRefIssues container/tab check ahead of this branch's hoisted-fetch block, so a bad request body short-circuits before any DB calls.
  • PUT endpoint: dropped main's standalone Dashboard.findOne since existingDashboard is already fetched in the hoisted Promise.all here; bumped that projection to include containers: 1 so the new tile-ref check sees the existing container set.

Your two API-spec thoughts are addressed in 476fa998:

  1. zod.ts:310 alias on heatmap: dropped from the external schema, the OpenAPI JSDoc, the MCP companion, and both directions of the converter.
  2. dashboards.ts:558 always-count aggFn: dropped aggFn from the heatmap select-item zod schema, OpenAPI JSDoc, MCP companion schema, MCP prompt examples, and the placeholder-on-read path. The chart-level displayType: "heatmap" is the discriminator; the per-select-item literal was noise.

yarn workspace @hyperdx/api ci:lint is clean after rebuilding common-utils; the 19 unit tests in utils/__tests__/dashboards.test.ts pass.

Mind converting the COMMENTED review to an approval if it still reads LGTM with the thoughts addressed?

…l-api-heatmap

# Conflicts:
#	packages/api/src/mcp/__tests__/dashboards.test.ts
#	packages/api/src/mcp/tools/dashboards/saveDashboard.ts
#	packages/api/src/mcp/tools/dashboards/schemas.ts
#	packages/api/src/routers/external-api/v2/dashboards.ts
#	packages/api/src/routers/external-api/v2/utils/dashboards.ts
alex-fedotyev added a commit that referenced this pull request May 12, 2026
Brandon flagged on #2199 that the schema unit tests only assert that
zod accepts the shape and don't prove that an agent-issued
hyperdx_save_dashboard call actually persists a heatmap with every
MCP-specific field. The bare-minimum heatmap test that lands with the
#2200 heatmap external-API work covers only the required
valueExpression; this test drives the full round-trip with
heatmapScaleType, countExpression, numberFormat, where, and
whereLanguage all set, then mutates them on PUT and re-reads to
confirm the new values stick.

Drive-by: drops an em-dash from a sibling test comment to satisfy the
prose-lint hook on this file.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Rebased on top of the recent batch of merges (#2156, #2174, #2201, #2250, #2260, #2262) in merge commit d6f27b3e. Locally clean on make ci-lint and the common-utils + app unit suites. CI is green now.

Net change vs. your last look (commit 476fa998):

  • The hoisted-sources optimization in the POST/PUT handlers now also threads getMissingOnClickDashboards and getInvalidOnClickSearchSources from feat: Add custom onClick field to external dashboards API #2156 through the same Promise.all, so the request-time fanout stays the same.
  • existingDashboard projection on the MCP update path picks up containers: 1 so the containers/tabs ref check from feat(external-api): round-trip dashboard containers, tabs, and tile container/tab refs #2201 can fall back to the persisted set when the PUT body omits containers.
  • The heatmap-source-kind check runs after sourceConnectionMismatches and before the onClick checks. Order matches the v2 router.
  • MCP integration test in packages/api/src/mcp/__tests__/dashboards.test.ts keeps the heatmap accept/reject + non-Trace-source coverage from this branch alongside the new containers/tabs and numberFormat round-trip tests from main.

Drew's two nits from the 2026-05-09 review (drop alias, drop aggFn from the heatmap select schema) landed in 476fa998 and survive the merge unchanged.

@kodiakhq kodiakhq Bot merged commit 4d22d4b into main May 12, 2026
20 checks passed
alex-fedotyev added a commit that referenced this pull request May 12, 2026
Brandon flagged on #2199 that the schema unit tests only assert that
zod accepts the shape and don't prove that an agent-issued
hyperdx_save_dashboard call actually persists a heatmap with every
MCP-specific field. The bare-minimum heatmap test that lands with the
#2200 heatmap external-API work covers only the required
valueExpression; this test drives the full round-trip with
heatmapScaleType, countExpression, numberFormat, where, and
whereLanguage all set, then mutates them on PUT and re-reads to
confirm the new values stick.

Drive-by: drops an em-dash from a sibling test comment to satisfy the
prose-lint hook on this file.
kodiakhq Bot pushed a commit that referenced this pull request May 12, 2026
## Summary

Adds a thorough save → get → update → get round-trip test for the
heatmap MCP path. Brandon flagged on this PR that the schema unit
tests at `mcp/tools/dashboards/__tests__/schemas.test.ts` only assert
that zod accepts the shape and don't prove that an agent-issued
`hyperdx_save_dashboard` call actually persists a heatmap with every
MCP-specific field. The bare-minimum heatmap test that landed with
#2200 covers only the required `valueExpression`; this test
exercises `heatmapScaleType`, `countExpression`, `numberFormat`,
`where`, and `whereLanguage` all at once, then mutates them on PUT
and re-reads to confirm the new values stick.

## Backstory

The MCP heatmap schema work originally drafted on this branch
(`mcpHeatmapSelectItemSchema`, `mcpHeatmapTileSchema`, the few-shot
example, the prompt guides) was absorbed into #2200 during the
convergence on the deep-review feedback. #2200 has merged on `main`,
so the only delta this PR adds on top is the integration test.

## What's in this PR

- One new test in `packages/api/src/mcp/__tests__/dashboards.test.ts`:
  `should round-trip a fully-specified heatmap tile through save, get, and update`
- Drive-by: drops one em-dash from a sibling test comment in the
  same file so the prose-lint hook passes when this file is touched

## Test plan

- [x] `yarn workspace @hyperdx/api lint` clean
- [x] `yarn workspace @hyperdx/api tsc --noEmit` clean
- [x] `yarn workspace @hyperdx/api docgen` produces no openapi.json diff
- [ ] CI runs the integration suite. Local `make dev-int` needs
      Docker BuildKit which isn't available on this host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants