fix(experiments): flatten propagated metadata#1641
Conversation
|
@claude review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c94c5a69fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
c94c5a6 to
70892a3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70892a36f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Deferring to a human — per your own PR notes this should stay draft until the server accepts flattened langfuse.experiment.metadata.<key> attributes, and the rollout/legacy-compat angle raised by another reviewer is a cross-system call worth a human sign-off.
Extended reasoning...
Overview
The PR switches experiment metadata (experiment_metadata / experiment_item_metadata) from a single JSON-serialized attribute to per-key flattened attributes (langfuse.experiment.metadata.<key>), so each value is independently validated against the 200-char propagation limit. The change spans langfuse/_client/attributes.py (new _flatten_and_serialize_metadata_values helper), client.py (call-site swap in _process_experiment_item), and propagation.py (dict routing in _propagate_attributes, fixed span-key lookup in _get_propagated_attributes_from_context, and multi-namespace baggage decoding in _get_span_key_from_baggage_key). Tests are updated and a new e2e guard is added.
Security risks
No meaningful new security surface. Metadata keys/values are still length-validated per entry; baggage propagation (as_baggage=True) continues to be opt-in, and the new baggage-decode loop only adds additional known metadata namespaces — no user-controlled prefix expansion.
Level of scrutiny
Higher than a typical bug fix. This is a wire-format change in trace/experiment propagation that interacts with server-side ingestion. You explicitly noted the PR should stay draft until the server parses the flattened attribute, and a peer reviewer raised a P1 about mixed-version deployments dropping legacy langfuse.experiment.metadata. Those are exactly the kinds of cross-system decisions (dual-write vs. gate on server rollout) that warrant a human call rather than a shadow approval.
Other factors
The per-file logic looks internally consistent: _get_propagated_span_key now includes "metadata" (fixing the previous fallback path), and _set_propagated_attribute uses the resolved span_key instead of hardcoded TRACE_METADATA. The one nit worth flagging separately from the ingestion question: in tests/unit/test_propagate_attributes.py the substring "experiment_metadata' value is over 200 characters" can no longer appear under the new per-key warning format (experiment_metadata.<subkey>' value is over 200 characters), so that assertion is effectively a no-op — greptile already flagged this. Outstanding P1/P2 comments from other reviewers have not yet been addressed, which is another reason to hold off on auto-approval.
Summary
Root Cause
run_experiment()serialized the entire experiment metadata dict into onelangfuse.experiment.metadatastring before propagation. The propagation layer enforces a 200-character limit per propagated string, so moderately sized metadata dicts were dropped even when each individual metadata value was short enough.Impact
Experiment metadata now propagates as attributes like
langfuse.experiment.metadata.job_name, matching the per-key behavior used for trace metadata. This avoids dropping valid metadata just because the combined serialized object is larger than 200 characters.Validation
uv run --frozen ruff check langfuse/_client/attributes.py langfuse/_client/client.py langfuse/_client/propagation.py tests/unit/test_propagate_attributes.py tests/e2e/test_experiments.pyuv run --frozen mypy langfuse --no-error-summaryuv run --frozen pytest tests/unit/test_propagate_attributes.py -qruff-check,ruff-format, andmypysuccessfully.Notes
The new e2e test is intentionally a server-ingestion guard. In this local environment I could not run it end-to-end because no Langfuse server or
LANGFUSE_*e2e credentials were available. The adjacent server checkout currently appears to parse only the legacy JSONlangfuse.experiment.metadataattribute, so this SDK PR should stay draft until matching server ingestion support forlangfuse.experiment.metadata.<key>is available in the CI server image.Disclaimer: Experimental PR review
Greptile Summary
Replaces the single-blob serialization of
experiment_metadata/experiment_item_metadatawith per-key flattening (langfuse.experiment.metadata.<key>), matching the pattern already used for trace metadata and allowing each value to be validated independently against the 200-char propagation limit. The propagation, context-read, and baggage-decode paths are all updated consistently, and the_get_propagated_span_keymap now correctly includes"metadata"so it is no longer misresolved through the fallback.Confidence Score: 4/5
Safe to merge; only P2 findings in tests (redundant import, stale assertion pattern).
Core logic changes across attributes.py, propagation.py, and client.py are coherent and well-tested. The two P2 issues are confined to the test file and do not affect production behavior.
tests/unit/test_propagate_attributes.py — redundant import and stale warning-text assertion.
Important Files Changed
_flatten_and_serialize_metadata_valuesto recursively flatten nested dicts into dot-notation string pairs; logic is correct and handles None/empty inputs safely.experiment_metadataandexperiment_item_metadatafrom a single serialized JSON string to a flattened dict;item_metadatacorrectly guarded withisinstance(..., dict)check._get_propagated_attributes_from_context, and adds"metadata"to_get_propagated_span_key; baggage decoding handles multiple metadata namespaces without ambiguity.Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(experiments): flatten propagated met..." | Re-trigger Greptile
Context used:
Learned From
langfuse/langfuse-python#1387