Skip to content

fix(optimization): handle None metric scores in LocalEvalSampler#5415

Open
JesserHamdaoui wants to merge 3 commits intogoogle:mainfrom
JesserHamdaoui:fix/5403-LocalEvalSampler-TypeError
Open

fix(optimization): handle None metric scores in LocalEvalSampler#5415
JesserHamdaoui wants to merge 3 commits intogoogle:mainfrom
JesserHamdaoui:fix/5403-LocalEvalSampler-TypeError

Conversation

@JesserHamdaoui
Copy link
Copy Markdown

Fixes #5403


Summary

When running adk optimize, if a metric evaluation fails (e.g., due to a transient API error, missing rubrics, or a malformed JSONDecodeError response from the LLM judge), local_eval_service.py gracefully catches the exception and returns an EvaluationResult with a None score and NOT_EVALUATED status.

However, LocalEvalSampler._extract_eval_data subsequently attempts to unconditionally round this value, resulting in a TypeError: type NoneType doesn't define __round__ method, which crashes the entire optimization loop rather than safely skipping or reporting the failed case.

Changes

  • google/adk/optimization/local_eval_sampler.py: Guarded the metric score rounding step in _extract_eval_data.
    • Before: "score": round(eval_metric_result.score, 2)
    • After: "score": round(eval_metric_result.score, 2) if eval_metric_result.score is not None else None
    • This correctly maintains the None value in the diagnostic trace data for failed evals.

Huge shoutout to the issue author @msteiner-google for the detailed bug report, root cause analysis, and for suggesting the fix!


Motivation

Optimization loops can run for a long time and make dozens of LLM calls. If a single evaluation case fails due to an intermittent network issue or a temporary rate limit, the NOT_EVALUATED status is the correct fallback. Crashing the entire adk optimize run because of a missing None check wastes compute, time, and API quotas. By preserving None, the optimizer can safely continue and log that the metric did not produce a score.


Test plan

Unit Tests:

  • Added test_extract_eval_data_preserves_none_metric_score in tests/unittests/optimization/local_eval_sampler_test.py to verify that _extract_eval_data preserves "score": None and retains the proper NOT_EVALUATED status without throwing a TypeError.
  • Ran targeted test with uv run pytest tests/unittests/optimization/local_eval_sampler_test.py::test_extract_eval_data_preserves_none_metric_score -q (Result: 1 passed).

Manual Reproduction & Verification:

  • Simulated the interruption: Created a local script to intentionally trigger the bug by forcing a None score during the evaluation step.
  • Verified the fix: Ran the simulation against the updated code. Before the fix, the script consistently crashed with TypeError: type NoneType doesn't define __round__ method. After applying the fix in this PR, the optimizer safely handled the None scores and ran to completion without crashing.

Used the hello_world example from the provided samples and followed the optimization documentation. Then added patch_and_run.py file in my local environment to force the eval failure

# Simulated the issue by triggering an eval failure to force None scores
# and verifying the optimizer handles it gracefully.
sampler_config = LocalEvalSamplerConfig(
    eval_config=EvalConfig(criteria={"rubric_based_tool_use_quality_v1": 0.75}), # Or a metric missing rubrics
    app_name="hello_world",
    train_eval_set="train_eval_set",
)
sampler = LocalEvalSampler(
    sampler_config, 
    LocalEvalSetsManager(agents_dir=os.path.dirname(os.getcwd()))
)

opt_config = GEPARootAgentPromptOptimizerConfig(max_metric_calls=5)
optimizer = GEPARootAgentPromptOptimizer(config=opt_config)

# Before PR: Crashes with TypeError on None. After PR: Runs successfully.
result = asyncio.run(optimizer.optimize(agent.root_agent, sampler))

@JesserHamdaoui JesserHamdaoui changed the title Fix/5403 local eval sampler type error fix(optimization): handle None metric scores in LocalEvalSampler Apr 20, 2026
@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Apr 20, 2026
@rohityan rohityan self-assigned this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError in LocalEvalSampler when metric evaluation fails

3 participants