Skip to content

Pr/rm/reasoning trace#64

Merged
raghavm243512 merged 21 commits intomainfrom
pr/rm/reasoning_trace
Apr 23, 2026
Merged

Pr/rm/reasoning trace#64
raghavm243512 merged 21 commits intomainfrom
pr/rm/reasoning_trace

Conversation

@raghavm243512
Copy link
Copy Markdown
Collaborator

@raghavm243512 raghavm243512 commented Apr 17, 2026

Enables reasoning trace for anthropic, gemini, and openai models.

Reasoning from previous turns and tool calls will be sent to subsequent turns, theoretically reducing token use (and latency) and improving accuracy. Also uses these systems in their intended way

Added reasoning tracking for these models (tracked in audit log and agent perf stats, as well as reasoning token count (in agent perf stats).

@raghavm243512 raghavm243512 marked this pull request as ready for review April 21, 2026 17:58
Copy link
Copy Markdown
Collaborator

@tara-servicenow tara-servicenow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary [Claude]

I reviewed this PR against the LiteLLM, OpenAI Responses API, Anthropic, and Gemini docs. Overall, the implementation is well-aligned with the reference patterns — the three tricky provider-specific behaviors (OpenAI encrypted reasoning threading, Anthropic thinking_blocks, Gemini provider_specific_fields) are each handled per docs, and pure helpers have good unit coverage.

What's correct

  • store=False + include=["reasoning.encrypted_content"] + echoing response.output back as input items matches the OpenAI cookbook stateless pattern exactly.
  • reasoning={"effort": ...} parameter shape is correct for the Responses API.
  • Attaching thinking_blocks to the assistant message in history is exactly what LiteLLM docs prescribe for Anthropic extended-thinking + tool-calling.
  • Switching to tc.model_dump(exclude_none=True) to preserve provider_specific_fields.thought_signature matches LiteLLM's documented auto-preservation behavior for Gemini.
  • use_responses_api at the top-level of the deployment rather than inside litellm_params is the right call — it's an EVA routing decision, not a LiteLLM param.

Issues to address

  1. Storing encrypted reasoning blob as human-readable reasoning text (inline comment on llm.py) — semantic bug that will pollute the perf CSV.
  2. Inconsistent router error handling between _lookup_use_responses_api_from_router and _get_router_litellm_params.
  3. Historical assistant message ordering: content placed after tool outputs in the Responses API input conversion.
  4. Redundant re-derivation of reasoning_content from thinking_blocks (LiteLLM already populates the combined string).
  5. LiteLLM version bump >=1.30.0>=1.82.6 — necessary for aresponses() and the new reasoning fields, but given recent supply-chain attacks on liteLLM, should be explicitly confirmed.
  6. Test coverage gap: no integration test exercises _complete_via_responses_api end-to-end through AgenticSystem.process_query; the new TestResponsesOutputItemsThreading only mocks llm_client.complete. Also no evidence in the diff that a debug-mode benchmark run was performed.
  7. Minor: raise last_exception # type: ignore[misc] at end of _complete_via_responses_api papers over dead code — the chat-completions branch above has the same structure without the ignore.

See inline comments for specifics.

Comment thread src/eva/assistant/services/llm.py
Comment thread src/eva/assistant/services/llm.py Outdated
Comment thread src/eva/assistant/services/llm.py
Comment thread src/eva/assistant/services/llm.py Outdated
Comment thread src/eva/assistant/services/llm.py Outdated
Comment thread pyproject.toml Outdated
Comment thread tests/unit/assistant/test_litellm_client.py
@tara-servicenow
Copy link
Copy Markdown
Collaborator

should we be switching to the responses API? i get an error when i try to run gpt-5.4 with reasoning

2026-04-22 22:14:07,299 | ERROR | eva.assistant.services.llm:179 | LiteLLM completion failed: litellm.BadRequestError: OpenAIException - Function tools with reasoning_effort are not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead.. Received Model Group=gpt-5.4

@raghavm243512
Copy link
Copy Markdown
Collaborator Author

@tara-servicenow that error is actually why the switch is needed

completions doesn't support reasoning + tool calls

this does. You need to add "use_responses_api": true (see .env.example changes in this PR)

@tara-servicenow
Copy link
Copy Markdown
Collaborator

ahh got it makes sense sorry i missed that earlier!

Comment thread src/eva/assistant/services/llm.py Outdated
Comment thread src/eva/assistant/services/llm.py Outdated
Comment thread pyproject.toml Outdated
Comment thread src/eva/assistant/services/llm.py Outdated
@raghavm243512 raghavm243512 enabled auto-merge April 23, 2026 18:47
Copy link
Copy Markdown
Collaborator

@JosephMarinier JosephMarinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thank you!

@raghavm243512 raghavm243512 added this pull request to the merge queue Apr 23, 2026
Merged via the queue into main with commit dfaecb5 Apr 23, 2026
1 check passed
@raghavm243512 raghavm243512 deleted the pr/rm/reasoning_trace branch April 23, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants