Pr/rm/reasoning trace#64
Conversation
This reverts commit a20dee9.
There was a problem hiding this comment.
Summary [Claude]
I reviewed this PR against the LiteLLM, OpenAI Responses API, Anthropic, and Gemini docs. Overall, the implementation is well-aligned with the reference patterns — the three tricky provider-specific behaviors (OpenAI encrypted reasoning threading, Anthropic thinking_blocks, Gemini provider_specific_fields) are each handled per docs, and pure helpers have good unit coverage.
What's correct
store=False+include=["reasoning.encrypted_content"]+ echoingresponse.outputback as input items matches the OpenAI cookbook stateless pattern exactly.reasoning={"effort": ...}parameter shape is correct for the Responses API.- Attaching
thinking_blocksto the assistant message in history is exactly what LiteLLM docs prescribe for Anthropic extended-thinking + tool-calling. - Switching to
tc.model_dump(exclude_none=True)to preserveprovider_specific_fields.thought_signaturematches LiteLLM's documented auto-preservation behavior for Gemini. use_responses_apiat the top-level of the deployment rather than insidelitellm_paramsis the right call — it's an EVA routing decision, not a LiteLLM param.
Issues to address
- Storing encrypted reasoning blob as human-readable reasoning text (inline comment on
llm.py) — semantic bug that will pollute the perf CSV. - Inconsistent router error handling between
_lookup_use_responses_api_from_routerand_get_router_litellm_params. - Historical assistant message ordering: content placed after tool outputs in the Responses API input conversion.
- Redundant re-derivation of
reasoning_contentfromthinking_blocks(LiteLLM already populates the combined string). - LiteLLM version bump
>=1.30.0→>=1.82.6— necessary foraresponses()and the new reasoning fields, but given recent supply-chain attacks on liteLLM, should be explicitly confirmed. - Test coverage gap: no integration test exercises
_complete_via_responses_apiend-to-end throughAgenticSystem.process_query; the newTestResponsesOutputItemsThreadingonly mocksllm_client.complete. Also no evidence in the diff that a debug-mode benchmark run was performed. - Minor:
raise last_exception # type: ignore[misc]at end of_complete_via_responses_apipapers over dead code — the chat-completions branch above has the same structure without the ignore.
See inline comments for specifics.
|
should we be switching to the responses API? i get an error when i try to run gpt-5.4 with reasoning 2026-04-22 22:14:07,299 | ERROR | eva.assistant.services.llm:179 | LiteLLM completion failed: litellm.BadRequestError: OpenAIException - Function tools with reasoning_effort are not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead.. Received Model Group=gpt-5.4 |
|
@tara-servicenow that error is actually why the switch is needed completions doesn't support reasoning + tool calls this does. You need to add |
|
ahh got it makes sense sorry i missed that earlier! |
to avoid different tests impacting eachother.
JosephMarinier
left a comment
There was a problem hiding this comment.
Cool! Thank you!
Enables reasoning trace for anthropic, gemini, and openai models.
Reasoning from previous turns and tool calls will be sent to subsequent turns, theoretically reducing token use (and latency) and improving accuracy. Also uses these systems in their intended way
Added reasoning tracking for these models (tracked in audit log and agent perf stats, as well as reasoning token count (in agent perf stats).