Pr/rm/reasoning trace by raghavm243512 · Pull Request #64 · ServiceNow/eva

raghavm243512 · 2026-04-17T20:36:46Z

Enables reasoning trace for anthropic, gemini, and openai models.

Reasoning from previous turns and tool calls will be sent to subsequent turns, theoretically reducing token use (and latency) and improving accuracy. Also uses these systems in their intended way

Added reasoning tracking for these models (tracked in audit log and agent perf stats, as well as reasoning token count (in agent perf stats).

This reverts commit a20dee9.

tara-servicenow

Summary [Claude]

I reviewed this PR against the LiteLLM, OpenAI Responses API, Anthropic, and Gemini docs. Overall, the implementation is well-aligned with the reference patterns — the three tricky provider-specific behaviors (OpenAI encrypted reasoning threading, Anthropic thinking_blocks, Gemini provider_specific_fields) are each handled per docs, and pure helpers have good unit coverage.

What's correct

store=False + include=["reasoning.encrypted_content"] + echoing response.output back as input items matches the OpenAI cookbook stateless pattern exactly.
reasoning={"effort": ...} parameter shape is correct for the Responses API.
Attaching thinking_blocks to the assistant message in history is exactly what LiteLLM docs prescribe for Anthropic extended-thinking + tool-calling.
Switching to tc.model_dump(exclude_none=True) to preserve provider_specific_fields.thought_signature matches LiteLLM's documented auto-preservation behavior for Gemini.
use_responses_api at the top-level of the deployment rather than inside litellm_params is the right call — it's an EVA routing decision, not a LiteLLM param.

Issues to address

Storing encrypted reasoning blob as human-readable reasoning text (inline comment on llm.py) — semantic bug that will pollute the perf CSV.
Inconsistent router error handling between _lookup_use_responses_api_from_router and _get_router_litellm_params.
Historical assistant message ordering: content placed after tool outputs in the Responses API input conversion.
Redundant re-derivation of reasoning_content from thinking_blocks (LiteLLM already populates the combined string).
LiteLLM version bump >=1.30.0 → >=1.82.6 — necessary for aresponses() and the new reasoning fields, but given recent supply-chain attacks on liteLLM, should be explicitly confirmed.
Test coverage gap: no integration test exercises _complete_via_responses_api end-to-end through AgenticSystem.process_query; the new TestResponsesOutputItemsThreading only mocks llm_client.complete. Also no evidence in the diff that a debug-mode benchmark run was performed.
Minor: raise last_exception # type: ignore[misc] at end of _complete_via_responses_api papers over dead code — the chat-completions branch above has the same structure without the ignore.

See inline comments for specifics.

…_trace

tara-servicenow · 2026-04-23T05:15:00Z

should we be switching to the responses API? i get an error when i try to run gpt-5.4 with reasoning

2026-04-22 22:14:07,299 | ERROR | eva.assistant.services.llm:179 | LiteLLM completion failed: litellm.BadRequestError: OpenAIException - Function tools with reasoning_effort are not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead.. Received Model Group=gpt-5.4

raghavm243512 · 2026-04-23T05:52:36Z

@tara-servicenow that error is actually why the switch is needed

completions doesn't support reasoning + tool calls

this does. You need to add "use_responses_api": true (see .env.example changes in this PR)

tara-servicenow · 2026-04-23T06:32:47Z

ahh got it makes sense sorry i missed that earlier!

to avoid different tests impacting eachother.

JosephMarinier

Cool! Thank you!

raghavm243512 and others added 13 commits March 26, 2026 14:03

save reasoning

4ca1628

Update system.py

3f30386

reasoning tokens and quoting

857ba0a

Revert "reasoning tokens and quoting"

74f6f2c

This reverts commit a20dee9.

quoting

460feb5

bedrock/gemini reasoning trace

1c82daa

merge main

7ee5d60

merge main

4656193

openai support

bef6108

Apply pre-commit

1143a77

add tests

7aca026

Apply pre-commit

d000528

Merge branch 'main' into pr/rm/reasoning_trace

8faf3a2

raghavm243512 marked this pull request as ready for review April 21, 2026 17:58

raghavm243512 added 2 commits April 21, 2026 11:05

precommit

1f05420

precommit again

ce81515

tara-servicenow reviewed Apr 22, 2026

View reviewed changes

raghavm243512 added 2 commits April 22, 2026 16:05

Merge branch 'main' of github.com:ServiceNow/eva into pr/rm/reasoning…

13f5f63

…_trace

address comments

ad84a2b

JosephMarinier reviewed Apr 23, 2026

View reviewed changes

Comment thread src/eva/assistant/services/llm.py Outdated

JosephMarinier reviewed Apr 23, 2026

View reviewed changes

Comment thread src/eva/assistant/services/llm.py Outdated

JosephMarinier reviewed Apr 23, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

JosephMarinier added 3 commits April 23, 2026 11:30

Lock dependencies

9b789af

Remove use_responses_api argument

35c7ac1

Reset router between tests

30af0cf

to avoid different tests impacting eachother.

JosephMarinier reviewed Apr 23, 2026

View reviewed changes

Comment thread src/eva/assistant/services/llm.py Outdated

cleanup unused param

79d4f80

raghavm243512 enabled auto-merge April 23, 2026 18:47

JosephMarinier approved these changes Apr 23, 2026

View reviewed changes

raghavm243512 added this pull request to the merge queue Apr 23, 2026

Merged via the queue into main with commit dfaecb5 Apr 23, 2026
1 check passed

raghavm243512 deleted the pr/rm/reasoning_trace branch April 23, 2026 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/rm/reasoning trace#64

Pr/rm/reasoning trace#64
raghavm243512 merged 21 commits intomainfrom
pr/rm/reasoning_trace

raghavm243512 commented Apr 17, 2026 •

edited

Loading

Uh oh!

tara-servicenow left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tara-servicenow commented Apr 23, 2026

Uh oh!

raghavm243512 commented Apr 23, 2026

Uh oh!

tara-servicenow commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JosephMarinier left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

raghavm243512 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tara-servicenow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary [Claude]

What's correct

Issues to address

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tara-servicenow commented Apr 23, 2026

Uh oh!

raghavm243512 commented Apr 23, 2026

Uh oh!

tara-servicenow commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JosephMarinier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raghavm243512 commented Apr 17, 2026 •

edited

Loading

tara-servicenow left a comment •

edited

Loading