feat(sample-app): add FastAPI + LiteLLM tracing example#4227
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR introduces a new FastAPI example application in the sample app package that demonstrates tracing LLM calls using Traceloop and LiteLLM. The app exposes a ChangesFastAPI LiteLLM Example Application
Sequence DiagramsequenceDiagram
participant Client
participant ChatEndpoint as POST /chat
participant ChatWorkflow
participant CallLLM as call_llm task
participant LiteLLM
Client->>ChatEndpoint: POST ChatRequest(message)
ChatEndpoint->>ChatWorkflow: chat_workflow(message)
ChatWorkflow->>CallLLM: call_llm(message)
CallLLM->>LiteLLM: completion(model, api_base, messages)
LiteLLM-->>CallLLM: response content
CallLLM-->>ChatWorkflow: reply text
ChatWorkflow-->>ChatEndpoint: reply text
ChatEndpoint-->>Client: {reply}
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/sample-app/sample_app/fastapi_litellm_example.py (1)
33-36: ⚡ Quick winAvoid blocking the event loop with synchronous I/O in async endpoint.
The async endpoint handler calls synchronous
chat_workflow(), which performs blocking I/O (LLM API call). This blocks FastAPI's event loop and prevents the server from handling concurrent requests efficiently.For a sample application that users will reference, it's important to demonstrate the correct async pattern.
⚡ Proposed fix using asyncio.to_thread for non-blocking execution
+import asyncio + `@app.post`("/chat") async def chat(request: ChatRequest): - reply = chat_workflow(request.message) + reply = await asyncio.to_thread(chat_workflow, request.message) return {"reply": reply}Alternatively, if LiteLLM supports async (check with
litellm.acompletion), refactor the functions to be fully async:`@task`(name="call_llm") -def call_llm(message: str) -> str: - response = litellm.completion( +async def call_llm(message: str) -> str: + response = await litellm.acompletion( model=os.environ.get("LLM_MODEL", "openai/gpt-4o-mini"), messages=[{"role": "user", "content": message}], api_base=os.environ.get("LLM_API_BASE", None), ) return response.choices[0].message.content `@workflow`(name="chat_workflow") -def chat_workflow(message: str) -> str: - return call_llm(message) +async def chat_workflow(message: str) -> str: + return await call_llm(message) `@app.post`("/chat") async def chat(request: ChatRequest): - reply = chat_workflow(request.message) + reply = await chat_workflow(request.message) return {"reply": reply}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sample-app/sample_app/fastapi_litellm_example.py` around lines 33 - 36, The async FastAPI handler chat currently calls the synchronous, blocking function chat_workflow which performs LLM I/O and will block the event loop; change chat to offload blocking work to a worker thread (e.g., use asyncio.to_thread to call chat_workflow) or refactor chat_workflow and downstream calls to async (e.g., use litellm async API like acompletion if available) so the endpoint does not perform synchronous I/O on the event loop; update references to chat_workflow and any LLM call sites accordingly to ensure non-blocking behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/sample-app/pyproject.toml`:
- Around line 40-41: Update the FastAPI dependency constraint in
packages/sample-app pyproject.toml to exclude vulnerable releases: replace the
current fastapi spec (the string "fastapi>=0.115.0,<1") with a range that sets
the minimum to 0.65.2 (e.g. "fastapi>=0.65.2,<1" or an equivalent exclusion of
<0.65.2) so the project no longer allows versions affected by the CSRF advisory;
leave the uvicorn constraint unchanged.
In `@packages/sample-app/sample_app/fastapi_litellm_example.py`:
- Around line 20-27: call_llm currently calls litellm.completion without error
handling or validating the response; wrap the litellm.completion call in a
try/except that catches runtime/network/auth errors (e.g., Exception) and
logs/raises a controlled error, and validate the returned object before using it
by checking response.choices exists and is non-empty and that
response.choices[0].message.content is not None (return a clear error or
fallback if validation fails); update the return path to safely extract and
return the content only after these checks and ensure any caught exceptions
produce a meaningful error message for the caller.
- Line 13: The Traceloop.init call currently sets disable_batch=True for
debugging but does not install a ConsoleSpanExporter; add a ConsoleSpanExporter
from opentelemetry.sdk.trace.export and attach it via a SimpleSpanProcessor to
the active TracerProvider so spans are emitted to the console. Locate the
Traceloop.init usage (Traceloop.init(app_name="fastapi_litellm_example",
disable_batch=True)) and before or immediately after it, import
ConsoleSpanExporter and SimpleSpanProcessor, create a ConsoleSpanExporter
instance, wrap it in a SimpleSpanProcessor, and add that processor to the global
tracer provider (or the provider used by Traceloop) so tracing output is visible
locally.
---
Nitpick comments:
In `@packages/sample-app/sample_app/fastapi_litellm_example.py`:
- Around line 33-36: The async FastAPI handler chat currently calls the
synchronous, blocking function chat_workflow which performs LLM I/O and will
block the event loop; change chat to offload blocking work to a worker thread
(e.g., use asyncio.to_thread to call chat_workflow) or refactor chat_workflow
and downstream calls to async (e.g., use litellm async API like acompletion if
available) so the endpoint does not perform synchronous I/O on the event loop;
update references to chat_workflow and any LLM call sites accordingly to ensure
non-blocking behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 65c1248a-3f18-4528-aa42-b2d23e0b0d56
⛔ Files ignored due to path filters (1)
packages/sample-app/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/sample-app/pyproject.tomlpackages/sample-app/sample_app/fastapi_litellm_example.py
| "fastapi>=0.115.0,<1", | ||
| "uvicorn>=0.32.0,<1", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify FastAPI and Uvicorn versions and check for security advisories
echo "=== Checking FastAPI latest version ==="
curl -s https://pypi.org/pypi/fastapi/json | jq -r '.info.version'
echo -e "\n=== Checking Uvicorn latest version ==="
curl -s https://pypi.org/pypi/uvicorn/json | jq -r '.info.version'
echo -e "\n=== Checking for FastAPI security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "fastapi") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
echo -e "\n=== Checking for Uvicorn security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "uvicorn") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Repository: traceloop/openllmetry
Length of output: 1175
Bump FastAPI minimum version (current range includes HIGH-severity CSRF advisory)
In packages/sample-app/pyproject.toml (lines 40-41), fastapi>=0.115.0,<1 includes versions <0.65.2, which are affected by a HIGH CSRF advisory (patched in 0.65.2). Set the lower bound to >=0.65.2 (or otherwise exclude <0.65.2).
The listed Uvicorn HIGH advisories affect <0.11.7, so uvicorn>=0.32.0,<1 is not impacted by those.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/sample-app/pyproject.toml` around lines 40 - 41, Update the FastAPI
dependency constraint in packages/sample-app pyproject.toml to exclude
vulnerable releases: replace the current fastapi spec (the string
"fastapi>=0.115.0,<1") with a range that sets the minimum to 0.65.2 (e.g.
"fastapi>=0.65.2,<1" or an equivalent exclusion of <0.65.2) so the project no
longer allows versions affected by the CSRF advisory; leave the uvicorn
constraint unchanged.
Summary
Adds a minimal FastAPI example (fastapi_litellm_example.py) to the sample-app package
demonstrating how to trace an HTTP LLM endpoint using OpenLLMetry.
All existing examples are run-once scripts. This is the first example showing
tracing inside a running HTTP service, which is how most production LLM
applications are structured.
What's new
packages/sample-app/sample_app/fastapi_litellm_example.py
POST /chatendpointcompletion()call wrapped with@task+@workflowdecoratorsTraceloop.init(disable_batch=True)for easy local debuggingLLM_MODELandLLM_API_BASEenv vars allow routing to any OpenAI-compatiblebackend (OpenAI, vLLM, Ollama, Groq, etc.)
packages/sample-app/pyproject.tomlfastapi>=0.115.0,<1anduvicorn>=0.32.0,<1How to test