fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686
fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686DvirDukhan wants to merge 2 commits into
Conversation
The Python analyzer hardcoded `environment_path={path}/venv` when starting
jedi-language-server via multilspy. When the repo had no venv (the common
case for cloned codebases like sphinx, sympy, anything from SWE-bench),
jedi raised `InvalidPythonEnvironment` on every `request_definition()`
call. analyzer.resolve() then swallowed the exception silently and the
indexer produced a graph with DEFINES edges only — zero CALLS, zero
EXTENDS. Benchmark validation showed sphinx (5K functions) and sympy
(41K functions) had no resolved cross-references at all.
Fix:
- source_analyzer.py: prefer {repo}/venv, then {repo}/.venv, then fall
back to the host interpreter's environment (sys.executable's prefix)
so jedi always has a valid Python to introspect.
- analyzer.py: log resolve() failures at WARN with file/line context
instead of swallowing them silently, so the next regression is loud.
Verified: re-indexed sphinx-doc/sphinx-9230 with the fix:
DEFINES: 5640, CALLS: 4931, EXTENDS: 484 (was DEFINES-only).
Fixes #685.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📝 WalkthroughWalkthroughSymbol resolution in the Python analyzer now logs exceptions with file and node context instead of silently failing. Second-pass analysis detects local Python virtual environments and safely handles missing files by skipping them with a warning rather than crashing. ChangesRobust Symbol Resolution with Error Visibility
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Smoke benchmark with fix (n=3, Sonnet 4.5, step-limit 50)After re-indexing all 3 SWE-bench worktrees with the fix and re-running the smoke:
Per-instance, code_graph wins everywhere (was losing badly on sympy before):
Sympy went from +78.7% (regression) to −19.3% (savings) — straight-line attributable to this fix turning a half-empty graph into a usable one. The 98 pp swing is direct evidence #685 was the dominant noise source in earlier runs. Edge counts before → after:
|
In source_analyzer.second_pass, the list of files we iterate can include paths that first_pass did not add to self.files (e.g. parse errors, LSP-induced timeouts, or rare edge cases where a candidate file is present in the input list but never makes it into the files map). Previously this raised KeyError and aborted the entire index. Hit on sympy/polys/distributedmodules.py during bench calibration of sympy-12481. Skip with a WARN log instead so a single bad file no longer takes down the whole index. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
api/analyzers/source_analyzer.py (1)
142-142: ⚡ Quick winMove sys import to module level.
The
sysimport should be at the top of the file with other imports, not inside the method. This follows PEP 8 conventions and improves code readability.♻️ Proposed fix
Add the import at the module level (after line 21):
import logging +import sys # Configure loggingRemove the import from the method:
if any(path.rglob('*.py')): - import sys py_venv = path / "venv"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/analyzers/source_analyzer.py` at line 142, Remove the inline "import sys" from inside the method in source_analyzer.py and add a single "import sys" at the top of the module alongside the other imports (module-level). Ensure you only have one import statement for sys and delete the in-method import so PEP8-compliant, readable code is preserved; no other code changes required.api/analyzers/analyzer.py (1)
60-60: ⚡ Quick winMove logging import to module level.
Importing
logginginside the exception handler is inefficient. Although Python caches module imports, the import statement executes on every exception. Move the import to the top of the file with the other imports.♻️ Proposed fix
Add the import at the module level (after line 8):
from multilspy import SyncLanguageServer +import loggingRemove the import from the except block:
except Exception as e: - import logging logging.getLogger(__name__).warning(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/analyzers/analyzer.py` at line 60, The import of logging currently inside the exception handler should be moved to the module level: add "import logging" with the other top-level imports near the top of api/analyzers/analyzer.py, and remove the inline "import logging" from the except block (the exception handler in the analyzer code where logging is currently imported). Ensure subsequent uses of logging in that function/class (e.g., in the except block) refer to the module-level logging object.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@api/analyzers/analyzer.py`:
- Line 60: The import of logging currently inside the exception handler should
be moved to the module level: add "import logging" with the other top-level
imports near the top of api/analyzers/analyzer.py, and remove the inline "import
logging" from the except block (the exception handler in the analyzer code where
logging is currently imported). Ensure subsequent uses of logging in that
function/class (e.g., in the except block) refer to the module-level logging
object.
In `@api/analyzers/source_analyzer.py`:
- Line 142: Remove the inline "import sys" from inside the method in
source_analyzer.py and add a single "import sys" at the top of the module
alongside the other imports (module-level). Ensure you only have one import
statement for sys and delete the in-method import so PEP8-compliant, readable
code is preserved; no other code changes required.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 69f73010-8608-4864-8c63-c30c8ce94c0e
📒 Files selected for processing (2)
api/analyzers/analyzer.pyapi/analyzers/source_analyzer.py
Fixes #685.
Problem
The Python analyzer hardcoded
environment_path="{path}/venv"whenstarting jedi-language-server via multilspy. When the repo had no venv
(the common case for cloned codebases like sphinx, sympy, and every
SWE-bench worktree), jedi raised
InvalidPythonEnvironmenton everyrequest_definition()call.analyzer.resolve()then swallowed theexception silently and the indexer produced a graph with DEFINES edges
only — zero CALLS, zero EXTENDS.
Discovered during benchmark validation: sphinx (5K functions) and sympy
(41K functions) had no resolved cross-references at all, while pytest
worked because… (still unexplained, see #685).
Fix
api/analyzers/source_analyzer.py— prefer{repo}/venv, then{repo}/.venv, then fall back to the host interpreter's environment(
sys.executable's prefix) so jedi always has a valid Python tointrospect.
api/analyzers/analyzer.py— logresolve()failures at WARN withfile/line context instead of swallowing them silently, so the next
regression is loud.
Verification
Re-indexed sphinx-doc/sphinx-9230 with the fix:
(DEFINES count differs because the previous run included
tests/andthis re-run excluded it; the headline is CALLS going from 0 → 4931.)
The benchmark harness will now show code_graph's true value on medium /
large repos, where it was previously running against a half-empty graph
and producing artificial regressions.
Summary by CodeRabbit
Bug Fixes
New Features