fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685) by DvirDukhan · Pull Request #686 · FalkorDB/code-graph

DvirDukhan · 2026-05-27T13:52:04Z

Fixes #685.

Problem

The Python analyzer hardcoded environment_path="{path}/venv" when
starting jedi-language-server via multilspy. When the repo had no venv
(the common case for cloned codebases like sphinx, sympy, and every
SWE-bench worktree), jedi raised InvalidPythonEnvironment on every
request_definition() call. analyzer.resolve() then swallowed the
exception silently and the indexer produced a graph with DEFINES edges
only — zero CALLS, zero EXTENDS.

Discovered during benchmark validation: sphinx (5K functions) and sympy
(41K functions) had no resolved cross-references at all, while pytest
worked because… (still unexplained, see #685).

Fix

api/analyzers/source_analyzer.py — prefer {repo}/venv, then
{repo}/.venv, then fall back to the host interpreter's environment
(sys.executable's prefix) so jedi always has a valid Python to
introspect.
api/analyzers/analyzer.py — log resolve() failures at WARN with
file/line context instead of swallowing them silently, so the next
regression is loud.

Verification

Re-indexed sphinx-doc/sphinx-9230 with the fix:

Edge type	Before	After
DEFINES	7064	5640
CALLS	0	4931
EXTENDS	0	484
PARAMETERS	0	552
RETURNS	0	143

(DEFINES count differs because the previous run included tests/ and
this re-run excluded it; the headline is CALLS going from 0 → 4931.)

The benchmark harness will now show code_graph's true value on medium /
large repos, where it was previously running against a half-empty graph
and producing artificial regressions.

Summary by CodeRabbit

Bug Fixes
- Improved error diagnostics: Analysis now logs detailed debugging information including file paths, line/column positions, and exception details when errors occur
- Enhanced file handling: Better error handling for missing files during symbol resolution processing
New Features
- Virtual environment support: Automatically detects and uses local Python virtual environments

The Python analyzer hardcoded `environment_path={path}/venv` when starting jedi-language-server via multilspy. When the repo had no venv (the common case for cloned codebases like sphinx, sympy, anything from SWE-bench), jedi raised `InvalidPythonEnvironment` on every `request_definition()` call. analyzer.resolve() then swallowed the exception silently and the indexer produced a graph with DEFINES edges only — zero CALLS, zero EXTENDS. Benchmark validation showed sphinx (5K functions) and sympy (41K functions) had no resolved cross-references at all. Fix: - source_analyzer.py: prefer {repo}/venv, then {repo}/.venv, then fall back to the host interpreter's environment (sys.executable's prefix) so jedi always has a valid Python to introspect. - analyzer.py: log resolve() failures at WARN with file/line context instead of swallowing them silently, so the next regression is loud. Verified: re-indexed sphinx-doc/sphinx-9230 with the fix: DEFINES: 5640, CALLS: 4931, EXTENDS: 484 (was DEFINES-only). Fixes #685. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-27T13:52:15Z

📝 Walkthrough

Walkthrough

Symbol resolution in the Python analyzer now logs exceptions with file and node context instead of silently failing. Second-pass analysis detects local Python virtual environments and safely handles missing files by skipping them with a warning rather than crashing.

Changes

Robust Symbol Resolution with Error Visibility

Layer / File(s)	Summary
Exception logging in resolve() `api/analyzers/analyzer.py`	`AbstractAnalyzer.resolve()` catches `Exception as e` and logs a structured warning with `file_path`, node start position (row/column), and exception details, instead of returning an empty list silently.
Python environment detection and safe file iteration in second_pass() `api/analyzers/source_analyzer.py`	`SourceAnalyzer.second_pass()` detects project-local `venv` or `.venv` with `bin/python` and passes the interpreter path to `MultilspyConfig`; fallback to `sys.executable` when absent. File iteration uses `self.files.get()` instead of direct indexing, logging a warning and continuing when files are missing instead of raising `KeyError`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Index second_pass: parallelize per-file LSP symbol resolution to cut wall-time on large repos #687: Related changes to the same code paths (AbstractAnalyzer.resolve and SourceAnalyzer.second_pass) affecting LSP resolve behavior, error handling/logging, and second-pass symbol resolution flow.

Poem

🐰 From silent shadows, the errors now sing,
With logs that whisper each exception's sting,
A venv found, or Python declared,
Missing files handled—now nothing's scared,
The graph builds stronger, no ghosts in the night! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly references fixing LSP CALLS edge resolution on repos without venv, which directly matches the main changes in analyzer.py and source_analyzer.py.
Linked Issues check	✅ Passed	The PR directly addresses issue `#685` by logging resolve() failures with context and implementing virtualenv detection (venv → .venv → sys.executable fallback) to enable CALLS/EXTENDS edge resolution.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing `#685`: exception logging in analyzer.py and virtualenv path resolution in source_analyzer.py with defensive file tracking.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dvirdukhan/fix-analyzer-silent-failures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DvirDukhan · 2026-05-27T15:01:11Z

Smoke benchmark with fix (n=3, Sonnet 4.5, step-limit 50)

After re-indexing all 3 SWE-bench worktrees with the fix and re-running the smoke:

config	total tokens	vs baseline
baseline	2,143,778	—
lsp	1,844,540	−14.0%
code_graph	1,368,152	−36.2%

Per-instance, code_graph wins everywhere (was losing badly on sympy before):

instance	baseline	code_graph	Δ
pytest-dev/pytest-6202	630,303	142,795	−77.3%
sphinx-doc/sphinx-9230	894,514	725,862	−18.9%
sympy/sympy-20154	618,961	499,495	−19.3%

Sympy went from +78.7% (regression) to −19.3% (savings) — straight-line attributable to this fix turning a half-empty graph into a usable one. The 98 pp swing is direct evidence #685 was the dominant noise source in earlier runs.

Edge counts before → after:

sphinx: DEFINES 7064, CALLS 0 → DEFINES 5640, CALLS 4931
sympy: DEFINES 41525, CALLS 0 → DEFINES 30002, CALLS 22803

In source_analyzer.second_pass, the list of files we iterate can include paths that first_pass did not add to self.files (e.g. parse errors, LSP-induced timeouts, or rare edge cases where a candidate file is present in the input list but never makes it into the files map). Previously this raised KeyError and aborted the entire index. Hit on sympy/polys/distributedmodules.py during bench calibration of sympy-12481. Skip with a WARN log instead so a single bad file no longer takes down the whole index. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (2)

api/analyzers/source_analyzer.py (1)
142-142: ⚡ Quick win

Move sys import to module level.

The sys import should be at the top of the file with other imports, not inside the method. This follows PEP 8 conventions and improves code readability.
♻️ Proposed fix

Add the import at the module level (after line 21):
 import logging
+import sys
 # Configure logging
Remove the import from the method:
     if any(path.rglob('*.py')):
-        import sys
         py_venv = path / "venv"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/analyzers/source_analyzer.py` at line 142, Remove the inline "import sys"
from inside the method in source_analyzer.py and add a single "import sys" at
the top of the module alongside the other imports (module-level). Ensure you
only have one import statement for sys and delete the in-method import so
PEP8-compliant, readable code is preserved; no other code changes required.
api/analyzers/analyzer.py (1)
60-60: ⚡ Quick win

Move logging import to module level.

Importing logging inside the exception handler is inefficient. Although Python caches module imports, the import statement executes on every exception. Move the import to the top of the file with the other imports.
♻️ Proposed fix

Add the import at the module level (after line 8):
 from multilspy import SyncLanguageServer
+import logging
Remove the import from the except block:
     except Exception as e:
-        import logging
         logging.getLogger(__name__).warning(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/analyzers/analyzer.py` at line 60, The import of logging currently inside
the exception handler should be moved to the module level: add "import logging"
with the other top-level imports near the top of api/analyzers/analyzer.py, and
remove the inline "import logging" from the except block (the exception handler
in the analyzer code where logging is currently imported). Ensure subsequent
uses of logging in that function/class (e.g., in the except block) refer to the
module-level logging object.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/analyzers/analyzer.py`:
- Line 60: The import of logging currently inside the exception handler should
be moved to the module level: add "import logging" with the other top-level
imports near the top of api/analyzers/analyzer.py, and remove the inline "import
logging" from the except block (the exception handler in the analyzer code where
logging is currently imported). Ensure subsequent uses of logging in that
function/class (e.g., in the except block) refer to the module-level logging
object.

In `@api/analyzers/source_analyzer.py`:
- Line 142: Remove the inline "import sys" from inside the method in
source_analyzer.py and add a single "import sys" at the top of the module
alongside the other imports (module-level). Ensure you only have one import
statement for sys and delete the in-method import so PEP8-compliant, readable
code is preserved; no other code changes required.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69f73010-8608-4864-8c63-c30c8ce94c0e

📥 Commits

Reviewing files that changed from the base of the PR and between b066064 and a112c6d.

📒 Files selected for processing (2)

api/analyzers/analyzer.py
api/analyzers/source_analyzer.py

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686

fix(analyzer): resolve LSP CALLS edges on repos without a venv (#685)#686
DvirDukhan wants to merge 2 commits into
stagingfrom
dvirdukhan/fix-analyzer-silent-failures

DvirDukhan commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

DvirDukhan commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DvirDukhan commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

DvirDukhan commented May 27, 2026

Smoke benchmark with fix (n=3, Sonnet 4.5, step-limit 50)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DvirDukhan commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading