Skip to content

Add corgea deps offline inventory (scan/graph/explain/diff/sbom/policy)#93

Open
juangaitanv wants to merge 1 commit into
chunk1/harnessfrom
chunk2/deps-offline
Open

Add corgea deps offline inventory (scan/graph/explain/diff/sbom/policy)#93
juangaitanv wants to merge 1 commit into
chunk1/harnessfrom
chunk2/deps-offline

Conversation

@juangaitanv
Copy link
Copy Markdown

Summary

Chunk 2 of the dependency tooling split (#89). Stacked on #92 (chunk1/harness) — review/merge that first; base will retarget to main once #92 lands.

Adds the fully offline corgea deps inventory surface for npm, Python, and Java:

  • Detects manifests + lockfiles, builds the resolved dependency graph.
  • Evaluates a pinning policy (DEP rules) with table / JSON / SARIF / CycloneDX output.
  • Subcommands: scan, graph, explain, diff, sbom, policy.
  • No token, no config, no network — wired into main.rs with no auth/token check.

Out of scope (chunk 3)

The network surface stays deferred: deps verify, registry freshness, --check-cve / vuln-api client, the vuln-api-stub binary, and the npm/pip/etc. install wrappers.

Coverage

83 unit + integration tests. Overall line coverage 13% → 36% (well above the 13% floor from chunk 1).

Test plan

  • CI green once Add ./harness quality contract + CI coverage gate #92 merges and this retargets to main
  • Local: ./harness ci passes (clippy strict, fmt, tests + coverage)
  • corgea deps scan / graph / explain / diff / sbom / policy run with no token or config present
  • Offline contract holds: no network calls in any deps subcommand

Offline dependency inventory for npm, Python, and Java: detects manifests
and lockfiles, builds the resolved graph, and evaluates a pinning policy
(DEP rules) with table/JSON/SARIF/CycloneDX output. Fully offline — no
token or network.

Carved out of #89 as chunk2 (stacks on the chunk1 harness branch). Excludes
the network surface deferred to chunk3: `deps verify`, registry freshness,
--check-cve / vuln-api, the vuln-api-stub binary, and the npm/pip/etc.
install wrappers.

Wires `corgea deps <subcommand>` into main.rs with no auth/token check.
Adds 83 unit + integration tests; overall line coverage 13% -> 36%.
@juangaitanv juangaitanv reopened this May 28, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Comment thread src/deps/run.rs
inv.findings.iter().any(|f| f.severity.at_least(sev))
}

fn scan_base_ref(_path: &str, _base: &str) -> Result<crate::deps::Inventory, DepsError> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot ship as the implementation for corgea deps diff: both _path and _base are ignored and the base inventory is always empty. Evidence: the diff command scans HEAD, then calls this stub, so even corgea deps diff --base HEAD reports every current dependency as added and never reports real removals/changes. Impact: CI/policy checks based on dependency diffs are unusable and --fail-on-new will be evaluated against the wrong baseline. Fix by materializing/scanning the requested git ref for the requested path (for example via a temporary worktree or reading files from the git object database), then add CLI tests where comparing against the same ref yields an empty diff and a real base/head change yields only the actual additions/removals/updates.

Comment thread src/deps/run.rs
out_format,
out_file,
} => {
let inv = scan(Path::new(&path), &Policy::default())?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI never loads the policy file that corgea deps policy init creates. Evidence: every command path calls scan(..., &Policy::default()), so a user can set .corgea/deps.yml to disable fail_on_wildcard or adjust other policy settings and corgea deps scan will still use the hard-coded defaults. Impact: the advertised policy feature produces false positives/false negatives in CI and users cannot rely on the generated config. Fix by loading .corgea/deps.yml from the scan root when present, failing closed on invalid YAML, and passing that Policy through scan/graph/explain/diff/sbom; add CLI coverage proving a generated/edited policy changes scan findings.

Comment thread src/deps/run.rs
let output = match format {
"json" => to_json(&inv).to_string(),
"sarif" => to_sarif(&inv).to_string(),
_ => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown output formats are silently accepted as table output. Evidence: out_format is an unrestricted Option<String>, and this wildcard arm handles every value other than json/sarif; with --out-format typo --out-file out, the code prints the table to stdout, writes the empty output string to the file, and exits successfully. Impact: automation expecting JSON/SARIF can receive empty or human-formatted output without any failure signal. Fix by making the match exhaustive for table, json, and sarif, returning an error for anything else (or use clap value_parser), and add negative CLI tests for invalid --out-format.

Comment thread src/deps/run.rs
}
}

fn should_fail(inv: &crate::deps::Inventory, threshold: &str) -> bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid severity thresholds disable the failure gate instead of failing the command. Evidence: Severity::parse(threshold) returning None makes should_fail return false, so corgea deps scan --fail-on hihg exits 0 even when high findings exist. Impact: a typo in CI silently bypasses the intended dependency policy gate. Fix by validating --fail-on during argument handling (or returning Result<bool, DepsError> here) and exiting non-zero with a clear error for unsupported severities; add a CLI test for an invalid threshold.

Comment thread src/deps/run.rs
for c in &diff.changed {
println!(" ~ {} {} -> {}", c.name, c.from, c.to);
}
if fail_on_new.is_some() && !head.findings.is_empty() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--fail-on-new ignores both "new" and the supplied threshold. Evidence: the value is only checked with is_some(), and the exit decision uses !head.findings.is_empty() instead of findings introduced by the diff and at/above the requested severity. Impact: once the base diff is implemented, this will still fail builds for pre-existing findings below the requested threshold and will also accept invalid threshold strings. Fix by parsing fail_on_new as a severity, comparing base vs head findings to identify newly introduced findings, and failing only for new findings meeting that threshold.

}
}
}
if let Some(express) = packages.get("node_modules/express") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The npm graph/explain implementation is fixture-specific: it only reads dependency declarations from the root package and node_modules/express. Evidence: dependencies of any other package in package-lock.json never get declared or parent set, and the transitive edge builder only adds edges when lp.parent is present. Impact: corgea deps graph and corgea deps explain are wrong for almost every npm project whose transitive dependencies are not under Express; affected transitive packages appear as disconnected or as fallback root paths. Fix by iterating every lockfile package entry, resolving each entry's dependencies map to node_modules/<dep> keys, and adding edges/declared constraints generically; add a fixture where a non-Express direct dependency has a transitive dependency and assert explain shows the real path.

},
lock_integrity: None,
});
if name == "urllib3" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Poetry graph has the same hard-coded fixture behavior: only urllib3 is connected as a transitive dependency of requests. Evidence: parse_poetry_lock returns only package names/versions, and this branch creates an edge only when name == "urllib3"; every other [package.dependencies] relationship in a real poetry.lock is ignored. Impact: corgea deps graph/explain cannot accurately explain Python dependency paths, and policy/path reporting will be misleading outside the one test fixture. Fix by parsing dependency tables for every package in poetry.lock and building edges from each package to its declared dependencies; add tests with at least two different parent packages and transitive dependencies.

Comment thread src/deps/run.rs
out_format,
out_file,
} => {
let inv = scan(Path::new(&path), &Policy::default())?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: scans never load the policy file that policy init creates. This path passes Policy::default() directly, and the same pattern is used for graph, explain, diff, and sbom, while Policy::from_yaml is never called by the CLI.

Impact: edits to .corgea/deps.yml have no effect, so users cannot disable or tune DEP rules and CI can fail on findings that policy explicitly allowed.

Fix: load <scan path>/.corgea/deps.yml when present, pass that Policy to every scan command, return an error for invalid YAML, and add a CLI test where disabling DEP004 changes the scan output/exit code.

Comment thread src/deps/run.rs
for c in &diff.changed {
println!(" ~ {} {} -> {}", c.name, c.from, c.to);
}
if fail_on_new.is_some() && !head.findings.is_empty() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: --fail-on-new does not compare against the base inventory and ignores the supplied severity threshold. As written, any non-empty head.findings exits 1, including pre-existing findings and findings below the requested threshold.

Impact: this cannot be used as a CI gate for “new high findings only”; existing low/medium findings in HEAD will break builds, while the option value is effectively unused.

Fix: parse the option into Severity, diff stable finding keys against base_inv.findings, fail only for new findings at or above the threshold, and reject invalid threshold values.

Comment thread src/deps/run.rs
}

fn should_fail(inv: &crate::deps::Inventory, threshold: &str) -> bool {
let Some(sev) = Severity::parse(threshold) else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: an invalid --fail-on value silently disables the gate. If Severity::parse returns None, should_fail returns false, so corgea deps scan --fail-on hihg exits 0 even when high/critical findings exist.

Impact: a typo in CI configuration bypasses the dependency policy gate.

Fix: validate --fail-on in run_inner and return an error/non-zero exit for unknown severities; add a CLI test for an invalid threshold.

Comment thread src/deps/run.rs
inv.findings.iter().any(|f| f.severity.at_least(sev))
}

fn scan_base_ref(_path: &str, _base: &str) -> Result<crate::deps::Inventory, DepsError> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: deps diff --base never scans the requested base ref. scan_base_ref ignores both arguments and returns an empty Inventory, so the diff always compares HEAD against an empty graph.

Impact: every current dependency is reported as added, removals and version changes from the real base are missed, and any CI behavior built on this diff is incorrect.

Fix: materialize the requested git ref in a temporary tree/worktree and scan that path, returning an error if the ref cannot be loaded; add a CLI test where a dependency changes or is removed between base and HEAD.

rel_manifest: &str,
) -> Result<(), DepsError> {
let pkg = read_json(manifest_path)?;
let lock_path = file_in_dir(ctx.detected, dir, DepFileKind::NpmLockfile);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: detected Yarn and pnpm lockfiles are ignored. detect.rs classifies yarn.lock and pnpm-lock.yaml, but npm scanning only looks for DepFileKind::NpmLockfile here.

Impact: projects using Yarn or pnpm are scanned as if they have no usable lockfile: resolved versions/integrity are lost and direct dependencies can be falsely reported as stale/missing.

Fix: either parse YarnLockfile/PnpmLockfile as lock candidates or treat them as explicitly unsupported with a clear finding/error instead of silently ignoring them; add fixtures for both formats.

if !key.starts_with("node_modules/") {
continue;
}
let name = key
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: scoped npm package names are corrupted for transitive lockfile entries. For a key like node_modules/@types/node, this extraction keeps only node, which then creates pkg:npm/node@... and collides with the unscoped node package.

Impact: the inventory, SBOM, findings lookup, and graph can all point at the wrong package for scoped dependencies.

Fix: when stripping node_modules/, preserve two path segments for scoped packages (@scope/name), and mirror that behavior in PackageId::name(); add a lockfile fixture containing a scoped transitive package.

}
}
}
if let Some(express) = packages.get("node_modules/express") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: npm dependency edges are hardcoded to express. parse_npm_lock only copies dependency declarations from the root package and node_modules/express; every other package’s dependencies object is ignored, leaving lp.parent unset and no edge emitted.

Impact: deps explain paths and CycloneDX dependency relationships are wrong for almost all real npm lockfiles.

Fix: iterate every packages entry, read its dependencies, resolve each child package key, and emit parent-to-child edges generically; remove the express special case and test with a non-Express transitive dependency.

let dir = parent_dir(&f.path);
let has_lock = ctx.detected.iter().any(|x| {
parent_dir(&x.path) == dir
&& matches!(x.kind, DepFileKind::PoetryLock | DepFileKind::UvLock)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: uv.lock suppresses requirements scanning but is never parsed. This branch treats UvLock as a lockfile, so requirements.txt is skipped, and there is no scanner elsewhere that consumes DepFileKind::UvLock.

Impact: Python projects with uv.lock plus requirements can produce an empty or severely incomplete inventory and miss DEP findings entirely.

Fix: implement uv.lock parsing before treating it as handled, or stop including UvLock in this has_lock check until support exists; add a fixture for pyproject.toml + uv.lock + requirements.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant