Add corgea deps offline inventory (scan/graph/explain/diff/sbom/policy)#93
Add corgea deps offline inventory (scan/graph/explain/diff/sbom/policy)#93juangaitanv wants to merge 1 commit into
Conversation
Offline dependency inventory for npm, Python, and Java: detects manifests and lockfiles, builds the resolved graph, and evaluates a pinning policy (DEP rules) with table/JSON/SARIF/CycloneDX output. Fully offline — no token or network. Carved out of #89 as chunk2 (stacks on the chunk1 harness branch). Excludes the network surface deferred to chunk3: `deps verify`, registry freshness, --check-cve / vuln-api, the vuln-api-stub binary, and the npm/pip/etc. install wrappers. Wires `corgea deps <subcommand>` into main.rs with no auth/token check. Adds 83 unit + integration tests; overall line coverage 13% -> 36%.
| inv.findings.iter().any(|f| f.severity.at_least(sev)) | ||
| } | ||
|
|
||
| fn scan_base_ref(_path: &str, _base: &str) -> Result<crate::deps::Inventory, DepsError> { |
There was a problem hiding this comment.
This cannot ship as the implementation for corgea deps diff: both _path and _base are ignored and the base inventory is always empty. Evidence: the diff command scans HEAD, then calls this stub, so even corgea deps diff --base HEAD reports every current dependency as added and never reports real removals/changes. Impact: CI/policy checks based on dependency diffs are unusable and --fail-on-new will be evaluated against the wrong baseline. Fix by materializing/scanning the requested git ref for the requested path (for example via a temporary worktree or reading files from the git object database), then add CLI tests where comparing against the same ref yields an empty diff and a real base/head change yields only the actual additions/removals/updates.
| out_format, | ||
| out_file, | ||
| } => { | ||
| let inv = scan(Path::new(&path), &Policy::default())?; |
There was a problem hiding this comment.
The CLI never loads the policy file that corgea deps policy init creates. Evidence: every command path calls scan(..., &Policy::default()), so a user can set .corgea/deps.yml to disable fail_on_wildcard or adjust other policy settings and corgea deps scan will still use the hard-coded defaults. Impact: the advertised policy feature produces false positives/false negatives in CI and users cannot rely on the generated config. Fix by loading .corgea/deps.yml from the scan root when present, failing closed on invalid YAML, and passing that Policy through scan/graph/explain/diff/sbom; add CLI coverage proving a generated/edited policy changes scan findings.
| let output = match format { | ||
| "json" => to_json(&inv).to_string(), | ||
| "sarif" => to_sarif(&inv).to_string(), | ||
| _ => { |
There was a problem hiding this comment.
Unknown output formats are silently accepted as table output. Evidence: out_format is an unrestricted Option<String>, and this wildcard arm handles every value other than json/sarif; with --out-format typo --out-file out, the code prints the table to stdout, writes the empty output string to the file, and exits successfully. Impact: automation expecting JSON/SARIF can receive empty or human-formatted output without any failure signal. Fix by making the match exhaustive for table, json, and sarif, returning an error for anything else (or use clap value_parser), and add negative CLI tests for invalid --out-format.
| } | ||
| } | ||
|
|
||
| fn should_fail(inv: &crate::deps::Inventory, threshold: &str) -> bool { |
There was a problem hiding this comment.
Invalid severity thresholds disable the failure gate instead of failing the command. Evidence: Severity::parse(threshold) returning None makes should_fail return false, so corgea deps scan --fail-on hihg exits 0 even when high findings exist. Impact: a typo in CI silently bypasses the intended dependency policy gate. Fix by validating --fail-on during argument handling (or returning Result<bool, DepsError> here) and exiting non-zero with a clear error for unsupported severities; add a CLI test for an invalid threshold.
| for c in &diff.changed { | ||
| println!(" ~ {} {} -> {}", c.name, c.from, c.to); | ||
| } | ||
| if fail_on_new.is_some() && !head.findings.is_empty() { |
There was a problem hiding this comment.
--fail-on-new ignores both "new" and the supplied threshold. Evidence: the value is only checked with is_some(), and the exit decision uses !head.findings.is_empty() instead of findings introduced by the diff and at/above the requested severity. Impact: once the base diff is implemented, this will still fail builds for pre-existing findings below the requested threshold and will also accept invalid threshold strings. Fix by parsing fail_on_new as a severity, comparing base vs head findings to identify newly introduced findings, and failing only for new findings meeting that threshold.
| } | ||
| } | ||
| } | ||
| if let Some(express) = packages.get("node_modules/express") { |
There was a problem hiding this comment.
The npm graph/explain implementation is fixture-specific: it only reads dependency declarations from the root package and node_modules/express. Evidence: dependencies of any other package in package-lock.json never get declared or parent set, and the transitive edge builder only adds edges when lp.parent is present. Impact: corgea deps graph and corgea deps explain are wrong for almost every npm project whose transitive dependencies are not under Express; affected transitive packages appear as disconnected or as fallback root paths. Fix by iterating every lockfile package entry, resolving each entry's dependencies map to node_modules/<dep> keys, and adding edges/declared constraints generically; add a fixture where a non-Express direct dependency has a transitive dependency and assert explain shows the real path.
| }, | ||
| lock_integrity: None, | ||
| }); | ||
| if name == "urllib3" { |
There was a problem hiding this comment.
The Poetry graph has the same hard-coded fixture behavior: only urllib3 is connected as a transitive dependency of requests. Evidence: parse_poetry_lock returns only package names/versions, and this branch creates an edge only when name == "urllib3"; every other [package.dependencies] relationship in a real poetry.lock is ignored. Impact: corgea deps graph/explain cannot accurately explain Python dependency paths, and policy/path reporting will be misleading outside the one test fixture. Fix by parsing dependency tables for every package in poetry.lock and building edges from each package to its declared dependencies; add tests with at least two different parent packages and transitive dependencies.
| out_format, | ||
| out_file, | ||
| } => { | ||
| let inv = scan(Path::new(&path), &Policy::default())?; |
There was a problem hiding this comment.
Blocking: scans never load the policy file that policy init creates. This path passes Policy::default() directly, and the same pattern is used for graph, explain, diff, and sbom, while Policy::from_yaml is never called by the CLI.
Impact: edits to .corgea/deps.yml have no effect, so users cannot disable or tune DEP rules and CI can fail on findings that policy explicitly allowed.
Fix: load <scan path>/.corgea/deps.yml when present, pass that Policy to every scan command, return an error for invalid YAML, and add a CLI test where disabling DEP004 changes the scan output/exit code.
| for c in &diff.changed { | ||
| println!(" ~ {} {} -> {}", c.name, c.from, c.to); | ||
| } | ||
| if fail_on_new.is_some() && !head.findings.is_empty() { |
There was a problem hiding this comment.
Blocking: --fail-on-new does not compare against the base inventory and ignores the supplied severity threshold. As written, any non-empty head.findings exits 1, including pre-existing findings and findings below the requested threshold.
Impact: this cannot be used as a CI gate for “new high findings only”; existing low/medium findings in HEAD will break builds, while the option value is effectively unused.
Fix: parse the option into Severity, diff stable finding keys against base_inv.findings, fail only for new findings at or above the threshold, and reject invalid threshold values.
| } | ||
|
|
||
| fn should_fail(inv: &crate::deps::Inventory, threshold: &str) -> bool { | ||
| let Some(sev) = Severity::parse(threshold) else { |
There was a problem hiding this comment.
Blocking: an invalid --fail-on value silently disables the gate. If Severity::parse returns None, should_fail returns false, so corgea deps scan --fail-on hihg exits 0 even when high/critical findings exist.
Impact: a typo in CI configuration bypasses the dependency policy gate.
Fix: validate --fail-on in run_inner and return an error/non-zero exit for unknown severities; add a CLI test for an invalid threshold.
| inv.findings.iter().any(|f| f.severity.at_least(sev)) | ||
| } | ||
|
|
||
| fn scan_base_ref(_path: &str, _base: &str) -> Result<crate::deps::Inventory, DepsError> { |
There was a problem hiding this comment.
Blocking: deps diff --base never scans the requested base ref. scan_base_ref ignores both arguments and returns an empty Inventory, so the diff always compares HEAD against an empty graph.
Impact: every current dependency is reported as added, removals and version changes from the real base are missed, and any CI behavior built on this diff is incorrect.
Fix: materialize the requested git ref in a temporary tree/worktree and scan that path, returning an error if the ref cannot be loaded; add a CLI test where a dependency changes or is removed between base and HEAD.
| rel_manifest: &str, | ||
| ) -> Result<(), DepsError> { | ||
| let pkg = read_json(manifest_path)?; | ||
| let lock_path = file_in_dir(ctx.detected, dir, DepFileKind::NpmLockfile); |
There was a problem hiding this comment.
Blocking: detected Yarn and pnpm lockfiles are ignored. detect.rs classifies yarn.lock and pnpm-lock.yaml, but npm scanning only looks for DepFileKind::NpmLockfile here.
Impact: projects using Yarn or pnpm are scanned as if they have no usable lockfile: resolved versions/integrity are lost and direct dependencies can be falsely reported as stale/missing.
Fix: either parse YarnLockfile/PnpmLockfile as lock candidates or treat them as explicitly unsupported with a clear finding/error instead of silently ignoring them; add fixtures for both formats.
| if !key.starts_with("node_modules/") { | ||
| continue; | ||
| } | ||
| let name = key |
There was a problem hiding this comment.
Blocking: scoped npm package names are corrupted for transitive lockfile entries. For a key like node_modules/@types/node, this extraction keeps only node, which then creates pkg:npm/node@... and collides with the unscoped node package.
Impact: the inventory, SBOM, findings lookup, and graph can all point at the wrong package for scoped dependencies.
Fix: when stripping node_modules/, preserve two path segments for scoped packages (@scope/name), and mirror that behavior in PackageId::name(); add a lockfile fixture containing a scoped transitive package.
| } | ||
| } | ||
| } | ||
| if let Some(express) = packages.get("node_modules/express") { |
There was a problem hiding this comment.
Blocking: npm dependency edges are hardcoded to express. parse_npm_lock only copies dependency declarations from the root package and node_modules/express; every other package’s dependencies object is ignored, leaving lp.parent unset and no edge emitted.
Impact: deps explain paths and CycloneDX dependency relationships are wrong for almost all real npm lockfiles.
Fix: iterate every packages entry, read its dependencies, resolve each child package key, and emit parent-to-child edges generically; remove the express special case and test with a non-Express transitive dependency.
| let dir = parent_dir(&f.path); | ||
| let has_lock = ctx.detected.iter().any(|x| { | ||
| parent_dir(&x.path) == dir | ||
| && matches!(x.kind, DepFileKind::PoetryLock | DepFileKind::UvLock) |
There was a problem hiding this comment.
Blocking: uv.lock suppresses requirements scanning but is never parsed. This branch treats UvLock as a lockfile, so requirements.txt is skipped, and there is no scanner elsewhere that consumes DepFileKind::UvLock.
Impact: Python projects with uv.lock plus requirements can produce an empty or severely incomplete inventory and miss DEP findings entirely.
Fix: implement uv.lock parsing before treating it as handled, or stop including UvLock in this has_lock check until support exists; add a fixture for pyproject.toml + uv.lock + requirements.txt.
Summary
Chunk 2 of the dependency tooling split (#89). Stacked on #92 (chunk1/harness) — review/merge that first; base will retarget to
mainonce #92 lands.Adds the fully offline
corgea depsinventory surface for npm, Python, and Java:scan,graph,explain,diff,sbom,policy.main.rswith no auth/token check.Out of scope (chunk 3)
The network surface stays deferred:
deps verify, registry freshness,--check-cve/ vuln-api client, the vuln-api-stub binary, and the npm/pip/etc. install wrappers.Coverage
83 unit + integration tests. Overall line coverage 13% → 36% (well above the 13% floor from chunk 1).
Test plan
main./harness cipasses (clippy strict, fmt, tests + coverage)corgea deps scan/graph/explain/diff/sbom/policyrun with no token or config presentdepssubcommand