Feature sparse linalg solvers#2841
Conversation
…oneMKL hooks
- _interface.py: add full operator algebra (.H, .T, +, *, **, neg),
_AdjointLinearOperator, _TransposedLinearOperator, _SumLinearOperator,
_ProductLinearOperator, _ScaledLinearOperator, _PowerLinearOperator,
IdentityOperator, MatrixLinearOperator, _AdjointMatrixOperator,
_CustomLinearOperator factory dispatch; extend aslinearoperator
to handle dpnp sparse and dense arrays
- _iterative.py: add _make_system (dtype validation, preconditioner
wiring, working dtype selection); add _make_fast_matvec CSR/oneMKL
SpMV hook; fix GMRES Arnoldi inner product to single oneMKL BLAS
gemv (dpnp.dot) instead of slow Python vdot loop; offload
Hessenberg lstsq to numpy.linalg.lstsq (CPU, matches CuPy);
fix SciPy host-fallback tol->rtol deprecation via _scipy_tol_kwarg;
add preconditioner support to CG; keep MINRES as SciPy-backed stub
Refs: CuPy v14.0.1 cupyx/scipy/sparse/linalg/_interface.py,
cupyx/scipy/sparse/linalg/_iterative.py"
…gmres, minres
Modeled after CuPy's cupyx_tests/scipy_tests/sparse_tests/test_linalg.py.
Covers:
- LinearOperator: shape, dtype inference, matvec/rmatvec/matmat,
subclassing, __matmul__, __call__, edge cases
- aslinearoperator: dense array, duck-type, identity passthrough,
rmatvec from dense, invalid inputs
- cg: SPD convergence, scipy reference match, x0 warm start, b_ndim=2,
callback, atol, LinearOperator path, invalid inputs,
non-convergence info check
- gmres: diag-dominant convergence, scipy reference match, restart
variants, x0, b_ndim=2, callbacks, complex systems, atol,
non-convergence info check, Hilbert-matrix stress test
- minres: SPD, symmetric-indefinite, scipy reference, shift parameter,
non-square guard, LinearOperator path, callback
- Integration: parametric (n, dtype) cross-solver tests via LinearOperator
- Import smoke tests: __all__ completeness
- Use dpnp.tests.helper: assert_dtype_allclose, generate_random_numpy_array, get_all_dtypes, get_float_complex_dtypes, has_support_aspect64 - Use dpnp.tests.third_party.cupy testing harness (with_requires, etc.) - Use numpy.testing assert_allclose / assert_array_equal / assert_raises - Use dpnp.asnumpy() instead of numpy.asarray() - Use pytest parametrize ids matching existing test conventions - Use is_scipy_available() helper from tests/helper.py - Strict class-per-solver organisation matching TestCholesky / TestDet etc.
…or dtype Two bugs fixed: 1. _init_dtype() was calling dpnp.zeros(n) which defaults to float64, so a float32 matvec would upcast and return float64, making the inferred dtype wrong. Fix: use dpnp.zeros(n, dtype=dpnp.int8) as SciPy/CuPy do — any numeric matvec will promote int8 to its own dtype. 2. _CustomLinearOperator.__init__ called _init_dtype() even when an explicit dtype was already supplied, overwriting the caller's value. Fix: _init_dtype() now short-circuits when self.dtype is already set.
…ption handling Align gemv.cpp with the conventions established in blas/gemm.cpp: Headers added: - ext/common.hpp (dpctl_td_ns, consistent with other extensions) - utils/memory_overlap.hpp (MemoryOverlap guard on x vs y) - utils/output_validation.hpp (CheckWritable + AmpleMemory on y) - utils/type_utils.hpp (validate_type_for_device<T> in impl) - <sstream> (needed for stringstream error_msg) Exception handling added in sparse_gemv_impl(): - try/catch(oneapi::mkl::exception) around all oneMKL sparse calls - try/catch(sycl::exception) around all oneMKL sparse calls - release_matrix_handle cleanup in the exception error path - throw std::runtime_error with descriptive message on catch Input validation added in sparse_gemv(): - ndim checks: x and y must be 1-D - queues_are_compatible() across all 5 USM arrays - MemoryOverlap()(x, y) aliasing guard - CheckWritable::throw_if_not_writable(y) - AmpleMemory::throw_if_not_ample(y, num_rows) - keep_args_alive() at function exit (was missing, returning empty event)
… table
Modeled after blas/gemm.cpp (2-D table: value type x index type) and
blas/gemv.cpp (dispatch vector pattern with ContigFactory + init_dispatch_table).
Changes:
- Add sparse/types_matrix.hpp with SparseGemvTypePairSupportFactory<Tv, Ti>
encoding the 4 supported combinations: {float32,float64} x {int32,int64}
- Rewrite sparse_gemv_impl() to take typeless char* pointers (matching
the blas gemv_impl signature style) — type info flows through template
params only, no runtime branching inside the impl
- Replace the 60-line if/else val_typenum/idx_typenum chain in sparse_gemv()
with a 2-D dispatch table lookup (gemv_dispatch_table[val_id][idx_id])
- Rename init_sparse_gemv_dispatch_vector -> init_sparse_gemv_dispatch_table
and implement it via init_dispatch_table<> from ext/common.hpp
- All validation guards and exception handling from prior commit are preserved
…se_gemv_dispatch_table Follows the rename made in gemv.cpp when the dispatch mechanism was changed from a 1-D vector to a 2-D table (value type x index type). All other declarations (sparse_gemv signature, parameters) are unchanged.
The oneMKL 2025-2 sparse BLAS API deprecated the old 8-argument
set_csr_data(queue, handle, nrows, ncols, index_base, row_ptr, col_ind,
values, deps) overload in favour of a new signature that takes the
sparse matrix handle as `spmat` and adds an explicit `nnz` argument:
set_csr_data(queue, spmat, nrows, ncols, nnz, index_base,
row_ptr, col_ind, values, deps)
Fixes:
- Replace old set_csr_data call with the new nnz-aware signature
- Silences the resulting -Wunused-parameter warning on `nnz` (now used)
- No functional change; all other logic is unchanged
…tring Line 477: `hasattr(A, "rmatmat\")` had a Markdown-escaped backslash leaked into the Python source, causing an unterminated string literal. Fixed to `hasattr(A, "rmatmat")`.
dpnp.ndarray blocks implicit NumPy conversion via __array__ to prevent silent dtype=object arrays. All test assertions must use .asnumpy() to materialize device arrays onto the host explicitly. Also replaces numpy.asarray(x_dp) in _rel_residual helper.
…dation order - _iterative.py: raise NotImplementedError for M != None *before* the _HOST_N_THRESHOLD SciPy fast-path in cg() and gmres(), so the contract is enforced regardless of system size (fixes test_cg_preconditioner_unsupported_raises, test_gmres_preconditioner_unsupported_raises). - _iterative.py: validate callback_type and raise NotImplementedError for 'pr_norm' *before* the _HOST_N_THRESHOLD branch in gmres(), so small-n systems also see the error (fixes test_gmres_callback_type_pr_norm_raises). - _iterative.py: pass callback_type='legacy' to scipy.sparse.linalg.gmres when delegating on the fast path to suppress SciPy DeprecationWarning. - test_scipy_sparse_linalg.py: add dtype=numpy.float64 to expected arange() calls in test_identity_operator and test_gmres_happy_breakdown so strict NumPy 2.0 dtype-equality checks pass (float64 result vs int64 expected).
… port SciPy corner cases
- Replace .asnumpy() method calls with dpnp.asnumpy() module fn (asnumpy is not an ndarray method in dpnp; it is a top-level fn) - Fix dpnp.any(x) ambiguous truth value in x0 zero-check; replace with explicit `x0 is not None` guard for r0 initialisation - Fix V_mat.T.conj() -> dpnp.conj(V_mat.T) in GMRES Arnoldi step - Guard minres beta sqrt against tiny negative floats: sqrt(abs(...)) - Unify GMRES Hessenberg h_np assignment to avoid .real stripping producing wrong dtype for complex systems - Fix float() cast on dpnp scalar norm inside GMRES inner h_j1 line
…failures) The committed code used hypot(gbar, oldb) as delta_k which is the gamma (norm) from the PREVIOUS rotation step, not the correct diagonal entry from applying the previous Givens rotation to the current column. The correct Paige-Saunders (1975) two-rotation recurrence is: oldeps = epsln delta = cs * dbar + sn * alpha # apply previous rotation gbar_k = sn * dbar - cs * alpha # residual -> new rotation input epsln = sn * beta dbar = -cs * beta gamma = hypot(gbar_k, beta) # NEW rotation eliminates beta cs = gbar_k / gamma sn = beta / gamma w_new = (v - oldeps*w - delta*w2) / gamma # three-term update This matches scipy.sparse.linalg.minres and Choi (2006) eq. 6.11. The buggy recurrence produced solutions ~1.08x away from the true solution (rel_err ~1e0) instead of the expected ~1e-13. Co-authored-by: fix-minres-recurrence
…feature-sparse-linalg-solvers
…agusetty/dpnp into feature-sparse-linalg-solvers
|
@antonwolfy Apologies for the delay. I got side tracked for a bit. The current workflow is failing with a Is this a known limitation and being triaged by compiler team |
|
Seems like it is: #2887 |
|
@abagusetty, yep, it's expected for the coverage due to known compiler bug, we will implement the w/a in #2887 |
…agusetty/dpnp into feature-sparse-linalg-solvers
|
|
||
| pybind11_add_module(${python_module_name} MODULE ${_module_src}) | ||
| add_sycl_to_target(TARGET ${python_module_name} SOURCES ${_module_src}) | ||
|
|
There was a problem hiding this comment.
| # Ensure Cython modules build first so _usmarray.h exists | |
| add_dependencies(${python_module_name} _usmarray) | |
| // dpctl namespace alias for type dispatch utilities | ||
| namespace dpctl_td_ns = dpnp::tensor::type_dispatch; |
There was a problem hiding this comment.
| // dpctl namespace alias for type dispatch utilities | |
| namespace dpctl_td_ns = dpnp::tensor::type_dispatch; | |
| // namespace for operations with types | |
| namespace dpnp_td_ns = dpnp::tensor::type_dispatch; |
| * Added implementation of `dpnp.scipy.linalg.lu` (SciPy-compatible) [#2787](https://github.com/IntelPython/dpnp/pull/2787) | ||
| * Added support for ndarray subclassing via `dpnp.ndarray.view` method with `type` parameter [#2815](https://github.com/IntelPython/dpnp/pull/2815) | ||
| * Migrated tensor implementation from `dpctl.tensor` into `dpnp.tensor`, making `dpnp` the primary owner of the Array API-compliant tensor layer [#2856](https://github.com/IntelPython/dpnp/pull/2856) | ||
| * Added implementation of `dpnp.scipy.sparse.linalg import LinearOperator, cg, gmres, minres` [#2841](https://github.com/IntelPython/dpnp/pull/2841) |
There was a problem hiding this comment.
need to move to 0.21.0 release
| # Lazy import: dpnp.scipy.sparse may import this module during | ||
| # package initialisation, so we cannot import it at module scope. | ||
| # pylint: disable=import-outside-toplevel | ||
| from dpnp.scipy.sparse import issparse |
There was a problem hiding this comment.
It seems issparse is missed and not implemented currently
| # Helpers for constructing SPD, diagonally dominant, and symmetric | ||
| # indefinite test matrices. Kept small and local, matching the style of | ||
| # vvsort() at the top of test_linalg.py. | ||
| def _spd_matrix(n, dtype): |
There was a problem hiding this comment.
Can we add CuPy tests to dpnp/tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py to check the compatibly?
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>
…agusetty/dpnp into feature-sparse-linalg-solvers
Closes the performance gap where csr_matrix.dot densified the matrix
via toarray() before every dpnp.dot, defeating the point of CSR
outside the solver fast path. cupyx routes csr_matrix.dot directly
through cuSPARSE SpMV; we now do the equivalent with oneMKL.
_csr.py
- Add _ensure_spmv_handle: lazily allocates a oneMKL matrix_handle
(set_csr_data + optimize_gemv, once) and caches it on the instance.
Skipped when the compiled backend extension is absent or the
(value, index) dtype pair is not in the dispatch table -- the
instance silently uses the dense fallback in that case.
- Add _spmv_supported predicate; tighter than the old check in
_make_fast_matvec because it now validates both value AND index
dtype against the C++ dispatch table, preventing a runtime
ValueError from the backend for unsupported index types.
- Rewrite dot(x) to dispatch sparse::gemv for 1-D x of matching
dtype; preserve the dense fallback for 2-D x (batched SpMM is a
separate oneMKL entry point not wired up yet).
- Add __del__ to release the cached handle. The handle attrs are
initialised at the very top of __init__ so __del__ never sees a
partially constructed object even if an _init_* helper raises.
- Add __matmul__ = dot so 'A @ x' takes the same fast path.
linalg/_iterative.py
- _CachedSpMVPair.matvec now drives the csr's own cached handle
instead of building a second forward handle, so a user-issued
A.dot(x) and a solver-issued matvec(x) share a single handle.
The adjoint handle is still owned by _CachedSpMV because csr
only caches the forward direction.
- _make_fast_matvec collapses to a probe of A._ensure_spmv_handle:
one None-check replaces three (extension import, dtype guard,
handle init try/except). The downstream behaviour is identical
but the dtype check now covers indices, not just values.
linalg/_interface.py
- Rewrite aslinearoperator in the canonical SciPy/CuPy dispatch
order: LinearOperator -> sparse -> dense ndarray -> duck-type.
Remove the unreachable second isinstance(dpnp.ndarray) block and
the conflicting ndim policy (was both <=2 and ==2 in different
branches). Promote 1-D dpnp arrays via atleast_2d to match CuPy.
- Explicitly reject numpy.ndarray with a directed message instead
of falling through to the generic 'type not understood'. Silent
host->device upload would mask real device/queue selection bugs.
- Drop the try/except (ImportError, AttributeError) wrapping the
sparse import. dpnp.scipy.sparse is always present in this build;
swallowing those exceptions hid real failures.
_base.py
- Docstring only: document why the legacy isspmatrix / isspmatrix_csr
aliases are intentionally omitted (dpnp has no spmatrix vs sparray
discriminator, and modern SciPy/CuPy code is steered toward
issparse + A.format == 'csr').
…; add cupyx tests
Five bug fixes informed by a full audit of LinearOperator, cg, gmres,
and minres against the upstream SciPy and cupyx reference
implementations, plus the cupyx-style compatibility tests requested in
the PR review.
_interface.py
- _init_dtype: replace the float64 trial vector with int8. The
float64 path silently upcast every dtype-inferred operator to
float64, breaking float32 and complex64 workflows and contradicting
the commit-message claim of an earlier fix that did not actually
land. Matches scipy/cupyx semantics: int8 is the lowest-precedence
numeric dtype and lets the matvec promote to its natural output
dtype.
- matvec / rmatvec: turn the second line of the dimension-mismatch
error message into an f-string. Previously reported the literal
text '{x.shape}' to users.
_iterative.py (cg)
- Fix the info contract: previously returned -1 for an rz/pAp
breakdown. SciPy and cupyx reserve info < 0 for illegal-input
errors only and use info > 0 for any non-convergence (maxiter
exhausted, breakdown, etc.). User code that branched on
'if info > 0' was silently broken. Now report the iteration index
at which the solver gave up.
- Document the info contract in the docstring so the regression
cannot resurface unnoticed.
_iterative.py (gmres)
- Cast b_norm to host once via float(...) instead of leaving it as a
0-D device tensor. The 'if b_norm == 0.0' guard and the per-iter
'if r_norm <= atol' comparison previously triggered an implicit
__bool__ sync each time they evaluated -- exactly the sync pattern
the module docstring claims to avoid. Track r_norm_host explicitly
so the loop condition and the post-loop info check operate on
host scalars; the residual norm is still computed device-side via
dpnp.linalg.norm.
- Final 'not bool(r_norm <= atol)' replaced with the equivalent host
comparison 'r_norm_host > atol'.
_iterative.py (minres)
- Cast beta1 = float(dpnp.inner(r1, y)) before the < 0 and == 0
guards. Previously the guards compared a 0-D device tensor to a
Python int, forcing an implicit sync inside each branch.
- Use numpy.sqrt on the already-host beta1 instead of dpnp.sqrt +
float() round-trip.
tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py
- New file (with sibling __init__.py for the package) requested in
the PR review thread. Modelled on
cupy/tests/cupyx_tests/scipy_tests/sparse_tests/test_linalg.py and
fitted to dpnp's in-tree testing harness (dpnp as cupy, the
dpnp.tests.third_party.cupy.testing helpers, scipy as the reference
via testing.with_requires('scipy')).
- Coverage:
* LinearOperator: explicit-dtype preservation, int8 dtype
inference, dimension-mismatch raise, __matmul__ dispatch, H
property.
* aslinearoperator: pass-through, dense dpnp.ndarray wrap,
csr_matrix wrap, explicit numpy.ndarray rejection.
* cg: dense-SPD convergence vs scipy, x0 warm-start, info > 0
when maxiter is hit (guards the breakdown-contract fix above),
zero-rhs returns zero.
* gmres: diag-dominant convergence vs scipy, restart parameter,
info > 0 when maxiter is hit.
* minres: symmetric-indefinite convergence vs scipy, shift
parameter, zero-rhs returns zero.
* Module surface: import smoke test for the five PR-promised
names (LinearOperator, aslinearoperator, cg, gmres, minres).
…ost Hessenberg lstsq Closes audit items IntelPython#9, IntelPython#15, IntelPython#16, IntelPython#17, IntelPython#19 from the prior solver review. All four touch the iterative-solver inner loop and were left out of the previous correctness-only commit because they require a new C++ binding. backend/extensions/blas/gemv.{cpp,hpp}, blas_py.cpp - Extend the typed gemv_impl signature with alpha and beta as doubles; the per-T impl casts them to the matrix value type at dispatch time. dpnp.dot and other legacy callers are unaffected -- the existing gemv() public function now forwards (1.0, 0.0) to the shared gemv_dispatch helper. - Add gemv_alpha_beta() entry point and a _gemv_alpha_beta pybind method computing y = alpha * op(A) * x + beta * y for caller- supplied scalars. Required by the GMRES Arnoldi fast path which fires gemv with (alpha=1, beta=0) writing into a Hessenberg column slice, then (alpha=-1, beta=1) fusing u -= V @ h into one kernel. For complex matrices the scalars are always one of {-1, 0, 1} and so survive the cast exactly; the impl docstring flags the silent imag-loss caveat for any other complex caller. - Refactor: hoist the existing validation / strides / dispatch plumbing into a static gemv_dispatch helper so both entry points share identical behaviour without duplication. scipy/sparse/linalg/_iterative.py - _make_compute_hu now takes both V and H. The closure writes the projection coefficients h = V[:, :j+1]^H @ u directly into the Hessenberg column slice H[:j+1, j] via a single gemv with the output pointing at that slice -- no intermediate h buffer, no slice-assign copy (audit item IntelPython#16). Pass 2 fuses the AXPY-style update u = -V @ h + 1 * u into one gemv with alpha=-1, beta=1 -- no tmp buffer, one kernel instead of gemv-plus-subtract (audit item IntelPython#19). For complex V the (j+1)-element h slice is conjugated in-place between the two passes (V^T -> V^H), negligible cost next to the n*(j+1) gemv. - Switch the Hessenberg least-squares H y = e from a device dpnp.linalg.lstsq (which dispatches an SVD kernel for a tiny 21x20 problem per restart) to numpy.linalg.lstsq on the host (audit item IntelPython#17). Matches CuPy's choice and removes a device- side SVD launch that on Intel GPU dominates the restart cost for the default restart=20. RHS e is now maintained as a numpy array; H is copied via dpnp.asnumpy once per restart and the resulting y is shipped back as a dpnp array for the V @ y update. - V[:, j+1] = v retained as a single contiguous USM slice store (audit item IntelPython#15 closes as no-change-required: the assignment is already one dpnp op on an F-order buffer and there is no fused 'normalise-then-store' path without further binding work). - cg per-iter syncs collapsed from 3-4 down to 1 (audit item IntelPython#9). The pAp and rz_new breakdown checks are no longer transferred to the host on every iteration; instead the loop relies on IEEE-754 inf / NaN propagation through alpha = rz / pAp. When pAp underflows the resulting alpha is inf or NaN, poisons the next residual via x += alpha * p and r -= alpha * Ap, and the single norm sync at the top of the next iteration detects the breakdown via numpy.isfinite(rnorm_host) and exits with info > 0. Mirrors the cuBLAS-style CG inner loop (nrm2 + scalar test, one host barrier per iter); the initial rz breakdown guard remains so a zero preconditioned residual still short- circuits correctly. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - test_gmres_complex_arnoldi_fast_path: complex-dtype regression guard for the conjugate-in-place branch of _make_compute_hu -- a silent miss of the conjugate would lose orthogonality and misconverge. - test_cg_inf_breakdown_returns_positive_info: regression guard for the per-iter-sync collapse. Runs cg on a deliberately singular SPD operator and asserts info > 0 (not zero, not -1) so a future re-introduction of explicit breakdown syncs would still pass but a regression to the old info contract would not.
… __del__ against shutdown races Closes audit items IntelPython#5 and IntelPython#29 from the prior solver review. Item IntelPython#4 (_matmat default uses a per-column matvec loop) is closed as wontfix: SciPy's scipy.sparse.linalg.LinearOperator and cupyx's analogue both ship the same hstack-of-matvecs default, so dpnp matches the reference exactly and there is no portable improvement to make without subclass-level _matmat overrides (which _CustomLinearOperator already exposes via its matmat= constructor argument). scipy/sparse/linalg/_interface.py - Set __array_ufunc__ = None on the LinearOperator base class. This is the SciPy contract: a host numpy.ndarray on the left of np_array * linop or np_array @ linop previously triggered NumPy's ufunc dispatch first, which would attempt to broadcast the operator element-wise before falling back to its reflected operator method -- producing either an opaque error or a wrong- typed result. With __array_ufunc__ = None NumPy returns NotImplemented from the ufunc protocol and Python's operator dispatch falls through cleanly to LinearOperator.__rmul__ / __rmatmul__. dpnp.ndarray itself sets __array_ufunc__ = None (see dpnp/dpnp_array.py:222) for the same reason, so the two dispatch systems now agree. scipy/sparse/_csr.py, scipy/sparse/linalg/_iterative.py - Harden __del__ in csr_matrix and in _CachedSpMV against the interpreter-shutdown race where the compiled _sparse_impl extension is garbage-collected before the matrix instance whose oneMKL handle it owns. Previous code used a single except Exception: pass which silenced two qualitatively different failure modes: 1. shutdown race -- extension gone, si._sparse_gemv_release evaluates to None or AttributeError; the handle is unrecoverable and leaving the OS to reclaim it at process exit is the only sane option; 2. genuine backend error while the interpreter is healthy -- a real bug we want to surface eventually, but raising from __del__ produces only an 'Exception ignored in:' warning and the handle is gone either way. The new code probes getattr(si, '_sparse_gemv_release', None) explicitly so case (1) takes the fast non-call path, and then splits the except into (AttributeError, TypeError) for case (1)- style residuals (queue / handle attribute access racing the shutdown) versus a final broad except for case (2). Both still return silently from __del__ -- raising is never valid here -- but the intent is now documented and a real backend regression is no longer indistinguishable from the GC race in code review. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - test_array_ufunc_opt_out: asserts the __array_ufunc__ = None marker is present on LinearOperator. Mirrors SciPy's own test suite test_interface.py::test_array_ufunc_opt_out. - test_numpy_scalar_times_linop_dispatches_to_rmul: the concrete runtime consequence -- numpy.float64(2.0) * linop must produce a scaled LinearOperator, not raise or yield an array.
Adds support for
from dpnp.scipy.sparse.linalg import LinearOperator, cg, gmres, minresFixes: #2831