docs: DSPy vs dsprrr feature parity report#110
Open
JamesHWade wants to merge 1 commit into
Open
Conversation
Multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature areas, with per-area parity scores, file:symbol evidence, and a prioritized list of gaps worth closing. Filed as beads issues: dsprrr-pcd (rollout_id diversity), dsprrr-v18 (trace-aware metrics), dsprrr-e7g (GEPA feedback), dsprrr-4bu (adapter abstraction), dsprrr-7nu (ReAct trajectory), dsprrr-ebq (signature manipulation API). Corroborates existing dsprrr-deh (Embedder) and dsprrr-a3z (Parallel). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new documentation artifact (DSPY_PARITY.md) that audits feature parity between DSPy 3.x (Python) and the dsprrr R package across nine feature areas, including parity scoring and concrete R/ file references to support the assessment.
Changes:
- Introduces
DSPY_PARITY.mdwith an executive summary, parity matrix, and per-area analysis (coverage, gaps, and dsprrr-specific extras). - Documents a prioritized “Gaps Worth Closing” list to guide follow-up implementation work.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Executive Summary | ||
|
|
||
| dsprrr is a faithful R port of DSPy's authoring surface and a partial port of its machinery. It ports the parts an R user touches first — string signatures, the core prediction modules (Predict, ChainOfThought, ProgramOfThought, ReAct, RAG, ensemble/refine wrappers), a two-tier cache, scoped LM configuration, and ten teleprompters sharing a clean S7 `compile()` architecture — and it ports them as real, tested implementations rather than stubs. The gaps are deeper in the stack and consistent across areas: there is no Adapter abstraction (formatting and parsing are delegated wholesale to ellmer's `chat_structured()`), no predictor/parameter introspection or composable `Module` subclassing, no trace-aware metric protocol, no `dspy.LM` wrapper, no weight-optimization family (BootstrapFinetune, GRPO), and no callback/MLflow/serving story. Where dsprrr diverges, it usually leans on the R ecosystem (ellmer, ragnar, vitals, pins, tidymodels-flavored grid search), which is a reasonable trade rather than a deficiency. Net: strong on what you write, weaker on what optimizes and observes it. |
|
|
||
| **What DSPy has.** A string-signature surface (`"inputs -> outputs"` with inline typing), class-based signatures with docstring instructions, `InputField`/`OutputField` factories over Pydantic constraints, a signature-manipulation API (`with_instructions`, `with_updated_fields`, `append`/`prepend`/`delete`, `equals`, `dump_state`/`load_state`), and a full Adapter layer: ChatAdapter with `[[ ## field ## ]]` markers, JSONAdapter with native `response_format` tiering and `json_repair`, XMLAdapter, TwoStepAdapter, BAMLAdapter, plus process-wide and scoped adapter configuration and a `dspy.Type` hook system (Image/Audio/Document/Citations/Reasoning, `adapt_to_native_lm_feature`). | ||
|
|
||
| **dsprrr's coverage.** The string-signature half is well done. `parse_signature` (R/signature-parser.R) splits on `->` with nesting-aware comma/colon handling (`split_respecting_nesting`), and `parse_type_string` maps the full inline-type vocabulary — `string`/`int`/`float`/`bool`/`list[...]`/`enum(...)`/`Literal[...]` plus bounds like `number[0,100]`. Outputs are native ellmer types, so structured output uses provider-native JSON schema directly via `chat_structured` (R/run.R:call_llm_request). `signature_to_json_schema()` (R/signature-schema.R) exports the contract. Reasoning is handled by composable transforms `with_reasoning()`/`without_reasoning()`/`chain_of_thought()` (R/signature-transforms.R). |
|
|
||
| **What DSPy has.** A unified `dspy.Embedder` (hosted-via-LiteLLM or custom callable, `batch_size`, caching, async `acall`), a built-in in-memory `retrievers.Embeddings` (brute-force↔FAISS auto-switch at 20k, returns `Prediction{passages, indices, scores}`), `ColBERTv2`, a standalone `dspy.KNN`, the legacy `Retrieve`/global `rm` config, `KNNFewShot`, and the canonical RAG pattern (retrieve as a plain callable composed with a separately-optimizable generation module). | ||
|
|
||
| **dsprrr's coverage.** This is the weakest area, by design — embedding and vector search are delegated to ragnar. RAGModule (R/module-rag.R) implements retrieve-then-generate: `extract_query` → `retrieve_context` (via `ragnar::ragnar_retrieve` or a custom `retriever(query, k)` closure) → inject into the context field → `chat_structured`. KNNFewShot is fully implemented as an S7 teleprompter (R/teleprompter-knn.R) plus a runtime KNNFewShotModule (R/module-knn.R) that embeds each query, finds k neighbors via pure-R `cosine_similarity`, and injects them as demos. `ragnar_tool()` (R/ragnar.R) exposes a ragnar store as an ellmer search tool for ReAct. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
DSPY_PARITY.md— a multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature areas, with per-area parity scores,file:symbolevidence fromR/, and a prioritized list of gaps worth closing.Headline assessment
dsprrr is a faithful port of DSPy's authoring surface and a partial port of its machinery — strong on what you write, weaker on what optimizes and observes it.
Follow-up work filed in beads
dsprrr-pcd(P1, bug) — BestOfN/Refine per-attempt diversity viarollout_id+ temperature overridedsprrr-v18(P1) — Trace-aware metric protocol (continuous in eval, binary in optimization)dsprrr-e7g(P2) — GEPA feedback-metric channel (depends ondsprrr-v18)dsprrr-4bu(P2) — Adapter abstraction (ChatAdapter + JSONAdapter fallback)dsprrr-7nu(P2) — ReAct: enforcemax_iterations+ inspectable trajectorydsprrr-ebq(P2) — Signature-manipulation APIdsprrr-deh(Embedder) anddsprrr-a3z(Parallel)All linked under the
dsprrr-u7zparity epic and labeleddspy-parity-2026.🤖 Generated with Claude Code