Skip to content

docs: DSPy vs dsprrr feature parity report#110

Open
JamesHWade wants to merge 1 commit into
mainfrom
docs/dspy-parity-report
Open

docs: DSPy vs dsprrr feature parity report#110
JamesHWade wants to merge 1 commit into
mainfrom
docs/dspy-parity-report

Conversation

@JamesHWade

Copy link
Copy Markdown
Owner

Summary

Adds DSPY_PARITY.md — a multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature areas, with per-area parity scores, file:symbol evidence from R/, and a prioritized list of gaps worth closing.

Headline assessment

dsprrr is a faithful port of DSPy's authoring surface and a partial port of its machinery — strong on what you write, weaker on what optimizes and observes it.

Area Parity Score
Signatures & Adapters partial 42
Core Prediction Modules strong 68
Agentic / Tool Modules partial 52
Ensemble / Refine / Robustness strong 78
Optimizers / Teleprompters partial 58
Evaluation & Metrics partial 55
Retrieval / RAG / Embeddings partial 38
LM Config, Caching, Async/Parallel partial 52
Persistence, Observability & Deployment partial 42

Follow-up work filed in beads

  • dsprrr-pcd (P1, bug) — BestOfN/Refine per-attempt diversity via rollout_id + temperature override
  • dsprrr-v18 (P1) — Trace-aware metric protocol (continuous in eval, binary in optimization)
  • dsprrr-e7g (P2) — GEPA feedback-metric channel (depends on dsprrr-v18)
  • dsprrr-4bu (P2) — Adapter abstraction (ChatAdapter + JSONAdapter fallback)
  • dsprrr-7nu (P2) — ReAct: enforce max_iterations + inspectable trajectory
  • dsprrr-ebq (P2) — Signature-manipulation API
  • Corroborates existing dsprrr-deh (Embedder) and dsprrr-a3z (Parallel)

All linked under the dsprrr-u7z parity epic and labeled dspy-parity-2026.

🤖 Generated with Claude Code

Multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature
areas, with per-area parity scores, file:symbol evidence, and a
prioritized list of gaps worth closing.

Filed as beads issues: dsprrr-pcd (rollout_id diversity), dsprrr-v18
(trace-aware metrics), dsprrr-e7g (GEPA feedback), dsprrr-4bu (adapter
abstraction), dsprrr-7nu (ReAct trajectory), dsprrr-ebq (signature
manipulation API). Corroborates existing dsprrr-deh (Embedder) and
dsprrr-a3z (Parallel).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 01:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation artifact (DSPY_PARITY.md) that audits feature parity between DSPy 3.x (Python) and the dsprrr R package across nine feature areas, including parity scoring and concrete R/ file references to support the assessment.

Changes:

  • Introduces DSPY_PARITY.md with an executive summary, parity matrix, and per-area analysis (coverage, gaps, and dsprrr-specific extras).
  • Documents a prioritized “Gaps Worth Closing” list to guide follow-up implementation work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread DSPY_PARITY.md

## Executive Summary

dsprrr is a faithful R port of DSPy's authoring surface and a partial port of its machinery. It ports the parts an R user touches first — string signatures, the core prediction modules (Predict, ChainOfThought, ProgramOfThought, ReAct, RAG, ensemble/refine wrappers), a two-tier cache, scoped LM configuration, and ten teleprompters sharing a clean S7 `compile()` architecture — and it ports them as real, tested implementations rather than stubs. The gaps are deeper in the stack and consistent across areas: there is no Adapter abstraction (formatting and parsing are delegated wholesale to ellmer's `chat_structured()`), no predictor/parameter introspection or composable `Module` subclassing, no trace-aware metric protocol, no `dspy.LM` wrapper, no weight-optimization family (BootstrapFinetune, GRPO), and no callback/MLflow/serving story. Where dsprrr diverges, it usually leans on the R ecosystem (ellmer, ragnar, vitals, pins, tidymodels-flavored grid search), which is a reasonable trade rather than a deficiency. Net: strong on what you write, weaker on what optimizes and observes it.
Comment thread DSPY_PARITY.md

**What DSPy has.** A string-signature surface (`"inputs -> outputs"` with inline typing), class-based signatures with docstring instructions, `InputField`/`OutputField` factories over Pydantic constraints, a signature-manipulation API (`with_instructions`, `with_updated_fields`, `append`/`prepend`/`delete`, `equals`, `dump_state`/`load_state`), and a full Adapter layer: ChatAdapter with `[[ ## field ## ]]` markers, JSONAdapter with native `response_format` tiering and `json_repair`, XMLAdapter, TwoStepAdapter, BAMLAdapter, plus process-wide and scoped adapter configuration and a `dspy.Type` hook system (Image/Audio/Document/Citations/Reasoning, `adapt_to_native_lm_feature`).

**dsprrr's coverage.** The string-signature half is well done. `parse_signature` (R/signature-parser.R) splits on `->` with nesting-aware comma/colon handling (`split_respecting_nesting`), and `parse_type_string` maps the full inline-type vocabulary — `string`/`int`/`float`/`bool`/`list[...]`/`enum(...)`/`Literal[...]` plus bounds like `number[0,100]`. Outputs are native ellmer types, so structured output uses provider-native JSON schema directly via `chat_structured` (R/run.R:call_llm_request). `signature_to_json_schema()` (R/signature-schema.R) exports the contract. Reasoning is handled by composable transforms `with_reasoning()`/`without_reasoning()`/`chain_of_thought()` (R/signature-transforms.R).
Comment thread DSPY_PARITY.md

**What DSPy has.** A unified `dspy.Embedder` (hosted-via-LiteLLM or custom callable, `batch_size`, caching, async `acall`), a built-in in-memory `retrievers.Embeddings` (brute-force↔FAISS auto-switch at 20k, returns `Prediction{passages, indices, scores}`), `ColBERTv2`, a standalone `dspy.KNN`, the legacy `Retrieve`/global `rm` config, `KNNFewShot`, and the canonical RAG pattern (retrieve as a plain callable composed with a separately-optimizable generation module).

**dsprrr's coverage.** This is the weakest area, by design — embedding and vector search are delegated to ragnar. RAGModule (R/module-rag.R) implements retrieve-then-generate: `extract_query` → `retrieve_context` (via `ragnar::ragnar_retrieve` or a custom `retriever(query, k)` closure) → inject into the context field → `chat_structured`. KNNFewShot is fully implemented as an S7 teleprompter (R/teleprompter-knn.R) plus a runtime KNNFewShotModule (R/module-knn.R) that embeds each query, finds k neighbors via pure-R `cosine_similarity`, and injects them as demos. `ragnar_tool()` (R/ragnar.R) exposes a ragnar store as an ellmer search tool for ReAct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants