docs: DSPy vs dsprrr feature parity report by JamesHWade · Pull Request #110 · JamesHWade/dsprrr

JamesHWade · 2026-06-03T01:24:03Z

Summary

Adds DSPY_PARITY.md — a multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature areas, with per-area parity scores, file:symbol evidence from R/, and a prioritized list of gaps worth closing.

Headline assessment

dsprrr is a faithful port of DSPy's authoring surface and a partial port of its machinery — strong on what you write, weaker on what optimizes and observes it.

Area	Parity	Score
Signatures & Adapters	partial	42
Core Prediction Modules	strong	68
Agentic / Tool Modules	partial	52
Ensemble / Refine / Robustness	strong	78
Optimizers / Teleprompters	partial	58
Evaluation & Metrics	partial	55
Retrieval / RAG / Embeddings	partial	38
LM Config, Caching, Async/Parallel	partial	52
Persistence, Observability & Deployment	partial	42

Follow-up work filed in beads

dsprrr-pcd (P1, bug) — BestOfN/Refine per-attempt diversity via rollout_id + temperature override
dsprrr-v18 (P1) — Trace-aware metric protocol (continuous in eval, binary in optimization)
dsprrr-e7g (P2) — GEPA feedback-metric channel (depends on dsprrr-v18)
dsprrr-4bu (P2) — Adapter abstraction (ChatAdapter + JSONAdapter fallback)
dsprrr-7nu (P2) — ReAct: enforce max_iterations + inspectable trajectory
dsprrr-ebq (P2) — Signature-manipulation API
Corroborates existing dsprrr-deh (Embedder) and dsprrr-a3z (Parallel)

All linked under the dsprrr-u7z parity epic and labeled dspy-parity-2026.

🤖 Generated with Claude Code

Multi-agent audit comparing DSPy 3.x against dsprrr across 9 feature areas, with per-area parity scores, file:symbol evidence, and a prioritized list of gaps worth closing. Filed as beads issues: dsprrr-pcd (rollout_id diversity), dsprrr-v18 (trace-aware metrics), dsprrr-e7g (GEPA feedback), dsprrr-4bu (adapter abstraction), dsprrr-7nu (ReAct trajectory), dsprrr-ebq (signature manipulation API). Corroborates existing dsprrr-deh (Embedder) and dsprrr-a3z (Parallel). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new documentation artifact (DSPY_PARITY.md) that audits feature parity between DSPy 3.x (Python) and the dsprrr R package across nine feature areas, including parity scoring and concrete R/ file references to support the assessment.

Changes:

Introduces DSPY_PARITY.md with an executive summary, parity matrix, and per-area analysis (coverage, gaps, and dsprrr-specific extras).
Documents a prioritized “Gaps Worth Closing” list to guide follow-up implementation work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
+## Executive Summary
+
+dsprrr is a faithful R port of DSPy's authoring surface and a partial port of its machinery. It ports the parts an R user touches first — string signatures, the core prediction modules (Predict, ChainOfThought, ProgramOfThought, ReAct, RAG, ensemble/refine wrappers), a two-tier cache, scoped LM configuration, and ten teleprompters sharing a clean S7 `compile()` architecture — and it ports them as real, tested implementations rather than stubs. The gaps are deeper in the stack and consistent across areas: there is no Adapter abstraction (formatting and parsing are delegated wholesale to ellmer's `chat_structured()`), no predictor/parameter introspection or composable `Module` subclassing, no trace-aware metric protocol, no `dspy.LM` wrapper, no weight-optimization family (BootstrapFinetune, GRPO), and no callback/MLflow/serving story. Where dsprrr diverges, it usually leans on the R ecosystem (ellmer, ragnar, vitals, pins, tidymodels-flavored grid search), which is a reasonable trade rather than a deficiency. Net: strong on what you write, weaker on what optimizes and observes it.


+
+**What DSPy has.** A string-signature surface (`"inputs -> outputs"` with inline typing), class-based signatures with docstring instructions, `InputField`/`OutputField` factories over Pydantic constraints, a signature-manipulation API (`with_instructions`, `with_updated_fields`, `append`/`prepend`/`delete`, `equals`, `dump_state`/`load_state`), and a full Adapter layer: ChatAdapter with `[[ ## field ## ]]` markers, JSONAdapter with native `response_format` tiering and `json_repair`, XMLAdapter, TwoStepAdapter, BAMLAdapter, plus process-wide and scoped adapter configuration and a `dspy.Type` hook system (Image/Audio/Document/Citations/Reasoning, `adapt_to_native_lm_feature`).
+
+**dsprrr's coverage.** The string-signature half is well done. `parse_signature` (R/signature-parser.R) splits on `->` with nesting-aware comma/colon handling (`split_respecting_nesting`), and `parse_type_string` maps the full inline-type vocabulary — `string`/`int`/`float`/`bool`/`list[...]`/`enum(...)`/`Literal[...]` plus bounds like `number[0,100]`. Outputs are native ellmer types, so structured output uses provider-native JSON schema directly via `chat_structured` (R/run.R:call_llm_request). `signature_to_json_schema()` (R/signature-schema.R) exports the contract. Reasoning is handled by composable transforms `with_reasoning()`/`without_reasoning()`/`chain_of_thought()` (R/signature-transforms.R).


+
+**What DSPy has.** A unified `dspy.Embedder` (hosted-via-LiteLLM or custom callable, `batch_size`, caching, async `acall`), a built-in in-memory `retrievers.Embeddings` (brute-force↔FAISS auto-switch at 20k, returns `Prediction{passages, indices, scores}`), `ColBERTv2`, a standalone `dspy.KNN`, the legacy `Retrieve`/global `rm` config, `KNNFewShot`, and the canonical RAG pattern (retrieve as a plain callable composed with a separately-optimizable generation module).
+
+**dsprrr's coverage.** This is the weakest area, by design — embedding and vector search are delegated to ragnar. RAGModule (R/module-rag.R) implements retrieve-then-generate: `extract_query` → `retrieve_context` (via `ragnar::ragnar_retrieve` or a custom `retriever(query, k)` closure) → inject into the context field → `chat_structured`. KNNFewShot is fully implemented as an S7 teleprompter (R/teleprompter-knn.R) plus a runtime KNNFewShotModule (R/module-knn.R) that embeds each query, finds k neighbors via pure-R `cosine_similarity`, and injects them as demos. `ragnar_tool()` (R/ragnar.R) exposes a ragnar store as an ellmer search tool for ReAct.


Copilot AI review requested due to automatic review settings June 3, 2026 01:24

Copilot started reviewing on behalf of JamesHWade June 3, 2026 01:24 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: DSPy vs dsprrr feature parity report#110

docs: DSPy vs dsprrr feature parity report#110
JamesHWade wants to merge 1 commit into
mainfrom
docs/dspy-parity-report

JamesHWade commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Executive Summary

		dsprrr is a faithful R port of DSPy's authoring surface and a partial port of its machinery. It ports the parts an R user touches first — string signatures, the core prediction modules (Predict, ChainOfThought, ProgramOfThought, ReAct, RAG, ensemble/refine wrappers), a two-tier cache, scoped LM configuration, and ten teleprompters sharing a clean S7 `compile()` architecture — and it ports them as real, tested implementations rather than stubs. The gaps are deeper in the stack and consistent across areas: there is no Adapter abstraction (formatting and parsing are delegated wholesale to ellmer's `chat_structured()`), no predictor/parameter introspection or composable `Module` subclassing, no trace-aware metric protocol, no `dspy.LM` wrapper, no weight-optimization family (BootstrapFinetune, GRPO), and no callback/MLflow/serving story. Where dsprrr diverges, it usually leans on the R ecosystem (ellmer, ragnar, vitals, pins, tidymodels-flavored grid search), which is a reasonable trade rather than a deficiency. Net: strong on what you write, weaker on what optimizes and observes it.


		What DSPy has. A string-signature surface (`"inputs -> outputs"` with inline typing), class-based signatures with docstring instructions, `InputField`/`OutputField` factories over Pydantic constraints, a signature-manipulation API (`with_instructions`, `with_updated_fields`, `append`/`prepend`/`delete`, `equals`, `dump_state`/`load_state`), and a full Adapter layer: ChatAdapter with `[[ ## field ## ]]` markers, JSONAdapter with native `response_format` tiering and `json_repair`, XMLAdapter, TwoStepAdapter, BAMLAdapter, plus process-wide and scoped adapter configuration and a `dspy.Type` hook system (Image/Audio/Document/Citations/Reasoning, `adapt_to_native_lm_feature`).

		dsprrr's coverage. The string-signature half is well done. `parse_signature` (R/signature-parser.R) splits on `->` with nesting-aware comma/colon handling (`split_respecting_nesting`), and `parse_type_string` maps the full inline-type vocabulary — `string`/`int`/`float`/`bool`/`list[...]`/`enum(...)`/`Literal[...]` plus bounds like `number[0,100]`. Outputs are native ellmer types, so structured output uses provider-native JSON schema directly via `chat_structured` (R/run.R:call_llm_request). `signature_to_json_schema()` (R/signature-schema.R) exports the contract. Reasoning is handled by composable transforms `with_reasoning()`/`without_reasoning()`/`chain_of_thought()` (R/signature-transforms.R).


		What DSPy has. A unified `dspy.Embedder` (hosted-via-LiteLLM or custom callable, `batch_size`, caching, async `acall`), a built-in in-memory `retrievers.Embeddings` (brute-force↔FAISS auto-switch at 20k, returns `Prediction{passages, indices, scores}`), `ColBERTv2`, a standalone `dspy.KNN`, the legacy `Retrieve`/global `rm` config, `KNNFewShot`, and the canonical RAG pattern (retrieve as a plain callable composed with a separately-optimizable generation module).

		dsprrr's coverage. This is the weakest area, by design — embedding and vector search are delegated to ragnar. RAGModule (R/module-rag.R) implements retrieve-then-generate: `extract_query` → `retrieve_context` (via `ragnar::ragnar_retrieve` or a custom `retriever(query, k)` closure) → inject into the context field → `chat_structured`. KNNFewShot is fully implemented as an S7 teleprompter (R/teleprompter-knn.R) plus a runtime KNNFewShotModule (R/module-knn.R) that embeds each query, finds k neighbors via pure-R `cosine_similarity`, and injects them as demos. `ragnar_tool()` (R/ragnar.R) exposes a ragnar store as an ellmer search tool for ReAct.

Conversation

JamesHWade commented Jun 3, 2026

Summary

Headline assessment

Follow-up work filed in beads

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants