Phase 2 of API improvements by moltude · Pull Request #93 · dpla/api

moltude · 2025-12-09T13:59:56Z

Phase 2 of API improvements

This PR refactors validation, query composition, and adds a concurrency-limiting utility. Summary of the substantive changes, client impact, deployment/infra considerations, and reviewer guidance:

Key changes

Adds ConcurrencyLimiter (dpla.api.v2.search.ConcurrencyLimiter)
- Semaphore-backed limiter for Futures with a timeout; throws ConcurrencyLimitExceeded when a permit cannot be acquired.
- Important implementation note: apply(...) uses Semaphore.tryAcquire with a timeout and blocks the calling thread for up to timeoutSeconds. The implementation comments recommend using a dedicated blocking dispatcher or tuning parameters when used in actor contexts.
Param validation tightened and refactored (dpla.api.v2.search.paramValidators.ParamValidator)
- Reduced limits: maxPage 100→10, maxPageSize 500→100, maxFacetSize 2000→200; added maxFetchBatchSize (500) and applied it to fetch ID validation.
- Validation semantics changed: out-of-range integers are now rejected (ValidationException) rather than clamped to max.
- getValid treats whitespace-only values as absent.
- Optional cross-field rule requires a query or filter when facets are present (controlled by requiresQueryForFacets).
- Filter parsing stricter: requires non-empty trimmed field names; splits values on AND and rejects if no non-empty values remain.
- More specific messaging for sort_by_pin when sort_by is not the coordinates field.
- Refactored getSearchParams to compute validated locals before constructing SearchParams.
QueryBuilder changes (dpla.api.v2.search.queryBuilders.QueryBuilder)
- Replaced Elasticsearch query_string usage with multi_match (type: best_fields, lenient: true) and operator settable via params.op; reduced set of fields used for keyword matching.
- Filter composition now produces JsArray of term clauses attached directly under bool->filter (instead of wrapped bool.must structure).
- Random and search query JSON structure altered accordingly.
Tests and test helpers updated
- New ConcurrencyLimiterTest added.
- Many ParamValidator and QueryBuilder tests updated to reflect the new validation/rejection semantics and the altered ES JSON shape.
- End-to-end InvalidParamsTest updated to wire in ebook routes and add ebook route assertions.
- Tests updated to assert that oversized pagination/facet parameters are rejected (renamed expectations from "default to max" to "reject param if too large").

Client- and API-impact summary

Public endpoints and response shapes are not intentionally changed by this PR (no new/removed HTTP endpoints or explicit response schema changes). However:
- Validation behavior change (range rejections) will cause requests that previously returned results (with clamped values) to now be rejected with 400/InvalidSearchParams when pagination/facet params exceed the new maxima. This is a user-visible change to client behavior.
- Query composition changes (multi_match vs query_string, reduced keyword fields, and filter JSON structure used internally) affect how searches are translated to Elasticsearch queries and therefore may change search relevance/ranking and which results are returned for some queries. These are not explicit changes to API response shape, but they can change observable search results.
No environment variables, AWS Secrets Manager keys, database migrations, or shared infrastructure (CodePipeline/CodeBuild/ECS task definitions/IAM policies) are added/removed/renamed by this PR.

Security implications

No authentication or credential-handling changes detected.
The concurrency limiter blocks caller threads while waiting for permits; when used on critical actor dispatchers this can impact actor responsiveness/availability. The implementation warns callers to use a dedicated blocking dispatcher or otherwise ensure actor threads are not starved. Consider reviewing dispatcher configuration where the limiter will be used.

Deployment notes

No code-level changes to CI/CD or infrastructure were detected in the modified files. A normal pipeline deployment should suffice; there is no explicit code that requires a special manual pipeline trigger, new environment variables, or ECS redeploy beyond the usual release deployment.

Reviewer / maintainer guidance

Behavior change risk:
- The stricter validation/rejection of out-of-range params and the facets-without-query restriction are breaking for clients that relied on the prior clamping/leniency—call out to API consumers may be necessary.
- QueryBuilder changes may alter search relevance; validate search QA for representative queries.
ConcurrencyLimiter usage:
- Ensure any use of ConcurrencyLimiter runs on an appropriate dispatcher (blocking) and that timeoutSeconds and maxConcurrent are tuned for the deployment environment to avoid blocking important threads.
Merge conflicts:
- Reviewer notes indicate this branch had merge conflicts with main related to ebook removal in a separate PR (Remove retired ebook functionality #103). The author should rebase and resolve conflicts; deleted ebook test files can likely be accepted, but some test expectations will need judgment about ebook-specific vs. item-specific behavior (tests in ParamValidatorTest, InvalidParamsTest, and related files were updated here and may conflict with upstream deletions).
Tests:
- Updated/added tests reflect the new semantics. Ensure the CI test run covers end-to-end routes and Elasticsearch-mapped behavior.

Files of interest (high-level)

Added: src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
Modified (key): src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala, src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
Tests updated/added across src/test/scala/... (ParamValidatorTest, QueryBuilderTest, ConcurrencyLimiterTest, InvalidParamsTest, and others)

If you want, I can:

Produce a short migration/consumer-notice draft describing the validation and search behavior changes for external API consumers.
Run a targeted search-compatibility checklist (examples of queries likely to change) or produce suggested dispatcher config for safe ConcurrencyLimiter usage.

coderabbitai · 2025-12-09T14:00:18Z

Walkthrough

Adds a semaphore-backed ConcurrencyLimiter, tightens and restructures search parameter validation, replaces keyword query generation from Elasticsearch query_string to multi_match, and updates/extends tests and field-mapping checks.

Changes

Cohort / File(s)	Summary
Concurrency Limiting `src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`, `src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala`	New `ConcurrencyLimiter` using a `Semaphore` with timeout-acquire semantics and `ConcurrencyLimitExceeded` error; tests cover permit lifecycle, synchronous/failing Future behavior, and constructor validation.
Parameter Validation `src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`, `src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`, `src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala`	Reduced limits for facet/page sizes, added `maxFetchBatchSize`, stricter filter parsing and whitespace handling, cross-field facet validation option (requires query/filter), and `validIntWithRange` now rejects values > max. Tests updated for stricter rejection behavior and facet requirements.
Query Building `src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`, `src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala`	Switched keyword queries from `query_string` to `multi_match` (`best_fields`, `lenient`) and threaded `op` through query builders; changed filter representation to a raw `JsArray` and adjusted exact-match preprocessing and boosted field set. Tests updated to new JSON paths and assertions.
Tests & Helpers `src/test/scala/dpla/api/helpers/FileReader.scala`, `src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala`, `src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala`	FileReader now uses explicit UTF-8 and ensures closure. End-to-end tests updated to include ebook registry wiring, new route assertions, and a route test timeout. Added field-mapping tests for keyword and exact-match fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

mdellabitta

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Phase 2 of API improvements' is vague and generic, using non-descriptive terms that do not convey the specific nature of the changes (concurrency limiting, parameter validation tightening, query builder refactoring, etc.).	Consider a more specific title that captures the main technical improvement, e.g., 'Add concurrency limiting and tighten parameter validation' or a similar descriptive phrase.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/api-phase-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/test/scala/dpla/api/helpers/FileReader.scala (1)

8-10: Resource handling is correct; Using could simplify (if Scala 2.13+).

The explicit UTF‑8 codec and try/finally with buffered.close() look correct and fix the resource‑leak risk. If the project is on Scala 2.13+, you could optionally rewrite this with scala.util.Using for a more idiomatic one-liner, but it’s not required.
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
184-194: Consider removing explicit cast.

The .asInstanceOf[JsArray] cast is safe but could be avoided by using the type system.
-  private def filterQuery(filters: Seq[Filter]): JsArray =
-    filters
-      .map(filter => {
-        JsObject(
-          "term" -> JsObject(
-            filter.fieldName -> filter.value.toJson
-          )
-        )
-      })
-      .toJson
-      .asInstanceOf[JsArray]
+  private def filterQuery(filters: Seq[Filter]): JsArray =
+    JsArray(filters.map { filter =>
+      JsObject(
+        "term" -> JsObject(
+          filter.fieldName -> filter.value.toJson
+        )
+      )
+    }: _*)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (1)
55-72: Narrow exception handling when evaluating f and consider documenting interruption semantics

Two small robustness points in apply:

When evaluating f, you currently catch Throwable. In Scala it’s more idiomatic to catch NonFatal so that truly fatal errors (e.g., VirtualMachineError, ThreadDeath) are not transformed into failed Futures:
+import scala.util.control.NonFatal
 ...
-    } else {
-      try {
-        val future = f
-        future.andThen { case _ => semaphore.release() }(ec)
-      } catch {
-        case e: Throwable =>
-          semaphore.release()
-          Future.failed(e)
-      }
-    }
+    } else {
+      try {
+        val future = f
+        future.andThen { case _ => semaphore.release() }(ec)
+      } catch {
+        case NonFatal(e) =>
+          semaphore.release()
+          Future.failed(e)
+      }
+    }
If the thread is interrupted while waiting in semaphore.tryAcquire(timeoutSeconds, TimeUnit.SECONDS), an InterruptedException will currently bubble out of apply instead of becoming a Future failure (and the interrupt flag isn’t restored). It might be worth deciding and documenting whether you want to:

Let that propagate as-is (and mention it in the Scaladoc), or

Catch it and convert to a failed Future (possibly re‑interrupting the thread).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d79d40 and 8f9a684.

📒 Files selected for processing (13)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (1 hunks)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (18 hunks)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (11 hunks)
src/test/scala/dpla/api/helpers/FileReader.scala (1 hunks)
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala (8 hunks)
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala (4 hunks)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (2 hunks)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala (2 hunks)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala (2 hunks)
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1 hunks)
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala (2 hunks)
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (27 hunks)
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (23 hunks)

🧰 Additional context used

🧬 Code graph analysis (9)

src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala (2)

src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)

MockEbookSearch (11-57)

src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)

MockEbookRegistry (11-30)

src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (2)

availablePermits (78-78)

ConcurrencyLimitExceeded (84-90)

src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala (1)

src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)

MockEbookRegistry (11-30)

src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (5)

search (19-33)

search (35-37)

search (39-41)

search (43-46)

search (48-51)

src/main/scala/dpla/api/v2/search/mappings/JsonFieldReader.scala (6)

readObject (16-29)

readString (34-37)

readObjectArray (31-34)

readInt (40-43)

readStringArray (37-40)

readBoolean (43-49)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)

query (128-161)

filterQuery (184-205)

src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala (6)

src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)

MockEbookRegistry (11-30)

src/test/scala/dpla/api/v2/registry/MockSmrRegistry.scala (1)

MockSmrRegistry (10-29)

src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)

MockEbookSearch (11-57)

src/test/scala/dpla/api/v2/smr/MockSmrRequestHandler.scala (1)

MockSmrRequestHandler (8-42)

src/main/scala/dpla/api/v2/smr/SmrProtocol.scala (1)

SmrProtocol (6-44)

src/main/scala/dpla/api/v2/smr/SmrParamValidator.scala (1)

validService (85-90)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (2)

src/main/scala/dpla/api/v2/search/SearchProtocol.scala (14)

search (61-64)

search (66-70)

search (72-75)

search (77-80)

search (82-86)

search (88-91)

search (93-97)

search (99-104)

search (106-109)

search (111-115)

search (117-121)

search (123-127)

search (129-132)

search (134-138)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (5)

TextField (7-7)

IntField (9-9)

URLField (11-11)

DateField (13-13)

WildcardField (17-17)

src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (2)

src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)

MockEbookSearch (11-57)

src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)

MockEbookRegistry (11-30)

src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala (3)

src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)

MockEbookRegistry (11-30)

src/test/scala/dpla/api/v2/registry/MockPssRegistry.scala (1)

MockPssRegistry (11-30)

src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)

MockEbookSearch (11-57)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)

getElasticSearchField (64-67)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Agent

🔇 Additional comments (26)

src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala (1)

17-18: Good addition of test timeout.

The explicit 3-second timeout prevents tests from hanging indefinitely and is appropriate for mock-based end-to-end tests.

Also applies to: 33-34

src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (1)

17-18: LGTM! Route test timeout added for test stability.

The RouteTestTimeout(3.seconds) addition helps prevent test flakiness by providing an explicit timeout. This is a good practice for Akka HTTP route tests.

Also applies to: 33-34

src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala (1)

17-18: LGTM! Consistent timeout configuration with other test files.

The route test timeout addition follows the same pattern established in other test files in this PR.

Also applies to: 33-34

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (3)

110-110: Verify track_total_hits value aligns with pagination depth.

The value 10000 is a reasonable optimization to avoid counting all documents. However, given that ParamValidator now limits pagination to 1000 items (fromValue + pageSize > 1000), you could consider lowering this to 1000 or keeping it at 10000 for safety margin. The current value works correctly but is 10x the maximum accessible records.

163-178: Good migration from query_string to multi_match.

Using multi_match with best_fields and lenient: true is a safer and more predictable approach than query_string. This avoids potential Lucene query syntax injection issues while maintaining effective full-text search capability.

113-123: Verify reduced field set meets search requirements.

The field set has been curated to high-value fields with reasonable boost weights. Ensure that removing previously included fields doesn't negatively impact search recall for your use cases.

src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala (1)

20-21: LGTM! Consistent test infrastructure updates.

Route test timeout added following the same pattern as other test files in this PR.

Also applies to: 38-39

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (3)

125-132: Breaking change: Stricter pagination and facet limits.

The limits have been significantly reduced:

maxFacetSize: 2000 → 200

maxPage: 100 → 10

maxPageSize: 500 → 100

Ensure API consumers are aware of these changes, as existing integrations using higher values will now receive capped results or errors.

249-256: Good addition: Requiring query or filter with facets.

This validation prevents expensive facet-only queries that scan the entire index. This is a sensible performance safeguard.

431-439: Good improvement: Empty string handling.

Treating empty/whitespace-only values as absent parameters (None) is a sensible approach that prevents unnecessary validation errors.

src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)

273-277: LGTM: Updated track_total_hits expectation from Boolean to Int.

The test now expects track_total_hits to be 10000 instead of a boolean value, aligning with the QueryBuilder.scala changes that limit total hit tracking for performance optimization.

287-311: LGTM: Tests updated to use multi_match instead of query_string.

The keyword query tests now correctly traverse the multi_match path instead of query_string, reflecting the query builder's switch to multi_match queries. The assertions for query, fields, operator, and lenient are properly adjusted.

930-937: LGTM: Filter query path updated to match new query structure.

The filter query test now reads from "query", "bool", "filter" path directly, consistent with the simplified filter structure in the updated QueryBuilder.scala.

src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala (1)

35-40: LGTM: Typed actor system integration and route timeout added.

The addition of typedSystem and createActorSystem() override enables proper typed/classic actor system interop, and the 3-second RouteTestTimeout provides reasonable test timing. This pattern is consistent with other test files updated in this PR.

src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala (1)

36-40: LGTM: Test updated to include required query parameter.

The addition of "q" -> "test" aligns with the new validation rule that requires a query or filter when facets are specified (as tested in ParamValidatorTest.scala at lines 322-326). The test still correctly verifies that non-applicable DPLA Map fields are ignored.

src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (4)

321-326: LGTM: New test validates facets require query or filter.

This test ensures that requests with facets but no accompanying query (q) or filter are rejected. This prevents expensive facet-only aggregation queries, which is a sensible performance guard.

353-360: LGTM: Updated max facet_size expectation to 200.

The expected maximum facet_size is now 200 (reduced from 2000), reflecting the stricter limits in the updated ParamValidator.scala.

581-592: LGTM: Pagination validation tests updated for stricter limits.

The tests now verify:

Too-large page values (600) result in InvalidSearchParams rather than capping to max

Deep pagination beyond 1000 results (page=50 × page_size=50 = 2500) is rejected

These stricter limits protect against expensive deep-pagination queries on Elasticsearch.

619-624: LGTM: Updated page_size validation to reject rather than cap.

The test now expects InvalidSearchParams for excessively large page_size values instead of silently capping to the maximum. This explicit rejection provides clearer feedback to API consumers.

src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (4)

1-22: LGTM: Well-structured test setup with appropriate async configuration.

The test class properly mixes in ScalaFutures and Eventually for async testing, with a reasonable PatienceConfig (5s timeout, 50ms polling interval).

82-119: LGTM: Thorough concurrency limit enforcement test.

This test properly validates:

Two permits are acquired when two Futures are in-flight

A third request fails with ConcurrencyLimitExceeded after timeout

Permits are released when the original Futures complete

Using Promise to control Future completion timing is the correct approach for testing concurrency limits.

146-162: LGTM: Constructor validation tests for edge cases.

Good coverage of invalid constructor arguments (zero and negative values for both maxConcurrent and timeoutSeconds), ensuring the limiter fails fast with clear errors.

63-80: Permit release on Future construction throw is synchronous and safe.

The implementation at lines 64-71 of ConcurrencyLimiter.scala confirms the permit is released synchronously: when val future = f throws (line 65), the catch block at line 68-70 immediately calls semaphore.release() before returning the failed Future. The test assertion at line 79 is correct and not flaky.

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (3)

6-24: Docs correctly call out blocking behavior and operational guidance

The Scaladoc explicitly flags the blocking tryAcquire semantics and suggests using a dedicated dispatcher / tuning parameters, which matches the implementation below and should help prevent misuse in Akka/HTTP paths.

25-38: Constructor validation and semaphore setup look correct

The require guards on maxConcurrent and timeoutSeconds and the corresponding Semaphore(maxConcurrent) initialization are consistent with the intended concurrency cap and error reporting; no changes needed here.

81-90: Exception type is informative and matches limiter configuration

ConcurrencyLimitExceeded carries both maxConcurrent and timeoutSeconds and renders a clear message, which should be helpful in logs and client error handling.

Copilot

Pull request overview

This PR implements Phase 2 of API improvements focused on search performance optimization and request limiting. Key changes include migrating from Elasticsearch's query_string to multi_match queries, enforcing stricter pagination limits to prevent deep pagination, requiring queries or filters for facet requests, and introducing a new ConcurrencyLimiter class for managing concurrent request loads.

Key Changes

Search query optimization: Replaced query_string with multi_match for keyword queries, reduced searchable fields from 27 to 8 high-value fields, and simplified filter query structure
Pagination limits: Reduced max page from 100 to 10, max page_size from 500 to 100, max facet_size from 2000 to 200, and added validation to reject requests exceeding 1000 total results (from + size)
Concurrency control: Added new ConcurrencyLimiter class with comprehensive test coverage to manage concurrent request loads using semaphores

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala	Migrated from query_string to multi_match queries, reduced searchable fields, simplified filter structure, and changed track_total_hits from boolean to 10000
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala	Added deep pagination validation, facet requirement checks, reduced max pagination limits, and added empty string handling in parameter validation
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala	New class implementing semaphore-based concurrency limiting for Futures with timeout support
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala	Comprehensive test coverage for ConcurrencyLimiter including edge cases and failure scenarios
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala	Updated tests to validate multi_match queries, new filter structure, and track_total_hits value
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala	Updated tests to validate new pagination limits, facet requirements, and deep pagination rejection
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala	Updated test to require query parameter with facets
src/test/scala/dpla/api/v2/httpHeadersAndMethods/*.scala	Added route test timeouts and reformatted for consistency
src/test/scala/dpla/api/v2/endToEnd/*.scala	Added route test timeouts and reformatted for consistency
src/test/scala/dpla/api/helpers/FileReader.scala	Improved resource handling with try-finally and explicit UTF-8 encoding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
196-274: Add defensive error handling to basic field query branch for consistency

The basic field query branch (else block, lines 268-274) silently falls back to keywordQueryFields when getElasticSearchField() returns None, whereas the .before/.after and exact-match branches throw a RuntimeException for unrecognized field names. Although upstream validation in ParamValidator.scala ensures that only valid, searchable field names reach singleFieldQuery() (via searchableDataFields.flatMap(getValidFieldQuery(...))), the inconsistent error handling makes the code less defensive and harder to debug if upstream validation changes or if this method is called directly in the future.

For consistency and defensive programming, adopt the same pattern as other branches:
-    } else {
-      // Basic field query
-      val fields: Seq[String] =
-        Seq(getElasticSearchField(fieldQuery.fieldName)).flatten
-      val obj: JsObject = keywordQuery(fieldQuery.value, fields)
-      Seq(obj)
-    }
+    } else {
+      // Basic field query
+      val field: String = getElasticSearchField(fieldQuery.fieldName)
+        .getOrElse(
+          throw new RuntimeException(
+            "Unrecognized field name: " + fieldQuery.fieldName
+          )
+        )
+      val obj: JsObject = keywordQuery(fieldQuery.value, Seq(field))
+      Seq(obj)
+    }
This aligns all branches to fail fast on invalid field names rather than silently changing query semantics.

🧹 Nitpick comments (2)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)
180-195: Avoid unsafe cast in filterQuery

filterQuery currently relies on .toJson.asInstanceOf[JsArray] for the sequence of term filters, which is brittle and ties correctness to the implementation detail that Seq always serializes to JsArray.

You can build the JsArray directly and drop the cast:
-  private def filterQuery(filters: Seq[Filter]): JsArray =
-    filters
-      .map(filter => {
-        JsObject(
-          "term" -> JsObject(
-            filter.fieldName -> filter.value.toJson
-          )
-        )
-      })
-      .toJson
-      .asInstanceOf[JsArray]
+  private def filterQuery(filters: Seq[Filter]): JsArray =
+    JsArray(
+      filters.map { filter =>
+        JsObject(
+          "term" -> JsObject(
+            filter.fieldName -> filter.value.toJson
+          )
+        )
+      }: _*
+    )
This is safer and keeps the return type explicit.

276-283: Minor naming nit: stripLeadingAndTrainingQuotationMarks

The method name still says Training while the doc comment (Lines 276-278) correctly says “leading and trailing quotation marks.” Consider renaming the method to stripLeadingAndTrailingQuotationMarks for clarity the next time you touch this area.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f9a684 and 0bbdc68.

📒 Files selected for processing (2)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (18 hunks)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (11 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala

🧰 Additional context used

🧬 Code graph analysis (1)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)

getElasticSearchField (64-67)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Prepare

🔇 Additional comments (6)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (6)

16-43: Message routing in apply is clear and side‑effect usage is confined

The actor behavior cleanly delegates to the next phase for search, fetch (single vs multi), and random queries, and uses Behaviors.same / Behaviors.unhandled appropriately. No issues spotted here.

67-93: Random query composition with optional filters looks correct

The new composeRandomQuery implementation correctly wraps optional filters in a bool.filter and nests that under function_score, while keeping the unfiltered case minimal. The use of size = 1 matches the intent of a single random hit. No functional problems here.

96-111: Bool query assembly and top‑level search structure are sound

The refactored composeSearchQuery / query pair correctly handle:

Pagination (from/size),

Combining keyword and field queries into a bool must/should depending on op,

Attaching filters as a separate filter clause,

Falling back to match_all when both terms and filters are absent,

Adding track_total_hits = 10000 to cap expensive hit tracking.

The structure matches typical ES expectations and looks consistent with the rest of the builder.

Also applies to: 128-161

113-123: Keyword multi‑match configuration and curated field list look reasonable

The reduced keywordQueryFields set with boosts, combined with multi_match using best_fields, operator = "AND", and lenient = true, forms a coherent default keyword query strategy. The targetFields fallback ensures calls without explicit fields still use this curated set.

Also applies to: 163-178

285-381: Facet (aggs) composition for geo and date fields looks consistent

The aggs builder correctly distinguishes:

Spatial facets using geo_distance buckets with a generated ranges array (Lines 302-307),

Date facets using a filtered date_histogram with month/year granularity and corresponding formats and gte windows (Lines 327-340),

Regular term facets via terms on exact‑match fields.

The control flow and error handling for unknown facet names are consistent with the rest of the file. No issues stood out here.

383-447: Sort and field retrieval defaults are reasonable

The sort function cleanly handles geo sort, field‑based sort, and falls back to _doc when there is no query/sort, which is appropriate for “no‑order” browsing.

fieldRetrieval returning Seq("*").toJson when no fields are specified gives a sensible default of retrieving the full source.

No functional concerns in these sections.

Also applies to: 449-453

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)
62-75: Consider verifying exact field mappings.

The test only checks that the count of mapped fields matches the count of keyword fields. If one field fails to map and is silently dropped, the test wouldn't catch it. Consider using contain allElementsOf to verify that all expected fields are present.

Apply this diff to make the test more precise:
     "have searchable keywordQuery fields defined in ES mapping" in {
       val keywordFields = Seq(
         "sourceResource.title",
         "sourceResource.subject.name",
         "sourceResource.description",
         "sourceResource.creator",
         "sourceResource.collection.title",
         "dataProvider.name",
         "provider.name"
       )
 
       val mapped = keywordFields.flatMap(tester.getElasticSearchField)
-      mapped.size shouldBe keywordFields.size
+      mapped should have size keywordFields.size
+      // Optionally verify all fields mapped successfully
+      keywordFields.foreach { field =>
+        tester.getElasticSearchField(field) should not be empty
+      }
     }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bbdc68 and 0e2a1b7.

📒 Files selected for processing (3)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (11 hunks)
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1 hunks)
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (23 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (2)

getElasticSearchField (64-67)

getElasticSearchExactMatchField (75-81)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)

getElasticSearchField (64-67)

src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (5)

search (19-33)

search (35-37)

search (39-41)

search (43-46)

search (48-51)

src/main/scala/dpla/api/v2/search/mappings/JsonFieldReader.scala (6)

readObject (16-29)

readString (34-37)

readObjectArray (31-34)

readInt (40-43)

readStringArray (37-40)

readBoolean (43-49)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)

query (127-160)

filterQuery (183-205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Agent

🔇 Additional comments (11)

src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)

77-91: LGTM!

The test correctly verifies that exact match fields use not_analyzed subfields. The use of foreach ensures each field is tested individually, which provides clear failure messages.

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (7)

113-122: LGTM!

The reduced set of keyword query fields with appropriate boost weights aligns well with cultural heritage search patterns. Prioritizing title (^2) over other fields is a sensible default.

162-177: LGTM!

The migration from query_string to multi_match simplifies the keyword query behavior and makes it more user-friendly by removing advanced Lucene query syntax. The lenient flag appropriately handles field type mismatches.

183-193: LGTM!

Changing the return type to JsArray correctly reflects Elasticsearch's bool filter syntax, where filters are an array of query clauses.

195-204: LGTM!

The updated comment correctly describes the multi_match query behavior and clearly distinguishes it from the exact-match "term" query.

205-275: LGTM!

The improved error messages and the fix for the function name typo (stripLeadingAndTrailingQuotationMarks) enhance code clarity. The logic for handling range queries, exact matches, and basic field queries is correct.

277-284: LGTM!

The function name typo has been corrected (stripLeadingAndTrailingQuotationMarks), and the comment accurately describes the behavior.

110-110: Pagination depth is already limited by existing parameter validation.

The ParamValidator enforces maxPage = 10 and maxPageSize = 100, which limits the maximum offset to 900 and total accessible results to approximately 1,000. This is more restrictive than the track_total_hits = 10000 setting, so users cannot request results beyond what is already enforced by the parameter validation layer. No additional validation is needed.

src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)

273-277: LGTM!

The test correctly updated to expect track_total_hits as an integer value of 10000, matching the QueryBuilder implementation change.

287-320: LGTM!

The tests correctly updated to read multi_match query structures instead of query_string, and the operator field name change from default_operator to operator matches Elasticsearch's multi_match syntax.

404-509: LGTM!

Excellent comprehensive test coverage for the quote stripping behavior. The tests cover all relevant edge cases: properly quoted strings, strings with internal quotes, strings with only leading or trailing quotes, empty quoted strings, and unquoted strings.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (3)
125-133: Pagination limits and depth guard interplay

The combination of maxPage = 10, maxPageSize = 100, and the fromValue + requestedPageSize > 1000 guard means that, with current constants, no valid combination can actually exceed the 1000 window—the guard is effectively a future‑proof safety net.

To avoid drift if either the ES index.max_result_window or these limits change later, consider either:

deriving the 1000 constant from shared configuration, or

adding a short comment near maxPage/maxPageSize explaining that they’re chosen so from + size <= 1000, and that the guard is an extra safety check.

Also applies to: 238-247

152-158: Fetch ID count capped by maxPageSize may deserve its own constant

Reusing maxPageSize to limit the number of ids in a fetch (ids.size > maxPageSize) is functionally fine, but it couples two concerns: search page size and fetch batch size. If you ever want to change one without the other, this will be surprising.

Consider introducing a dedicated constant (e.g., maxFetchIds) or at least a short comment explaining that the fetch ID limit is intentionally tied to maxPageSize.

Also applies to: 167-170

490-499: validIntWithRange error message hard-codes lower bound 0

validIntWithRange computes (min, max) per param (e.g., page uses (minPage, maxPage) where minPage = 1), but the rule string always says:
val rule = s"$param must be an integer between 0 and $max"
So for page, an input of 0 or a negative value will yield an error message that mentions 0 as the lower bound, even though the actual check is int < min.

Consider interpolating min instead of 0:
-    val rule = s"$param must be an integer between 0 and $max"
+    val rule = s"$param must be an integer between $min and $max"
This keeps behavior the same while making validation errors clearer to API consumers.

Also applies to: 502-507

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2a1b7 and de04ff6.

📒 Files selected for processing (1)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (18 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (1)

src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (5)

TextField (7-7)

IntField (9-9)

URLField (11-11)

DateField (13-13)

WildcardField (17-17)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (2)
297-304: Facet tests correctly enforce q/filter requirement; consider adding filter-only coverage

The facet tests now (a) require facets to be paired with a primary search param (via q) and (b) explicitly reject facets with no query or filter, which matches the tightened validation behavior. To fully exercise the new rule, it would be useful to add a case where facets is combined with filter but no q, to lock in that this combination remains accepted.

Example sketch, placed alongside the existing facet tests:
   "facet validator" should {
@@
     "accept valid param" in {
@@
     }
@@
+    "accept facets with filter but no query" in {
+      val given = "provider.@id"
+      val filter = "provider.@id:http://dp.la/api/contributor/lc"
+      val params = Map("facets" -> given, "filter" -> filter)
+      ebookParamValidator ! RawSearchParams(params, replyProbe.ref)
+      val msg = interProbe.expectMessageType[ValidSearchParams]
+      msg.params.facets shouldEqual Some(Seq("provider.@id"))
+    }
+
     "reject unfacetable field" in {
Also applies to: 313-320, 322-326

353-358: Pagination guard tests align with new behavior; rename specs for clarity

The facet-size and pagination tests now correctly treat oversize values as invalid and the new “reject deep pagination beyond 1000 results” case exercises the depth guard nicely. The only mismatch is that the spec descriptions at Line [579] and Line [617] still say “default to max if param is too large” even though the expectations now assert invalid input; renaming them would keep the tests self-documenting and consistent with the facet-size test.

Suggested rename:
-    "default to max if param is too large" in {
+    "reject param if too large" in {
@@
-    "default to max if param is too large" in {
+    "reject param if too large" in {
Also applies to: 556-563, 579-584, 586-590, 617-622

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de04ff6 and d14bcec.

📒 Files selected for processing (1)

src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (27 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (1)

src/main/scala/dpla/api/v2/search/SearchProtocol.scala (1)

Random (24-27)

🔇 Additional comments (3)

src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (3)

13-16: Readability-only reflow looks good

Rewrapping the class header, long strings, and map literals over multiple lines improves readability without changing behavior; no issues here.

Also applies to: 52-53, 74-79, 434-438, 768-768, 823-824, 834-835, 850-851, 859-860, 869-870

123-124: Consistent field-value extraction across validators

These assertions now all use the same pattern to pull out the stored value for each field, which keeps the tests uniform and easier to scan; behavior is unchanged and looks correct.

Also applies to: 135-136, 147-148, 177-178, 214-215, 244-245, 403-404, 443-444, 473-474, 510-511, 642-643, 761-762, 798-799

875-887: Filter validator assertions are precise and robust

These filter tests now assert both the parsed field name and value for the first clause, which tightens the contract without over-coupling to internal representation. This is a clear improvement with no downsides.

Also applies to: 889-900

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (1)

167-170: Decouple fetch ID limits from search page-size limits.

Lines 167-170 now inherit the tighter maxPageSize bound, so reducing search page_size from 500 to 100 also reduces the fetch batch limit to 100. That couples two separate API contracts and makes future pagination changes a silent fetch breaking change. A dedicated fetch-specific bound would be safer.

Suggested refactor

+  protected val maxFetchIds: Int = 100
@@
-        if (ids.size > maxPageSize)
+        if (ids.size > maxFetchIds)
           throw ValidationException(
-            s"The number of ids cannot exceed $maxPageSize"
+            s"The number of ids cannot exceed $maxFetchIds"
           )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`
around lines 167 - 170, The validator currently enforces fetch ID list size
using the search pagination constant maxPageSize (the block checking ids.size >
maxPageSize and throwing ValidationException), which couples fetch-batch limits
to page_size; introduce a separate constant (e.g., maxFetchIds or fetchIdLimit)
and replace the ids.size check to compare against that new fetch-specific bound,
update any related error message to reference the fetch limit, and ensure the
new constant is used only for the fetch-ID validation so pagination changes do
not affect fetch behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 200-202: The current retrieval using getValid(rawParams, "facets",
validFields) can produce Some(Seq()) when validFields filtered out
ignoredFields; update the logic so empty facet sequences are normalized to None
before further checks or passed into SearchParams. Specifically, replace the
bare use of facets with a normalized value (e.g., map(_.filter(...)) then
.filter(_.nonEmpty) or pattern-match to convert Some(Nil) to None) so downstream
logic (the facets nonEmpty check and the SearchParams construction) only treats
truly non-empty facet lists as facet requests; reference getValid, rawParams,
validFields, facets, ignoredFields, and SearchParams when making the change.
- Around line 489-503: The error message in validIntWithRange currently
hardcodes the lower bound as 0 which is wrong for params like "page" that use
minPage = 1; update the rule string to use the computed min and max variables
(e.g., s"$param must be an integer between $min and $max") so the
ValidationException thrown by validIntWithRange reflects the real allowed range
for params such as facet_size (minFacetSize/maxFacetSize), page
(minPage/maxPage) and page_size (minPageSize/maxPageSize).

---

Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 167-170: The validator currently enforces fetch ID list size using
the search pagination constant maxPageSize (the block checking ids.size >
maxPageSize and throwing ValidationException), which couples fetch-batch limits
to page_size; introduce a separate constant (e.g., maxFetchIds or fetchIdLimit)
and replace the ids.size check to compare against that new fetch-specific bound,
update any related error message to reference the fetch limit, and ensure the
new constant is used only for the fetch-ID validation so pagination changes do
not affect fetch behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6592d3da-e02b-46b4-8fbf-30ffae5c91f5

📥 Commits

Reviewing files that changed from the base of the PR and between d14bcec and 6708044.

📒 Files selected for processing (1)

src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala

DominicBM · 2026-04-13T22:56:11Z

@coderabbitai full review

coderabbitai · 2026-04-13T22:56:17Z

✅ Actions performed

Full review triggered.

DominicBM · 2026-04-13T23:44:40Z

@coderabbitai full review

coderabbitai · 2026-04-13T23:44:46Z

✅ Actions performed

Full review triggered.

DominicBM · 2026-04-14T16:46:45Z

@coderabbitai full review

coderabbitai · 2026-04-14T16:46:53Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (1)
25-79: Unify concurrency limiting to a single implementation.

This utility duplicates the behavior already present in src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala (Lines 95-115). Keeping both paths increases drift risk (timeouts, exception type/message, metrics semantics). Consider routing ES calls through ConcurrencyLimiter so there is one source of truth.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala` around lines 25 -
79, The ConcurrencyLimiter implementation duplicates semaphore/timeout behavior
in ElasticSearchClient; refactor ElasticSearchClient to delegate to
ConcurrencyLimiter instead of keeping its own semaphore logic: remove the
bespoke tryAcquire/release and timeout handling in ElasticSearchClient and call
ConcurrencyLimiter.apply[T](...) when running ES futures (inject or construct a
ConcurrencyLimiter instance in ElasticSearchClient), preserve timeoutSeconds and
maxConcurrent semantics and the ConcurrencyLimitExceeded failure, and use
ConcurrencyLimiter.availablePermits for any monitoring/metrics formerly reading
the ES semaphore so there is a single source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 55-71: The apply method currently calls
semaphore.tryAcquire(timeoutSeconds, TimeUnit.SECONDS) and catches all
Throwables around evaluating f; change it so tryAcquire is wrapped in a
try/catch that catches InterruptedException, restores the thread interrupt
status (Thread.currentThread().interrupt()) and returns Future.failed with an
appropriate exception (e.g., ConcurrencyLimitExceeded or a new
InterruptedException-wrapped failure) instead of letting it escape
synchronously, and replace the broad catch { case e: Throwable => ... } around
evaluating f with a catch for scala.util.control.NonFatal to avoid swallowing
fatal errors; keep the existing semaphore.release() logic in both the failure
and successful future branches (using future.andThen { case _ =>
semaphore.release() }(ec)) so the semaphore is always released.

In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 167-170: The check inside ParamValidator (currently using
maxPageSize) should use a new constant for fetch batching so fetch size isn't
coupled to search pagination; add a new constant (e.g., maxFetchBatchSize)
alongside maxPageSize, update the ID validation in getFetchIds (the block that
throws ValidationException when ids.size > maxPageSize) to compare against
maxFetchBatchSize, and keep the ValidationException message updated to reference
the fetch limit; ensure other references to maxPageSize remain unchanged so only
getFetchIds uses the new constant.
- Around line 345-355: The validator currently accepts filters lacking a ':'
because it uses headOption/lastOption on parts from split(":", 2); change the
logic in ParamValidator.scala so you explicitly require exactly two parts (e.g.,
check parts.length == 2 or pattern-match parts.toList as field :: rest :: Nil)
before assigning fieldName and values, and throw ValidationException(s"$filter
is not a valid filter") if the separator is missing or the second part is empty;
keep the existing trimming/splitting on "AND" for values and retain the final
empty-values check.

In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 135-137: The bug: the outer bool uses op ("AND"/"OR") but keyword
and basic field queries always build a multi_match with "operator": "AND", so
op="OR" does nothing. Fix by propagating the op into the multi_match/operator
parameter when building keyword and basic field queries: update the
calls/definitions that produce the multi_match (e.g., keywordQuery,
keywordQueryFields, filterQuery and the functions that build basic field queries
around q and single-field queries) so they accept the op (or compute operator =
if (op == "OR") "or" else "and") and set "operator" accordingly; apply the same
change in the other occurrences noted (blocks around lines 165-174, 205-208,
269-274) so multi_match operator reflects op in all call sites.

In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`:
- Around line 586-590: The test never exercises the depth-check because
validIntWithRange (bounded by maxPage) rejects page=50 first; update the test to
only fail the combined-depth guard by either constructing a test-specific
ebookParamValidator configured with larger maxPage/maxPageSize or choose params
that respect validIntWithRange but violate the from+size check (e.g., set page
to the current maxPage value used by validIntWithRange and set page_size high
enough so (page-1)*page_size + page_size > 1000), and then send RawSearchParams
to ebookParamValidator and assert
replyProbe.expectMessageType[InvalidSearchParams].

---

Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 25-79: The ConcurrencyLimiter implementation duplicates
semaphore/timeout behavior in ElasticSearchClient; refactor ElasticSearchClient
to delegate to ConcurrencyLimiter instead of keeping its own semaphore logic:
remove the bespoke tryAcquire/release and timeout handling in
ElasticSearchClient and call ConcurrencyLimiter.apply[T](...) when running ES
futures (inject or construct a ConcurrencyLimiter instance in
ElasticSearchClient), preserve timeoutSeconds and maxConcurrent semantics and
the ConcurrencyLimitExceeded failure, and use
ConcurrencyLimiter.availablePermits for any monitoring/metrics formerly reading
the ES semaphore so there is a single source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 53b6a041-f18a-4462-8634-ef45595d77a3

📥 Commits

Reviewing files that changed from the base of the PR and between 25a30bf and f5f699c.

📒 Files selected for processing (14)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala

DominicBM · 2026-04-14T23:07:11Z

@coderabbitai resume

DominicBM · 2026-04-16T03:54:05Z

@coderabbitai full review

coderabbitai · 2026-04-16T03:54:10Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)
82-119: Add an explicit lazy-evaluation assertion for the 3rd request.

This test proves timeout behavior, but it doesn’t assert the call-by-name contract (f not evaluated when permit isn’t acquired). Consider passing an expression with a side effect and asserting the side effect did not occur when the limiter rejects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala` around lines
82 - 119, The test should also assert that the limiter uses call-by-name and
does not evaluate the passed expression when a permit is not acquired: modify
the third invocation of ConcurrencyLimiter.apply (the call written as
limiter(promises(2).future) / limiter(...)) to pass an expression with a visible
side effect (for example incrementing a mutable counter or setting a flag) and
after the failed future assertion verify that the side effect did NOT occur;
keep using the existing assertions on the thrown ConcurrencyLimitExceeded
(maxConcurrent and timeoutSeconds) and the same promises for the first two
futures so you only add the side-effect check around the third request.
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (1)
33-33: Consider centralizing route test timeout configuration.

RouteTestTimeout(3.seconds) is now repeated across multiple suites. Moving this into a shared test trait/object would reduce duplication and keep timeout tuning consistent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala`
at line 33, Multiple test suites duplicate the implicit val routeTestTimeout:
RouteTestTimeout = RouteTestTimeout(3.seconds); extract this into a single
shared test trait or object (e.g., TestTimeouts or RouteTestSupport) that
defines the implicit val routeTestTimeout: RouteTestTimeout =
RouteTestTimeout(3.seconds), then have HeaderAuthorizationTest and other suites
mix in or extend that trait/object and remove the local implicit definitions so
all tests use the centralized RouteTestTimeout configuration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 240-247: The new validation unconditionally rejects any facet-only
request (using hasFacets/hasQuery/hasFilter and throwing ValidationException),
which breaks existing facet-only usage; modify the guard in ParamValidator
(around hasFacets/hasQuery/hasFilter) so it only rejects unsupported/unbounded
facet types instead of all facet-only requests — implement a helper (e.g.,
isUnboundedFacet or facetRequiresQuery) that inspects each facet in facets and
change the condition to throw only when facets.exists(isUnboundedFacet) &&
!hasQuery && !hasFilter; keep supported facet-only cases (including the
coordinates/other tested facets) passing as before.

---

Nitpick comments:
In
`@src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala`:
- Line 33: Multiple test suites duplicate the implicit val routeTestTimeout:
RouteTestTimeout = RouteTestTimeout(3.seconds); extract this into a single
shared test trait or object (e.g., TestTimeouts or RouteTestSupport) that
defines the implicit val routeTestTimeout: RouteTestTimeout =
RouteTestTimeout(3.seconds), then have HeaderAuthorizationTest and other suites
mix in or extend that trait/object and remove the local implicit definitions so
all tests use the centralized RouteTestTimeout configuration.

In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala`:
- Around line 82-119: The test should also assert that the limiter uses
call-by-name and does not evaluate the passed expression when a permit is not
acquired: modify the third invocation of ConcurrencyLimiter.apply (the call
written as limiter(promises(2).future) / limiter(...)) to pass an expression
with a visible side effect (for example incrementing a mutable counter or
setting a flag) and after the failed future assertion verify that the side
effect did NOT occur; keep using the existing assertions on the thrown
ConcurrencyLimitExceeded (maxConcurrent and timeoutSeconds) and the same
promises for the first two futures so you only add the side-effect check around
the third request.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 695eef19-d0a8-4303-a162-4e3a008e77a4

📥 Commits

Reviewing files that changed from the base of the PR and between 88c693c and 689fd58.

📒 Files selected for processing (14)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala

✅ Files skipped from review due to trivial changes (2)

src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala

🚧 Files skipped from review as they are similar to previous changes (8)

src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala

DominicBM · 2026-04-16T05:00:02Z

@coderabbitai review

coderabbitai · 2026-04-16T05:00:09Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

DominicBM · 2026-04-21T02:58:32Z

This branch has merge conflicts with main due to the ebook removal in #103. The conflicts are in ParamValidator.scala, InvalidParamsTest.scala, ParamValidatorTest.scala, and four ebook test files deleted by #103. The deleted files are straightforward (accept the deletion), but the test-behavior conflicts require judgment about ebook-specific vs. item-specific test expectations. The PR author should rebase and resolve.

…cala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…or.scala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ing (#94) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: moltude <1419766+moltude@users.noreply.github.com>

filter=type: (no value after colon) produced Some(Seq.empty) because getValid silently drops empty strings. hasFilter.isDefined was then true with no filters applied, allowing high-cardinality facet requests past the facet guard. Split ":" once into parts, and use .collect to trim+filter in one pass. Empty values now throw ValidationException before reaching the guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…bounds - facets and fields: add .filter(_.nonEmpty) so Some(Seq.empty) from validFields (which drops ignoredFields entries) normalizes to None, preventing hasFacets/fields.isDefined from being misleadingly true - validIntWithRange: make rule a lazy val so the error string is only allocated on failure paths, not on every successful validation - validIntWithRange: use $min instead of hardcoded 0 in error message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…r for fields - facets: revert .filter(_.nonEmpty); Some(Seq.empty) is the intended return when all requested facet fields are dropped (e.g. valid DPLA Map fields not facetable for items). Fix the guard instead: facets.nonEmpty checks the Option (always true for Some), but facets.exists(_.nonEmpty) correctly checks the inner seq. - fields: keep .filter(_.nonEmpty) — fieldRetrieval(Some(Nil)) sends "_source":[] to ES which suppresses all source fields; None returns ["*"] (all fields) which is the correct fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ter, op propagation - ConcurrencyLimiter: handle InterruptedException from tryAcquire (restore interrupt status, return Future.failed); narrow Throwable catch to NonFatal to avoid swallowing fatal JVM errors - ParamValidator: add maxFetchBatchSize=500 constant so fetch ID batch limit is decoupled from search page_size (which was reduced to 100 in this PR) - ParamValidator: reject filter values without a field:value separator — filter=provider.name now throws ValidationException instead of being silently parsed as Filter("provider.name", "provider.name") - QueryBuilder: propagate op parameter into multi_match operator in keywordQuery and singleFieldQuery — op="OR" was previously a no-op for plain keyword and basic field queries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Revert track_total_hits from 10000 to true: the 10000 cap causes ES to return relation="gte" with value=10000 for large result sets, but the parsers in DPLAMAPJsonFormats and PssJsonFormats only read hits.total.value without checking relation, silently surfacing 10000 as an exact count - Replace asInstanceOf[JsArray] in filterQuery with JsArray(...toVector) to avoid the unsafe runtime cast - Differentiate sort_by_pin error messages: "sort_by_pin is only allowed when sort_by is the coordinates field" vs "The sort_by parameter is required." so callers know whether sort_by is missing or just wrong Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…test - Remove the from+size>1000 guard in ParamValidator: with maxPage=10 and maxPageSize=100, the maximum reachable value is exactly 1000 and the strict >1000 check can never fire; it was a no-op safety net from when the limits were higher - Add test verifying that op="OR" propagates to multi_match.operator field, not just to the outer bool clause Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

requestedPage, requestedPageSize, and fromValue were left behind after removing the pagination depth guard — dead code with no references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…int compatibility The guard rejecting facets without a query/filter was applied to all validators, breaking three integration tests that exercise facet-only requests on /v2/items (coordinates, dataProvider with size limit, basic dataProvider). Item facet-only requests are a documented and supported part of the public API contract. Introduces requiresQueryForFacets (default false) as an overridable protected val in ParamValidator. EbookParamValidator overrides it to true, keeping the restriction exactly where the unit test already specified it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DominicBM · 2026-04-21T07:50:18Z

@coderabbitai full review

coderabbitai · 2026-04-21T07:50:23Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 162-177: The keywordQuery currently silently falls back to
keywordQueryFields when fields is empty; change this to fail fast by enforcing
require(fields.nonEmpty, "keywordQuery requires at least one target field")
inside keywordQuery, and update call sites that pass
Seq(getElasticSearchField(fieldQuery.fieldName)).flatten to check for an empty
sequence and throw a clear exception (matching the range-query/facets pattern)
when a field has no ES mapping; ensure callers either pass an explicit fallback
(keywordQueryFields) or propagate the error so missing field mappings are not
silently broadened.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3547985f-528d-471d-9050-bbe443b4b30e

📥 Commits

Reviewing files that changed from the base of the PR and between 8cb077c and 815dc55.

📒 Files selected for processing (10)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala

✅ Files skipped from review due to trivial changes (2)

src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala

🚧 Files skipped from review as they are similar to previous changes (5)

src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala

coderabbitai · 2026-04-21T07:56:26Z

+  /** A general keyword query on the given fields. Uses multi_match for
+    * best_fields with lenient parsing.
+    */
+  private def keywordQuery(q: String, fields: Seq[String], op: String = "AND"): JsObject = {
+    val targetFields = if (fields.nonEmpty) fields else keywordQueryFields
+
    JsObject(
-      "query_string" -> JsObject(
-        "fields" -> fields.toJson,
+      "multi_match" -> JsObject(
        "query" -> q.toJson,
-        "analyze_wildcard" -> true.toJson,
-        "default_operator" -> "AND".toJson,
+        "fields" -> targetFields.toJson,
+        "type" -> "best_fields".toJson,
+        "operator" -> op.toJson,
        "lenient" -> true.toJson
      )
    )
+  }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether any field in searchableDataFields can return None from getElasticSearchField. ast-grep --pattern $'def getElasticSearchField($_) = $$$' rg -nP --type=scala -C3 '\bsearchableDataFields\b|\bgetElasticSearchField\b'

Repository: dpla/api

Length of output: 10480

Silent fallback to default keyword fields when field mapping is missing.

At line 166, keywordQuery falls back to keywordQueryFields when fields is empty. This silent fallback is inconsistent with how range queries (lines 212–215, 230–233) and facets (line 324–326) handle missing field mappings—they explicitly throw.

In the basic field-query path (line 272), Seq(getElasticSearchField(fieldQuery.fieldName)).flatten produces an empty Seq if the field has no ES mapping. When passed to keywordQuery, this empty Seq triggers the fallback, silently broadening a user's field-scoped query into a general keyword search. While ParamValidator gates on searchableDataFields so this shouldn't occur in practice, the silent fallback masks potential definition mismatches rather than failing fast.

Either add require(fields.nonEmpty, "keywordQuery requires at least one target field") and update call sites to check before calling (or explicitly pass keywordQueryFields for fallback cases), or restructure line 272–273 to throw when the field has no mapping, matching the range-query pattern.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` around lines 162 - 177, The keywordQuery currently silently falls back to keywordQueryFields when fields is empty; change this to fail fast by enforcing require(fields.nonEmpty, "keywordQuery requires at least one target field") inside keywordQuery, and update call sites that pass Seq(getElasticSearchField(fieldQuery.fieldName)).flatten to check for an empty sequence and throw a clear exception (matching the range-query/facets pattern) when a field has no ES mapping; ensure callers either pass an explicit fallback (keywordQueryFields) or propagate the error so missing field mappings are not silently broadened.

DominicBM · 2026-04-21T13:14:13Z

@coderabbitai full review

coderabbitai · 2026-04-21T13:14:23Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
255-267: ⚠️ Potential issue | 🟡 Minor

Splitting on bare "AND" / "OR" can fragment values containing those substrings.

.split("AND") / .split("OR") operate on the raw (post-strip) value with no word boundary. Values like "BRAND NEW", "WORLD HISTORY", "LANDSCAPES", or "LABOR UNIONS" will be split on the embedded AND/OR, producing bogus term subqueries (e.g., "BR" + " NEW"), which silently returns zero/irrelevant matches on the .not_analyzed subfield.

This is pre-existing behavior, but given this PR expands the exact-match test surface (quotes, empty, leading/trailing) it'd be a good moment to either:

split on word-boundaried tokens (e.g., split("\\s+AND\\s+") then split("\\s+OR\\s+")), or

require the quoted-form contract ("\"X\" AND \"Y\"") and split only on the quote-delimited boundary.

At minimum, consider adding a regression test covering a single value containing AND/OR as a substring to lock in the intended behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` around
lines 255 - 267, The current splitting in QueryBuilder.scala (in the block that
builds `values` from `fieldQuery.value`) naively uses .split("AND")/.split("OR")
and therefore breaks tokens that contain those letter sequences; change the
splitting to use word-boundary-aware regexes (e.g., split on "\\s+AND\\s+" and
"\\s+OR\\s+" or use "\\bAND\\b"/"\\bOR\\b") or alternately only split when the
input matches the quoted-pair contract (quote-delimited tokens), then trim and
re-strip quotes as before; update the `values` construction logic in the same
function/method that creates the JsObject `term` entries to use the new
splitting approach and add a regression test that asserts a single value
containing "AND" or "OR" (e.g., "LANDSCAPES", "BRAND NEW") is not split into
multiple terms.

🧹 Nitpick comments (2)

src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)

165-165: Minor: default op = "AND" is dead on both signatures.

Both callers (query at line 136 and singleFieldQuery at line 273) now pass op explicitly, so the default values on keywordQuery and singleFieldQuery are never exercised. If the intent is "callers must thread op," drop the defaults so a future caller can't silently accidentally get "AND" when the request was op=OR.

Proposed diff

-  private def keywordQuery(q: String, fields: Seq[String], op: String = "AND"): JsObject = {
+  private def keywordQuery(q: String, fields: Seq[String], op: String): JsObject = {
@@
-  private def singleFieldQuery(
-      fieldQuery: FieldQuery,
-      exactFieldMatch: Boolean,
-      op: String = "AND"
-  ): Seq[JsObject] =
+  private def singleFieldQuery(
+      fieldQuery: FieldQuery,
+      exactFieldMatch: Boolean,
+      op: String
+  ): Seq[JsObject] =

Also applies to: 204-208

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` at line
165, Remove the dead default "AND" for the op parameter so callers must supply
op explicitly: update the method signatures for keywordQuery (private def
keywordQuery(q: String, fields: Seq[String], op: String = "AND")) and both
singleFieldQuery overloads to remove the default op="AND" (i.e., make op: String
a required parameter), and adjust any internal references if needed; this
prevents silently using "AND" when callers explicitly pass OR and matches
existing callsites (query and singleFieldQuery) that already provide op.

src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)

98-106: Assert rejected work is not evaluated.

This test proves the third request times out, but promises(2).future is harmless if evaluated early. Add a side-effect assertion so the call-by-name contract is covered: when no permit is acquired, f must not be constructed.

Suggested test hardening

-      val future3 = limiter(promises(2).future)
+      var thirdRequestEvaluated = false
+      val future3 = limiter {
+        thirdRequestEvaluated = true
+        promises(2).future
+      }

       whenReady(future3.failed) { ex =>
         ex shouldBe a[ConcurrencyLimitExceeded]
         val cle = ex.asInstanceOf[ConcurrencyLimitExceeded]
         cle.maxConcurrent shouldBe 2
         cle.timeoutSeconds shouldBe 1
       }
+
+      thirdRequestEvaluated shouldBe false

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala` around lines
98 - 106, The test currently passes promises(2).future which can be constructed
eagerly; change the invocation to pass a by-name thunk that has an observable
side effect (e.g. incrementing a var, setting an AtomicBoolean, or throwing if
evaluated) so you can assert the thunk was not executed when the limiter
rejects; update the third request call to use that thunk (referencing limiter
and the third promise/promise index previously used, e.g. promises(2)) and after
asserting the failure (ConcurrencyLimitExceeded) assert the side-effect flag
remains unchanged to prove the work was not evaluated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 26-39: The project has duplicate semaphore logic:
ConcurrencyLimiter is implemented but ElasticSearchClient.withConcurrencyLimit
contains an inline semaphore wrapper; replace that inline logic by instantiating
and using ConcurrencyLimiter (or delegating to its methods) inside
ElasticSearchClient.withConcurrencyLimit so all ES requests go through the
shared ConcurrencyLimiter instance (or remove the inline wrapper and call
ConcurrencyLimiter directly), ensuring you reuse the ConcurrencyLimiter
constructor/timeout behavior and its acquire/release semantics rather than
duplicating semaphore code.

In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala`:
- Around line 93-98: The test "still pass a well-formed key through to the
registry" currently only asserts status should not be StatusCodes.Forbidden and
therefore would incorrectly pass on 404 after switching to /v2/ebooks; update
the assertion inside the check block (the block that follows
Get(s"/v2/ebooks?api_key=$fakeApiKey") ~> Route.seal(routes) ~> check) to also
reject StatusCodes.NotFound (e.g., assert status is neither
StatusCodes.Forbidden nor StatusCodes.NotFound) while keeping the
negative-assertion style and not requiring a 200 OK.
- Around line 11-18: The test suite references removed ebook wiring causing
compile failures: missing MockEbookRegistry, MockEbookSearch,
ebookAnalyticsClient, and a Routes constructor mismatch; either restore full
ebook route support (add back MockEbookRegistry and MockEbookSearch test
doubles, provide a mock ebookAnalyticsClient, and update the Routes class or its
companion to accept the ebook dependency so the call site using the 5-arg Routes
constructor compiles) or remove the ebook-specific imports/usages and tests
(/v2/ebooks) and revert the Routes instantiation to the existing 4-arg
constructor; update imports (remove MockEbookRegistry/MockEbookSearch) and
adjust the test code that constructs Routes and any
SearchCommand/RegistryCommand usages accordingly.

In
`@src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala`:
- Around line 34-39: The test in ItemParamValidatorTest is no longer facet-only
because it adds a "q" param which weakens the contract; update the test to only
include the facets param (remove "q" from the params map) so that invoking
itemParamValidator ! RawSearchParams(params, replyProbe.ref) verifies that
ValidSearchParams.msg.params.facets remains Some(Seq()) after ignored-field
filtering; keep the existing assertions (expectMessageType[ValidSearchParams]
and shouldEqual expected) and references to itemParamValidator, RawSearchParams,
replyProbe, and interProbe unchanged.

In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`:
- Around line 27-28: The test references a missing validator: restore or replace
the EbookParamValidator referenced by ebookParamValidator (and used with
IntermediateSearchResult and interProbe.ref) so the test compiles; either
reintroduce the EbookParamValidator object with the same API used in
ParamValidatorTest (including its constructor/behavior used by testKit.spawn and
the requiresQueryForFacets flag) or change the test to instantiate an existing
validator and add a test-only fixture that sets requiresQueryForFacets = true to
exercise the same behavior currently expected from EbookParamValidator.

---

Outside diff comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 255-267: The current splitting in QueryBuilder.scala (in the block
that builds `values` from `fieldQuery.value`) naively uses
.split("AND")/.split("OR") and therefore breaks tokens that contain those letter
sequences; change the splitting to use word-boundary-aware regexes (e.g., split
on "\\s+AND\\s+" and "\\s+OR\\s+" or use "\\bAND\\b"/"\\bOR\\b") or alternately
only split when the input matches the quoted-pair contract (quote-delimited
tokens), then trim and re-strip quotes as before; update the `values`
construction logic in the same function/method that creates the JsObject `term`
entries to use the new splitting approach and add a regression test that asserts
a single value containing "AND" or "OR" (e.g., "LANDSCAPES", "BRAND NEW") is not
split into multiple terms.

---

Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Line 165: Remove the dead default "AND" for the op parameter so callers must
supply op explicitly: update the method signatures for keywordQuery (private def
keywordQuery(q: String, fields: Seq[String], op: String = "AND")) and both
singleFieldQuery overloads to remove the default op="AND" (i.e., make op: String
a required parameter), and adjust any internal references if needed; this
prevents silently using "AND" when callers explicitly pass OR and matches
existing callsites (query and singleFieldQuery) that already provide op.

In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala`:
- Around line 98-106: The test currently passes promises(2).future which can be
constructed eagerly; change the invocation to pass a by-name thunk that has an
observable side effect (e.g. incrementing a var, setting an AtomicBoolean, or
throwing if evaluated) so you can assert the thunk was not executed when the
limiter rejects; update the third request call to use that thunk (referencing
limiter and the third promise/promise index previously used, e.g. promises(2))
and after asserting the failure (ConcurrencyLimitExceeded) assert the
side-effect flag remains unchanged to prove the work was not evaluated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c9fac11-4b19-45c2-9053-b6ac4dea4b36

📥 Commits

Reviewing files that changed from the base of the PR and between 64d519e and 815dc55.

📒 Files selected for processing (10)

src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
src/test/scala/dpla/api/helpers/FileReader.scala
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala

coderabbitai · 2026-04-21T13:20:01Z

+class ConcurrencyLimiter(
+    val maxConcurrent: Int,
+    val timeoutSeconds: Long
+) {
+  require(
+    maxConcurrent > 0,
+    s"maxConcurrent must be positive, got: $maxConcurrent"
+  )
+  require(
+    timeoutSeconds > 0,
+    s"timeoutSeconds must be positive, got: $timeoutSeconds"
+  )
+
+  private val semaphore = new Semaphore(maxConcurrent)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Expectation: # - Production search/ES code should instantiate or receive ConcurrencyLimiter. # - Legacy inline tryAcquire wrappers should be removed or intentionally documented. printf '\n== ConcurrencyLimiter references ==\n' rg -n -C4 --type scala '\bnew\s+ConcurrencyLimiter\b|\bConcurrencyLimiter\s*\(' printf '\n== Remaining inline semaphore limiter patterns ==\n' rg -n -C6 --type scala '\bwithConcurrencyLimit\s*\(|\.tryAcquire\s*\(|case\s+e:\s*Throwable|case\s+NonFatal'

Repository: dpla/api

Length of output: 12060

Wire ConcurrencyLimiter into the ES request path to eliminate duplication.

ConcurrencyLimiter is defined and tested, but production code still bypasses it. ElasticSearchClient.withConcurrencyLimit (lines 95–115) implements its own semaphore wrapper with identical logic. Either integrate ConcurrencyLimiter into the ES request path or remove the inline implementation to prevent maintenance divergence.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala` around lines 26 - 39, The project has duplicate semaphore logic: ConcurrencyLimiter is implemented but ElasticSearchClient.withConcurrencyLimit contains an inline semaphore wrapper; replace that inline logic by instantiating and using ConcurrencyLimiter (or delegating to its methods) inside ElasticSearchClient.withConcurrencyLimit so all ES requests go through the shared ConcurrencyLimiter instance (or remove the inline wrapper and call ConcurrencyLimiter directly), ensuring you reuse the ConcurrencyLimiter constructor/timeout behavior and its acquire/release semantics rather than duplicating semaphore code.

coderabbitai · 2026-04-21T13:20:01Z

+import dpla.api.v2.registry.{
+  MockEbookRegistry,
+  MockSmrRegistry,
+  SearchRegistryCommand,
+  SmrRegistryCommand
+}
+import dpla.api.v2.search.MockEbookSearch
+import dpla.api.v2.search.SearchProtocol.SearchCommand


⚠️ Potential issue | 🔴 Critical

Resolve the ebook route/test wiring before merging.

This currently blocks compilation: CI reports missing MockEbookRegistry, missing MockEbookSearch, missing ebookAnalyticsClient, and Routes still accepts 4 constructor arguments, not the 5 passed on Lines 60-66. Either restore/add the ebook route support and test mocks end-to-end, or remove/revert the /v2/ebooks coverage from this suite if ebooks were intentionally removed.

Also applies to: 42-51, 60-66, 101-139

🧰 Tools

🪛 GitHub Actions: Scala CI

[error] 11-11: Scala compilation error: object MockEbookRegistry is not a member of package dpla.api.v2.registry

[error] 17-17: Scala compilation error: object MockEbookSearch is not a member of package dpla.api.v2.search

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala` around lines 11 - 18, The test suite references removed ebook wiring causing compile failures: missing MockEbookRegistry, MockEbookSearch, ebookAnalyticsClient, and a Routes constructor mismatch; either restore full ebook route support (add back MockEbookRegistry and MockEbookSearch test doubles, provide a mock ebookAnalyticsClient, and update the Routes class or its companion to accept the ebook dependency so the call site using the 5-arg Routes constructor compiles) or remove the ebook-specific imports/usages and tests (/v2/ebooks) and revert the Routes instantiation to the existing 4-arg constructor; update imports (remove MockEbookRegistry/MockEbookSearch) and adjust the test code that constructs Routes and any SearchCommand/RegistryCommand usages accordingly.

coderabbitai · 2026-04-21T13:20:01Z

    "still pass a well-formed key through to the registry" in {
-      val request = Get(s"/v2/items?api_key=$fakeApiKey")
+      val request = Get(s"/v2/ebooks?api_key=$fakeApiKey")
      request ~> Route.seal(routes) ~> check {
        status should not be StatusCodes.Forbidden
-        status should not be StatusCodes.NotFound
+      }
+    }


⚠️ Potential issue | 🟠 Major

Guard the pass-through test against a missing route.

After switching to /v2/ebooks, this test would still pass on 404 NotFound because it only excludes Forbidden. Keep the negative assertion style, but also reject NotFound.

Proposed test assertion update

request ~> Route.seal(routes) ~> check { status should not be StatusCodes.Forbidden + status should not be StatusCodes.NotFound }

Based on learnings, this test intentionally uses negative assertions and should not assert HTTP 200 because CI may not have backend availability.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"still pass a well-formed key through to the registry" in {

val request = Get(s"/v2/items?api_key=$fakeApiKey")

val request = Get(s"/v2/ebooks?api_key=$fakeApiKey")

request ~> Route.seal(routes) ~> check {

status should not be StatusCodes.Forbidden

status should not be StatusCodes.NotFound

}

}

"still pass a well-formed key through to the registry" in {

val request = Get(s"/v2/ebooks?api_key=$fakeApiKey")

request ~> Route.seal(routes) ~> check {

status should not be StatusCodes.Forbidden

status should not be StatusCodes.NotFound

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala` around lines 93 - 98, The test "still pass a well-formed key through to the registry" currently only asserts status should not be StatusCodes.Forbidden and therefore would incorrectly pass on 404 after switching to /v2/ebooks; update the assertion inside the check block (the block that follows Get(s"/v2/ebooks?api_key=$fakeApiKey") ~> Route.seal(routes) ~> check) to also reject StatusCodes.NotFound (e.g., assert status is neither StatusCodes.Forbidden nor StatusCodes.NotFound) while keeping the negative-assertion style and not requiring a 200 OK.

coderabbitai · 2026-04-21T13:20:01Z

      val given = "sourceResource.subtitle"
      val expected = Some(Seq())
-      val params = Map("facets" -> given)
+      val params = Map("facets" -> given, "q" -> "test")
      itemParamValidator ! RawSearchParams(params, replyProbe.ref)
      val msg = interProbe.expectMessageType[ValidSearchParams]
      msg.params.facets shouldEqual expected


⚠️ Potential issue | 🟡 Minor

Keep this item test facet-only.

Adding q here weakens coverage of the item contract: item facet-only requests should remain valid, and this case specifically verifies facets = Some(Seq.empty) after ignored-field filtering.

Proposed test adjustment

- val params = Map("facets" -> given, "q" -> "test") + val params = Map("facets" -> given)

Based on learnings: facets should preserve Some(Seq.empty), and the facet query/filter requirement must not falsely treat that state as a real facet request.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

val given = "sourceResource.subtitle"

val expected = Some(Seq())

val params = Map("facets" -> given)

val params = Map("facets" -> given, "q" -> "test")

itemParamValidator ! RawSearchParams(params, replyProbe.ref)

val msg = interProbe.expectMessageType[ValidSearchParams]

msg.params.facets shouldEqual expected

val given = "sourceResource.subtitle"

val expected = Some(Seq())

val params = Map("facets" -> given)

itemParamValidator ! RawSearchParams(params, replyProbe.ref)

val msg = interProbe.expectMessageType[ValidSearchParams]

msg.params.facets shouldEqual expected

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala` around lines 34 - 39, The test in ItemParamValidatorTest is no longer facet-only because it adds a "q" param which weakens the contract; update the test to only include the facets param (remove "q" from the params map) so that invoking itemParamValidator ! RawSearchParams(params, replyProbe.ref) verifies that ValidSearchParams.msg.params.facets remains Some(Seq()) after ignored-field filtering; keep the existing assertions (expectMessageType[ValidSearchParams] and shouldEqual expected) and references to itemParamValidator, RawSearchParams, replyProbe, and interProbe unchanged.

coderabbitai · 2026-04-21T13:20:02Z

+  val ebookParamValidator: ActorRef[IntermediateSearchResult] =
+    testKit.spawn(EbookParamValidator(interProbe.ref))


⚠️ Potential issue | 🔴 Critical

Restore or replace the missing EbookParamValidator.

The Scala CI failure is definitive: Line 28 references EbookParamValidator, but that symbol is not defined, so the test suite cannot compile. If ebook validation still exists, add/restore the validator object; if ebooks were removed, switch these tests to an existing validator plus a test-only fixture for the requiresQueryForFacets = true behavior.

🧰 Tools

🪛 GitHub Actions: Scala CI

[error] 28-28: Scala compilation error: not found value EbookParamValidator

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala` around lines 27 - 28, The test references a missing validator: restore or replace the EbookParamValidator referenced by ebookParamValidator (and used with IntermediateSearchResult and interProbe.ref) so the test compiles; either reintroduce the EbookParamValidator object with the same API used in ParamValidatorTest (including its constructor/behavior used by testKit.spawn and the requiresQueryForFacets flag) or change the test to instantiate an existing validator and add a test-only fixture that sets requiresQueryForFacets = true to exercise the same behavior currently expected from EbookParamValidator.

moltude requested a review from Copilot December 9, 2025 13:59

Copilot started reviewing on behalf of moltude December 9, 2025 14:00 View session

moltude marked this pull request as ready for review December 9, 2025 14:00