Conversation
WalkthroughAdds a semaphore-backed ConcurrencyLimiter, tightens and restructures search parameter validation, replaces keyword query generation from Elasticsearch Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/test/scala/dpla/api/helpers/FileReader.scala (1)
8-10: Resource handling is correct;Usingcould simplify (if Scala 2.13+).The explicit UTF‑8 codec and
try/finallywithbuffered.close()look correct and fix the resource‑leak risk. If the project is on Scala 2.13+, you could optionally rewrite this withscala.util.Usingfor a more idiomatic one-liner, but it’s not required.src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
184-194: Consider removing explicit cast.The
.asInstanceOf[JsArray]cast is safe but could be avoided by using the type system.- private def filterQuery(filters: Seq[Filter]): JsArray = - filters - .map(filter => { - JsObject( - "term" -> JsObject( - filter.fieldName -> filter.value.toJson - ) - ) - }) - .toJson - .asInstanceOf[JsArray] + private def filterQuery(filters: Seq[Filter]): JsArray = + JsArray(filters.map { filter => + JsObject( + "term" -> JsObject( + filter.fieldName -> filter.value.toJson + ) + ) + }: _*)src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (1)
55-72: Narrow exception handling when evaluatingfand consider documenting interruption semanticsTwo small robustness points in
apply:
- When evaluating
f, you currently catchThrowable. In Scala it’s more idiomatic to catchNonFatalso that truly fatal errors (e.g.,VirtualMachineError,ThreadDeath) are not transformed into failedFutures:+import scala.util.control.NonFatal ... - } else { - try { - val future = f - future.andThen { case _ => semaphore.release() }(ec) - } catch { - case e: Throwable => - semaphore.release() - Future.failed(e) - } - } + } else { + try { + val future = f + future.andThen { case _ => semaphore.release() }(ec) + } catch { + case NonFatal(e) => + semaphore.release() + Future.failed(e) + } + }
- If the thread is interrupted while waiting in
semaphore.tryAcquire(timeoutSeconds, TimeUnit.SECONDS), anInterruptedExceptionwill currently bubble out ofapplyinstead of becoming aFuturefailure (and the interrupt flag isn’t restored). It might be worth deciding and documenting whether you want to:
- Let that propagate as-is (and mention it in the Scaladoc), or
- Catch it and convert to a failed
Future(possibly re‑interrupting the thread).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala(1 hunks)src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala(18 hunks)src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala(11 hunks)src/test/scala/dpla/api/helpers/FileReader.scala(1 hunks)src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala(8 hunks)src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala(4 hunks)src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala(2 hunks)src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala(2 hunks)src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala(2 hunks)src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala(1 hunks)src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala(2 hunks)src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala(27 hunks)src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala(23 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala (2)
src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)
MockEbookSearch(11-57)src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)
MockEbookRegistry(11-30)
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (2)
availablePermits(78-78)ConcurrencyLimitExceeded(84-90)
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala (1)
src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)
MockEbookRegistry(11-30)
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (5)
search(19-33)search(35-37)search(39-41)search(43-46)search(48-51)src/main/scala/dpla/api/v2/search/mappings/JsonFieldReader.scala (6)
readObject(16-29)readString(34-37)readObjectArray(31-34)readInt(40-43)readStringArray(37-40)readBoolean(43-49)src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)
query(128-161)filterQuery(184-205)
src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala (6)
src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)
MockEbookRegistry(11-30)src/test/scala/dpla/api/v2/registry/MockSmrRegistry.scala (1)
MockSmrRegistry(10-29)src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)
MockEbookSearch(11-57)src/test/scala/dpla/api/v2/smr/MockSmrRequestHandler.scala (1)
MockSmrRequestHandler(8-42)src/main/scala/dpla/api/v2/smr/SmrProtocol.scala (1)
SmrProtocol(6-44)src/main/scala/dpla/api/v2/smr/SmrParamValidator.scala (1)
validService(85-90)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (2)
src/main/scala/dpla/api/v2/search/SearchProtocol.scala (14)
search(61-64)search(66-70)search(72-75)search(77-80)search(82-86)search(88-91)search(93-97)search(99-104)search(106-109)search(111-115)search(117-121)search(123-127)search(129-132)search(134-138)src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (5)
TextField(7-7)IntField(9-9)URLField(11-11)DateField(13-13)WildcardField(17-17)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (2)
src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)
MockEbookSearch(11-57)src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)
MockEbookRegistry(11-30)
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala (3)
src/test/scala/dpla/api/v2/registry/MockEbookRegistry.scala (1)
MockEbookRegistry(11-30)src/test/scala/dpla/api/v2/registry/MockPssRegistry.scala (1)
MockPssRegistry(11-30)src/test/scala/dpla/api/v2/search/MockEbookSearch.scala (1)
MockEbookSearch(11-57)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)
getElasticSearchField(64-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (26)
src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala (1)
17-18: Good addition of test timeout.The explicit 3-second timeout prevents tests from hanging indefinitely and is appropriate for mock-based end-to-end tests.
Also applies to: 33-34
src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (1)
17-18: LGTM! Route test timeout added for test stability.The
RouteTestTimeout(3.seconds)addition helps prevent test flakiness by providing an explicit timeout. This is a good practice for Akka HTTP route tests.Also applies to: 33-34
src/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scala (1)
17-18: LGTM! Consistent timeout configuration with other test files.The route test timeout addition follows the same pattern established in other test files in this PR.
Also applies to: 33-34
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (3)
110-110: Verifytrack_total_hitsvalue aligns with pagination depth.The value
10000is a reasonable optimization to avoid counting all documents. However, given thatParamValidatornow limits pagination to 1000 items (fromValue + pageSize > 1000), you could consider lowering this to1000or keeping it at10000for safety margin. The current value works correctly but is 10x the maximum accessible records.
163-178: Good migration fromquery_stringtomulti_match.Using
multi_matchwithbest_fieldsandlenient: trueis a safer and more predictable approach thanquery_string. This avoids potential Lucene query syntax injection issues while maintaining effective full-text search capability.
113-123: Verify reduced field set meets search requirements.The field set has been curated to high-value fields with reasonable boost weights. Ensure that removing previously included fields doesn't negatively impact search recall for your use cases.
src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala (1)
20-21: LGTM! Consistent test infrastructure updates.Route test timeout added following the same pattern as other test files in this PR.
Also applies to: 38-39
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (3)
125-132: Breaking change: Stricter pagination and facet limits.The limits have been significantly reduced:
maxFacetSize: 2000 → 200maxPage: 100 → 10maxPageSize: 500 → 100Ensure API consumers are aware of these changes, as existing integrations using higher values will now receive capped results or errors.
249-256: Good addition: Requiring query or filter with facets.This validation prevents expensive facet-only queries that scan the entire index. This is a sensible performance safeguard.
431-439: Good improvement: Empty string handling.Treating empty/whitespace-only values as absent parameters (
None) is a sensible approach that prevents unnecessary validation errors.src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)
273-277: LGTM: Updated track_total_hits expectation from Boolean to Int.The test now expects
track_total_hitsto be10000instead of a boolean value, aligning with theQueryBuilder.scalachanges that limit total hit tracking for performance optimization.
287-311: LGTM: Tests updated to usemulti_matchinstead ofquery_string.The keyword query tests now correctly traverse the
multi_matchpath instead ofquery_string, reflecting the query builder's switch tomulti_matchqueries. The assertions forquery,fields,operator, andlenientare properly adjusted.
930-937: LGTM: Filter query path updated to match new query structure.The filter query test now reads from
"query", "bool", "filter"path directly, consistent with the simplified filter structure in the updatedQueryBuilder.scala.src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala (1)
35-40: LGTM: Typed actor system integration and route timeout added.The addition of
typedSystemandcreateActorSystem()override enables proper typed/classic actor system interop, and the 3-secondRouteTestTimeoutprovides reasonable test timing. This pattern is consistent with other test files updated in this PR.src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala (1)
36-40: LGTM: Test updated to include required query parameter.The addition of
"q" -> "test"aligns with the new validation rule that requires a query or filter when facets are specified (as tested inParamValidatorTest.scalaat lines 322-326). The test still correctly verifies that non-applicable DPLA Map fields are ignored.src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (4)
321-326: LGTM: New test validates facets require query or filter.This test ensures that requests with facets but no accompanying query (
q) or filter are rejected. This prevents expensive facet-only aggregation queries, which is a sensible performance guard.
353-360: LGTM: Updated max facet_size expectation to 200.The expected maximum
facet_sizeis now 200 (reduced from 2000), reflecting the stricter limits in the updatedParamValidator.scala.
581-592: LGTM: Pagination validation tests updated for stricter limits.The tests now verify:
- Too-large page values (600) result in
InvalidSearchParamsrather than capping to max- Deep pagination beyond 1000 results (page=50 × page_size=50 = 2500) is rejected
These stricter limits protect against expensive deep-pagination queries on Elasticsearch.
619-624: LGTM: Updated page_size validation to reject rather than cap.The test now expects
InvalidSearchParamsfor excessively largepage_sizevalues instead of silently capping to the maximum. This explicit rejection provides clearer feedback to API consumers.src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (4)
1-22: LGTM: Well-structured test setup with appropriate async configuration.The test class properly mixes in
ScalaFuturesandEventuallyfor async testing, with a reasonablePatienceConfig(5s timeout, 50ms polling interval).
82-119: LGTM: Thorough concurrency limit enforcement test.This test properly validates:
- Two permits are acquired when two Futures are in-flight
- A third request fails with
ConcurrencyLimitExceededafter timeout- Permits are released when the original Futures complete
Using
Promiseto control Future completion timing is the correct approach for testing concurrency limits.
146-162: LGTM: Constructor validation tests for edge cases.Good coverage of invalid constructor arguments (zero and negative values for both
maxConcurrentandtimeoutSeconds), ensuring the limiter fails fast with clear errors.
63-80: Permit release on Future construction throw is synchronous and safe.The implementation at lines 64-71 of
ConcurrencyLimiter.scalaconfirms the permit is released synchronously: whenval future = fthrows (line 65), the catch block at line 68-70 immediately callssemaphore.release()before returning the failed Future. The test assertion at line 79 is correct and not flaky.src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (3)
6-24: Docs correctly call out blocking behavior and operational guidanceThe Scaladoc explicitly flags the blocking
tryAcquiresemantics and suggests using a dedicated dispatcher / tuning parameters, which matches the implementation below and should help prevent misuse in Akka/HTTP paths.
25-38: Constructor validation and semaphore setup look correctThe
requireguards onmaxConcurrentandtimeoutSecondsand the correspondingSemaphore(maxConcurrent)initialization are consistent with the intended concurrency cap and error reporting; no changes needed here.
81-90: Exception type is informative and matches limiter configuration
ConcurrencyLimitExceededcarries bothmaxConcurrentandtimeoutSecondsand renders a clear message, which should be helpful in logs and client error handling.
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 2 of API improvements focused on search performance optimization and request limiting. Key changes include migrating from Elasticsearch's query_string to multi_match queries, enforcing stricter pagination limits to prevent deep pagination, requiring queries or filters for facet requests, and introducing a new ConcurrencyLimiter class for managing concurrent request loads.
Key Changes
- Search query optimization: Replaced
query_stringwithmulti_matchfor keyword queries, reduced searchable fields from 27 to 8 high-value fields, and simplified filter query structure - Pagination limits: Reduced max page from 100 to 10, max page_size from 500 to 100, max facet_size from 2000 to 200, and added validation to reject requests exceeding 1000 total results (from + size)
- Concurrency control: Added new
ConcurrencyLimiterclass with comprehensive test coverage to manage concurrent request loads using semaphores
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala | Migrated from query_string to multi_match queries, reduced searchable fields, simplified filter structure, and changed track_total_hits from boolean to 10000 |
| src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala | Added deep pagination validation, facet requirement checks, reduced max pagination limits, and added empty string handling in parameter validation |
| src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala | New class implementing semaphore-based concurrency limiting for Futures with timeout support |
| src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala | Comprehensive test coverage for ConcurrencyLimiter including edge cases and failure scenarios |
| src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala | Updated tests to validate multi_match queries, new filter structure, and track_total_hits value |
| src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala | Updated tests to validate new pagination limits, facet requirements, and deep pagination rejection |
| src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala | Updated test to require query parameter with facets |
| src/test/scala/dpla/api/v2/httpHeadersAndMethods/*.scala | Added route test timeouts and reformatted for consistency |
| src/test/scala/dpla/api/v2/endToEnd/*.scala | Added route test timeouts and reformatted for consistency |
| src/test/scala/dpla/api/helpers/FileReader.scala | Improved resource handling with try-finally and explicit UTF-8 encoding |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
196-274: Add defensive error handling to basic field query branch for consistencyThe basic field query branch (else block, lines 268-274) silently falls back to
keywordQueryFieldswhengetElasticSearchField()returnsNone, whereas the.before/.afterand exact-match branches throw aRuntimeExceptionfor unrecognized field names. Although upstream validation inParamValidator.scalaensures that only valid, searchable field names reachsingleFieldQuery()(viasearchableDataFields.flatMap(getValidFieldQuery(...))), the inconsistent error handling makes the code less defensive and harder to debug if upstream validation changes or if this method is called directly in the future.For consistency and defensive programming, adopt the same pattern as other branches:
- } else { - // Basic field query - val fields: Seq[String] = - Seq(getElasticSearchField(fieldQuery.fieldName)).flatten - val obj: JsObject = keywordQuery(fieldQuery.value, fields) - Seq(obj) - } + } else { + // Basic field query + val field: String = getElasticSearchField(fieldQuery.fieldName) + .getOrElse( + throw new RuntimeException( + "Unrecognized field name: " + fieldQuery.fieldName + ) + ) + val obj: JsObject = keywordQuery(fieldQuery.value, Seq(field)) + Seq(obj) + }This aligns all branches to fail fast on invalid field names rather than silently changing query semantics.
🧹 Nitpick comments (2)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)
180-195: Avoid unsafe cast infilterQuery
filterQuerycurrently relies on.toJson.asInstanceOf[JsArray]for the sequence of term filters, which is brittle and ties correctness to the implementation detail thatSeqalways serializes toJsArray.You can build the
JsArraydirectly and drop the cast:- private def filterQuery(filters: Seq[Filter]): JsArray = - filters - .map(filter => { - JsObject( - "term" -> JsObject( - filter.fieldName -> filter.value.toJson - ) - ) - }) - .toJson - .asInstanceOf[JsArray] + private def filterQuery(filters: Seq[Filter]): JsArray = + JsArray( + filters.map { filter => + JsObject( + "term" -> JsObject( + filter.fieldName -> filter.value.toJson + ) + ) + }: _* + )This is safer and keeps the return type explicit.
276-283: Minor naming nit:stripLeadingAndTrainingQuotationMarksThe method name still says
Trainingwhile the doc comment (Lines 276-278) correctly says “leading and trailing quotation marks.” Consider renaming the method tostripLeadingAndTrailingQuotationMarksfor clarity the next time you touch this area.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala(18 hunks)src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala(11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
🧰 Additional context used
🧬 Code graph analysis (1)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)
getElasticSearchField(64-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Prepare
🔇 Additional comments (6)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (6)
16-43: Message routing inapplyis clear and side‑effect usage is confinedThe actor behavior cleanly delegates to the next phase for search, fetch (single vs multi), and random queries, and uses
Behaviors.same/Behaviors.unhandledappropriately. No issues spotted here.
67-93: Random query composition with optional filters looks correctThe new
composeRandomQueryimplementation correctly wraps optional filters in abool.filterand nests that underfunction_score, while keeping the unfiltered case minimal. The use ofsize = 1matches the intent of a single random hit. No functional problems here.
96-111: Bool query assembly and top‑level search structure are soundThe refactored
composeSearchQuery/querypair correctly handle:
- Pagination (
from/size),- Combining keyword and field queries into a bool
must/shoulddepending onop,- Attaching filters as a separate
filterclause,- Falling back to
match_allwhen both terms and filters are absent,- Adding
track_total_hits = 10000to cap expensive hit tracking.The structure matches typical ES expectations and looks consistent with the rest of the builder.
Also applies to: 128-161
113-123: Keyword multi‑match configuration and curated field list look reasonableThe reduced
keywordQueryFieldsset with boosts, combined withmulti_matchusingbest_fields,operator = "AND", andlenient = true, forms a coherent default keyword query strategy. ThetargetFieldsfallback ensures calls without explicit fields still use this curated set.Also applies to: 163-178
285-381: Facet (aggs) composition for geo and date fields looks consistentThe
aggsbuilder correctly distinguishes:
- Spatial facets using
geo_distancebuckets with a generatedrangesarray (Lines 302-307),- Date facets using a filtered
date_histogramwith month/year granularity and corresponding formats andgtewindows (Lines 327-340),- Regular term facets via
termson exact‑match fields.The control flow and error handling for unknown facet names are consistent with the rest of the file. No issues stood out here.
383-447: Sort and field retrieval defaults are reasonable
- The
sortfunction cleanly handles geo sort, field‑based sort, and falls back to_docwhen there is no query/sort, which is appropriate for “no‑order” browsing.fieldRetrievalreturningSeq("*").toJsonwhen no fields are specified gives a sensible default of retrieving the full source.No functional concerns in these sections.
Also applies to: 449-453
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)
62-75: Consider verifying exact field mappings.The test only checks that the count of mapped fields matches the count of keyword fields. If one field fails to map and is silently dropped, the test wouldn't catch it. Consider using
contain allElementsOfto verify that all expected fields are present.Apply this diff to make the test more precise:
"have searchable keywordQuery fields defined in ES mapping" in { val keywordFields = Seq( "sourceResource.title", "sourceResource.subject.name", "sourceResource.description", "sourceResource.creator", "sourceResource.collection.title", "dataProvider.name", "provider.name" ) val mapped = keywordFields.flatMap(tester.getElasticSearchField) - mapped.size shouldBe keywordFields.size + mapped should have size keywordFields.size + // Optionally verify all fields mapped successfully + keywordFields.foreach { field => + tester.getElasticSearchField(field) should not be empty + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala(11 hunks)src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala(1 hunks)src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala(23 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)
src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (2)
getElasticSearchField(64-67)getElasticSearchExactMatchField(75-81)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (1)
getElasticSearchField(64-67)
src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (5)
search(19-33)search(35-37)search(39-41)search(43-46)search(48-51)src/main/scala/dpla/api/v2/search/mappings/JsonFieldReader.scala (6)
readObject(16-29)readString(34-37)readObjectArray(31-34)readInt(40-43)readStringArray(37-40)readBoolean(43-49)src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (2)
query(127-160)filterQuery(183-205)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (11)
src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala (1)
77-91: LGTM!The test correctly verifies that exact match fields use not_analyzed subfields. The use of
foreachensures each field is tested individually, which provides clear failure messages.src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (7)
113-122: LGTM!The reduced set of keyword query fields with appropriate boost weights aligns well with cultural heritage search patterns. Prioritizing title (^2) over other fields is a sensible default.
162-177: LGTM!The migration from
query_stringtomulti_matchsimplifies the keyword query behavior and makes it more user-friendly by removing advanced Lucene query syntax. Thelenientflag appropriately handles field type mismatches.
183-193: LGTM!Changing the return type to
JsArraycorrectly reflects Elasticsearch's bool filter syntax, where filters are an array of query clauses.
195-204: LGTM!The updated comment correctly describes the multi_match query behavior and clearly distinguishes it from the exact-match "term" query.
205-275: LGTM!The improved error messages and the fix for the function name typo (
stripLeadingAndTrailingQuotationMarks) enhance code clarity. The logic for handling range queries, exact matches, and basic field queries is correct.
277-284: LGTM!The function name typo has been corrected (
stripLeadingAndTrailingQuotationMarks), and the comment accurately describes the behavior.
110-110: Pagination depth is already limited by existing parameter validation.The
ParamValidatorenforcesmaxPage = 10andmaxPageSize = 100, which limits the maximum offset to 900 and total accessible results to approximately 1,000. This is more restrictive than thetrack_total_hits = 10000setting, so users cannot request results beyond what is already enforced by the parameter validation layer. No additional validation is needed.src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala (3)
273-277: LGTM!The test correctly updated to expect
track_total_hitsas an integer value of 10000, matching the QueryBuilder implementation change.
287-320: LGTM!The tests correctly updated to read
multi_matchquery structures instead ofquery_string, and the operator field name change fromdefault_operatortooperatormatches Elasticsearch's multi_match syntax.
404-509: LGTM!Excellent comprehensive test coverage for the quote stripping behavior. The tests cover all relevant edge cases: properly quoted strings, strings with internal quotes, strings with only leading or trailing quotes, empty quoted strings, and unquoted strings.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (3)
125-133: Pagination limits and depth guard interplayThe combination of
maxPage = 10,maxPageSize = 100, and thefromValue + requestedPageSize > 1000guard means that, with current constants, no valid combination can actually exceed the 1000 window—the guard is effectively a future‑proof safety net.To avoid drift if either the ES
index.max_result_windowor these limits change later, consider either:
- deriving the
1000constant from shared configuration, or- adding a short comment near
maxPage/maxPageSizeexplaining that they’re chosen sofrom + size <= 1000, and that the guard is an extra safety check.Also applies to: 238-247
152-158: Fetch ID count capped bymaxPageSizemay deserve its own constantReusing
maxPageSizeto limit the number ofids in a fetch (ids.size > maxPageSize) is functionally fine, but it couples two concerns: search page size and fetch batch size. If you ever want to change one without the other, this will be surprising.Consider introducing a dedicated constant (e.g.,
maxFetchIds) or at least a short comment explaining that the fetch ID limit is intentionally tied tomaxPageSize.Also applies to: 167-170
490-499:validIntWithRangeerror message hard-codes lower bound 0
validIntWithRangecomputes(min, max)per param (e.g.,pageuses(minPage, maxPage)whereminPage = 1), but therulestring always says:val rule = s"$param must be an integer between 0 and $max"So for
page, an input of0or a negative value will yield an error message that mentions0as the lower bound, even though the actual check isint < min.Consider interpolating
mininstead of0:- val rule = s"$param must be an integer between 0 and $max" + val rule = s"$param must be an integer between $min and $max"This keeps behavior the same while making validation errors clearer to API consumers.
Also applies to: 502-507
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala(18 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (1)
src/main/scala/dpla/api/v2/search/models/FieldDefinitions.scala (5)
TextField(7-7)IntField(9-9)URLField(11-11)DateField(13-13)WildcardField(17-17)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (2)
297-304: Facet tests correctly enforce q/filter requirement; consider adding filter-only coverageThe facet tests now (a) require
facetsto be paired with a primary search param (viaq) and (b) explicitly rejectfacetswith no query or filter, which matches the tightened validation behavior. To fully exercise the new rule, it would be useful to add a case wherefacetsis combined withfilterbut noq, to lock in that this combination remains accepted.Example sketch, placed alongside the existing facet tests:
"facet validator" should { @@ "accept valid param" in { @@ } @@ + "accept facets with filter but no query" in { + val given = "provider.@id" + val filter = "provider.@id:http://dp.la/api/contributor/lc" + val params = Map("facets" -> given, "filter" -> filter) + ebookParamValidator ! RawSearchParams(params, replyProbe.ref) + val msg = interProbe.expectMessageType[ValidSearchParams] + msg.params.facets shouldEqual Some(Seq("provider.@id")) + } + "reject unfacetable field" in {Also applies to: 313-320, 322-326
353-358: Pagination guard tests align with new behavior; rename specs for clarityThe facet-size and pagination tests now correctly treat oversize values as invalid and the new “reject deep pagination beyond 1000 results” case exercises the depth guard nicely. The only mismatch is that the spec descriptions at Line [579] and Line [617] still say “default to max if param is too large” even though the expectations now assert invalid input; renaming them would keep the tests self-documenting and consistent with the facet-size test.
Suggested rename:
- "default to max if param is too large" in { + "reject param if too large" in { @@ - "default to max if param is too large" in { + "reject param if too large" in {Also applies to: 556-563, 579-584, 586-590, 617-622
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala(27 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (1)
src/main/scala/dpla/api/v2/search/SearchProtocol.scala (1)
Random(24-27)
🔇 Additional comments (3)
src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala (3)
13-16: Readability-only reflow looks goodRewrapping the class header, long strings, and map literals over multiple lines improves readability without changing behavior; no issues here.
Also applies to: 52-53, 74-79, 434-438, 768-768, 823-824, 834-835, 850-851, 859-860, 869-870
123-124: Consistent field-value extraction across validatorsThese assertions now all use the same pattern to pull out the stored value for each field, which keeps the tests uniform and easier to scan; behavior is unchanged and looks correct.
Also applies to: 135-136, 147-148, 177-178, 214-215, 244-245, 403-404, 443-444, 473-474, 510-511, 642-643, 761-762, 798-799
875-887: Filter validator assertions are precise and robustThese filter tests now assert both the parsed field name and value for the first clause, which tightens the contract without over-coupling to internal representation. This is a clear improvement with no downsides.
Also applies to: 889-900
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala (1)
167-170: Decouple fetch ID limits from search page-size limits.Lines 167-170 now inherit the tighter
maxPageSizebound, so reducing searchpage_sizefrom 500 to 100 also reduces the fetch batch limit to 100. That couples two separate API contracts and makes future pagination changes a silent fetch breaking change. A dedicated fetch-specific bound would be safer.Suggested refactor
+ protected val maxFetchIds: Int = 100 @@ - if (ids.size > maxPageSize) + if (ids.size > maxFetchIds) throw ValidationException( - s"The number of ids cannot exceed $maxPageSize" + s"The number of ids cannot exceed $maxFetchIds" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala` around lines 167 - 170, The validator currently enforces fetch ID list size using the search pagination constant maxPageSize (the block checking ids.size > maxPageSize and throwing ValidationException), which couples fetch-batch limits to page_size; introduce a separate constant (e.g., maxFetchIds or fetchIdLimit) and replace the ids.size check to compare against that new fetch-specific bound, update any related error message to reference the fetch limit, and ensure the new constant is used only for the fetch-ID validation so pagination changes do not affect fetch behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 200-202: The current retrieval using getValid(rawParams, "facets",
validFields) can produce Some(Seq()) when validFields filtered out
ignoredFields; update the logic so empty facet sequences are normalized to None
before further checks or passed into SearchParams. Specifically, replace the
bare use of facets with a normalized value (e.g., map(_.filter(...)) then
.filter(_.nonEmpty) or pattern-match to convert Some(Nil) to None) so downstream
logic (the facets nonEmpty check and the SearchParams construction) only treats
truly non-empty facet lists as facet requests; reference getValid, rawParams,
validFields, facets, ignoredFields, and SearchParams when making the change.
- Around line 489-503: The error message in validIntWithRange currently
hardcodes the lower bound as 0 which is wrong for params like "page" that use
minPage = 1; update the rule string to use the computed min and max variables
(e.g., s"$param must be an integer between $min and $max") so the
ValidationException thrown by validIntWithRange reflects the real allowed range
for params such as facet_size (minFacetSize/maxFacetSize), page
(minPage/maxPage) and page_size (minPageSize/maxPageSize).
---
Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 167-170: The validator currently enforces fetch ID list size using
the search pagination constant maxPageSize (the block checking ids.size >
maxPageSize and throwing ValidationException), which couples fetch-batch limits
to page_size; introduce a separate constant (e.g., maxFetchIds or fetchIdLimit)
and replace the ids.size check to compare against that new fetch-specific bound,
update any related error message to reference the fetch limit, and ensure the
new constant is used only for the fetch-ID validation so pagination changes do
not affect fetch behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6592d3da-e02b-46b4-8fbf-30ffae5c91f5
📒 Files selected for processing (1)
src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
570dc53 to
f5f699c
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala (1)
25-79: Unify concurrency limiting to a single implementation.This utility duplicates the behavior already present in
src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala(Lines 95-115). Keeping both paths increases drift risk (timeouts, exception type/message, metrics semantics). Consider routing ES calls throughConcurrencyLimiterso there is one source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala` around lines 25 - 79, The ConcurrencyLimiter implementation duplicates semaphore/timeout behavior in ElasticSearchClient; refactor ElasticSearchClient to delegate to ConcurrencyLimiter instead of keeping its own semaphore logic: remove the bespoke tryAcquire/release and timeout handling in ElasticSearchClient and call ConcurrencyLimiter.apply[T](...) when running ES futures (inject or construct a ConcurrencyLimiter instance in ElasticSearchClient), preserve timeoutSeconds and maxConcurrent semantics and the ConcurrencyLimitExceeded failure, and use ConcurrencyLimiter.availablePermits for any monitoring/metrics formerly reading the ES semaphore so there is a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 55-71: The apply method currently calls
semaphore.tryAcquire(timeoutSeconds, TimeUnit.SECONDS) and catches all
Throwables around evaluating f; change it so tryAcquire is wrapped in a
try/catch that catches InterruptedException, restores the thread interrupt
status (Thread.currentThread().interrupt()) and returns Future.failed with an
appropriate exception (e.g., ConcurrencyLimitExceeded or a new
InterruptedException-wrapped failure) instead of letting it escape
synchronously, and replace the broad catch { case e: Throwable => ... } around
evaluating f with a catch for scala.util.control.NonFatal to avoid swallowing
fatal errors; keep the existing semaphore.release() logic in both the failure
and successful future branches (using future.andThen { case _ =>
semaphore.release() }(ec)) so the semaphore is always released.
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 167-170: The check inside ParamValidator (currently using
maxPageSize) should use a new constant for fetch batching so fetch size isn't
coupled to search pagination; add a new constant (e.g., maxFetchBatchSize)
alongside maxPageSize, update the ID validation in getFetchIds (the block that
throws ValidationException when ids.size > maxPageSize) to compare against
maxFetchBatchSize, and keep the ValidationException message updated to reference
the fetch limit; ensure other references to maxPageSize remain unchanged so only
getFetchIds uses the new constant.
- Around line 345-355: The validator currently accepts filters lacking a ':'
because it uses headOption/lastOption on parts from split(":", 2); change the
logic in ParamValidator.scala so you explicitly require exactly two parts (e.g.,
check parts.length == 2 or pattern-match parts.toList as field :: rest :: Nil)
before assigning fieldName and values, and throw ValidationException(s"$filter
is not a valid filter") if the separator is missing or the second part is empty;
keep the existing trimming/splitting on "AND" for values and retain the final
empty-values check.
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 135-137: The bug: the outer bool uses op ("AND"/"OR") but keyword
and basic field queries always build a multi_match with "operator": "AND", so
op="OR" does nothing. Fix by propagating the op into the multi_match/operator
parameter when building keyword and basic field queries: update the
calls/definitions that produce the multi_match (e.g., keywordQuery,
keywordQueryFields, filterQuery and the functions that build basic field queries
around q and single-field queries) so they accept the op (or compute operator =
if (op == "OR") "or" else "and") and set "operator" accordingly; apply the same
change in the other occurrences noted (blocks around lines 165-174, 205-208,
269-274) so multi_match operator reflects op in all call sites.
In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`:
- Around line 586-590: The test never exercises the depth-check because
validIntWithRange (bounded by maxPage) rejects page=50 first; update the test to
only fail the combined-depth guard by either constructing a test-specific
ebookParamValidator configured with larger maxPage/maxPageSize or choose params
that respect validIntWithRange but violate the from+size check (e.g., set page
to the current maxPage value used by validIntWithRange and set page_size high
enough so (page-1)*page_size + page_size > 1000), and then send RawSearchParams
to ebookParamValidator and assert
replyProbe.expectMessageType[InvalidSearchParams].
---
Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 25-79: The ConcurrencyLimiter implementation duplicates
semaphore/timeout behavior in ElasticSearchClient; refactor ElasticSearchClient
to delegate to ConcurrencyLimiter instead of keeping its own semaphore logic:
remove the bespoke tryAcquire/release and timeout handling in
ElasticSearchClient and call ConcurrencyLimiter.apply[T](...) when running ES
futures (inject or construct a ConcurrencyLimiter instance in
ElasticSearchClient), preserve timeoutSeconds and maxConcurrent semantics and
the ConcurrencyLimitExceeded failure, and use
ConcurrencyLimiter.availablePermits for any monitoring/metrics formerly reading
the ES semaphore so there is a single source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 53b6a041-f18a-4462-8634-ef45595d77a3
📒 Files selected for processing (14)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scalasrc/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scalasrc/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scalasrc/test/scala/dpla/api/helpers/FileReader.scalasrc/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scalasrc/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scalasrc/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scalasrc/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
|
@coderabbitai resume |
88c693c to
689fd58
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)
82-119: Add an explicit lazy-evaluation assertion for the 3rd request.This test proves timeout behavior, but it doesn’t assert the call-by-name contract (
fnot evaluated when permit isn’t acquired). Consider passing an expression with a side effect and asserting the side effect did not occur when the limiter rejects.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala` around lines 82 - 119, The test should also assert that the limiter uses call-by-name and does not evaluate the passed expression when a permit is not acquired: modify the third invocation of ConcurrencyLimiter.apply (the call written as limiter(promises(2).future) / limiter(...)) to pass an expression with a visible side effect (for example incrementing a mutable counter or setting a flag) and after the failed future assertion verify that the side effect did NOT occur; keep using the existing assertions on the thrown ConcurrencyLimitExceeded (maxConcurrent and timeoutSeconds) and the same promises for the first two futures so you only add the side-effect check around the third request.src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala (1)
33-33: Consider centralizing route test timeout configuration.
RouteTestTimeout(3.seconds)is now repeated across multiple suites. Moving this into a shared test trait/object would reduce duplication and keep timeout tuning consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala` at line 33, Multiple test suites duplicate the implicit val routeTestTimeout: RouteTestTimeout = RouteTestTimeout(3.seconds); extract this into a single shared test trait or object (e.g., TestTimeouts or RouteTestSupport) that defines the implicit val routeTestTimeout: RouteTestTimeout = RouteTestTimeout(3.seconds), then have HeaderAuthorizationTest and other suites mix in or extend that trait/object and remove the local implicit definitions so all tests use the centralized RouteTestTimeout configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scala`:
- Around line 240-247: The new validation unconditionally rejects any facet-only
request (using hasFacets/hasQuery/hasFilter and throwing ValidationException),
which breaks existing facet-only usage; modify the guard in ParamValidator
(around hasFacets/hasQuery/hasFilter) so it only rejects unsupported/unbounded
facet types instead of all facet-only requests — implement a helper (e.g.,
isUnboundedFacet or facetRequiresQuery) that inspects each facet in facets and
change the condition to throw only when facets.exists(isUnboundedFacet) &&
!hasQuery && !hasFilter; keep supported facet-only cases (including the
coordinates/other tested facets) passing as before.
---
Nitpick comments:
In
`@src/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scala`:
- Line 33: Multiple test suites duplicate the implicit val routeTestTimeout:
RouteTestTimeout = RouteTestTimeout(3.seconds); extract this into a single
shared test trait or object (e.g., TestTimeouts or RouteTestSupport) that
defines the implicit val routeTestTimeout: RouteTestTimeout =
RouteTestTimeout(3.seconds), then have HeaderAuthorizationTest and other suites
mix in or extend that trait/object and remove the local implicit definitions so
all tests use the centralized RouteTestTimeout configuration.
In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala`:
- Around line 82-119: The test should also assert that the limiter uses
call-by-name and does not evaluate the passed expression when a permit is not
acquired: modify the third invocation of ConcurrencyLimiter.apply (the call
written as limiter(promises(2).future) / limiter(...)) to pass an expression
with a visible side effect (for example incrementing a mutable counter or
setting a flag) and after the failed future assertion verify that the side
effect did NOT occur; keep using the existing assertions on the thrown
ConcurrencyLimitExceeded (maxConcurrent and timeoutSeconds) and the same
promises for the first two futures so you only add the side-effect check around
the third request.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 695eef19-d0a8-4303-a162-4e3a008e77a4
📒 Files selected for processing (14)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scalasrc/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scalasrc/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scalasrc/test/scala/dpla/api/helpers/FileReader.scalasrc/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scalasrc/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/HeaderAuthorizationTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/PermittedMediaTypesTest.scalasrc/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scalasrc/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scalasrc/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
✅ Files skipped from review due to trivial changes (2)
- src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
- src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
🚧 Files skipped from review as they are similar to previous changes (8)
- src/test/scala/dpla/api/helpers/FileReader.scala
- src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
- src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
- src/test/scala/dpla/api/v2/httpHeadersAndMethods/ResponseHeadersTest.scala
- src/test/scala/dpla/api/v2/endToEnd/MapperFailureTest.scala
- src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
- src/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
- src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
This branch has merge conflicts with |
…cala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…or.scala Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ing (#94) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: moltude <1419766+moltude@users.noreply.github.com>
filter=type: (no value after colon) produced Some(Seq.empty) because getValid silently drops empty strings. hasFilter.isDefined was then true with no filters applied, allowing high-cardinality facet requests past the facet guard. Split ":" once into parts, and use .collect to trim+filter in one pass. Empty values now throw ValidationException before reaching the guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bounds - facets and fields: add .filter(_.nonEmpty) so Some(Seq.empty) from validFields (which drops ignoredFields entries) normalizes to None, preventing hasFacets/fields.isDefined from being misleadingly true - validIntWithRange: make rule a lazy val so the error string is only allocated on failure paths, not on every successful validation - validIntWithRange: use $min instead of hardcoded 0 in error message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r for fields - facets: revert .filter(_.nonEmpty); Some(Seq.empty) is the intended return when all requested facet fields are dropped (e.g. valid DPLA Map fields not facetable for items). Fix the guard instead: facets.nonEmpty checks the Option (always true for Some), but facets.exists(_.nonEmpty) correctly checks the inner seq. - fields: keep .filter(_.nonEmpty) — fieldRetrieval(Some(Nil)) sends "_source":[] to ES which suppresses all source fields; None returns ["*"] (all fields) which is the correct fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ter, op propagation
- ConcurrencyLimiter: handle InterruptedException from tryAcquire (restore
interrupt status, return Future.failed); narrow Throwable catch to NonFatal
to avoid swallowing fatal JVM errors
- ParamValidator: add maxFetchBatchSize=500 constant so fetch ID batch limit
is decoupled from search page_size (which was reduced to 100 in this PR)
- ParamValidator: reject filter values without a field:value separator —
filter=provider.name now throws ValidationException instead of being
silently parsed as Filter("provider.name", "provider.name")
- QueryBuilder: propagate op parameter into multi_match operator in
keywordQuery and singleFieldQuery — op="OR" was previously a no-op for
plain keyword and basic field queries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Revert track_total_hits from 10000 to true: the 10000 cap causes ES to return relation="gte" with value=10000 for large result sets, but the parsers in DPLAMAPJsonFormats and PssJsonFormats only read hits.total.value without checking relation, silently surfacing 10000 as an exact count - Replace asInstanceOf[JsArray] in filterQuery with JsArray(...toVector) to avoid the unsafe runtime cast - Differentiate sort_by_pin error messages: "sort_by_pin is only allowed when sort_by is the coordinates field" vs "The sort_by parameter is required." so callers know whether sort_by is missing or just wrong Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…test - Remove the from+size>1000 guard in ParamValidator: with maxPage=10 and maxPageSize=100, the maximum reachable value is exactly 1000 and the strict >1000 check can never fire; it was a no-op safety net from when the limits were higher - Add test verifying that op="OR" propagates to multi_match.operator field, not just to the outer bool clause Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
requestedPage, requestedPageSize, and fromValue were left behind after removing the pagination depth guard — dead code with no references. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…int compatibility The guard rejecting facets without a query/filter was applied to all validators, breaking three integration tests that exercise facet-only requests on /v2/items (coordinates, dataProvider with size limit, basic dataProvider). Item facet-only requests are a documented and supported part of the public API contract. Introduces requiresQueryForFacets (default false) as an overridable protected val in ParamValidator. EbookParamValidator overrides it to true, keeping the restriction exactly where the unit test already specified it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8cb077c to
815dc55
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 162-177: The keywordQuery currently silently falls back to
keywordQueryFields when fields is empty; change this to fail fast by enforcing
require(fields.nonEmpty, "keywordQuery requires at least one target field")
inside keywordQuery, and update call sites that pass
Seq(getElasticSearchField(fieldQuery.fieldName)).flatten to check for an empty
sequence and throw a clear exception (matching the range-query/facets pattern)
when a field has no ES mapping; ensure callers either pass an explicit fallback
(keywordQueryFields) or propagate the error so missing field mappings are not
silently broadened.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3547985f-528d-471d-9050-bbe443b4b30e
📒 Files selected for processing (10)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scalasrc/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scalasrc/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scalasrc/test/scala/dpla/api/helpers/FileReader.scalasrc/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scalasrc/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scalasrc/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
✅ Files skipped from review due to trivial changes (2)
- src/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scala
- src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala
🚧 Files skipped from review as they are similar to previous changes (5)
- src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala
- src/test/scala/dpla/api/helpers/FileReader.scala
- src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala
- src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala
- src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
| /** A general keyword query on the given fields. Uses multi_match for | ||
| * best_fields with lenient parsing. | ||
| */ | ||
| private def keywordQuery(q: String, fields: Seq[String], op: String = "AND"): JsObject = { | ||
| val targetFields = if (fields.nonEmpty) fields else keywordQueryFields | ||
|
|
||
| JsObject( | ||
| "query_string" -> JsObject( | ||
| "fields" -> fields.toJson, | ||
| "multi_match" -> JsObject( | ||
| "query" -> q.toJson, | ||
| "analyze_wildcard" -> true.toJson, | ||
| "default_operator" -> "AND".toJson, | ||
| "fields" -> targetFields.toJson, | ||
| "type" -> "best_fields".toJson, | ||
| "operator" -> op.toJson, | ||
| "lenient" -> true.toJson | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether any field in searchableDataFields can return None from getElasticSearchField.
ast-grep --pattern $'def getElasticSearchField($_) = $$$'
rg -nP --type=scala -C3 '\bsearchableDataFields\b|\bgetElasticSearchField\b'Repository: dpla/api
Length of output: 10480
Silent fallback to default keyword fields when field mapping is missing.
At line 166, keywordQuery falls back to keywordQueryFields when fields is empty. This silent fallback is inconsistent with how range queries (lines 212–215, 230–233) and facets (line 324–326) handle missing field mappings—they explicitly throw.
In the basic field-query path (line 272), Seq(getElasticSearchField(fieldQuery.fieldName)).flatten produces an empty Seq if the field has no ES mapping. When passed to keywordQuery, this empty Seq triggers the fallback, silently broadening a user's field-scoped query into a general keyword search. While ParamValidator gates on searchableDataFields so this shouldn't occur in practice, the silent fallback masks potential definition mismatches rather than failing fast.
Either add require(fields.nonEmpty, "keywordQuery requires at least one target field") and update call sites to check before calling (or explicitly pass keywordQueryFields for fallback cases), or restructure line 272–273 to throw when the field has no mapping, matching the range-query pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` around
lines 162 - 177, The keywordQuery currently silently falls back to
keywordQueryFields when fields is empty; change this to fail fast by enforcing
require(fields.nonEmpty, "keywordQuery requires at least one target field")
inside keywordQuery, and update call sites that pass
Seq(getElasticSearchField(fieldQuery.fieldName)).flatten to check for an empty
sequence and throw a clear exception (matching the range-query/facets pattern)
when a field has no ES mapping; ensure callers either pass an explicit fallback
(keywordQueryFields) or propagate the error so missing field mappings are not
silently broadened.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
255-267:⚠️ Potential issue | 🟡 MinorSplitting on bare
"AND"/"OR"can fragment values containing those substrings.
.split("AND")/.split("OR")operate on the raw (post-strip) value with no word boundary. Values like"BRAND NEW","WORLD HISTORY","LANDSCAPES", or"LABOR UNIONS"will be split on the embeddedAND/OR, producing bogustermsubqueries (e.g.,"BR"+" NEW"), which silently returns zero/irrelevant matches on the.not_analyzedsubfield.This is pre-existing behavior, but given this PR expands the exact-match test surface (quotes, empty, leading/trailing) it'd be a good moment to either:
- split on word-boundaried tokens (e.g.,
split("\\s+AND\\s+")thensplit("\\s+OR\\s+")), or- require the quoted-form contract (
"\"X\" AND \"Y\"") and split only on the quote-delimited boundary.At minimum, consider adding a regression test covering a single value containing
AND/ORas a substring to lock in the intended behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` around lines 255 - 267, The current splitting in QueryBuilder.scala (in the block that builds `values` from `fieldQuery.value`) naively uses .split("AND")/.split("OR") and therefore breaks tokens that contain those letter sequences; change the splitting to use word-boundary-aware regexes (e.g., split on "\\s+AND\\s+" and "\\s+OR\\s+" or use "\\bAND\\b"/"\\bOR\\b") or alternately only split when the input matches the quoted-pair contract (quote-delimited tokens), then trim and re-strip quotes as before; update the `values` construction logic in the same function/method that creates the JsObject `term` entries to use the new splitting approach and add a regression test that asserts a single value containing "AND" or "OR" (e.g., "LANDSCAPES", "BRAND NEW") is not split into multiple terms.
🧹 Nitpick comments (2)
src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala (1)
165-165: Minor: defaultop = "AND"is dead on both signatures.Both callers (
queryat line 136 andsingleFieldQueryat line 273) now passopexplicitly, so the default values onkeywordQueryandsingleFieldQueryare never exercised. If the intent is "callers must threadop," drop the defaults so a future caller can't silently accidentally get"AND"when the request wasop=OR.Proposed diff
- private def keywordQuery(q: String, fields: Seq[String], op: String = "AND"): JsObject = { + private def keywordQuery(q: String, fields: Seq[String], op: String): JsObject = { @@ - private def singleFieldQuery( - fieldQuery: FieldQuery, - exactFieldMatch: Boolean, - op: String = "AND" - ): Seq[JsObject] = + private def singleFieldQuery( + fieldQuery: FieldQuery, + exactFieldMatch: Boolean, + op: String + ): Seq[JsObject] =Also applies to: 204-208
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala` at line 165, Remove the dead default "AND" for the op parameter so callers must supply op explicitly: update the method signatures for keywordQuery (private def keywordQuery(q: String, fields: Seq[String], op: String = "AND")) and both singleFieldQuery overloads to remove the default op="AND" (i.e., make op: String a required parameter), and adjust any internal references if needed; this prevents silently using "AND" when callers explicitly pass OR and matches existing callsites (query and singleFieldQuery) that already provide op.src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala (1)
98-106: Assert rejected work is not evaluated.This test proves the third request times out, but
promises(2).futureis harmless if evaluated early. Add a side-effect assertion so the call-by-name contract is covered: when no permit is acquired,fmust not be constructed.Suggested test hardening
- val future3 = limiter(promises(2).future) + var thirdRequestEvaluated = false + val future3 = limiter { + thirdRequestEvaluated = true + promises(2).future + } whenReady(future3.failed) { ex => ex shouldBe a[ConcurrencyLimitExceeded] val cle = ex.asInstanceOf[ConcurrencyLimitExceeded] cle.maxConcurrent shouldBe 2 cle.timeoutSeconds shouldBe 1 } + + thirdRequestEvaluated shouldBe false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala` around lines 98 - 106, The test currently passes promises(2).future which can be constructed eagerly; change the invocation to pass a by-name thunk that has an observable side effect (e.g. incrementing a var, setting an AtomicBoolean, or throwing if evaluated) so you can assert the thunk was not executed when the limiter rejects; update the third request call to use that thunk (referencing limiter and the third promise/promise index previously used, e.g. promises(2)) and after asserting the failure (ConcurrencyLimitExceeded) assert the side-effect flag remains unchanged to prove the work was not evaluated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala`:
- Around line 26-39: The project has duplicate semaphore logic:
ConcurrencyLimiter is implemented but ElasticSearchClient.withConcurrencyLimit
contains an inline semaphore wrapper; replace that inline logic by instantiating
and using ConcurrencyLimiter (or delegating to its methods) inside
ElasticSearchClient.withConcurrencyLimit so all ES requests go through the
shared ConcurrencyLimiter instance (or remove the inline wrapper and call
ConcurrencyLimiter directly), ensuring you reuse the ConcurrencyLimiter
constructor/timeout behavior and its acquire/release semantics rather than
duplicating semaphore code.
In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala`:
- Around line 93-98: The test "still pass a well-formed key through to the
registry" currently only asserts status should not be StatusCodes.Forbidden and
therefore would incorrectly pass on 404 after switching to /v2/ebooks; update
the assertion inside the check block (the block that follows
Get(s"/v2/ebooks?api_key=$fakeApiKey") ~> Route.seal(routes) ~> check) to also
reject StatusCodes.NotFound (e.g., assert status is neither
StatusCodes.Forbidden nor StatusCodes.NotFound) while keeping the
negative-assertion style and not requiring a 200 OK.
- Around line 11-18: The test suite references removed ebook wiring causing
compile failures: missing MockEbookRegistry, MockEbookSearch,
ebookAnalyticsClient, and a Routes constructor mismatch; either restore full
ebook route support (add back MockEbookRegistry and MockEbookSearch test
doubles, provide a mock ebookAnalyticsClient, and update the Routes class or its
companion to accept the ebook dependency so the call site using the 5-arg Routes
constructor compiles) or remove the ebook-specific imports/usages and tests
(/v2/ebooks) and revert the Routes instantiation to the existing 4-arg
constructor; update imports (remove MockEbookRegistry/MockEbookSearch) and
adjust the test code that constructs Routes and any
SearchCommand/RegistryCommand usages accordingly.
In
`@src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala`:
- Around line 34-39: The test in ItemParamValidatorTest is no longer facet-only
because it adds a "q" param which weakens the contract; update the test to only
include the facets param (remove "q" from the params map) so that invoking
itemParamValidator ! RawSearchParams(params, replyProbe.ref) verifies that
ValidSearchParams.msg.params.facets remains Some(Seq()) after ignored-field
filtering; keep the existing assertions (expectMessageType[ValidSearchParams]
and shouldEqual expected) and references to itemParamValidator, RawSearchParams,
replyProbe, and interProbe unchanged.
In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`:
- Around line 27-28: The test references a missing validator: restore or replace
the EbookParamValidator referenced by ebookParamValidator (and used with
IntermediateSearchResult and interProbe.ref) so the test compiles; either
reintroduce the EbookParamValidator object with the same API used in
ParamValidatorTest (including its constructor/behavior used by testKit.spawn and
the requiresQueryForFacets flag) or change the test to instantiate an existing
validator and add a test-only fixture that sets requiresQueryForFacets = true to
exercise the same behavior currently expected from EbookParamValidator.
---
Outside diff comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Around line 255-267: The current splitting in QueryBuilder.scala (in the block
that builds `values` from `fieldQuery.value`) naively uses
.split("AND")/.split("OR") and therefore breaks tokens that contain those letter
sequences; change the splitting to use word-boundary-aware regexes (e.g., split
on "\\s+AND\\s+" and "\\s+OR\\s+" or use "\\bAND\\b"/"\\bOR\\b") or alternately
only split when the input matches the quoted-pair contract (quote-delimited
tokens), then trim and re-strip quotes as before; update the `values`
construction logic in the same function/method that creates the JsObject `term`
entries to use the new splitting approach and add a regression test that asserts
a single value containing "AND" or "OR" (e.g., "LANDSCAPES", "BRAND NEW") is not
split into multiple terms.
---
Nitpick comments:
In `@src/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scala`:
- Line 165: Remove the dead default "AND" for the op parameter so callers must
supply op explicitly: update the method signatures for keywordQuery (private def
keywordQuery(q: String, fields: Seq[String], op: String = "AND")) and both
singleFieldQuery overloads to remove the default op="AND" (i.e., make op: String
a required parameter), and adjust any internal references if needed; this
prevents silently using "AND" when callers explicitly pass OR and matches
existing callsites (query and singleFieldQuery) that already provide op.
In `@src/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scala`:
- Around line 98-106: The test currently passes promises(2).future which can be
constructed eagerly; change the invocation to pass a by-name thunk that has an
observable side effect (e.g. incrementing a var, setting an AtomicBoolean, or
throwing if evaluated) so you can assert the thunk was not executed when the
limiter rejects; update the third request call to use that thunk (referencing
limiter and the third promise/promise index previously used, e.g. promises(2))
and after asserting the failure (ConcurrencyLimitExceeded) assert the
side-effect flag remains unchanged to prove the work was not evaluated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0c9fac11-4b19-45c2-9053-b6ac4dea4b36
📒 Files selected for processing (10)
src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scalasrc/main/scala/dpla/api/v2/search/paramValidators/ParamValidator.scalasrc/main/scala/dpla/api/v2/search/queryBuilders/QueryBuilder.scalasrc/test/scala/dpla/api/helpers/FileReader.scalasrc/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scalasrc/test/scala/dpla/api/v2/search/ConcurrencyLimiterTest.scalasrc/test/scala/dpla/api/v2/search/models/DPLAMAPFieldsTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scalasrc/test/scala/dpla/api/v2/search/queryBuilders/QueryBuilderTest.scala
| class ConcurrencyLimiter( | ||
| val maxConcurrent: Int, | ||
| val timeoutSeconds: Long | ||
| ) { | ||
| require( | ||
| maxConcurrent > 0, | ||
| s"maxConcurrent must be positive, got: $maxConcurrent" | ||
| ) | ||
| require( | ||
| timeoutSeconds > 0, | ||
| s"timeoutSeconds must be positive, got: $timeoutSeconds" | ||
| ) | ||
|
|
||
| private val semaphore = new Semaphore(maxConcurrent) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Expectation:
# - Production search/ES code should instantiate or receive ConcurrencyLimiter.
# - Legacy inline tryAcquire wrappers should be removed or intentionally documented.
printf '\n== ConcurrencyLimiter references ==\n'
rg -n -C4 --type scala '\bnew\s+ConcurrencyLimiter\b|\bConcurrencyLimiter\s*\('
printf '\n== Remaining inline semaphore limiter patterns ==\n'
rg -n -C6 --type scala '\bwithConcurrencyLimit\s*\(|\.tryAcquire\s*\(|case\s+e:\s*Throwable|case\s+NonFatal'Repository: dpla/api
Length of output: 12060
Wire ConcurrencyLimiter into the ES request path to eliminate duplication.
ConcurrencyLimiter is defined and tested, but production code still bypasses it. ElasticSearchClient.withConcurrencyLimit (lines 95–115) implements its own semaphore wrapper with identical logic. Either integrate ConcurrencyLimiter into the ES request path or remove the inline implementation to prevent maintenance divergence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/scala/dpla/api/v2/search/ConcurrencyLimiter.scala` around lines 26 -
39, The project has duplicate semaphore logic: ConcurrencyLimiter is implemented
but ElasticSearchClient.withConcurrencyLimit contains an inline semaphore
wrapper; replace that inline logic by instantiating and using ConcurrencyLimiter
(or delegating to its methods) inside ElasticSearchClient.withConcurrencyLimit
so all ES requests go through the shared ConcurrencyLimiter instance (or remove
the inline wrapper and call ConcurrencyLimiter directly), ensuring you reuse the
ConcurrencyLimiter constructor/timeout behavior and its acquire/release
semantics rather than duplicating semaphore code.
| import dpla.api.v2.registry.{ | ||
| MockEbookRegistry, | ||
| MockSmrRegistry, | ||
| SearchRegistryCommand, | ||
| SmrRegistryCommand | ||
| } | ||
| import dpla.api.v2.search.MockEbookSearch | ||
| import dpla.api.v2.search.SearchProtocol.SearchCommand |
There was a problem hiding this comment.
Resolve the ebook route/test wiring before merging.
This currently blocks compilation: CI reports missing MockEbookRegistry, missing MockEbookSearch, missing ebookAnalyticsClient, and Routes still accepts 4 constructor arguments, not the 5 passed on Lines 60-66. Either restore/add the ebook route support and test mocks end-to-end, or remove/revert the /v2/ebooks coverage from this suite if ebooks were intentionally removed.
Also applies to: 42-51, 60-66, 101-139
🧰 Tools
🪛 GitHub Actions: Scala CI
[error] 11-11: Scala compilation error: object MockEbookRegistry is not a member of package dpla.api.v2.registry
[error] 17-17: Scala compilation error: object MockEbookSearch is not a member of package dpla.api.v2.search
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala` around lines 11
- 18, The test suite references removed ebook wiring causing compile failures:
missing MockEbookRegistry, MockEbookSearch, ebookAnalyticsClient, and a Routes
constructor mismatch; either restore full ebook route support (add back
MockEbookRegistry and MockEbookSearch test doubles, provide a mock
ebookAnalyticsClient, and update the Routes class or its companion to accept the
ebook dependency so the call site using the 5-arg Routes constructor compiles)
or remove the ebook-specific imports/usages and tests (/v2/ebooks) and revert
the Routes instantiation to the existing 4-arg constructor; update imports
(remove MockEbookRegistry/MockEbookSearch) and adjust the test code that
constructs Routes and any SearchCommand/RegistryCommand usages accordingly.
| "still pass a well-formed key through to the registry" in { | ||
| val request = Get(s"/v2/items?api_key=$fakeApiKey") | ||
| val request = Get(s"/v2/ebooks?api_key=$fakeApiKey") | ||
| request ~> Route.seal(routes) ~> check { | ||
| status should not be StatusCodes.Forbidden | ||
| status should not be StatusCodes.NotFound | ||
| } | ||
| } |
There was a problem hiding this comment.
Guard the pass-through test against a missing route.
After switching to /v2/ebooks, this test would still pass on 404 NotFound because it only excludes Forbidden. Keep the negative assertion style, but also reject NotFound.
Proposed test assertion update
request ~> Route.seal(routes) ~> check {
status should not be StatusCodes.Forbidden
+ status should not be StatusCodes.NotFound
}Based on learnings, this test intentionally uses negative assertions and should not assert HTTP 200 because CI may not have backend availability.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "still pass a well-formed key through to the registry" in { | |
| val request = Get(s"/v2/items?api_key=$fakeApiKey") | |
| val request = Get(s"/v2/ebooks?api_key=$fakeApiKey") | |
| request ~> Route.seal(routes) ~> check { | |
| status should not be StatusCodes.Forbidden | |
| status should not be StatusCodes.NotFound | |
| } | |
| } | |
| "still pass a well-formed key through to the registry" in { | |
| val request = Get(s"/v2/ebooks?api_key=$fakeApiKey") | |
| request ~> Route.seal(routes) ~> check { | |
| status should not be StatusCodes.Forbidden | |
| status should not be StatusCodes.NotFound | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala` around lines 93
- 98, The test "still pass a well-formed key through to the registry" currently
only asserts status should not be StatusCodes.Forbidden and therefore would
incorrectly pass on 404 after switching to /v2/ebooks; update the assertion
inside the check block (the block that follows
Get(s"/v2/ebooks?api_key=$fakeApiKey") ~> Route.seal(routes) ~> check) to also
reject StatusCodes.NotFound (e.g., assert status is neither
StatusCodes.Forbidden nor StatusCodes.NotFound) while keeping the
negative-assertion style and not requiring a 200 OK.
| val given = "sourceResource.subtitle" | ||
| val expected = Some(Seq()) | ||
| val params = Map("facets" -> given) | ||
| val params = Map("facets" -> given, "q" -> "test") | ||
| itemParamValidator ! RawSearchParams(params, replyProbe.ref) | ||
| val msg = interProbe.expectMessageType[ValidSearchParams] | ||
| msg.params.facets shouldEqual expected |
There was a problem hiding this comment.
Keep this item test facet-only.
Adding q here weakens coverage of the item contract: item facet-only requests should remain valid, and this case specifically verifies facets = Some(Seq.empty) after ignored-field filtering.
Proposed test adjustment
- val params = Map("facets" -> given, "q" -> "test")
+ val params = Map("facets" -> given)Based on learnings: facets should preserve Some(Seq.empty), and the facet query/filter requirement must not falsely treat that state as a real facet request.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val given = "sourceResource.subtitle" | |
| val expected = Some(Seq()) | |
| val params = Map("facets" -> given) | |
| val params = Map("facets" -> given, "q" -> "test") | |
| itemParamValidator ! RawSearchParams(params, replyProbe.ref) | |
| val msg = interProbe.expectMessageType[ValidSearchParams] | |
| msg.params.facets shouldEqual expected | |
| val given = "sourceResource.subtitle" | |
| val expected = Some(Seq()) | |
| val params = Map("facets" -> given) | |
| itemParamValidator ! RawSearchParams(params, replyProbe.ref) | |
| val msg = interProbe.expectMessageType[ValidSearchParams] | |
| msg.params.facets shouldEqual expected |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@src/test/scala/dpla/api/v2/search/paramValidators/ItemParamValidatorTest.scala`
around lines 34 - 39, The test in ItemParamValidatorTest is no longer facet-only
because it adds a "q" param which weakens the contract; update the test to only
include the facets param (remove "q" from the params map) so that invoking
itemParamValidator ! RawSearchParams(params, replyProbe.ref) verifies that
ValidSearchParams.msg.params.facets remains Some(Seq()) after ignored-field
filtering; keep the existing assertions (expectMessageType[ValidSearchParams]
and shouldEqual expected) and references to itemParamValidator, RawSearchParams,
replyProbe, and interProbe unchanged.
| val ebookParamValidator: ActorRef[IntermediateSearchResult] = | ||
| testKit.spawn(EbookParamValidator(interProbe.ref)) |
There was a problem hiding this comment.
Restore or replace the missing EbookParamValidator.
The Scala CI failure is definitive: Line 28 references EbookParamValidator, but that symbol is not defined, so the test suite cannot compile. If ebook validation still exists, add/restore the validator object; if ebooks were removed, switch these tests to an existing validator plus a test-only fixture for the requiresQueryForFacets = true behavior.
🧰 Tools
🪛 GitHub Actions: Scala CI
[error] 28-28: Scala compilation error: not found value EbookParamValidator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/test/scala/dpla/api/v2/search/paramValidators/ParamValidatorTest.scala`
around lines 27 - 28, The test references a missing validator: restore or
replace the EbookParamValidator referenced by ebookParamValidator (and used with
IntermediateSearchResult and interProbe.ref) so the test compiles; either
reintroduce the EbookParamValidator object with the same API used in
ParamValidatorTest (including its constructor/behavior used by testKit.spawn and
the requiresQueryForFacets flag) or change the test to instantiate an existing
validator and add a test-only fixture that sets requiresQueryForFacets = true to
exercise the same behavior currently expected from EbookParamValidator.
Phase 2 of API improvements
This PR refactors validation, query composition, and adds a concurrency-limiting utility. Summary of the substantive changes, client impact, deployment/infra considerations, and reviewer guidance:
Key changes
Client- and API-impact summary
Security implications
Deployment notes
Reviewer / maintainer guidance
Files of interest (high-level)
If you want, I can: