Validate monitoring_location_id format in waterdata functions by thodson-usgs · Pull Request #229 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-04-13T22:53:40Z

Summary

Closes #188.

Passing an integer or a bare string (without the required AGENCY-ID prefix) to a waterdata function silently wasted an API call and returned confusing empty results. Now all 10 public functions that accept monitoring_location_id validate the input immediately, normalize iterables to a list, and raise before any network call when the input is wrong.

Functions covered: get_daily, get_continuous, get_monitoring_locations, get_time_series_metadata, get_latest_continuous, get_latest_daily, get_field_measurements, get_stats_por, get_stats_date_range, get_channel

Accepted shapes (monitoring_location_id=...):

None — optional parameter, passes through
A single string in AGENCY-ID format (e.g. "USGS-01646500")
Any non-string iterable of strings — list, tuple, pandas.Series, pandas.Index, numpy.ndarray, generators, etc. Iterables are materialized to a list internally so the downstream comma-join / POST-CQL2 logic sees a uniform shape.

Rejected shapes:

int, float, etc. (not iterable) → TypeError
A string missing the hyphen (e.g. "01646500") → ValueError ("AGENCY-ID format")
Iterable whose elements aren't strings (e.g. pd.Series([1, 2, 3]), np.array([1, 2, 3])) → TypeError
dict (and other Mapping types) → TypeError. Iterating a dict yields keys, which would be a footgun; if you really want the keys, pass list(d) explicitly.

The validation rule mirrors the OGC API spec, which describes monitoring_location_id as "the agency code … and the ID number of the monitoring location, separated by a hyphen (e.g. USGS-02238500)". The spec doesn't carry a machine-readable pattern constraint — the server returns an empty result for malformed input rather than a 400 — so this is a client-side safety net.

Live reproduction

Both calls below hit api.waterdata.usgs.gov pre-fix and silently return an empty DataFrame with no indication that anything went wrong. Post-fix, both raise locally before any network call.

from dataretrieval.waterdata import get_daily

# 1. int instead of "AGENCY-ID" string
df, _ = get_daily(monitoring_location_id=5129115, parameter_code="00060")
# Pre-fix:  empty DataFrame (silent — API was hit)
# Post-fix: TypeError: monitoring_location_id must be a string or iterable of strings, not int.
#                      Expected format: 'AGENCY-ID', e.g., 'USGS-5129115'.

# 2. bare string without "AGENCY-" prefix
df, _ = get_daily(monitoring_location_id="dog", parameter_code="00060")
# Pre-fix:  empty DataFrame (silent — API was hit)
# Post-fix: ValueError: Invalid monitoring_location_id: 'dog'.
#                       Expected 'AGENCY-ID' format, e.g., 'USGS-01646500'.

Iterable shapes also live-verified — these all succeed post-fix (3 rows each, against api.waterdata.usgs.gov):

import pandas as pd
import numpy as np
from dataretrieval.waterdata import get_daily

for value in [
    ("USGS-01646500",),                   # tuple
    pd.Series(["USGS-01646500"]),
    pd.Index(["USGS-01646500"]),
    np.array(["USGS-01646500"]),
]:
    df, _ = get_daily(
        monitoring_location_id=value,
        parameter_code="00060",
        time="2024-06-01/2024-06-03",
    )
    assert len(df) == 3

Pre-fix, tuple would have round-tripped (downstream isinstance(v, (list, tuple)) accepted it); the pandas/numpy shapes would have been str-serialized by requests into a broken URL and silently returned empty.

Test plan

16 unit tests in TestCheckMonitoringLocationId covering: valid string / list / tuple / pd.Series / pd.Index / np.ndarray; None; bare integer; integer in list; pd.Series / np.ndarray of ints; missing agency prefix; bare site number; dict rejection; end-to-end get_daily checks for both error types
Full test suite passes (260/260, 2 skipped)
Live verification against api.waterdata.usgs.gov for both the rejection cases (above) and the iterable-acceptance cases (3 rows each for tuple / Series / Index / ndarray)

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds client-side validation/normalization for monitoring_location_id across the WaterData public API to fail fast on malformed IDs and avoid wasting network calls (closes #188).

Changes:

Introduces _check_monitoring_location_id helper in dataretrieval.waterdata.utils to validate AGENCY-ID format and normalize supported iterables to list.
Calls the new validator early in the 10 WaterData public functions that accept monitoring_location_id.
Adds unit tests covering valid/invalid scalar and iterable inputs, plus regression checks via get_daily.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`dataretrieval/waterdata/utils.py`	Adds the validation/normalization helper used by the public API.
`dataretrieval/waterdata/api.py`	Wires validation into the WaterData public entry points before request construction.
`tests/waterdata_test.py`	Adds regression/unit tests for accepted and rejected `monitoring_location_id` shapes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. TypeError messages now use a fixed example ("USGS-01646500") instead of interpolating the user's value into the suggestion template. The offending value is still surfaced via "(got {value!r})" so the user can see what they actually passed — but mappings, large objects, etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}". 2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`. Same effective behavior, but `fullmatch` makes the intent explicit and removes the redundant anchor characters. 3. Widened the type annotations on all 10 public waterdata functions that accept `monitoring_location_id` from `str | list[str] | None` to `str | Iterable[str] | None`, matching the runtime contract that accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc. Added `from collections.abc import Iterable` to api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

The second example block in `get_reference_table`'s docstring was unrunnable: `collection="parameter-codes"` and `query={...}` were on separate lines with no comma between them, so the doctest text doesn't parse as a valid call. Surfaced from PR DOI-USGS#229's verification pass where I executed each public function's docstring example against the live USGS OGC API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The second example block in `get_reference_table`'s docstring was unrunnable: `collection="parameter-codes"` and `query={...}` were on separate lines with no comma between them, so the doctest text doesn't parse as a valid call. Surfaced from PR #229's verification pass where I executed each public function's docstring example against the live USGS OGC API. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. TypeError messages now use a fixed example ("USGS-01646500") instead of interpolating the user's value into the suggestion template. The offending value is still surfaced via "(got {value!r})" so the user can see what they actually passed — but mappings, large objects, etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}". 2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`. Same effective behavior, but `fullmatch` makes the intent explicit and removes the redundant anchor characters. 3. Widened the type annotations on all 10 public waterdata functions that accept `monitoring_location_id` from `str | list[str] | None` to `str | Iterable[str] | None`, matching the runtime contract that accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc. Added `from collections.abc import Iterable` to api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Passing an integer (e.g. 5129115) or a bare string without an agency prefix (e.g. "dog") to any waterdata function silently wasted an API call and returned empty data. Now all ten public functions that accept monitoring_location_id raise before touching the network: - TypeError if the value is not a string or list of strings - ValueError if any string doesn't match the 'AGENCY-ID' format (e.g. 'USGS-01646500') Closes DOI-USGS#188. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous `(str, list)` isinstance check rejected legitimate inputs: tuple, pandas.Series, pandas.Index, numpy.ndarray, generators. Pre-fix these would round-trip through requests and either work (tuple) or silently break the URL (numpy/pandas). Post-fix the function: - Accepts any non-string iterable whose elements are strings - Materializes the iterable to a list so downstream comma-join / POST-CQL2 logic in _construct_api_requests keeps working uniformly - Returns the (possibly-normalized) value, so callers reassign: monitoring_location_id = _check_monitoring_location_id(monitoring_location_id) - Rejects Mapping (e.g. dict) explicitly — iterating a dict yields keys, which is a footgun Live-verified against api.waterdata.usgs.gov: passing tuple, pd.Series, pd.Index, and np.ndarray of "USGS-01646500" all return 3 rows for the 2024-06-01/2024-06-03 window. Passing pd.Series([1646500]) raises TypeError("monitoring_location_id elements must be strings, not int...") before any network call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nts) Internal refactor; behavior unchanged. - Add full type hints: ``str | Iterable[str] | None`` in, ``str | list[str] | None`` out, mirroring the runtime contract. - Split into explicit fast paths instead of wrapping-then-unwrapping: the string case validates and returns directly, eliminating the throwaway ``[monitoring_location_id]`` list and the final ``isinstance(str)`` re-check at return. - Invert the Mapping/Iterable check: reject ``Mapping`` (and non-iterables) up front, then ``list()`` the iterable. Reads more linearly than the compound ``isinstance(Iterable) and not isinstance(Mapping)``. - Extract ``_check_id_format(value)`` for the regex/ValueError pair so the format rule lives in one place and the call sites are one-liners. Tests unchanged; 16 validator tests + full suite still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1. TypeError messages now use a fixed example ("USGS-01646500") instead of interpolating the user's value into the suggestion template. The offending value is still surfaced via "(got {value!r})" so the user can see what they actually passed — but mappings, large objects, etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}". 2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`. Same effective behavior, but `fullmatch` makes the intent explicit and removes the redundant anchor characters. 3. Widened the type annotations on all 10 public waterdata functions that accept `monitoring_location_id` from `str | list[str] | None` to `str | Iterable[str] | None`, matching the runtime contract that accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc. Added `from collections.abc import Iterable` to api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extracts the type-and-iterable normalization out of _check_monitoring_location_id into a reusable helper, then wires it into every public waterdata getter at every multi-value string parameter. Closes the gap noted in PR review: previously only `monitoring_location_id` accepted pd.Series / np.ndarray / tuple; other params (parameter_code, statistic_id, state_name, etc.) still silently str-serialized non-list iterables into the request. Changes: - dataretrieval/waterdata/utils.py: new `_normalize_str_iterable(value, param_name)` does the None / str / non-Mapping-Iterable dispatch and per-element type check. `_check_monitoring_location_id` now wraps it and adds the AGENCY-ID hyphen check, which remains monitoring_location_id-specific. - dataretrieval/waterdata/api.py: 153 normalization calls inserted across 11 public functions (get_daily, get_continuous, get_monitoring_locations, get_time_series_metadata, get_latest_continuous, get_latest_daily, get_field_measurements, get_samples, get_stats_por, get_stats_date_range, get_channel). 170 type annotations widened from `str | list[str] | None` / `list[str] | None` to `str | Iterable[str] | None`. Added Iterable import and `_normalize_str_iterable` to imports. Excluded (kept as-is): time-range params (time, last_modified, begin, end, begin_utc, end_utc, datetime) — these have special semantics in _format_api_dates (single-string or two-element range). Out of scope; users who need iterable support there can `.tolist()`. - tests/waterdata_test.py: new TestNormalizeStrIterable class with 10 tests covering str / list / tuple / pd.Series / pd.Index / np.ndarray acceptance, plus int / dict rejection, plus an integration check (mock.patch on get_ogc_data) confirming that passing pd.Series for parameter_code arrives at the inner call as a list, not a stringified Series. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small follow-ups to c206c0c addressing review findings: 1. Restore the AGENCY-ID hint in _check_monitoring_location_id's TypeError. The refactor through _normalize_str_iterable accidentally dropped the trailing "Expected 'AGENCY-ID' format, e.g., 'USGS-01646500'." that the original wrapper carried. Catch+re-raise so monitoring_location_id keeps its helpful error while the generic helper stays generic. Pinned with a new assertion in test_integer_raises_type_error. 2. Document the date-range exclusion in _normalize_str_iterable's docstring. `time`, `last_modified`, `begin`, `end`, `datetime` bypass this helper because _format_api_dates handles their single-string-or-range semantics inside _construct_api_requests; the exclusion lived only in the prior commit message. 3. Single-pass validation. The helper previously did `values = list(value)` followed by a separate `for v in values` isinstance loop. Folded into one loop that builds the list while validating per-element. Plus tests: hoist `from unittest import mock` to module level (matches the rest of the repo's test files) instead of importing it inside the test body. No behavior change for valid inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses Copilot's review DOI-USGS#5 (collapse mechanical per-param calls) and DOI-USGS#1/DOI-USGS#2 along the way: - `_get_args` now normalizes every multi-value string param it sees, using: * a tiny ``_NO_NORMALIZE_PARAMS`` set for the few names that need explicit bypass (``monitoring_location_id``, validated separately; ``time``/``last_modified``/``begin``/``end``/``datetime``, which can contain ``pd.NaT``/None and are parsed by ``_format_api_dates``); * runtime type detection for the rest: scalar non-string knobs (``limit``, ``ssl_check``, ``convert_type``, ``skip_geometry``, ...) and ``list[float]`` params (``bbox``, ``boundingBox``) pass through automatically without listing them. - Strip 153 per-function ``_normalize_str_iterable(...)`` assignments from ``dataretrieval/waterdata/api.py`` and drop the now-unused import. Net: -240 LOC in api.py. - Tighten ``_MONITORING_LOCATION_ID_RE`` from ``.+-.+`` to ``[^-\s]+-[^-\s]+`` (Copilot DOI-USGS#2). Now rejects values with leading/ trailing whitespace or multiple hyphens, which used to pass and then silently return 0 rows from the API. - Fix the ``_normalize_str_iterable`` docstring claim "Used by every public waterdata getter" — now accurate ("from ``_get_args`` for every multi-value string parameter on every waterdata getter that uses ``_get_args``"; ``get_nearest_continuous`` and a few others don't). 26 normalizer/validator tests still pass; 267 + 2 skipped + 4 deselected in the full suite (the 4 deselected are flaky live-API tests that 502 intermittently — unrelated). Ruff lint + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously `time`/`datetime`/`last_modified`/`begin`/`end` were typed as `str | Iterable[str] | None`, but the implementation used `len(...)` and subscripting — generators and other non-Sequence iterables would have raised at runtime, contradicting the annotation. Add a single `list(...)` materialization line right after the str wrap, so any iterable (pandas.Series, numpy.ndarray, generators, sets) flows through cleanly. The half-bounded NaT/None range form is preserved. Verified by passing list / tuple / Series / generator / `[pd.NaT, ...]` through and getting the expected formatted output in each case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small follow-ups to the centralization: 1. Extract `_DATE_RANGE_PARAMS = frozenset({"datetime", "last_modified", "begin", "end", "time"})` once at module level. `_construct_api_requests` previously defined the same set twice (`single_params` for POST/GET routing, `time_periods` for the `_format_api_dates` call); the new `_NO_NORMALIZE_PARAMS` overlapped on the same five names. All three now reuse `_DATE_RANGE_PARAMS`. A future date param means one edit, not three. `_NO_NORMALIZE_PARAMS = _DATE_RANGE_PARAMS | {"monitoring_location_id"}`. 2. Trim `_normalize_str_iterable` docstring from ~37 lines to ~20: drop the over-narration of callers and downstream branching; keep the contract (accepted shapes, return shape, raises). 3. Tighten the `_NO_NORMALIZE_PARAMS` comment to one short paragraph (was 11 lines) and inline the four-branch pass-through cascade in `_get_args` into a single boolean `if ... or ... or ... or ...:` so the per-branch noise comments drop away. Behavior unchanged. 26 normalizer/validator + 22 waterdata_utils tests pass; full suite 267 passed + 2 skipped + 4 deselected (flaky live-API 502s); ruff lint + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adopt named type aliases so the 180 signatures across api.py read as semantic intent ("a StringFilter / StringList / DateRange") instead of the mechanical "str | Iterable[str] | None" busy-work. Plus four correctness fixes from the latest Copilot pass. types.py: StringFilter = Optional[Union[str, Iterable[str]]] Multi-value filter: str OR iterable of str. The runtime normalizes iterables to list inside _get_args. StringList = Optional[Iterable[str]] Comma-joined list of property names. Excludes single str at the type level because ",".join(str) would iterate characters. api.py: - 172 `str | Iterable[str] | None` -> `StringFilter` - 8 `properties: ... | None` -> `properties: StringList` - Drop the now-unused `from collections.abc import Iterable` Correctness fixes: 1. _format_api_dates: handle `None` up front. The new `list(...)` materialization for Series/ndarray/generator support crashed on None even though the signature/docstring promised acceptance. (Copilot DOI-USGS#1) 2. _get_args: add `_LIST_ONLY_STR_PARAMS = {"properties"}` and wrap stray single-string input into a one-element list, so `",".join(properties)` downstream stays safe. (Copilot DOI-USGS#3) 3. _construct_api_requests: `if bbox:` -> `if bbox is not None and len(bbox) > 0`. The truthy check raised ValueError when bbox was a numpy.ndarray with >1 element. Use len() instead of truthy. (Copilot DOI-USGS#2) 4. _NO_NORMALIZE_PARAMS: add `bbox`, `boundingBox`. These are list[float] params; the previous runtime-type heuristic in _get_args handled list/tuple of floats but a numpy.ndarray of floats would have been routed through `_normalize_str_iterable` and rejected as "elements must be strings". (Copilot DOI-USGS#2) Full suite 267 passed + 2 skipped + 4 deselected (flaky live-API 502s); ruff lint + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per code-review feedback: domain-specific type aliases hide what the annotations actually mean. PEP 604 unions are the pythonic idiom and show the caller exactly what's accepted at the call site. - `StringFilter` (172 sites) -> `str | Iterable[str] | None` - `StringList` (8 sites) -> `list[str] | None` Runtime behavior is unchanged: `_get_args` still routes filter params through `_normalize_str_iterable` (accepts str, list, tuple, Series, ndarray, generator) and `properties` through `_LIST_ONLY_STR_PARAMS` (wraps stray single-string input into a one-element list).

…rough Live stress test found that `parameter_code=[60, 65]` (ints) was silently passed to the OGC API, surfacing as a confusing JSONDecodeError when the server returned an error page. The "list-of-non-str pass-through" clause in `_get_args` was a defensive shortcut intended for `bbox`/`boundingBox` (which are `list[float]`), but those params are already covered by `_NO_NORMALIZE_PARAMS`, making the clause redundant AND bug-silencing. Now `_normalize_str_iterable` runs for every non-listed list-shaped param, raising TypeError with the offending element type — same path that already handles `monitoring_location_id=[..., 12345]`.

…day, peak_since) `get_peaks` (merged in fca3d6c) introduced five `int | list[int] | None` filters. The prior commit removed `_get_args`'s "list-of-non-str pass-through" clause to catch user errors like `parameter_code=[60, 65]` client-side — but the same clause was the only thing letting `water_year=[2020]` through. CI surfaced this via `test_get_peaks_water_year_filter`. Add the five known int-list filter names to `_NO_NORMALIZE_PARAMS` so they bypass string-iterable normalization. Existing string-list params still validate, and `parameter_code=[60, 65]` still raises TypeError client-side as intended. If future int-list params are added, they must be opted in here — this is intentional: the cost of one line per new param is a fair price for not silently passing user errors to the API.

…ation to all OGC functions Four issues from Copilot's latest review: 1. `_check_monitoring_location_id` was missing from `get_combined_metadata`, `get_field_measurements_metadata`, and `get_peaks`. Bad mloc inputs to those callers reached the API. 2. `get_channel` built its `args` dict from a raw `locals()` comprehension instead of `_get_args`, so non-string iterables (`pd.Series`, `np.ndarray`, generators) were never materialized before request construction. 3. `get_combined_metadata`'s `thresholds: float | list[float]` filter was being routed through `_normalize_str_iterable`, which would reject `[1.0, 2.0]` as non-string. Added to `_NO_NORMALIZE_PARAMS`. 4. The `properties` annotation was `list[str] | None` in 8 functions, but `_get_args` wraps a single-string input into a list at runtime (via `_LIST_ONLY_STR_PARAMS`). Widened to `str | list[str] | None` so type checkers don't reject the supported `properties="time_series_id"` call shape.

/simplify findings: 1. Move the per-function `monitoring_location_id = _check_monitoring_location_id(...)` into `_get_args` itself. Eliminates 13 copy-paste call sites in api.py and closes the bug class Copilot found twice (four functions had been missed when added piecemeal). New `get_*` functions inherit validation automatically. 2. Drop `_LIST_ONLY_STR_PARAMS` (a one-element `frozenset`). Inline the `properties` special case with a literal `elif k == "properties":` plus a one-line WHY comment. 3. Compress the three-paragraph narration in `_construct_api_requests` to a single line explaining the only non-obvious bit (`len()` instead of truthiness, because of numpy ndarray). 4. Add `begin_utc`/`end_utc` to `_DATE_RANGE_PARAMS`. `get_time_series_metadata` exposes both as range filters but the constant was missing them, so a two-element list would have been treated as a multi-value POST/CQL2 filter instead of being formatted as an ISO-8601 interval. 5. Drop the now-unused `_check_monitoring_location_id` import from api.py.

…measurements_metadata / get_peaks These three metadata functions were added to main while PR 229 was in flight, so their string-filter parameters used the narrower `str | list[str] | None`. PR 229's centralized `_get_args` materializes any non-string Iterable (pd.Series, np.ndarray, generators) before sending the request, so the runtime accepts more than `list[str]`. Bring the type annotations in line with the other 11 OGC getters (`str | Iterable[str] | None`). `properties` is intentionally left as `str | list[str] | None` because the join-by-comma site downstream of `_get_args` only handles `list` correctly after the single-string wrap; mirroring the other getters here.

Two issues: 1. CI failure on `test_get_args_basic` / `test_get_args_with_exclude`. The pre-existing tests used `monitoring_location_id="123"` purely as a placeholder to exercise `_get_args`'s filter/exclude logic. After the previous commit centralized the AGENCY-ID format check into `_get_args`, the bare "123" now raises ValueError before the dict-shaping assertions are reached. Update the placeholder to a well-formed "USGS-123" so the test's actual intent — filter + exclude wiring — still runs. 2. Copilot review: `_format_api_dates` silently accepts a Mapping by materializing its keys (`time={"2024-01-01": "x"}` → `["2024-01-01"]`), the same footgun `_normalize_str_iterable` already rejects elsewhere. Raise TypeError before the `list(...)` call. Adds a unit test.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot noted the public docstrings still described multi-value string filters as "string or list of strings" while the annotations were widened to `str | Iterable[str] | None`. The user-facing docs now lie about the contract — pd.Series / np.ndarray / generators / tuples all work at runtime but read as unsupported. Replaced 165 occurrences across api.py. The 11 `properties` parameter docstrings remain "string or list of strings" because that parameter is intentionally typed `str | list[str] | None` (the comma-join site requires a list after the single-string wrap).

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Previous commit kept `properties: str | list[str] | None` based on a wrong premise — that `",".join(...)` downstream "requires a list." It doesn't: `_get_args` runs `_normalize_str_iterable` for properties exactly like every other multi-value string filter, materializing pd.Series, np.ndarray, generators, and tuples into a list before any downstream code sees them. The stress test already proved all five iterable shapes work at runtime. Bring the annotation and docstring in line with the others: 11 signatures (`str | list[str] | None` -> `str | Iterable[str] | None`) and 11 docstring lines ("string or list of strings" -> "string or iterable of strings").

…in nearest Copilot found that `get_ratings` accepts `monitoring_location_id` and documents the same AGENCY-ID contract, but builds its STAC filter directly without routing through `_get_args` — so the centralized validation never ran. Call `_check_monitoring_location_id` at the top of `get_ratings` and widen the annotation/docstring to `Iterable[str]` for consistency. `get_nearest_continuous` inherits validation via its forwarded call to `get_continuous`, so its behavior is already correct — but its annotation and docstring still advertised `list[str]`. Widen both for parity.

…rim _check branching Audit finds: 1. `monitoring_location_id` was listed in `_NO_NORMALIZE_PARAMS` from when the AGENCY-ID check lived in each public function. After the centralization commit (42852a8), `_get_args` dispatches it via its own `if k == "monitoring_location_id":` branch BEFORE the `_NO_NORMALIZE_PARAMS` check, so the entry is dead code. Remove it and tighten the comment to list only the params that actually need the bypass (date-range, bbox/boundingBox, get_peaks's int filters, get_combined_metadata's thresholds). 2. `_check_monitoring_location_id` had a two-arm if/else that called `_check_id_format` once for the str case and in a loop for the iterable case. Replaced with a one-element-tuple wrap so a single loop covers both shapes — same logic, less code. Confirmed coverage: 15 public functions accept `monitoring_location_id` (12 in api.py, get_ratings in ratings.py, get_nearest_continuous in nearest.py); each rejects bad input client-side. 5 `list[int]` filters (`water_year`, `year`, `month`, `day`, `peak_since`), `bbox`, `boundingBox`, `thresholds`, and 7 date-range params all bypass string-iterable normalization correctly.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

thodson-usgs mentioned this pull request May 6, 2026

Release staging: PRs targeted for v1.1.4 #270

Closed

thodson-usgs requested a review from Copilot May 13, 2026 05:11

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 05:11 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py Outdated

Comment thread dataretrieval/waterdata/utils.py

Comment thread dataretrieval/waterdata/api.py Outdated

thodson-usgs requested a review from Copilot May 13, 2026 14:08

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 14:09 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py Outdated

Comment thread dataretrieval/waterdata/utils.py Outdated

Comment thread dataretrieval/waterdata/api.py

Comment thread dataretrieval/waterdata/api.py

Comment thread dataretrieval/waterdata/api.py Outdated

thodson-usgs requested a review from Copilot May 13, 2026 14:41

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 14:41 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py

Comment thread dataretrieval/waterdata/utils.py Outdated

Comment thread dataretrieval/waterdata/api.py

Comment thread dataretrieval/waterdata/api.py

thodson-usgs mentioned this pull request May 13, 2026

Fix missing comma in get_reference_table doctest example #275

Merged

thodson-usgs force-pushed the validate-monitoring-location-id branch from 4787951 to 4dbb5f9 Compare May 13, 2026 16:14

thodson-usgs requested a review from Copilot May 13, 2026 16:24

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 16:24 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py Outdated

Comment thread dataretrieval/waterdata/api.py

Comment thread dataretrieval/waterdata/api.py Outdated

Comment thread dataretrieval/waterdata/utils.py

thodson-usgs requested a review from Copilot May 13, 2026 16:37

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 16:38 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py

Comment thread dataretrieval/waterdata/utils.py Outdated

thodson-usgs requested a review from Copilot May 13, 2026 18:07

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 18:08 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py

thodson-usgs and others added 5 commits May 13, 2026 13:34

thodson-usgs and others added 12 commits May 13, 2026 13:34

thodson-usgs force-pushed the validate-monitoring-location-id branch from d679375 to 5260a8f Compare May 13, 2026 18:36

thodson-usgs requested a review from Copilot May 13, 2026 18:39

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 18:39 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/api.py Outdated

thodson-usgs requested a review from Copilot May 13, 2026 19:54

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 19:54 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread dataretrieval/waterdata/utils.py

thodson-usgs added 3 commits May 13, 2026 14:59

thodson-usgs requested a review from Copilot May 13, 2026 20:11

Copilot started reviewing on behalf of thodson-usgs May 13, 2026 20:12 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

thodson-usgs marked this pull request as ready for review May 13, 2026 21:27

thodson-usgs merged commit c755f6b into DOI-USGS:main May 13, 2026
12 checks passed

Conversation

thodson-usgs commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Live reproduction

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thodson-usgs commented Apr 13, 2026 •

edited

Loading