Validate monitoring_location_id format in waterdata functions#229
Merged
thodson-usgs merged 21 commits intoMay 13, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds client-side validation/normalization for monitoring_location_id across the WaterData public API to fail fast on malformed IDs and avoid wasting network calls (closes #188).
Changes:
- Introduces
_check_monitoring_location_idhelper indataretrieval.waterdata.utilsto validateAGENCY-IDformat and normalize supported iterables tolist. - Calls the new validator early in the 10 WaterData public functions that accept
monitoring_location_id. - Adds unit tests covering valid/invalid scalar and iterable inputs, plus regression checks via
get_daily.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
dataretrieval/waterdata/utils.py |
Adds the validation/normalization helper used by the public API. |
dataretrieval/waterdata/api.py |
Wires validation into the WaterData public entry points before request construction. |
tests/waterdata_test.py |
Adds regression/unit tests for accepted and rejected monitoring_location_id shapes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
May 13, 2026
1. TypeError messages now use a fixed example ("USGS-01646500") instead
of interpolating the user's value into the suggestion template.
The offending value is still surfaced via "(got {value!r})" so the
user can see what they actually passed — but mappings, large objects,
etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}".
2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched
with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`.
Same effective behavior, but `fullmatch` makes the intent explicit
and removes the redundant anchor characters.
3. Widened the type annotations on all 10 public waterdata functions
that accept `monitoring_location_id` from `str | list[str] | None`
to `str | Iterable[str] | None`, matching the runtime contract that
accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc.
Added `from collections.abc import Iterable` to api.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
May 13, 2026
The second example block in `get_reference_table`'s docstring was
unrunnable: `collection="parameter-codes"` and `query={...}` were on
separate lines with no comma between them, so the doctest text doesn't
parse as a valid call.
Surfaced from PR DOI-USGS#229's verification pass where I executed each public
function's docstring example against the live USGS OGC API.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thodson-usgs
added a commit
that referenced
this pull request
May 13, 2026
The second example block in `get_reference_table`'s docstring was
unrunnable: `collection="parameter-codes"` and `query={...}` were on
separate lines with no comma between them, so the doctest text doesn't
parse as a valid call.
Surfaced from PR #229's verification pass where I executed each public
function's docstring example against the live USGS OGC API.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
May 13, 2026
1. TypeError messages now use a fixed example ("USGS-01646500") instead
of interpolating the user's value into the suggestion template.
The offending value is still surfaced via "(got {value!r})" so the
user can see what they actually passed — but mappings, large objects,
etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}".
2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched
with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`.
Same effective behavior, but `fullmatch` makes the intent explicit
and removes the redundant anchor characters.
3. Widened the type annotations on all 10 public waterdata functions
that accept `monitoring_location_id` from `str | list[str] | None`
to `str | Iterable[str] | None`, matching the runtime contract that
accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc.
Added `from collections.abc import Iterable` to api.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4787951 to
4dbb5f9
Compare
Passing an integer (e.g. 5129115) or a bare string without an agency prefix (e.g. "dog") to any waterdata function silently wasted an API call and returned empty data. Now all ten public functions that accept monitoring_location_id raise before touching the network: - TypeError if the value is not a string or list of strings - ValueError if any string doesn't match the 'AGENCY-ID' format (e.g. 'USGS-01646500') Closes DOI-USGS#188. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous `(str, list)` isinstance check rejected legitimate inputs:
tuple, pandas.Series, pandas.Index, numpy.ndarray, generators. Pre-fix
these would round-trip through requests and either work (tuple) or
silently break the URL (numpy/pandas). Post-fix the function:
- Accepts any non-string iterable whose elements are strings
- Materializes the iterable to a list so downstream comma-join /
POST-CQL2 logic in _construct_api_requests keeps working uniformly
- Returns the (possibly-normalized) value, so callers reassign:
monitoring_location_id = _check_monitoring_location_id(monitoring_location_id)
- Rejects Mapping (e.g. dict) explicitly — iterating a dict yields
keys, which is a footgun
Live-verified against api.waterdata.usgs.gov: passing tuple, pd.Series,
pd.Index, and np.ndarray of "USGS-01646500" all return 3 rows for the
2024-06-01/2024-06-03 window. Passing pd.Series([1646500]) raises
TypeError("monitoring_location_id elements must be strings, not int...")
before any network call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nts) Internal refactor; behavior unchanged. - Add full type hints: ``str | Iterable[str] | None`` in, ``str | list[str] | None`` out, mirroring the runtime contract. - Split into explicit fast paths instead of wrapping-then-unwrapping: the string case validates and returns directly, eliminating the throwaway ``[monitoring_location_id]`` list and the final ``isinstance(str)`` re-check at return. - Invert the Mapping/Iterable check: reject ``Mapping`` (and non-iterables) up front, then ``list()`` the iterable. Reads more linearly than the compound ``isinstance(Iterable) and not isinstance(Mapping)``. - Extract ``_check_id_format(value)`` for the regex/ValueError pair so the format rule lives in one place and the call sites are one-liners. Tests unchanged; 16 validator tests + full suite still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. TypeError messages now use a fixed example ("USGS-01646500") instead
of interpolating the user's value into the suggestion template.
The offending value is still surfaced via "(got {value!r})" so the
user can see what they actually passed — but mappings, large objects,
etc. no longer produce ugly suggestion strings like "USGS-{'k': 'v'}".
2. _MONITORING_LOCATION_ID_RE switched from anchored `^.+-.+$` matched
with `re.match` to un-anchored `.+-.+` matched with `re.fullmatch`.
Same effective behavior, but `fullmatch` makes the intent explicit
and removes the redundant anchor characters.
3. Widened the type annotations on all 10 public waterdata functions
that accept `monitoring_location_id` from `str | list[str] | None`
to `str | Iterable[str] | None`, matching the runtime contract that
accepts tuples, pandas.Series, pandas.Index, numpy.ndarray, etc.
Added `from collections.abc import Iterable` to api.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the type-and-iterable normalization out of _check_monitoring_location_id into a reusable helper, then wires it into every public waterdata getter at every multi-value string parameter. Closes the gap noted in PR review: previously only `monitoring_location_id` accepted pd.Series / np.ndarray / tuple; other params (parameter_code, statistic_id, state_name, etc.) still silently str-serialized non-list iterables into the request. Changes: - dataretrieval/waterdata/utils.py: new `_normalize_str_iterable(value, param_name)` does the None / str / non-Mapping-Iterable dispatch and per-element type check. `_check_monitoring_location_id` now wraps it and adds the AGENCY-ID hyphen check, which remains monitoring_location_id-specific. - dataretrieval/waterdata/api.py: 153 normalization calls inserted across 11 public functions (get_daily, get_continuous, get_monitoring_locations, get_time_series_metadata, get_latest_continuous, get_latest_daily, get_field_measurements, get_samples, get_stats_por, get_stats_date_range, get_channel). 170 type annotations widened from `str | list[str] | None` / `list[str] | None` to `str | Iterable[str] | None`. Added Iterable import and `_normalize_str_iterable` to imports. Excluded (kept as-is): time-range params (time, last_modified, begin, end, begin_utc, end_utc, datetime) — these have special semantics in _format_api_dates (single-string or two-element range). Out of scope; users who need iterable support there can `.tolist()`. - tests/waterdata_test.py: new TestNormalizeStrIterable class with 10 tests covering str / list / tuple / pd.Series / pd.Index / np.ndarray acceptance, plus int / dict rejection, plus an integration check (mock.patch on get_ogc_data) confirming that passing pd.Series for parameter_code arrives at the inner call as a list, not a stringified Series. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups to c206c0c addressing review findings: 1. Restore the AGENCY-ID hint in _check_monitoring_location_id's TypeError. The refactor through _normalize_str_iterable accidentally dropped the trailing "Expected 'AGENCY-ID' format, e.g., 'USGS-01646500'." that the original wrapper carried. Catch+re-raise so monitoring_location_id keeps its helpful error while the generic helper stays generic. Pinned with a new assertion in test_integer_raises_type_error. 2. Document the date-range exclusion in _normalize_str_iterable's docstring. `time`, `last_modified`, `begin`, `end`, `datetime` bypass this helper because _format_api_dates handles their single-string-or-range semantics inside _construct_api_requests; the exclusion lived only in the prior commit message. 3. Single-pass validation. The helper previously did `values = list(value)` followed by a separate `for v in values` isinstance loop. Folded into one loop that builds the list while validating per-element. Plus tests: hoist `from unittest import mock` to module level (matches the rest of the repo's test files) instead of importing it inside the test body. No behavior change for valid inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Copilot's review DOI-USGS#5 (collapse mechanical per-param calls) and DOI-USGS#1/DOI-USGS#2 along the way: - `_get_args` now normalizes every multi-value string param it sees, using: * a tiny ``_NO_NORMALIZE_PARAMS`` set for the few names that need explicit bypass (``monitoring_location_id``, validated separately; ``time``/``last_modified``/``begin``/``end``/``datetime``, which can contain ``pd.NaT``/None and are parsed by ``_format_api_dates``); * runtime type detection for the rest: scalar non-string knobs (``limit``, ``ssl_check``, ``convert_type``, ``skip_geometry``, ...) and ``list[float]`` params (``bbox``, ``boundingBox``) pass through automatically without listing them. - Strip 153 per-function ``_normalize_str_iterable(...)`` assignments from ``dataretrieval/waterdata/api.py`` and drop the now-unused import. Net: -240 LOC in api.py. - Tighten ``_MONITORING_LOCATION_ID_RE`` from ``.+-.+`` to ``[^-\s]+-[^-\s]+`` (Copilot DOI-USGS#2). Now rejects values with leading/ trailing whitespace or multiple hyphens, which used to pass and then silently return 0 rows from the API. - Fix the ``_normalize_str_iterable`` docstring claim "Used by every public waterdata getter" — now accurate ("from ``_get_args`` for every multi-value string parameter on every waterdata getter that uses ``_get_args``"; ``get_nearest_continuous`` and a few others don't). 26 normalizer/validator tests still pass; 267 + 2 skipped + 4 deselected in the full suite (the 4 deselected are flaky live-API tests that 502 intermittently — unrelated). Ruff lint + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `time`/`datetime`/`last_modified`/`begin`/`end` were typed as `str | Iterable[str] | None`, but the implementation used `len(...)` and subscripting — generators and other non-Sequence iterables would have raised at runtime, contradicting the annotation. Add a single `list(...)` materialization line right after the str wrap, so any iterable (pandas.Series, numpy.ndarray, generators, sets) flows through cleanly. The half-bounded NaT/None range form is preserved. Verified by passing list / tuple / Series / generator / `[pd.NaT, ...]` through and getting the expected formatted output in each case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups to the centralization:
1. Extract `_DATE_RANGE_PARAMS = frozenset({"datetime", "last_modified",
"begin", "end", "time"})` once at module level. `_construct_api_requests`
previously defined the same set twice (`single_params` for POST/GET
routing, `time_periods` for the `_format_api_dates` call); the new
`_NO_NORMALIZE_PARAMS` overlapped on the same five names. All three
now reuse `_DATE_RANGE_PARAMS`. A future date param means one edit,
not three. `_NO_NORMALIZE_PARAMS = _DATE_RANGE_PARAMS | {"monitoring_location_id"}`.
2. Trim `_normalize_str_iterable` docstring from ~37 lines to ~20: drop
the over-narration of callers and downstream branching; keep the
contract (accepted shapes, return shape, raises).
3. Tighten the `_NO_NORMALIZE_PARAMS` comment to one short paragraph
(was 11 lines) and inline the four-branch pass-through cascade in
`_get_args` into a single boolean `if ... or ... or ... or ...:` so
the per-branch noise comments drop away. Behavior unchanged.
26 normalizer/validator + 22 waterdata_utils tests pass; full suite
267 passed + 2 skipped + 4 deselected (flaky live-API 502s); ruff lint
+ format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adopt named type aliases so the 180 signatures across api.py read as
semantic intent ("a StringFilter / StringList / DateRange") instead of
the mechanical "str | Iterable[str] | None" busy-work. Plus four
correctness fixes from the latest Copilot pass.
types.py:
StringFilter = Optional[Union[str, Iterable[str]]]
Multi-value filter: str OR iterable of str. The runtime normalizes
iterables to list inside _get_args.
StringList = Optional[Iterable[str]]
Comma-joined list of property names. Excludes single str at the
type level because ",".join(str) would iterate characters.
api.py:
- 172 `str | Iterable[str] | None` -> `StringFilter`
- 8 `properties: ... | None` -> `properties: StringList`
- Drop the now-unused `from collections.abc import Iterable`
Correctness fixes:
1. _format_api_dates: handle `None` up front. The new `list(...)`
materialization for Series/ndarray/generator support crashed on
None even though the signature/docstring promised acceptance.
(Copilot DOI-USGS#1)
2. _get_args: add `_LIST_ONLY_STR_PARAMS = {"properties"}` and wrap
stray single-string input into a one-element list, so
`",".join(properties)` downstream stays safe. (Copilot DOI-USGS#3)
3. _construct_api_requests: `if bbox:` -> `if bbox is not None and
len(bbox) > 0`. The truthy check raised ValueError when bbox was
a numpy.ndarray with >1 element. Use len() instead of truthy.
(Copilot DOI-USGS#2)
4. _NO_NORMALIZE_PARAMS: add `bbox`, `boundingBox`. These are
list[float] params; the previous runtime-type heuristic in
_get_args handled list/tuple of floats but a numpy.ndarray of
floats would have been routed through `_normalize_str_iterable`
and rejected as "elements must be strings". (Copilot DOI-USGS#2)
Full suite 267 passed + 2 skipped + 4 deselected (flaky live-API 502s);
ruff lint + format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per code-review feedback: domain-specific type aliases hide what the annotations actually mean. PEP 604 unions are the pythonic idiom and show the caller exactly what's accepted at the call site. - `StringFilter` (172 sites) -> `str | Iterable[str] | None` - `StringList` (8 sites) -> `list[str] | None` Runtime behavior is unchanged: `_get_args` still routes filter params through `_normalize_str_iterable` (accepts str, list, tuple, Series, ndarray, generator) and `properties` through `_LIST_ONLY_STR_PARAMS` (wraps stray single-string input into a one-element list).
…rough Live stress test found that `parameter_code=[60, 65]` (ints) was silently passed to the OGC API, surfacing as a confusing JSONDecodeError when the server returned an error page. The "list-of-non-str pass-through" clause in `_get_args` was a defensive shortcut intended for `bbox`/`boundingBox` (which are `list[float]`), but those params are already covered by `_NO_NORMALIZE_PARAMS`, making the clause redundant AND bug-silencing. Now `_normalize_str_iterable` runs for every non-listed list-shaped param, raising TypeError with the offending element type — same path that already handles `monitoring_location_id=[..., 12345]`.
…day, peak_since) `get_peaks` (merged in fca3d6c) introduced five `int | list[int] | None` filters. The prior commit removed `_get_args`'s "list-of-non-str pass-through" clause to catch user errors like `parameter_code=[60, 65]` client-side — but the same clause was the only thing letting `water_year=[2020]` through. CI surfaced this via `test_get_peaks_water_year_filter`. Add the five known int-list filter names to `_NO_NORMALIZE_PARAMS` so they bypass string-iterable normalization. Existing string-list params still validate, and `parameter_code=[60, 65]` still raises TypeError client-side as intended. If future int-list params are added, they must be opted in here — this is intentional: the cost of one line per new param is a fair price for not silently passing user errors to the API.
…ation to all OGC functions Four issues from Copilot's latest review: 1. `_check_monitoring_location_id` was missing from `get_combined_metadata`, `get_field_measurements_metadata`, and `get_peaks`. Bad mloc inputs to those callers reached the API. 2. `get_channel` built its `args` dict from a raw `locals()` comprehension instead of `_get_args`, so non-string iterables (`pd.Series`, `np.ndarray`, generators) were never materialized before request construction. 3. `get_combined_metadata`'s `thresholds: float | list[float]` filter was being routed through `_normalize_str_iterable`, which would reject `[1.0, 2.0]` as non-string. Added to `_NO_NORMALIZE_PARAMS`. 4. The `properties` annotation was `list[str] | None` in 8 functions, but `_get_args` wraps a single-string input into a list at runtime (via `_LIST_ONLY_STR_PARAMS`). Widened to `str | list[str] | None` so type checkers don't reject the supported `properties="time_series_id"` call shape.
/simplify findings: 1. Move the per-function `monitoring_location_id = _check_monitoring_location_id(...)` into `_get_args` itself. Eliminates 13 copy-paste call sites in api.py and closes the bug class Copilot found twice (four functions had been missed when added piecemeal). New `get_*` functions inherit validation automatically. 2. Drop `_LIST_ONLY_STR_PARAMS` (a one-element `frozenset`). Inline the `properties` special case with a literal `elif k == "properties":` plus a one-line WHY comment. 3. Compress the three-paragraph narration in `_construct_api_requests` to a single line explaining the only non-obvious bit (`len()` instead of truthiness, because of numpy ndarray). 4. Add `begin_utc`/`end_utc` to `_DATE_RANGE_PARAMS`. `get_time_series_metadata` exposes both as range filters but the constant was missing them, so a two-element list would have been treated as a multi-value POST/CQL2 filter instead of being formatted as an ISO-8601 interval. 5. Drop the now-unused `_check_monitoring_location_id` import from api.py.
…measurements_metadata / get_peaks These three metadata functions were added to main while PR 229 was in flight, so their string-filter parameters used the narrower `str | list[str] | None`. PR 229's centralized `_get_args` materializes any non-string Iterable (pd.Series, np.ndarray, generators) before sending the request, so the runtime accepts more than `list[str]`. Bring the type annotations in line with the other 11 OGC getters (`str | Iterable[str] | None`). `properties` is intentionally left as `str | list[str] | None` because the join-by-comma site downstream of `_get_args` only handles `list` correctly after the single-string wrap; mirroring the other getters here.
Two issues:
1. CI failure on `test_get_args_basic` / `test_get_args_with_exclude`.
The pre-existing tests used `monitoring_location_id="123"` purely as a
placeholder to exercise `_get_args`'s filter/exclude logic. After the
previous commit centralized the AGENCY-ID format check into `_get_args`,
the bare "123" now raises ValueError before the dict-shaping assertions
are reached. Update the placeholder to a well-formed "USGS-123" so the
test's actual intent — filter + exclude wiring — still runs.
2. Copilot review: `_format_api_dates` silently accepts a Mapping by
materializing its keys (`time={"2024-01-01": "x"}` → `["2024-01-01"]`),
the same footgun `_normalize_str_iterable` already rejects elsewhere.
Raise TypeError before the `list(...)` call. Adds a unit test.
d679375 to
5260a8f
Compare
Copilot noted the public docstrings still described multi-value string filters as "string or list of strings" while the annotations were widened to `str | Iterable[str] | None`. The user-facing docs now lie about the contract — pd.Series / np.ndarray / generators / tuples all work at runtime but read as unsupported. Replaced 165 occurrences across api.py. The 11 `properties` parameter docstrings remain "string or list of strings" because that parameter is intentionally typed `str | list[str] | None` (the comma-join site requires a list after the single-string wrap).
Previous commit kept `properties: str | list[str] | None` based on a wrong
premise — that `",".join(...)` downstream "requires a list." It doesn't:
`_get_args` runs `_normalize_str_iterable` for properties exactly like
every other multi-value string filter, materializing pd.Series, np.ndarray,
generators, and tuples into a list before any downstream code sees them.
The stress test already proved all five iterable shapes work at runtime.
Bring the annotation and docstring in line with the others: 11 signatures
(`str | list[str] | None` -> `str | Iterable[str] | None`) and 11
docstring lines ("string or list of strings" -> "string or iterable of
strings").
…in nearest Copilot found that `get_ratings` accepts `monitoring_location_id` and documents the same AGENCY-ID contract, but builds its STAC filter directly without routing through `_get_args` — so the centralized validation never ran. Call `_check_monitoring_location_id` at the top of `get_ratings` and widen the annotation/docstring to `Iterable[str]` for consistency. `get_nearest_continuous` inherits validation via its forwarded call to `get_continuous`, so its behavior is already correct — but its annotation and docstring still advertised `list[str]`. Widen both for parity.
…rim _check branching Audit finds: 1. `monitoring_location_id` was listed in `_NO_NORMALIZE_PARAMS` from when the AGENCY-ID check lived in each public function. After the centralization commit (42852a8), `_get_args` dispatches it via its own `if k == "monitoring_location_id":` branch BEFORE the `_NO_NORMALIZE_PARAMS` check, so the entry is dead code. Remove it and tighten the comment to list only the params that actually need the bypass (date-range, bbox/boundingBox, get_peaks's int filters, get_combined_metadata's thresholds). 2. `_check_monitoring_location_id` had a two-arm if/else that called `_check_id_format` once for the str case and in a loop for the iterable case. Replaced with a one-element-tuple wrap so a single loop covers both shapes — same logic, less code. Confirmed coverage: 15 public functions accept `monitoring_location_id` (12 in api.py, get_ratings in ratings.py, get_nearest_continuous in nearest.py); each rejects bad input client-side. 5 `list[int]` filters (`water_year`, `year`, `month`, `day`, `peak_since`), `bbox`, `boundingBox`, `thresholds`, and 7 date-range params all bypass string-iterable normalization correctly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #188.
Passing an integer or a bare string (without the required
AGENCY-IDprefix) to a waterdata function silently wasted an API call and returned confusing empty results. Now all 10 public functions that acceptmonitoring_location_idvalidate the input immediately, normalize iterables to a list, and raise before any network call when the input is wrong.Functions covered:
get_daily,get_continuous,get_monitoring_locations,get_time_series_metadata,get_latest_continuous,get_latest_daily,get_field_measurements,get_stats_por,get_stats_date_range,get_channelAccepted shapes (
monitoring_location_id=...):None— optional parameter, passes throughAGENCY-IDformat (e.g."USGS-01646500")list,tuple,pandas.Series,pandas.Index,numpy.ndarray, generators, etc. Iterables are materialized to alistinternally so the downstream comma-join / POST-CQL2 logic sees a uniform shape.Rejected shapes:
int,float, etc. (not iterable) →TypeError"01646500") →ValueError("AGENCY-ID format")pd.Series([1, 2, 3]),np.array([1, 2, 3])) →TypeErrordict(and otherMappingtypes) →TypeError. Iterating a dict yields keys, which would be a footgun; if you really want the keys, passlist(d)explicitly.The validation rule mirrors the OGC API spec, which describes
monitoring_location_idas "the agency code … and the ID number of the monitoring location, separated by a hyphen (e.g.USGS-02238500)". The spec doesn't carry a machine-readablepatternconstraint — the server returns an empty result for malformed input rather than a 400 — so this is a client-side safety net.Live reproduction
Both calls below hit
api.waterdata.usgs.govpre-fix and silently return an emptyDataFramewith no indication that anything went wrong. Post-fix, both raise locally before any network call.Iterable shapes also live-verified — these all succeed post-fix (3 rows each, against
api.waterdata.usgs.gov):Pre-fix,
tuplewould have round-tripped (downstreamisinstance(v, (list, tuple))accepted it); the pandas/numpy shapes would have been str-serialized byrequestsinto a broken URL and silently returned empty.Test plan
TestCheckMonitoringLocationIdcovering: valid string / list / tuple /pd.Series/pd.Index/np.ndarray;None; bare integer; integer in list;pd.Series/np.ndarrayof ints; missing agency prefix; bare site number;dictrejection; end-to-endget_dailychecks for both error typesapi.waterdata.usgs.govfor both the rejection cases (above) and the iterable-acceptance cases (3 rows each for tuple / Series / Index / ndarray)🤖 Generated with Claude Code