Parse Date/Time/TimeZone triplets in samples and WQP by thodson-usgs · Pull Request #272 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-05-07T12:04:20Z

Summary

Closes #266.

Adds dataretrieval.utils._attach_datetime_columns(df) — a private helper that scans a CSV-derived DataFrame for <prefix>Date / <prefix>Time / <prefix>TimeZone triplets and appends a derived <prefix>DateTime UTC column for each one. Recognizes both the WQX3 / Samples shape (Activity_StartDate, Activity_StartTime, Activity_StartTimeZone) and the legacy WQP shape (ActivityStartDate, ActivityStartTime/Time, ActivityStartTime/TimeZoneCode).
Wires the helper into waterdata.get_samples() and wqp.get_results() (per the issue title's "samples/WQP" scope). Users do not call the helper directly; the <prefix>DateTime columns simply appear on the returned DataFrame.
Original triplet columns are preserved; an existing <prefix>DateTime column on the response is never overwritten. Unknown timezone abbreviations resolve to NaT.
Timezone abbreviations are resolved via the existing dataretrieval.codes.tz mapping (EST/EDT/CST/CDT/MST/MDT/PST/PDT/AKST/AKDT/HST/UTC/GMT and many more), the same source the existing format_datetime() helper uses. Mirrors R dataRetrieval's create_dateTime in importWQP.R.
Also bumps requires-python from >=3.8 to >=3.9 (and the ruff target similarly) to match what CI was already testing. Called out in NEWS.

The worked example in the issue (no API change — the new column just appears):

>>> df, md = waterdata.get_samples(
...     monitoringLocationIdentifier="USGS-11455508",
...     usgsPCode="00631",
...     activityStartDateLower="2024-01-01",
...     activityStartDateUpper="2024-06-01",
...     profile="basicphyschem",
... )
>>> df["Activity_StartDate"][0],   df["Activity_StartTime"][0], df["Activity_StartTimeZone"][0]
('2024-01-09', '10:00:00', 'PST')
>>> df["Activity_StartDateTime"][0]
Timestamp('2024-01-09 18:00:00+0000', tz='UTC')

Test plan

New unit tests in tests/utils_test.py::Test_attach_datetime_columns cover: WQX3 triplet → UTC, legacy WQP triplet → UTC, unknown TZ → NaT, partial-row missing time/tz → NaT, existing target column not overwritten, multiple triplets in one frame, lone *Date column without time/tz left alone.
tests/waterdata_test.py::test_mock_get_samples updated to assert the new Activity_StartDateTime column appears with the correct UTC timestamp on the samples_results.txt fixture.
tests/wqp_test.py::test_get_results (legacy) and test_get_results_WQX3 updated to assert the appropriate *DateTime column appears and parses on each fixture.
tests/waterdata_test.py::test_samples_activity (live) column-count assertion bumped from 95 → 97 to account for Activity_StartDateTime and Activity_EndDateTime.
Full mocked test suite: 283 passed locally (no regressions). Full CI matrix (Ubuntu + Windows × 3.9, 3.13, 3.14) passes.

🤖 Generated with Claude Code

Add a shared utils.attach_datetime_columns helper that scans a CSV-derived DataFrame for <prefix>Date / <prefix>Time / <prefix>TimeZone triplets and appends a derived <prefix>DateTime UTC column for each one, leaving the original triplet columns intact. Recognizes both the WQX3 / Samples naming (Activity_StartDate, Activity_StartTime, Activity_StartTimeZone) and the legacy WQP naming (ActivityStartDate, ActivityStartTime/Time, ActivityStartTime/TimeZoneCode). Mirrors R dataRetrieval's create_dateTime. Wired into waterdata.get_samples and wqp.get_results. Closes DOI-USGS#266. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Replace the lambda-laden _DATETIME_TRIPLET_PATTERNS dict with a flat _TIME_TZ_SUFFIXES tuple of (time_suffix, tz_suffix) pairs; the unused date_suffix field is gone. - Use str.removesuffix("Date") for the prefix swap and resolve the target column name once before iterating patterns, hoisting the existence check out of the inner loop. - Drop the redundant <NA>-mask in _build_utc_datetime; errors="coerce" already turns rows with missing inputs into NaT. - Switch pd.to_datetime from format="mixed" to a fixed "%Y-%m-%d %H:%M:%S %z" so pandas doesn't probe formats per row. - Trim WHAT-comments and per-test docstrings so the new tests match the noise level of the surrounding Test_to_str class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a utility to derive UTC Timestamp columns from <prefix>Date / <prefix>Time / <prefix>TimeZone triplets returned by Samples and WQP CSV endpoints, and wires it into waterdata.get_samples() and wqp.get_results() to address #266.

Changes:

Added dataretrieval.utils.attach_datetime_columns(df) to detect WQX3 and legacy WQP triplet naming patterns and append <prefix>DateTime (UTC) columns without overwriting existing ones.
Integrated the helper into dataretrieval.waterdata.api.get_samples() and dataretrieval.wqp.get_results().
Updated unit/integration tests and NEWS entry to reflect the new derived columns.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`dataretrieval/utils.py`	Introduces `attach_datetime_columns()` and supporting parsing helpers for Date/Time/TimeZone triplets.
`dataretrieval/waterdata/api.py`	Applies `attach_datetime_columns()` to Samples CSV responses returned by `get_samples()`.
`dataretrieval/wqp.py`	Applies `attach_datetime_columns()` to WQP CSV responses returned by `get_results()` and documents the behavior.
`tests/utils_test.py`	Adds unit tests covering triplet detection, UTC conversion, and edge cases (unknown TZ, missing values, no overwrite).
`tests/waterdata_test.py`	Updates Samples mocked/live tests to account for new derived datetime columns and validate a fixture timestamp.
`tests/wqp_test.py`	Updates WQP mocked tests to account for new columns and asserts derived datetime presence.
`NEWS.md`	Documents the new derived datetime behavior for Samples and WQP.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The /simplify pass introduced col.removesuffix("Date"), but the project declares requires-python = ">=3.8" (and the ruff target is py38), and removesuffix was added in Python 3.9 — so the helper would AttributeError at first call on a 3.8 interpreter. Revert to the slice form. Reported by Copilot review on PR DOI-USGS#272. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's matrix already tests only Python 3.9 / 3.13 / 3.14 (and the waterdata test module skips itself on <3.10), but pyproject.toml still declared requires-python = ">=3.8" and ruff was targeting py38. Bring the manifest in line with reality: - requires-python = ">=3.9" - [tool.ruff] target-version = "py39" That unblocks col.removesuffix("Date") in attach_datetime_columns (restored), and surfaces two pre-existing pyupgrade fixes that ruff now applies under the py39 target: - dataretrieval/waterdata/ratings.py: import Iterable from collections.abc instead of typing. - dataretrieval/waterdata/utils.py: zoneinfo is stdlib on 3.9+, so the ZoneInfo import moves into the stdlib group. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumping ruff target-version from py38 to py39 made the formatter prefer parenthesized context managers (a 3.9-PEG-parser feature). The CI lint job picked up the resulting drift in tests/waterdata_filters_test.py; apply the formatter to bring it back in line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

The previous commit (3a6cad9) raised requires-python from >=3.8 to >=3.9 to align pyproject with what CI actually tested. That is a breaking change for any downstream user still on 3.8, so call it out in the changelog. Reported by Copilot review on PR DOI-USGS#272. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Copilot autofix in c635619 reverted one of four parenthesized-with blocks back to the chained form, leaving the file inconsistent under the project's ruff target (py39 prefers the parenthesized form per the 3.9 PEG parser). Re-running ruff format restores all four blocks to the canonical form so ruff format --check passes again. The "parenthesized with is 3.10+ only" concern is technically incorrect on this codebase: the 3.9 PEG parser accepts it, and the last CI run on 3a6cad9 / 2f7cf10 passed test (ubuntu-latest, 3.9) and test (windows-latest, 3.9) with this syntax in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/waterdata_test.py:9

This module is still skipped entirely on Python < 3.10, but the project now declares requires-python >= 3.9 and CI includes a 3.9 job. As a result, waterdata behavior (including the new Activity_*DateTime derivation asserted below) is not tested on one of the declared supported versions. Consider either removing/relaxing the module-level skip or updating the declared minimum Python version to match the versions that can run the waterdata test suite.

import pandas as pd
import pytest
from pandas import DataFrame

if sys.version_info < (3, 10):
    pytest.skip("Skip entire module on Python < 3.10", allow_module_level=True)

The helper is purely an internal post-processing step inside get_samples / get_results — users have no reason to call it directly, and dataretrieval/__init__.py's `from dataretrieval.utils import *` was leaking it into the public API surface as `dataretrieval.attach_datetime_columns`. Underscore-prefix it and update the two call sites plus the unit tests. Also annotate _attach_datetime_columns and _build_utc_datetime with pd.DataFrame / pd.Series / pd.Series → pd.Series signatures, matching the typing style already used in dataretrieval/waterdata/utils.py. Addresses self-review of PR DOI-USGS#272. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two-tuple constant identifying the WQP/Samples Date/Time/TimeZone naming patterns was previously labeled "WQX3 / Samples" and "legacy WQP" — accurate for someone steeped in USGS jargon, opaque for a maintainer reading the file cold. Spell out an example column-name triplet next to each entry so the constant is self-explanatory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

thodson-usgs · 2026-05-08T17:55:08Z

@ldecicco-USGS, this modifies get_samples and wqp to return an additional datetime We also bump the minimum Python version to 3.9. The CI already tested 3.9+, but I still had 3.8+ in the package config.

ldecicco-USGS

Looks good. The only thing I see that is still different between R and Python for this is that the returned table in R sorts by either Activity_StartDateTime (which comes back from WQX3), ActivityStartDateTime (from legacy), or the first set of triple Time/Date/Zone columns (just in case...).

    if ("Activity_StartDateTime" %in% names(retval)) {
      #WQX 3
      retval <- retval[order(retval$Activity_StartDateTime), ]
    } else if ("ActivityStartDateTime" %in% names(retval)) {
      #legacy
      retval <- retval[order(retval$ActivityStartDateTime), ]
    } else {
      retval <- retval[order(retval[[dateCols[1]]]), ]
    }

Otherwise the order coming back from both samples and WQP can get pretty funky.

Not sure if you want to consider that in another PR or add it to this one.

Per ldecicco-USGS's review of DOI-USGS#272: R dataRetrieval ends its WQP/Samples pipeline by sorting the returned table on the activity-start datetime, because the API's natural order is unstable. Mirror that here. _attach_datetime_columns now picks a sort key with the same precedence R uses: 1. Activity_StartDateTime (WQX3 / Samples) 2. ActivityStartDateTime (legacy WQP) 3. first detected *Date column (fallback) The sort runs in addition to (and after) the DateTime-column derivation, and uses ignore_index=True to match the convention in dataretrieval/waterdata/utils.py::_sort_rows. Three new unit tests cover each branch of the precedence; the existing mock samples test was updated to assert the new monotonic-increasing iloc[0] row from the fixture (2023-06-20 09:25 CDT, the earliest sample in the file). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop 5 unit tests and a handful of sort-specific integration assertions: - test_missing_time_or_tz_is_NaT: redundant with test_unknown_timezone_is_NaT — both exercise the same NaT coercion path through pd.to_datetime. - test_multiple_triplets_handled: the samples_results.txt mock fixture already has 6 triplets, so the integration test exercises this. - test_lone_date_column_left_alone: trivially obvious from the loop body. - test_rows_sorted_by_wqx3_activity_start, _legacy_*, _first_date_column_*: per maintainer feedback, the sort behavior doesn't need dedicated test coverage on a private helper. In test_mock_get_samples, drop the iloc[0]/is_monotonic_increasing assertions and replace with a minimal "DateTime column has at least one parsed timestamp" check. The shape assertion (67, 187) still proves all 6 derived DateTime columns were appended. Also drop the now-unused `import pandas as pd` from tests/waterdata_test.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thodson-usgs and others added 2 commits May 7, 2026 07:03

thodson-usgs requested a review from Copilot May 7, 2026 12:29

Copilot started reviewing on behalf of thodson-usgs May 7, 2026 12:29 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread dataretrieval/utils.py

thodson-usgs and others added 3 commits May 7, 2026 07:42

thodson-usgs requested a review from Copilot May 7, 2026 13:25

Copilot started reviewing on behalf of thodson-usgs May 7, 2026 13:26 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread tests/waterdata_filters_test.py

Comment thread tests/waterdata_filters_test.py

Comment thread tests/waterdata_filters_test.py

Comment thread tests/waterdata_filters_test.py

Comment thread pyproject.toml

thodson-usgs and others added 3 commits May 7, 2026 08:45

Potential fix for pull request finding

c635619

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

thodson-usgs requested a review from Copilot May 7, 2026 14:01

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread pyproject.toml

thodson-usgs and others added 2 commits May 7, 2026 10:47

thodson-usgs requested a review from Copilot May 7, 2026 17:49

Copilot started reviewing on behalf of thodson-usgs May 7, 2026 17:49 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread dataretrieval/utils.py

thodson-usgs requested a review from Copilot May 8, 2026 14:37

Copilot started reviewing on behalf of thodson-usgs May 8, 2026 14:38 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread dataretrieval/utils.py

thodson-usgs requested a review from ldecicco-USGS May 8, 2026 17:51

ldecicco-USGS marked this pull request as ready for review May 8, 2026 19:48

ldecicco-USGS approved these changes May 8, 2026

View reviewed changes

thodson-usgs and others added 2 commits May 8, 2026 16:17

thodson-usgs merged commit 0f90912 into DOI-USGS:main May 8, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse Date/Time/TimeZone triplets in samples and WQP#272

Parse Date/Time/TimeZone triplets in samples and WQP#272
thodson-usgs merged 12 commits intoDOI-USGS:mainfrom
thodson-usgs:parse-samples-wqp-datetime

thodson-usgs commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

thodson-usgs commented May 8, 2026

Uh oh!

ldecicco-USGS left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thodson-usgs commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

thodson-usgs commented May 8, 2026

Uh oh!

ldecicco-USGS left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thodson-usgs commented May 7, 2026 •

edited

Loading