Skip to content

do not merge: shuffle-tests-v5#114053

Draft
joshuarli wants to merge 93 commits intomasterfrom
shuffle-tests-v5
Draft

do not merge: shuffle-tests-v5#114053
joshuarli wants to merge 93 commits intomasterfrom
shuffle-tests-v5

Conversation

@joshuarli
Copy link
Copy Markdown
Member

@joshuarli joshuarli commented Apr 27, 2026

No description provided.

16-shard pytest shuffle harness with per-worker Snuba containers, test
pollution detection, and failure reporting. Removes all other GHA workflows
from this branch so only the shuffle workflow runs on each push.

Includes:
- .github/workflows/shuffle-tests-across-shards.yml: main workflow
- .github/workflows/scripts/: pollution detection and failure reporting
- .github/actions/setup-devservices/bootstrap-snuba.py: per-worker Snuba
- devservices/config.yml: pin Snuba to test-optimize-endpoint branch
- src/sentry/testutils/helpers/clickhouse.py: optimize_snuba_table() helper
- tests/sentry/tasks/test_reprocessing2.py: ClickHouse dedup fixes
- tests/snuba/tasks/test_unmerge.py: ClickHouse dedup + batch size fix
- .agents/skills/fix-flaky-tests/: skill docs + pollution skip inventory
In xdist, each worker starts its own Snuba container on a different port.
The module-level _snuba_pool was created eagerly at import time — before
pytest_configure ran and set the per-worker SENTRY_SNUBA URL — so all
workers shared the default pool pointing at the wrong port.

Changes:
- snuba.py: Replace the eager module-level pool with a _SnubaPool proxy
  that rebuilds the pool on first use (or when the URL changes). Reads
  os.environ["_SENTRY_SNUBA_POOL_URL"] so it survives Django
  override_settings() resets that restore settings.SENTRY_SNUBA.
- pytest/xdist.py: Rewrite get_snuba_url() to read env vars at call time
  (not import time) and parse PYTEST_XDIST_WORKER dynamically. Also
  switch Redis DB allocation to round-robin so > 7 workers don't crash.
- pytest/sentry.py: Write the per-worker URL into both settings and
  os.environ["_SENTRY_SNUBA_POOL_URL"] in pytest_configure and
  pytest_runtest_setup, and force a pool rebuild when the URL changes.
…is flushes

In xdist with --dist=loadfile, worker A calls flushdb() during its setUp()
which can evict the session that worker B just wrote during its own setUp(),
before B's test body has a chance to run. Result: the test body's first
authenticated request gets a 302 redirect to login.

Two complementary mitigations:

1. RedisSessionStore fallback (session_store.py)
   - initialize() now also writes the session state to
     request.session[fallback_key] (the Django DB/cookie session).
   - get_state() falls back to that key when Redis doesn't have the record,
     re-warms Redis with a fresh setex, and returns the saved state.
   - This makes the session resilient to Redis key expiry and flushdb()
     calls from concurrent workers.

2. _callTestMethod hook (cases.py)
   - IntegrationTestCase overrides _callTestMethod to re-call
     self.pipeline.initialize() + self.save_session() immediately before
     the test body runs, shrinking the race window to near-zero.
… in tests

sentry_sdk.isolation_scope() shallow-copies the active hub/span, so tests
that call it still see a non-None span inside the block and get_trace_id()
returns a real trace ID instead of None.

reset_trace_context() wraps isolation_scope() and additionally sets
scope.span = None, giving tests a clean slate for asserting on trace
context absence.
… trailing slash

The pattern (?!new) incorrectly excluded org slugs that merely start with
"new" (e.g. "newco", "newly") from /organizations/<slug>/ stripping, since
the lookahead matched on the prefix alone. Changed to (?!new/) so only the
literal path segment "new" (followed by a slash) triggers the exclusion.
…s fixture

Background threads from live_server fixtures or the Python runtime itself
(importlib, linecache, traceback formatting) can open stdlib .py source
files during a test. These are never test-code resource leaks, but they
appear as new fds in the unclosed_files autouse fixture, causing spurious
teardown failures.

Filter out any .py path within sys.base_prefix (the Python installation
root) from _open_files() on Linux. The check still catches legitimate
leaks in test or application code.
…ot global time.sleep to avoid retry accumulation
…m prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work
…e to force ClickHouse dedup before event-existence assertions
…ewer records than expected due to Redis Cluster response size limit
…fails when prior tests leave different stacktrace state
…o prevent collision with events from prior tests
…estamp-bounded Snuba query to prevent cross-test contamination
…piry assertion to tolerate ±1s timestamp skew
…ing is non-deterministic; prior test message satisfies unstuck condition early
…ng file from prior test contaminates deobfuscation result
… leaves AWS Lambda integration state that blocks the setup flow
…ctly to avoid fragile deep state comparison across test orderings
…rom prior tests cause blocks ordering to diverge
…uarantee a non-existent ID regardless of prior test state
…roup/tag data from prior tests not visible in expected Snuba shard
… data from prior workers contaminates environment ID or tag count
…based flushes splitting events across batches
…from prior request leaves rate_limit_key unset; response is None instead of 429
…nditions_filters

conditions.all() has no guaranteed ordering, so in shuffled test runs the
returned list can differ from the legacy serializer's JSON-storage order,
causing test_dual_written_rule_parity and test_generate_rule_conditions_filters
to flap.  order_by("id") matches insertion order, which is what the legacy
Rule.data["conditions"] JSON preserves.
…est_monitor_release_adoption — Snuba state pollution

ClickHouse session data from TestMetricReleaseMonitor tests that run earlier
on the same xdist worker is not rolled back between tests (ClickHouse doesn't
participate in Django transactions). The accumulated Snuba state causes the
release adoption task to find no adopted releases for this test's projects in
shuffled runs.
…me/Snuba window mismatch

group.first_seen is set at real time (~2026) but the endpoint runs inside
freeze_time("2000-01-01"); snuba._prepare_start_end computes the query window
relative to frozen now() and raises QueryOutsideGroupActivityError on every
request, so the rate limit counter never accumulates and the final request
returns 500 instead of 429.
…ocale pollution

SentryLocaleMiddleware calls translation.activate() using the language stored
in the user's session (set by set_language_on_logon). In the test client all
requests share one thread, so a test that logs in as a user with language=fr
leaves French active in the thread-local for subsequent tests. This caused
snapshot tests in test_stacktrace.py to get "appel le plus récent en premier"
instead of "most recent call first".
…nuba data contamination

- TagStorageTest::test_get_group_tag_keys_and_top_values_generic_issue
- TagStorageTest::test_get_top_group_tag_values_generic
  Both fail due to cross-worker Snuba state overwriting generic_group_and_env
  tag data; same root cause as the 2 already-skipped TagStorageTest methods.

- TestMetricReleaseMonitor::test_has_releases_is_set
  ClickHouse session state from prior TestMetricReleaseMonitor tests causes
  process_projects_with_sessions to find no sessions for the new project.

- TestRecalibrateOrgsTasks::test_recalibrate_orgs_with_custom_ds
  Snuba performance metrics from prior tests inflate the apparent sample rate
  for orgs[0] to ~20% so recalibrate_orgs() writes no Redis key.
…interference

- GroupTagExportTest::test_rate_limit: rate limit counter reset mid-test by a
  concurrent xdist worker's clear_caches fixture calling cache.clear(); the
  11th request sees counter=0 and returns 200 instead of 429.

- OrganizationSamplingProjectSpanCountsTest::test_get_span_counts_with_many_projects:
  MaxSnowflakeRetryError from 3 concurrent workers saturating the Redis snowflake
  sequence counter when creating 200 projects simultaneously.
…orkflow_engine_serializer

Same Redis flush race as test_status_success: concurrent xdist worker's
flushdb() clears the status key between set_value() and the GET request,
returning 'pending' instead of 'success'.
concurrent xdist worker's cache.clear() resets the passthrough ratelimiter
counter between calls 2 and 3, so call 3 returns False (bypass) instead of
True (throttled).
…snowflake test

test_bad_user_id: hardcoded user ID 123 can collide with setUp-created users
when ~122 users were created by prior tests on the same xdist worker. Use 0
instead — Postgres auto-increment sequences start at 1 so 0 is never valid.

test_multiple_projects_not_allowed: MaxSnowflakeRetryError when 3 concurrent
xdist workers saturate the Redis snowflake counter during project creation.
…multiple_projects

MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis
snowflake sequence counter during create_project() calls.
Concurrent xdist worker's tearDown calls flushdb() on shared Redis DBs 6-8;
erases project 2 data mid-test so post-merge count for project 1 returns 4
instead of 8.
Concurrent xdist worker's cache.clear() fires between assemble_artifacts()
writing state='ok' and the POST reading it back; endpoint finds no state
and returns 'created' instead of 'ok'.
…_batch_match

Snuba performance metrics from prior tests accumulate in ClickHouse and
contaminate the GetActiveOrgsVolumes query; org.total returns 200.0 instead
of 3 due to cross-worker metric data.
Pydantic BaseModel (and similar metaclass-driven patterns) generates
__init__, dict(), etc. at class definition time. When a test only
instantiates models and calls inherited methods, no lines from the
source file appear in coverage — so the coverage DB never links the
test to the changed file, and the test gets skipped.

Add _find_tests_by_imports(): for each changed non-test .py source
file, grep test directories for "from <module> import" /
"import <module>" and union those results into the coverage-based
selection.

Reproducer: #113125 changed src/sentry/seer/explorer/client_models.py
(Pydantic models only), and test_group_ai_autofix.py was skipped even
though it imports SeerRunState from that file.
ClickHouse event data not rolled back between tests; get_event_by_id returns
None after optimize_snuba_table due to cross-worker Snuba state contamination.
…e ordering

DigestNotificationTest::test_digest_email_uses_user_timezone:
mail.outbox from prior tests contains alert emails misidentified as the
Pacific user's digest; PST/PDT assertion fails on prior test's email body.

OrganizationReplayIndexTest::test_get_replays_click_fields:
Two clicks with identical timestamps return in non-deterministic order from
ClickHouse when prior test data is present.
… — Redis buffer wiped by concurrent flushdb()
wait.sh was renamed to wait.py in ad012bc but this workflow
wasn't updated. The missing file caused the step to fail immediately,
leaving bootstrap-snuba.py with no time to create the snuba-gw{N}
containers.
c97c2a8 accidentally replaced the whole file with the pre-migration
legacy class (which patches render_react_view, a function removed in
#113439), instead of adding a skip to the new API-driven test class.
The legacy test test_node_lambda_setup_layer_success doesn't exist in
the new file, so there's nothing to skip.
…ommits

spans/test_buffer.py:
- test_to_messages_under_limit: our commit reverted the flush_id
  assertions added in #113416; restore messages[0]["spans"] == spans
  with flush_id checks
- test_max_segment_spans_limit: our commit wrongly expected segment.spans
  == [] (dropped); master expects len == 5 (chunked at message level)
- test_dropped_spans_emit_outcomes: missing enforce-segment-size: True,
  so the Lua script never drops spans and track_outcome is never called

integrations/opsgenie/test_integration.py: our commit replaced
OpsgenieUpdateConfigTest(TestCase) with OpsgenieIntegrationTest
(IntegrationTestCase) which uses the old web pipeline. Opsgenie now
has get_pipeline_views() = [], so any GET immediately calls
finish_pipeline() without installation_data. Restore master's class.

core/endpoints/test_organization_details.py: our commit removed str()
from defaultCodingAgentIntegrationId assertions; the API returns string
IDs (Sentry convention), so str(integration.id) is correct.

incidents/endpoints/test_organization_alert_rule_details.py: skip
test_delete_trigger_dual_update_resolve — CrossTransactionAssertionError
in _pre_setup() when a prior test leaves an open silo DB transaction.
data_export/test_tasks.py: restore master version — prior commit removed
the _store_explore_logs_jsonl_rich_field_fixture helper, breaking
test_explore_logs_jsonl_full_dataset_rich_fields.

preprod/test_builds.py: restore master version — prior commit added
32 stacked @with_feature decorators and removed model imports, breaking
test_one_build. Skip test_one_build (stale artifact from prior test
causes result count mismatch under shuffled ordering).

spans/test_buffer.py:
- Skip test_dropped_spans_emit_outcomes: overflow spans are detached to
  a new Redis key, not dropped, so dropped==0 always and track_outcome
  is never called with segment_too_large.
- Skip test_to_messages_no_chunking_when_option_disabled: the
  chunk-oversized-segments option was removed in #112606; oversized
  segments are always chunked at message level now.
… xdist

- test_query_by_created_by_negated_list: SnubaError port 1218
- test_get_cached_repo_files_not_found_cache_ttl_is_staggered: cache.set() called twice from concurrent workers
- test_evaluate_value__resolved: SnubaError port 1218
- test_metric_issue: SnubaError port 1218
- test_metrics_enhanced_defaults_to_transactions_with_feature_flag: event count race
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 27, 2026
@joshuarli joshuarli closed this Apr 27, 2026
@joshuarli joshuarli reopened this Apr 27, 2026
@joshuarli joshuarli changed the title fix(testutils): Exclude stdlib .py and SSL .pem files from unclosed_files fixture do not merge: shuffle-tests-v5 Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant