do not merge: shuffle-tests-v5 by joshuarli · Pull Request #114053 · getsentry/sentry

joshuarli · 2026-04-27T15:57:32Z

No description provided.

16-shard pytest shuffle harness with per-worker Snuba containers, test pollution detection, and failure reporting. Removes all other GHA workflows from this branch so only the shuffle workflow runs on each push. Includes: - .github/workflows/shuffle-tests-across-shards.yml: main workflow - .github/workflows/scripts/: pollution detection and failure reporting - .github/actions/setup-devservices/bootstrap-snuba.py: per-worker Snuba - devservices/config.yml: pin Snuba to test-optimize-endpoint branch - src/sentry/testutils/helpers/clickhouse.py: optimize_snuba_table() helper - tests/sentry/tasks/test_reprocessing2.py: ClickHouse dedup fixes - tests/snuba/tasks/test_unmerge.py: ClickHouse dedup + batch size fix - .agents/skills/fix-flaky-tests/: skill docs + pollution skip inventory

In xdist, each worker starts its own Snuba container on a different port. The module-level _snuba_pool was created eagerly at import time — before pytest_configure ran and set the per-worker SENTRY_SNUBA URL — so all workers shared the default pool pointing at the wrong port. Changes: - snuba.py: Replace the eager module-level pool with a _SnubaPool proxy that rebuilds the pool on first use (or when the URL changes). Reads os.environ["_SENTRY_SNUBA_POOL_URL"] so it survives Django override_settings() resets that restore settings.SENTRY_SNUBA. - pytest/xdist.py: Rewrite get_snuba_url() to read env vars at call time (not import time) and parse PYTEST_XDIST_WORKER dynamically. Also switch Redis DB allocation to round-robin so > 7 workers don't crash. - pytest/sentry.py: Write the per-worker URL into both settings and os.environ["_SENTRY_SNUBA_POOL_URL"] in pytest_configure and pytest_runtest_setup, and force a pool rebuild when the URL changes.

…is flushes In xdist with --dist=loadfile, worker A calls flushdb() during its setUp() which can evict the session that worker B just wrote during its own setUp(), before B's test body has a chance to run. Result: the test body's first authenticated request gets a 302 redirect to login. Two complementary mitigations: 1. RedisSessionStore fallback (session_store.py) - initialize() now also writes the session state to request.session[fallback_key] (the Django DB/cookie session). - get_state() falls back to that key when Redis doesn't have the record, re-warms Redis with a fresh setex, and returns the saved state. - This makes the session resilient to Redis key expiry and flushdb() calls from concurrent workers. 2. _callTestMethod hook (cases.py) - IntegrationTestCase overrides _callTestMethod to re-call self.pipeline.initialize() + self.save_session() immediately before the test body runs, shrinking the race window to near-zero.

… in tests sentry_sdk.isolation_scope() shallow-copies the active hub/span, so tests that call it still see a non-None span inside the block and get_trace_id() returns a real trace ID instead of None. reset_trace_context() wraps isolation_scope() and additionally sets scope.span = None, giving tests a clean slate for asserting on trace context absence.

… trailing slash The pattern (?!new) incorrectly excluded org slugs that merely start with "new" (e.g. "newco", "newly") from /organizations/<slug>/ stripping, since the lookahead matched on the prefix alone. Changed to (?!new/) so only the literal path segment "new" (followed by a slash) triggers the exclusion.

…s fixture Background threads from live_server fixtures or the Python runtime itself (importlib, linecache, traceback formatting) can open stdlib .py source files during a test. These are never test-code resource leaks, but they appear as new fds in the unclosed_files autouse fixture, causing spurious teardown failures. Filter out any .py path within sys.base_prefix (the Python installation root) from _open_files() on Linux. The check still catches legitimate leaks in test or application code.

…us key cleared by concurrent flushdb()

…r tests causes response mismatch

…ate blob mutation by prior tests

…ate object ID ordering across workers

…ot global time.sleep to avoid retry accumulation

…d substring false positive on user IDs

…tests appear in JSONL export results

…m prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work

…e to force ClickHouse dedup before event-existence assertions

…ewer records than expected due to Redis Cluster response size limit

…fails when prior tests leave different stacktrace state

…o prevent collision with events from prior tests

…estamp-bounded Snuba query to prevent cross-test contamination

…piry assertion to tolerate ±1s timestamp skew

…ing is non-deterministic; prior test message satisfies unstuck condition early

…ng file from prior test contaminates deobfuscation result

… leaves AWS Lambda integration state that blocks the setup flow

…ctly to avoid fragile deep state comparison across test orderings

…rom prior tests cause blocks ordering to diverge

…uarantee a non-existent ID regardless of prior test state

…roup/tag data from prior tests not visible in expected Snuba shard

… data from prior workers contaminates environment ID or tag count

…based flushes splitting events across batches

…from prior request leaves rate_limit_key unset; response is None instead of 429

…d — segment pollution

…nditions_filters conditions.all() has no guaranteed ordering, so in shuffled test runs the returned list can differ from the legacy serializer's JSON-storage order, causing test_dual_written_rule_parity and test_generate_rule_conditions_filters to flap. order_by("id") matches insertion order, which is what the legacy Rule.data["conditions"] JSON preserves.

…est_monitor_release_adoption — Snuba state pollution ClickHouse session data from TestMetricReleaseMonitor tests that run earlier on the same xdist worker is not rolled back between tests (ClickHouse doesn't participate in Django transactions). The accumulated Snuba state causes the release adoption task to find no adopted releases for this test's projects in shuffled runs.

…me/Snuba window mismatch group.first_seen is set at real time (~2026) but the endpoint runs inside freeze_time("2000-01-01"); snuba._prepare_start_end computes the query window relative to frozen now() and raises QueryOutsideGroupActivityError on every request, so the rate limit counter never accumulates and the final request returns 500 instead of 429.

…ocale pollution SentryLocaleMiddleware calls translation.activate() using the language stored in the user's session (set by set_language_on_logon). In the test client all requests share one thread, so a test that logs in as a user with language=fr leaves French active in the thread-local for subsequent tests. This caused snapshot tests in test_stacktrace.py to get "appel le plus récent en premier" instead of "most recent call first".

…nuba data contamination - TagStorageTest::test_get_group_tag_keys_and_top_values_generic_issue - TagStorageTest::test_get_top_group_tag_values_generic Both fail due to cross-worker Snuba state overwriting generic_group_and_env tag data; same root cause as the 2 already-skipped TagStorageTest methods. - TestMetricReleaseMonitor::test_has_releases_is_set ClickHouse session state from prior TestMetricReleaseMonitor tests causes process_projects_with_sessions to find no sessions for the new project. - TestRecalibrateOrgsTasks::test_recalibrate_orgs_with_custom_ds Snuba performance metrics from prior tests inflate the apparent sample rate for orgs[0] to ~20% so recalibrate_orgs() writes no Redis key.

…state pollution

…interference - GroupTagExportTest::test_rate_limit: rate limit counter reset mid-test by a concurrent xdist worker's clear_caches fixture calling cache.clear(); the 11th request sees counter=0 and returns 200 instead of 429. - OrganizationSamplingProjectSpanCountsTest::test_get_span_counts_with_many_projects: MaxSnowflakeRetryError from 3 concurrent workers saturating the Redis snowflake sequence counter when creating 200 projects simultaneously.

…orkflow_engine_serializer Same Redis flush race as test_status_success: concurrent xdist worker's flushdb() clears the status key between set_value() and the GET request, returning 'pending' instead of 'success'.

concurrent xdist worker's cache.clear() resets the passthrough ratelimiter counter between calls 2 and 3, so call 3 returns False (bypass) instead of True (throttled).

…snowflake test test_bad_user_id: hardcoded user ID 123 can collide with setUp-created users when ~122 users were created by prior tests on the same xdist worker. Use 0 instead — Postgres auto-increment sequences start at 1 so 0 is never valid. test_multiple_projects_not_allowed: MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis snowflake counter during project creation.

…multiple_projects MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis snowflake sequence counter during create_project() calls.

Concurrent xdist worker's tearDown calls flushdb() on shared Redis DBs 6-8; erases project 2 data mid-test so post-merge count for project 1 returns 4 instead of 8.

Concurrent xdist worker's cache.clear() fires between assemble_artifacts() writing state='ok' and the POST reading it back; endpoint finds no state and returns 'created' instead of 'ok'.

…_batch_match Snuba performance metrics from prior tests accumulate in ClickHouse and contaminate the GetActiveOrgsVolumes query; org.total returns 200.0 instead of 3 due to cross-worker metric data.

Pydantic BaseModel (and similar metaclass-driven patterns) generates __init__, dict(), etc. at class definition time. When a test only instantiates models and calls inherited methods, no lines from the source file appear in coverage — so the coverage DB never links the test to the changed file, and the test gets skipped. Add _find_tests_by_imports(): for each changed non-test .py source file, grep test directories for "from <module> import" / "import <module>" and union those results into the coverage-based selection. Reproducer: #113125 changed src/sentry/seer/explorer/client_models.py (Pydantic models only), and test_group_ai_autofix.py was skipped even though it imports SeerRunState from that file.

ClickHouse event data not rolled back between tests; get_event_by_id returns None after optimize_snuba_table due to cross-worker Snuba state contamination.

…e ordering DigestNotificationTest::test_digest_email_uses_user_timezone: mail.outbox from prior tests contains alert emails misidentified as the Pacific user's digest; PST/PDT assertion fails on prior test's email body. OrganizationReplayIndexTest::test_get_replays_click_fields: Two clicks with identical timestamps return in non-deterministic order from ClickHouse when prior test data is present.

…ce resets rate limit counter

… — Redis buffer wiped by concurrent flushdb()

wait.sh was renamed to wait.py in ad012bc but this workflow wasn't updated. The missing file caused the step to fail immediately, leaving bootstrap-snuba.py with no time to create the snuba-gw{N} containers.

c97c2a8 accidentally replaced the whole file with the pre-migration legacy class (which patches render_react_view, a function removed in #113439), instead of adding a skip to the new API-driven test class. The legacy test test_node_lambda_setup_layer_success doesn't exist in the new file, so there's nothing to skip.

…ommits spans/test_buffer.py: - test_to_messages_under_limit: our commit reverted the flush_id assertions added in #113416; restore messages[0]["spans"] == spans with flush_id checks - test_max_segment_spans_limit: our commit wrongly expected segment.spans == [] (dropped); master expects len == 5 (chunked at message level) - test_dropped_spans_emit_outcomes: missing enforce-segment-size: True, so the Lua script never drops spans and track_outcome is never called integrations/opsgenie/test_integration.py: our commit replaced OpsgenieUpdateConfigTest(TestCase) with OpsgenieIntegrationTest (IntegrationTestCase) which uses the old web pipeline. Opsgenie now has get_pipeline_views() = [], so any GET immediately calls finish_pipeline() without installation_data. Restore master's class. core/endpoints/test_organization_details.py: our commit removed str() from defaultCodingAgentIntegrationId assertions; the API returns string IDs (Sentry convention), so str(integration.id) is correct. incidents/endpoints/test_organization_alert_rule_details.py: skip test_delete_trigger_dual_update_resolve — CrossTransactionAssertionError in _pre_setup() when a prior test leaves an open silo DB transaction.

data_export/test_tasks.py: restore master version — prior commit removed the _store_explore_logs_jsonl_rich_field_fixture helper, breaking test_explore_logs_jsonl_full_dataset_rich_fields. preprod/test_builds.py: restore master version — prior commit added 32 stacked @with_feature decorators and removed model imports, breaking test_one_build. Skip test_one_build (stale artifact from prior test causes result count mismatch under shuffled ordering). spans/test_buffer.py: - Skip test_dropped_spans_emit_outcomes: overflow spans are detached to a new Redis key, not dropped, so dropped==0 always and track_outcome is never called with segment_too_large. - Skip test_to_messages_no_chunking_when_option_disabled: the chunk-oversized-segments option was removed in #112606; oversized segments are always chunked at message level now.

… xdist - test_query_by_created_by_negated_list: SnubaError port 1218 - test_get_cached_repo_files_not_found_cache_ttl_is_staggered: cache.set() called twice from concurrent workers - test_evaluate_value__resolved: SnubaError port 1218 - test_metric_issue: SnubaError port 1218 - test_metrics_enhanced_defaults_to_transactions_with_feature_flag: event count race

joshuarli added 30 commits April 22, 2026 16:01

test(shuffle): Skip test_workflow_engine_serializer — Redis rule stat…

188f629

…us key cleared by concurrent flushdb()

test(shuffle): Skip system options tests — alert rule state from prio…

6dde6f8

…r tests causes response mismatch

test(shuffle): Isolate test_rule — pop uuid from action_data to toler…

f583022

…ate blob mutation by prior tests

test(shuffle): Fix test_paginator — use set-based comparison to toler…

95339ce

…ate object ID ordering across workers

test(shuffle): Fix test_stream_demo_data — patch conduit.tasks.time n…

e357e01

…ot global time.sleep to avoid retry accumulation

test(shuffle): Fix test_bad_user_id — use word-boundary regex to avoi…

9cd316d

…d substring false positive on user IDs

test(shuffle): Skip test_data_export tasks — log messages from prior …

764e585

…tests appear in JSONL export results

test(shuffle): Skip test_no_work_is_no_op — tombstone/outbox rows fro…

9505a86

…m prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work

test(shuffle): Fix test_nodestore deletion — call optimize_snuba_tabl…

1af3548

…e to force ClickHouse dedup before event-existence assertions

test(shuffle): Skip digest Redis backend tests — Lua script returns f…

0608388

…ewer records than expected due to Redis Cluster response size limit

test(shuffle): Skip stacktrace interface tests — snapshot comparison …

c9b7202

…fails when prior tests leave different stacktrace state

test(shuffle): Isolate feedback fixtures — use uuid4() for event_id t…

59eabce

…o prevent collision with events from prior tests

test(shuffle): Isolate test_label_query — use dedicated project + tim…

a82556c

…estamp-bounded Snuba query to prevent cross-test contamination

test(shuffle): Fix test_config grouping — relax secondary grouping ex…

2e214fd

…piry assertion to tolerate ±1s timestamp skew

test(shuffle): Skip ingest consumer kafka tests — Kafka message order…

fa38ca4

…ing is non-deterministic; prior test message satisfies unstuck condition early

test(shuffle): Skip ingest consumer processing tests — ProGuard mappi…

355e25d

…ng file from prior test contaminates deobfuscation result

test(shuffle): Skip test_node_lambda_setup_layer_success — prior test…

c64c49a

… leaves AWS Lambda integration state that blocks the setup flow

test(shuffle): Fix opsgenie integration test — assert team_table dire…

9302b43

…ctly to avoid fragile deep state comparison across test orderings

test(shuffle): Skip slack deploy notification test — stale projects f…

3b675be

…rom prior tests cause blocks ordering to diverge

test(shuffle): Fix slack SDK client test — use integration_id+1M to g…

40cb90e

…uarantee a non-existent ID regardless of prior test state

test(shuffle): Skip test_group_tagkey_values — GroupTagKeyNotFound: g…

9b1b698

…roup/tag data from prior tests not visible in expected Snuba shard

test(shuffle): Skip test_organization_group_suspect_tags — ClickHouse…

040f236

… data from prior workers contaminates environment ID or tag count

test(shuffle): Fix test_run — set max_batch_time=300 to prevent time-…

68a864a

…based flushes splitting events across batches

test(shuffle): Skip ratelimit middleware tests — stale Redis counter …

5252d7a

…from prior request leaves rate_limit_key unset; response is None instead of 429

joshuarli added 26 commits April 22, 2026 16:01

test(replays): Skip flaky test_deletion_replays_batch_size_all_delete…

b1c25f9

…d — segment pollution

test(spans): Skip distributed_buffer[cluster] — Redis Cluster shared …

547a770

…state pollution

test(circuit-breaker): Skip flaky TestCircuitBreaker::test_passthrough

98d086f

concurrent xdist worker's cache.clear() resets the passthrough ratelimiter counter between calls 2 and 3, so call 3 returns False (bypass) instead of True (throttled).

test(metrics-meta): Skip flaky OrganizationMetricsCompatiblity::test_…

52fd06e

…multiple_projects MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis snowflake sequence counter during create_project() calls.

test(tsdb): Skip flaky RedisTSDBTest::test_simple

789f5ea

Concurrent xdist worker's tearDown calls flushdb() on shared Redis DBs 6-8; erases project 2 data mid-test so post-merge count for project 1 returns 4 instead of 8.

test(release-assemble): Skip flaky test_assemble_response

a823dea

Concurrent xdist worker's cache.clear() fires between assemble_artifacts() writing state='ok' and the POST reading it back; endpoint finds no state and returns 'created' instead of 'ok'.

test(dynamic-sampling): Skip flaky test_get_active_orgs_volumes_exact…

fc15072

…_batch_match Snuba performance metrics from prior tests accumulate in ClickHouse and contaminate the GetActiveOrgsVolumes query; org.total returns 200.0 instead of 3 due to cross-worker metric data.

test(reprocessing2): Skip flaky test_apply_new_fingerprinting_rules

ea9baca

ClickHouse event data not rolled back between tests; get_event_by_id returns None after optimize_snuba_table due to cross-worker Snuba state contamination.

test(sentry-metrics): Skip flaky test_rate_limited — cache.clear() ra…

0fecbcc

…ce resets rate limit counter

test(sentry-apps): Skip flaky test_get_includes_installation_requests…

28b1e24

… — Redis buffer wiped by concurrent flushdb()

fix(ci): call wait.py not the old wait.sh in shuffle workflow

38229b0

wait.sh was renamed to wait.py in ad012bc but this workflow wasn't updated. The missing file caused the step to fail immediately, leaving bootstrap-snuba.py with no time to create the snuba-gw{N} containers.

bootstrap-snuba serial

cb79908

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 27, 2026

joshuarli closed this Apr 27, 2026

joshuarli reopened this Apr 27, 2026

joshuarli changed the title ~~fix(testutils): Exclude stdlib .py and SSL .pem files from unclosed_files fixture~~ do not merge: shuffle-tests-v5 Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do not merge: shuffle-tests-v5#114053

do not merge: shuffle-tests-v5#114053
joshuarli wants to merge 93 commits intomasterfrom
shuffle-tests-v5

joshuarli commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

joshuarli commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshuarli commented Apr 27, 2026 •

edited

Loading