Draft
Conversation
16-shard pytest shuffle harness with per-worker Snuba containers, test pollution detection, and failure reporting. Removes all other GHA workflows from this branch so only the shuffle workflow runs on each push. Includes: - .github/workflows/shuffle-tests-across-shards.yml: main workflow - .github/workflows/scripts/: pollution detection and failure reporting - .github/actions/setup-devservices/bootstrap-snuba.py: per-worker Snuba - devservices/config.yml: pin Snuba to test-optimize-endpoint branch - src/sentry/testutils/helpers/clickhouse.py: optimize_snuba_table() helper - tests/sentry/tasks/test_reprocessing2.py: ClickHouse dedup fixes - tests/snuba/tasks/test_unmerge.py: ClickHouse dedup + batch size fix - .agents/skills/fix-flaky-tests/: skill docs + pollution skip inventory
In xdist, each worker starts its own Snuba container on a different port. The module-level _snuba_pool was created eagerly at import time — before pytest_configure ran and set the per-worker SENTRY_SNUBA URL — so all workers shared the default pool pointing at the wrong port. Changes: - snuba.py: Replace the eager module-level pool with a _SnubaPool proxy that rebuilds the pool on first use (or when the URL changes). Reads os.environ["_SENTRY_SNUBA_POOL_URL"] so it survives Django override_settings() resets that restore settings.SENTRY_SNUBA. - pytest/xdist.py: Rewrite get_snuba_url() to read env vars at call time (not import time) and parse PYTEST_XDIST_WORKER dynamically. Also switch Redis DB allocation to round-robin so > 7 workers don't crash. - pytest/sentry.py: Write the per-worker URL into both settings and os.environ["_SENTRY_SNUBA_POOL_URL"] in pytest_configure and pytest_runtest_setup, and force a pool rebuild when the URL changes.
…is flushes
In xdist with --dist=loadfile, worker A calls flushdb() during its setUp()
which can evict the session that worker B just wrote during its own setUp(),
before B's test body has a chance to run. Result: the test body's first
authenticated request gets a 302 redirect to login.
Two complementary mitigations:
1. RedisSessionStore fallback (session_store.py)
- initialize() now also writes the session state to
request.session[fallback_key] (the Django DB/cookie session).
- get_state() falls back to that key when Redis doesn't have the record,
re-warms Redis with a fresh setex, and returns the saved state.
- This makes the session resilient to Redis key expiry and flushdb()
calls from concurrent workers.
2. _callTestMethod hook (cases.py)
- IntegrationTestCase overrides _callTestMethod to re-call
self.pipeline.initialize() + self.save_session() immediately before
the test body runs, shrinking the race window to near-zero.
… in tests sentry_sdk.isolation_scope() shallow-copies the active hub/span, so tests that call it still see a non-None span inside the block and get_trace_id() returns a real trace ID instead of None. reset_trace_context() wraps isolation_scope() and additionally sets scope.span = None, giving tests a clean slate for asserting on trace context absence.
… trailing slash The pattern (?!new) incorrectly excluded org slugs that merely start with "new" (e.g. "newco", "newly") from /organizations/<slug>/ stripping, since the lookahead matched on the prefix alone. Changed to (?!new/) so only the literal path segment "new" (followed by a slash) triggers the exclusion.
…s fixture Background threads from live_server fixtures or the Python runtime itself (importlib, linecache, traceback formatting) can open stdlib .py source files during a test. These are never test-code resource leaks, but they appear as new fds in the unclosed_files autouse fixture, causing spurious teardown failures. Filter out any .py path within sys.base_prefix (the Python installation root) from _open_files() on Linux. The check still catches legitimate leaks in test or application code.
…us key cleared by concurrent flushdb()
…r tests causes response mismatch
…ate blob mutation by prior tests
…ate object ID ordering across workers
…ot global time.sleep to avoid retry accumulation
…d substring false positive on user IDs
…tests appear in JSONL export results
…m prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work
…e to force ClickHouse dedup before event-existence assertions
…ewer records than expected due to Redis Cluster response size limit
…fails when prior tests leave different stacktrace state
…o prevent collision with events from prior tests
…estamp-bounded Snuba query to prevent cross-test contamination
…piry assertion to tolerate ±1s timestamp skew
…ing is non-deterministic; prior test message satisfies unstuck condition early
…ng file from prior test contaminates deobfuscation result
… leaves AWS Lambda integration state that blocks the setup flow
…ctly to avoid fragile deep state comparison across test orderings
…rom prior tests cause blocks ordering to diverge
…uarantee a non-existent ID regardless of prior test state
…roup/tag data from prior tests not visible in expected Snuba shard
… data from prior workers contaminates environment ID or tag count
…based flushes splitting events across batches
…from prior request leaves rate_limit_key unset; response is None instead of 429
…d — segment pollution
…nditions_filters
conditions.all() has no guaranteed ordering, so in shuffled test runs the
returned list can differ from the legacy serializer's JSON-storage order,
causing test_dual_written_rule_parity and test_generate_rule_conditions_filters
to flap. order_by("id") matches insertion order, which is what the legacy
Rule.data["conditions"] JSON preserves.
…est_monitor_release_adoption — Snuba state pollution ClickHouse session data from TestMetricReleaseMonitor tests that run earlier on the same xdist worker is not rolled back between tests (ClickHouse doesn't participate in Django transactions). The accumulated Snuba state causes the release adoption task to find no adopted releases for this test's projects in shuffled runs.
…me/Snuba window mismatch
group.first_seen is set at real time (~2026) but the endpoint runs inside
freeze_time("2000-01-01"); snuba._prepare_start_end computes the query window
relative to frozen now() and raises QueryOutsideGroupActivityError on every
request, so the rate limit counter never accumulates and the final request
returns 500 instead of 429.
…ocale pollution SentryLocaleMiddleware calls translation.activate() using the language stored in the user's session (set by set_language_on_logon). In the test client all requests share one thread, so a test that logs in as a user with language=fr leaves French active in the thread-local for subsequent tests. This caused snapshot tests in test_stacktrace.py to get "appel le plus récent en premier" instead of "most recent call first".
…nuba data contamination - TagStorageTest::test_get_group_tag_keys_and_top_values_generic_issue - TagStorageTest::test_get_top_group_tag_values_generic Both fail due to cross-worker Snuba state overwriting generic_group_and_env tag data; same root cause as the 2 already-skipped TagStorageTest methods. - TestMetricReleaseMonitor::test_has_releases_is_set ClickHouse session state from prior TestMetricReleaseMonitor tests causes process_projects_with_sessions to find no sessions for the new project. - TestRecalibrateOrgsTasks::test_recalibrate_orgs_with_custom_ds Snuba performance metrics from prior tests inflate the apparent sample rate for orgs[0] to ~20% so recalibrate_orgs() writes no Redis key.
…interference - GroupTagExportTest::test_rate_limit: rate limit counter reset mid-test by a concurrent xdist worker's clear_caches fixture calling cache.clear(); the 11th request sees counter=0 and returns 200 instead of 429. - OrganizationSamplingProjectSpanCountsTest::test_get_span_counts_with_many_projects: MaxSnowflakeRetryError from 3 concurrent workers saturating the Redis snowflake sequence counter when creating 200 projects simultaneously.
…orkflow_engine_serializer Same Redis flush race as test_status_success: concurrent xdist worker's flushdb() clears the status key between set_value() and the GET request, returning 'pending' instead of 'success'.
concurrent xdist worker's cache.clear() resets the passthrough ratelimiter counter between calls 2 and 3, so call 3 returns False (bypass) instead of True (throttled).
…snowflake test test_bad_user_id: hardcoded user ID 123 can collide with setUp-created users when ~122 users were created by prior tests on the same xdist worker. Use 0 instead — Postgres auto-increment sequences start at 1 so 0 is never valid. test_multiple_projects_not_allowed: MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis snowflake counter during project creation.
…multiple_projects MaxSnowflakeRetryError when 3 concurrent xdist workers saturate the Redis snowflake sequence counter during create_project() calls.
Concurrent xdist worker's tearDown calls flushdb() on shared Redis DBs 6-8; erases project 2 data mid-test so post-merge count for project 1 returns 4 instead of 8.
Concurrent xdist worker's cache.clear() fires between assemble_artifacts() writing state='ok' and the POST reading it back; endpoint finds no state and returns 'created' instead of 'ok'.
…_batch_match Snuba performance metrics from prior tests accumulate in ClickHouse and contaminate the GetActiveOrgsVolumes query; org.total returns 200.0 instead of 3 due to cross-worker metric data.
Pydantic BaseModel (and similar metaclass-driven patterns) generates __init__, dict(), etc. at class definition time. When a test only instantiates models and calls inherited methods, no lines from the source file appear in coverage — so the coverage DB never links the test to the changed file, and the test gets skipped. Add _find_tests_by_imports(): for each changed non-test .py source file, grep test directories for "from <module> import" / "import <module>" and union those results into the coverage-based selection. Reproducer: #113125 changed src/sentry/seer/explorer/client_models.py (Pydantic models only), and test_group_ai_autofix.py was skipped even though it imports SeerRunState from that file.
ClickHouse event data not rolled back between tests; get_event_by_id returns None after optimize_snuba_table due to cross-worker Snuba state contamination.
…e ordering DigestNotificationTest::test_digest_email_uses_user_timezone: mail.outbox from prior tests contains alert emails misidentified as the Pacific user's digest; PST/PDT assertion fails on prior test's email body. OrganizationReplayIndexTest::test_get_replays_click_fields: Two clicks with identical timestamps return in non-deterministic order from ClickHouse when prior test data is present.
…ce resets rate limit counter
… — Redis buffer wiped by concurrent flushdb()
wait.sh was renamed to wait.py in ad012bc but this workflow wasn't updated. The missing file caused the step to fail immediately, leaving bootstrap-snuba.py with no time to create the snuba-gw{N} containers.
c97c2a8 accidentally replaced the whole file with the pre-migration legacy class (which patches render_react_view, a function removed in #113439), instead of adding a skip to the new API-driven test class. The legacy test test_node_lambda_setup_layer_success doesn't exist in the new file, so there's nothing to skip.
…ommits spans/test_buffer.py: - test_to_messages_under_limit: our commit reverted the flush_id assertions added in #113416; restore messages[0]["spans"] == spans with flush_id checks - test_max_segment_spans_limit: our commit wrongly expected segment.spans == [] (dropped); master expects len == 5 (chunked at message level) - test_dropped_spans_emit_outcomes: missing enforce-segment-size: True, so the Lua script never drops spans and track_outcome is never called integrations/opsgenie/test_integration.py: our commit replaced OpsgenieUpdateConfigTest(TestCase) with OpsgenieIntegrationTest (IntegrationTestCase) which uses the old web pipeline. Opsgenie now has get_pipeline_views() = [], so any GET immediately calls finish_pipeline() without installation_data. Restore master's class. core/endpoints/test_organization_details.py: our commit removed str() from defaultCodingAgentIntegrationId assertions; the API returns string IDs (Sentry convention), so str(integration.id) is correct. incidents/endpoints/test_organization_alert_rule_details.py: skip test_delete_trigger_dual_update_resolve — CrossTransactionAssertionError in _pre_setup() when a prior test leaves an open silo DB transaction.
data_export/test_tasks.py: restore master version — prior commit removed the _store_explore_logs_jsonl_rich_field_fixture helper, breaking test_explore_logs_jsonl_full_dataset_rich_fields. preprod/test_builds.py: restore master version — prior commit added 32 stacked @with_feature decorators and removed model imports, breaking test_one_build. Skip test_one_build (stale artifact from prior test causes result count mismatch under shuffled ordering). spans/test_buffer.py: - Skip test_dropped_spans_emit_outcomes: overflow spans are detached to a new Redis key, not dropped, so dropped==0 always and track_outcome is never called with segment_too_large. - Skip test_to_messages_no_chunking_when_option_disabled: the chunk-oversized-segments option was removed in #112606; oversized segments are always chunked at message level now.
… xdist - test_query_by_created_by_negated_list: SnubaError port 1218 - test_get_cached_repo_files_not_found_cache_ttl_is_staggered: cache.set() called twice from concurrent workers - test_evaluate_value__resolved: SnubaError port 1218 - test_metric_issue: SnubaError port 1218 - test_metrics_enhanced_defaults_to_transactions_with_feature_flag: event count race
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.