chore(ci): run benchmark sanity checks on size-xl-x64 runner by danceratopz · Pull Request #3012 · ethereum/execution-specs

danceratopz · 2026-06-18T12:01:11Z

🗒️ Description

Run the benchmark sanity-checks jobs (Benchmark Gas Values, Fixed Opcode Count CLI, Fixed Opcode Count Config) on the size-xl-x64 self-hosted runner instead of ubuntu-latest.

The standard ubuntu-latest runner only exposes 2 cores, so fill/pytest with -n auto --maxprocesses 10 was capped at 2 workers, making Benchmark Gas Values take roughly 28 minutes. The XL runner has enough cores for -n auto to scale toward the existing --maxprocesses 10 cap, matching the runner already used by test.yaml.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
just static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).

The standard `ubuntu-latest` runner exposes only 2 cores, so `-n auto --maxprocesses 10` was capped at 2 workers. Use the `size-xl-x64` self-hosted runner so the existing `--maxprocesses 10` cap can bind, matching `test.yaml`.

codecov · 2026-06-18T12:09:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.20%. Comparing base (5f8c109) to head (d14ee17).
⚠️ Report is 18 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #3012   +/-   ##
================================================
  Coverage            93.20%   93.20%           
================================================
  Files                  620      620           
  Lines                38777    38777           
  Branches              3342     3342           
================================================
  Hits                 36144    36144           
  Misses                1773     1773           
  Partials               860      860

Flag	Coverage Δ
unittests	`93.20% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The self-hosted `size-xl-x64` image lacks `build-essential`, `pkg-config`, and `libsecp256k1-dev`, so `coincurve` failed to build from source. Add the `setup-env` action, matching `test.yaml`.

Drop `enable-cache: "false"` so the benchmark `setup-uv` steps use the action default, caching uv's resolved dependencies and the from-source `coincurve` build, matching `test.yaml`.

Remove the `--maxprocesses 10` cap so `-n auto` can use all 16 cores on the `size-xl-x64` runner, now that ethereum#2751 cut per-worker peak RSS to roughly 3.4 GB. Add `--durations=100` to the fill calls to surface per-test timings and find the serial bottleneck.

danceratopz · 2026-06-19T07:28:31Z

Thanks for jumping in with a review @LouisTsai-Csie. It didn't lead to an improvement (which is curious). Will revisit when I get the chance, I think we need #2693

danceratopz · 2026-06-29T11:47:32Z

Thanks for jumping in with a review @LouisTsai-Csie. It didn't lead to an improvement (which is curious). Will revisit when I get the chance, I think we need #2693

My friend told me that:

... the Phase 2 "slowest durations" has a cliff:
465.28s  test_keccak_max_permutations[...blockchain_test...]@t8n-cache-bc5796d4
464.94s  test_keccak_max_permutations[...blockchain_test_engine_x...]@t8n-cache-bc5796d4
408.27s  test_keccak_max_permutations[...blockchain_test_engine...]@t8n-cache-bc5796d4
 12.29s  test_point_evaluation_uncachable   ← next slowest
One test eats ~465s per format; everything else is ≤12s. And note the shared @t8n-cache-bc5796d4 suffix: that's the --dist=loadgroup key, so all three format-variants are pinned to a single xdist worker and run serially → ~1338s (~22 min) on one core. The other 15 cores finish the entire rest of the suite and then sit idle.

We only ignore these test cases so far:

An easy quick win would be to mark this test slow:

chore(tests-benchmark, ci): mark test_keccak_max_permutations as slow #3058

But tbh, I think there's an easy optimization to be had here that would help releases; will work on that. Perhaps it will even supersede #3058.

danceratopz · 2026-07-01T02:57:57Z

#3057 and #3060 have brought the run times down to ~5 min (for now); let's leave this on the Github runners. We can always bump later if need be.

fix(ci): add setup-env to benchmark sanity checks for coincurve build

eeec93e

The self-hosted `size-xl-x64` image lacks `build-essential`, `pkg-config`, and `libsecp256k1-dev`, so `coincurve` failed to build from source. Add the `setup-env` action, matching `test.yaml`.

LouisTsai-Csie approved these changes Jun 18, 2026

View reviewed changes

danceratopz marked this pull request as draft June 18, 2026 12:36

danceratopz added C-chore Category: chore A-test-benchmark Area: execution_testing.benchmark and tests/benchmark A-ci Area: Continuous Integration labels Jun 18, 2026

danceratopz added 2 commits June 18, 2026 15:04

chore(ci): enable uv cache for benchmark jobs

928293b

Drop `enable-cache: "false"` so the benchmark `setup-uv` steps use the action default, caching uv's resolved dependencies and the from-source `coincurve` build, matching `test.yaml`.

danceratopz mentioned this pull request Jun 29, 2026

chore(ci): speed up bench-gas by filling only the blockchain_test format #3057

Merged

4 tasks

danceratopz closed this Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(ci): run benchmark sanity checks on size-xl-x64 runner#3012

chore(ci): run benchmark sanity checks on size-xl-x64 runner#3012
danceratopz wants to merge 4 commits into
ethereum:forks/amsterdamfrom
danceratopz:speed-up-gas-bench

danceratopz commented Jun 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

danceratopz commented Jun 19, 2026

Uh oh!

danceratopz commented Jun 29, 2026

Uh oh!

danceratopz commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danceratopz commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

danceratopz commented Jun 19, 2026

Uh oh!

danceratopz commented Jun 29, 2026

Uh oh!

danceratopz commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danceratopz commented Jun 18, 2026 •

edited

Loading

codecov Bot commented Jun 18, 2026 •

edited

Loading