chore(ci): speed up bench-gas by filling only the blockchain_test format#3057
chore(ci): speed up bench-gas by filling only the blockchain_test format#3057danceratopz wants to merge 1 commit into
blockchain_test format#3057Conversation
The bench-gas smoke check filled all fixture formats via `--generate-all-formats`, but `json_loader` only replays `blockchain_test` against EELS, so the `engine` and `engine_x` fixtures were filled but never verified, spending roughly two-thirds of the fill for no coverage. Restructure the recipe into three steps: Generate the pre-alloc groups (which keeps the `BlockchainEngineX` code path smoke-tested), fill only the `blockchain_test` format, then verify against EELS. Add `--no-dev --group test` to the PyPy verify so its environment builds without the dev Rust crates, matching the `fill-pypy` recipe. Addresses ethereum#2673.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #3057 +/- ##
================================================
Coverage 93.22% 93.22%
================================================
Files 624 624
Lines 36926 36926
Branches 3377 3377
================================================
Hits 34424 34424
Misses 1708 1708
Partials 794 794
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
This cuts CI run time from ~25-26 min to ~12 min, based on the run here (recent run without this change: https://github.com/ethereum/execution-specs/actions/runs/28234305627/job/83645530740). Not filling the |
blockchain_test format
spencer-tb
left a comment
There was a problem hiding this comment.
Happy with this overall! Thanks for the quick wins.
| bench-gas *args: | ||
| @mkdir -p "{{ output_dir }}/bench-gas/tmp" "{{ output_dir }}/bench-gas/logs" | ||
| @echo "==> Step 1/2: Filling benchmark fixtures with configured EVM (EVM_BIN={{ evm_bin }})" | ||
| @echo "==> Step 1/3: Generating pre-alloc groups (smoke-tests the BlockchainEngineX path)" |
There was a problem hiding this comment.
How essential is this phase for CI?
There was a problem hiding this comment.
I believe this checks the most fragile part of engine_x filling by checking the prealloc groups are good. For example, that no hard-coded addresses were inadvertently added to the tests (that could lead to clashes).
| @echo "==> Step 2/3: Filling blockchain_test fixtures with configured EVM (EVM_BIN={{ evm_bin }})" | ||
| uv run fill \ | ||
| --evm-bin="{{ evm_bin }}" \ | ||
| --gas-benchmark-values 1 \ |
There was a problem hiding this comment.
If we can use 0.9 here then I think we might as well?
There was a problem hiding this comment.
It's a small gain and it's not entirely for free (requires PR, but the changeset is not huge):
I was leaning towards skipping this, will re-check it.
🗒️ Description
bench-gas(theBenchmark Gas Valuesleg of theBenchmarkingworkflow) is the longest job in CI. It is a correctness smoke check: It fills thetests/benchmark/computesuite with the configured EVM (geth) and replays the fixtures through EELS, confirming the two implementations agree. It is not the released benchmark artifact, which is built separately from the release matrix in.github/configs/feature.yamland is left untouched here.This PR restructures the recipe into three steps:
fill --generate-pre-alloc-groups: Generate the pre-allocation groups. This keeps theBlockchainEngineXcode path smoke-tested, since pre-alloc generation is the part unique to the all-formats run and where its risk is concentrated.fill -m "blockchain_test and (not derived_test) and (not slow)": Fill only theblockchain_testformat, the one EELS consumes.json_loader. Added--no-dev --group testso the PyPy environment builds without the dev Rust crates, matching thefill-pypyrecipe.🔗 Related Issues or PRs
Addresses #2673. Alternative to #2693 (accepting float
--gas-benchmark-values): Lowering the gas value cannot help, since it is floored at roughly 0.9M by the most expensive single benchmark op, and fill time is dominated by per-format generation rather than gas magnitude. Also an alternative to #3012 (running on a larger runner), which did not improve the time because the two-phase fill does not scale with cores.✅ Checklist
just statictype(scope):.