Skip to content

[ISSUE #10521] Use madvise(MADV_RANDOM) to disable kernel read-ahead during correctMinOffset binary search#10523

Merged
lollipopjin merged 3 commits into
apache:developfrom
lizhimins:fix/cq-read-pulse-madvise
Jun 17, 2026
Merged

[ISSUE #10521] Use madvise(MADV_RANDOM) to disable kernel read-ahead during correctMinOffset binary search#10523
lollipopjin merged 3 commits into
apache:developfrom
lizhimins:fix/cq-read-pulse-madvise

Conversation

@lizhimins

Copy link
Copy Markdown
Member

What is the purpose of the change

close #10521

ConsumeQueue.correctMinOffset performs binary search on mmap files (random access pattern). The Linux kernel default read_ahead_kb on NVMe devices is aggressively large, so each page fault during binary search pulls in far more data than actually needed, producing periodic disk read pulses.

On cloud disks where read/write bandwidth share a single quota, these read pulses squeeze CommitLog writes and cause periodic send-RT spikes.

Brief changelog

  • Call madvise(MADV_RANDOM) before binary search to disable read-ahead, restore madvise(MADV_NORMAL) in the finally block
  • Add config switch correctMinOffsetMadviseEnable (default: off)
  • Skip on Windows where madvise is not available
  • Platform check cached as static final to avoid repeated evaluation

Verifying this change

  • Added unit tests covering large dataset (5000 entries), small dataset (10 entries), and empty queue scenarios
  • Production validation: send p99 stabilized at ~4ms with zero pulses over 60 minutes (previously 7 pulses/hour with p99 spikes to 26ms)

…ahead during correctMinOffset binary search

correctMinOffset performs binary search on mmap'd ConsumeQueue files (random access).
The kernel's default read-ahead window is aggressively large on NVMe devices, causing
each page fault to pull in far more data than needed. On cloud disks where read/write
bandwidth share a single quota, these read pulses squeeze CommitLog writes and cause
periodic send-RT spikes.

Use madvise(MADV_RANDOM) before binary search to disable read-ahead, restore
MADV_NORMAL in the finally block afterwards. Controlled by config switch
correctMinOffsetMadviseEnable (default: off). Skipped on Windows where madvise
is not available.
@codecov-commenter

codecov-commenter commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.08%. Comparing base (226e24f) to head (ac66627).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
...n/java/org/apache/rocketmq/store/ConsumeQueue.java 35.71% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10523      +/-   ##
=============================================
- Coverage      48.18%   48.08%   -0.10%     
+ Complexity     13394    13369      -25     
=============================================
  Files           1377     1377              
  Lines         100730   100753      +23     
  Branches       13012    13019       +7     
=============================================
- Hits           48536    48449      -87     
- Misses         46264    46345      +81     
- Partials        5930     5959      +29     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Uses madvise(MADV_RANDOM) to disable kernel read-ahead during ConsumeQueue.correctMinOffset binary search on mmap'd files, addressing periodic disk read pulses that squeeze CommitLog writes on cloud disks.

Findings

  • [Info] ConsumeQueue.java:619–640 — The madvise(MADV_RANDOM) setup is placed before the try block, and the restore (MADV_NORMAL) is in the finally block. This is correct: if the setup itself throws, MADV_RANDOM was never applied so there is nothing to restore. No gap between successful madvise and entering try.

  • [Info] ConsumeQueue.java:46 — IS_LINUX = !MixAll.isWindows() is cached as static final — good, avoids repeated platform evaluation on every correctMinOffset call.

  • [Info] MessageStoreConfig.java — New correctMinOffsetMadviseEnable defaults to false. Safe rollout path; operators opt in explicitly.

  • [Info] ConsumeQueueTest.java — Three test scenarios (5000 entries, 10 entries, empty queue) with sequential correction calls verify that MADV_NORMAL is properly restored between invocations. Good coverage.

  • [Info] LibC.java — MADV_RANDOM = 1 and madvise(Pointer, NativeLong, int) already exist in the codebase; no new native bindings needed.

Suggestions

  • Minor: Consider adding a log.debug when madvise(MADV_RANDOM) is successfully applied (not just on failure), to aid production diagnostics when the feature is enabled. Optional.

Verdict

Well-scoped optimization with production validation data (p99 stabilized at ~4ms, zero pulses vs 7/hour before). Config-gated, properly guarded, and well-tested. LGTM.


Automated review by github-manager-bot

Comment thread store/src/main/java/org/apache/rocketmq/store/ConsumeQueue.java Outdated

@lollipopjin lollipopjin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lollipopjin lollipopjin merged commit f941dce into apache:develop Jun 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Use madvise(MADV_RANDOM) to disable kernel read-ahead during correctMinOffset binary search

5 participants