I encountered this issue from tokio::task::spawn_blocking which execute the closure in a thread pool. But it seems that simply creating a new std thread can reproduce it.
fn main() {
std::thread::spawn(|| {
busy_func();
})
.join()
.unwrap()
}
fn busy_func() {
for i in 0..u32::MAX {
std::hint::black_box(i);
}
}
I'm expecting busy_func being shown as the main time cost, but cargo flamegraph shows nothing relevant and misleading says _start costs most of the time.
Using the captured perf.data, running perf report shows the hot function correctly:
Samples: 984 of event 'cpu/cycles/Pu', Event count (approx.): 4306731901
Children Self Command Shared Object Symbol
100.00% 100.00% main main [.] std::sys::backtrace::__rust_begin_short_backtrace::<main::main::{closure#0}, ()>
0.00% 0.00% main ld-linux-x86-64.so.2 [.] _start
0.00% 0.00% main ld-linux-x86-64.so.2 [.] _dl_start
0.00% 0.00% main ld-linux-x86-64.so.2 [.] _dl_lookup_symbol_x
0.00% 0.00% main ld-linux-x86-64.so.2 [.] _dl_sysdep_start
0.00% 0.00% main ld-linux-x86-64.so.2 [.] dl_main
The first function is the thread entry point with busy_func inlined. Clicking into it shows:
Samples: 984 of event 'cpu/cycles/Pu', 997 Hz, Event count (approx.): 4306731901
std::sys::backtrace::__rust_begin_short_backtrace::<main::main::{closure#0}, ()> [redacted]/target/release/main [Percent: local period]
Percent │ xor %eax,%eax
│ lea -0x4(%rsp),%rcx
│ nop
│10: mov %eax,-0x4(%rsp)
73.76 │ inc %eax
│ cmp $0xffffffff,%eax
26.24 │ ↑ jne 10
│ ← ret
More informations
Linux 6.18.24
perf 7.0.1
rustc 1.95.0 (59807616e 2026-04-14)
flamegraph-flamegraph 0.6.12
I'm using LLD and already have -Clink-arg=-Wl,--no-rosegment argument set.
I encountered this issue from
tokio::task::spawn_blockingwhich execute the closure in a thread pool. But it seems that simply creating a new std thread can reproduce it.I'm expecting
busy_funcbeing shown as the main time cost, butcargo flamegraphshows nothing relevant and misleading says_startcosts most of the time.Using the captured
perf.data, runningperf reportshows the hot function correctly:The first function is the thread entry point with
busy_funcinlined. Clicking into it shows:More informations
I'm using LLD and already have
-Clink-arg=-Wl,--no-rosegmentargument set.