Linux Perf Notes
2024-08-31
First, RTM.
Brendan Gregg’s perf one-liners. Reread these every time. What you want is probably here. You should browse the rest of his site as well.
Please, read the manpages.
The perf
man pages could be more thorough and some commands are not well-documented (looking at you, perf diff
), but they are invaluable resources.
Visibility
They key to performance is understand what your application is doing.
Think about this in two ways: first from the top-down, then from the bottom up.
Top-Down
Bottom-Up
perf
is ideal for this.
This script watches most of the events I care about and generates all the reports in one place.
set -x
events="cycles:u,instructions,user_time,cache-misses,branch-misses,task-clock"
freq=99 # sampling frequency
app=$PWD/a.out
config="$*"
name=$(echo "$*" | sed -e 's/ /_/g')
ulimit -Ss unlimited
test -d $name || mkdir $name
pushd $name
perf record \
--output perf.data \
--call-graph fp \
-F $freq -e "$events" \
-- taskset 0x2 $app >/dev/null
perf report \
--stdio -G \
--inline --itrace=i \
> perf.report
perf stat record \
-o perf-stat.data \
-e "$events" \
-- taskset 0x2 $app >/dev/null
# --stdio much preferred to --stdio2
perf annotate -i perf.data --stdio > perf.annotate
popd
I like to create separate directories for all the data on a per-flag basis because I’m trying lots of different flags when investigating a performance change. This way, each time I want to try another combination of flags, my history is preserved in its own directory and I don’t have to wait to look at any reports:
# whatever directory was created by the above script
d="flags"
perf report -i $d/perf.data
perf stat report $d/perf-stat.data
$PAGER $d/perf.annotate
$PAGER $d/perf.report
Build with -fno-omit-frame-pointer
so perf can give you reasonable traces.
Debug info (perf record --call-graph=dwarf
) works okayyy but you’ll end up with massive perf output files that take forever to load into perf-report and other tools.
Why is my app slower when I X?
I’ve seen next-to-noone mention perf diff
for looking at differences between two profiles.
I’ve found it invaluable when comparing performance of the same app built differently or with different compilers.
make FLAGS="-O0"
perf record ...
make FLAGS="-O3"
perf record ...
perf diff