Linux Perf Notes

2024-08-31

First, RTM.

Brendan Gregg’s perf one-liners. Reread these every time. What you want is probably here. You should browse the rest of his site as well.

Please, read the manpages. The perf man pages could be more thorough and some commands are not well-documented (looking at you, perf diff), but they are invaluable resources.

Visibility

They key to performance is understand what your application is doing.

Think about this in two ways: first from the top-down, then from the bottom up.

Top-Down

Todo

unfinished

Bottom-Up

perf is ideal for this.

This script watches most of the events I care about and generates all the reports in one place.

set -x

events="cycles:u,instructions,user_time,cache-misses,branch-misses,task-clock"
freq=99 # sampling frequency
app=$PWD/a.out
config="$*"
name=$(echo "$*" | sed -e 's/ /_/g')

ulimit -Ss unlimited
test -d $name || mkdir $name
pushd $name

perf record \
    --output perf.data \
    --call-graph fp \
    -F $freq -e "$events" \
    -- taskset 0x2 $app >/dev/null

perf report \
    --stdio -G \
    --inline --itrace=i \
    > perf.report

perf stat record \
    -o perf-stat.data \
    -e "$events" \
    -- taskset 0x2 $app >/dev/null

# --stdio much preferred to --stdio2
perf annotate -i perf.data --stdio > perf.annotate

popd

I like to create separate directories for all the data on a per-flag basis because I’m trying lots of different flags when investigating a performance change. This way, each time I want to try another combination of flags, my history is preserved in its own directory and I don’t have to wait to look at any reports:

# whatever directory was created by the above script
d="flags"
perf report -i $d/perf.data
perf stat report $d/perf-stat.data
$PAGER $d/perf.annotate
$PAGER $d/perf.report

Tip

Build with -fno-omit-frame-pointer so perf can give you reasonable traces. Debug info (perf record --call-graph=dwarf) works okayyy but you’ll end up with massive perf output files that take forever to load into perf-report and other tools.

Why is my app slower when I X?

I’ve seen next-to-noone mention perf diff for looking at differences between two profiles. I’ve found it invaluable when comparing performance of the same app built differently or with different compilers.

make FLAGS="-O0"
perf record ...
make FLAGS="-O3"
perf record ...
perf diff

Heads up!

perf-diff is worse when name mangling is different (e.g. with Fortran apps) because perf can’t match the events up.