Performance-Profiling Swift on Linux: Getting Started
Learn how to profile Server-Side Swift with perf on Linux. You’ll discover the basic principles of profiling and how to view events, call-graph-traces and perform basic analysis. By kelvin ma.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Performance-Profiling Swift on Linux: Getting Started
35 mins
- Getting Started
- The Data File
- The Trading Application
- The Supporting Code
- Running the Application
- Measuring Performance
- Profiling With Perf
- Loading Samples With Perf Script
- Getting Task Information
- Getting Timeline Information
- Getting Landmark Information
- Working With Perf Events
- Collecting Call Graph Traces
- Tracing With Frame Pointers
- Viewing Call Graph Traces
- Working With Call Graph Traces
- Configuring Perf for Swift
- Enabling DWARF Metadata
- Demangling Swift Function Names
- Putting It All Together
- Counting Function Names
- Observing Changes in Performance
- Where to Go From Here?
Configuring Perf for Swift
If you scroll through enough of the crypto-bot
samples in perf script
, you might notice the call graphs are detailed for the C-language portions of the call chain, but relatively sparse for the Swift-language portions. You might see a lot of [unknown]
symbols, short call chains and call chains where the only resolved Swift symbols are general-purpose entrypoints such as swift_retain
and swift_release
.
Enabling DWARF Metadata
You might not see a lot of call chain information because the Swift compiler performs a lot of transformations on the Swift code you write. The machine code coming out of the compiler ends up looking significantly different from the source code that went in. C compilers are comparatively transparent in how they transform C code into machine code, which makes it easier to map points in a call graph to meaningful landmarks in source code.
To address that problem, Swift provides the missing binary-to-source mappings as DWARF metadata. As you have seen, perf record
supports loading DWARF metadata through its --call-graph
option. Set it to dwarf
mode:
perf record --call-graph dwarf -o profile.perf .build/release/crypto-bot
On ARM environments, run:
perf record --call-graph fp -F 999 -o profile.perf .build/release/crypto-bot
If using DWARF mode, you’ll see perf
record much more data to profile.perf than it did in frame pointer mode. For long-running applications, perf
in DWARF mode might record more data than you can manageably store, in which case you might need to reduce the sampling frequency.
Open profile.perf in perf script
again, and scroll down to the crypto-bot
samples:
perf script -i profile.perf -F comm,pid,tid,time,event,period,ip,sym,symoff,dso
You’ll see far more detailed information about crypto-bot
’s Swift call chain:
crypto-bot 86/86 4678.431018: 2273989 cycles:ppp: 39eb03 $sSe4fromxs7Decoder_p_tKcfCTj+0xffff006bdedf2003 (/usr/lib/swift... 27288 $s4JSONAAO10DictionaryV6decode_6forKeyqd__qd__m_xtKSeRd+0xffff54... 28118 $s4JSONAAO10DictionaryVy_xGs30KeyedDecodingContainerProtocolAAsA... 27f10 $s4JSONAAO10DictionaryVy_xGs30KeyedDecodingContainerProtocolAAsA... ... 3aa9e $s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF+0xfff... 3f6b6 $s10crypto_bot4MainO4mainyyKFZTf4d_n+0xffff5446a31ba0a6 (/crypto...
Demangling Swift Function Names
You might recognize some function names in the call graph symbols. They look strange because the Swift compiler mangled them. Demangle them with the swift demangle
tool, which is part of the Swift tool chain. For example, to demangle “s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF
”, run the following in a terminal:
swift demangle s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF
crypto_bot.decode(message: JSON.JSON) -> crypto_bot.FTX.Message?
"swift_demangle"
symbol from the Swift runtime. However, that’s out of the scope of this tutorial.
You can pipe the output of another tool through swift demangle
, in which case it will automatically recognize all the mangled Swift symbols in the input and replace them with human-readable descriptions. To mass-demangle all the symbols from perf script
, run the following commands in a terminal:
perf script -i profile.perf -F comm,pid,tid,time,event,period,ip,sym,symoff,dso | swift demangle > profile.txt
The final shell redirection (>
) in those commands saves the demangled call graphs to a file named profile.txt.
Putting It All Together
Sifting through profiling data is a science of its own, and we won’t be able to scratch the surface of it in this tutorial. However, one fast way to get actionable statistics about your application is to use the wc
tool to count occurrences of function names you are interested in.
Counting Function Names
First, regenerate a performance profile for crypto-bot
, and save it to a file named unoptimized.perf for comparison:
perf record --call-graph dwarf -o unoptimized.perf .build/release/crypto-bot
On ARM and Apple Silicon run:
perf record --call-graph fp -F 999 -o unoptimized.perf .build/release/crypto-bot
Recall that the sample code contains an @inlinable(never)
function called decode(message:)
that wraps its JSON decoding implementation. The @inlinable(never)
attribute forbids the compiler from restructuring its invocation, so you can be reasonably confident the number of samples containing crypto_bot.decode(message:)
in their call graph traces reflects the amount of time spent executing that function.
This isn’t an entirely sound method of measuring the performance of an application. Notably, it doesn’t account for the variability of the sampling period, which can skew the results. But in a pinch, it can be a useful proxy metric.
To count occurrences of crypto_bot.decode(message:
, run the following in the terminal:
perf script -i unoptimized.perf -F ip,sym | swift demangle | grep crypto_bot\.decode\(message\: | wc -l
This command contains four piped subcommands:
-
perf script: This deserializes the binary unoptimized.perf file. It only loads the
ip
andsym
fields because the others are unnecessary here. -
swift demangle: This allows us to search for demangled function names instead of mangled names. If we knew the mangled name of
crypto_bot.decode(message:)
ahead of time, this step wouldn’t be necessary. -
grep: This searches for the string
crypto_bot.decode(message:
, which is distinct enough to not return any false positives. -
wc: This counts the number of lines piped to its input. Becaise
grep
prints each match on a separate line, this gives the number of matchesgrep
returned.
The unoptimized profile should contain 2,000 to 3,000 instances of crypto_bot.decode(message:
. If using frame pointers you’ll see around 1,000 instances.
Observing Changes in Performance
To demonstrate what successful optimizations might look like when using this method, return to the while
loop in Main.main
in Sources/crypto-bot/example.swift. Replace the call to decode(message:)
with a call to the more-efficient decodeFast(message:)
function:
guard let message = decodeFast(message: json) else {
continue
}
Recompile the application:
swift build -c release
Then, generate a new performance profile named optimized.perf:
perf record --call-graph dwarf -o optimized.perf .build/release/crypto-bot
If running on Apple Silicon or ARM, run:
perf record --call-graph fp -F 999 -o optimized.perf .build/release/crypto-bot
Rerun the commands from the last section, this time using the optimized.perf data file and the search string decodeFast(message:
:
perf script -i optimized.perf -F ip,sym | swift demangle | grep crypto_bot\.decodeFast\(message\: | wc -l
This time, you should see only a few hundred occurrences of decodeFast(message:
on x86 machines or tens on ARM, which is compelling evidence that the decodeFast(message:)
implementation is faster.
Where to Go From Here?
You can download the dockerfile and sources for crypto-bot
by clicking the Download Materials button below this section.
In this tutorial, you’ve learned how to record an application with perf
, how to view and interpret the recordings, how to generate call graph traces, and how to configure perf
to produce detailed profiles specifically for Swift binaries. You’ve also learned some basic techniques for post-processing and sifting through this data programatically.
This tutorial should give you enough of an understanding of the fundamentals of performance sampling for you to start applying these techniques to your own projects. The data science of analyzing a performance profile is a huge topic, and much more remains to discover on your own!
To learn more about profiling Swift on Linux, check the Server-Side Swift performance guide. If you’re more interested in profiling Swift on macOS, check our Instruments Tutorial with Swift: Getting Started. I’ve also written about Low-level Swift Optimization Tips on my own blog.
If you have any suggestions, questions or performance-profiling tips you’d like to share, join the discussion below.