ShazamKit Tutorial for iOS: Getting Started
Learn how to use ShazamKit to find information about specific audio recordings by matching a segment of that audio against a reference catalog of audio signatures. By Saleh Albuga.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
ShazamKit Tutorial for iOS: Getting Started
30 mins
- Getting Started
- Understanding Shazam’s Matching Mechanism
- Shazam Signatures
- Shazam Catalogs
- Matching Music Against Shazam’s Catalog
- Exploring ShazamKit Sessions
- Displaying the Matched Song
- Testing The App
- Working With Custom Catalogs
- Shazam Signature Files
- Creating a Custom Catalog
- Matching Audio Against a Custom Catalog
- Synchronizing App Content With Audio
- Implementing the Annotations
- Displaying the Synchronized Annotations
- Testing the App
- Where to Go From Here?
You’ve probably heard a song you liked in a restaurant and wanted to know its name and artist. In this situation, the first thing that comes to mind is Shazam.
You simply open Shazam, tap recognize and voilà! The song info is right on your phone.
Apple acquired Shazam in 2018. With the release of Xcode 13 and iOS 15, Apple introduced ShazamKit, a framework you can use to add audio recognition experiences to your apps. Whether you want to show users what song is playing or match a track or a video you created, ShazamKit has got you more than covered.
In this tutorial, you’ll:
- Understand Shazam’s recognition mechanism.
- Create DevCompanion, a simple Shazam clone that matches popular, published music and songs.
- Match custom audio from a video.
- Change the app content depending on the video playing position.
For this tutorial, you should be familiar with the Shazam app or matching music with Siri. Don’t worry if you’re not. Just play a song on your laptop and ask Siri, “What’s this song?” or download the Shazam app.
Getting Started
Download the starter project by clicking Download Materials at the top or bottom of the tutorial. Open the project, then build and run.
DevCompanion has two views:
- What’s playing?: Where users can match popular music, just like Shazam.
- Video Content: Where users can see annotations and additional content while watching a SwiftUI video course here on raywenderlich.com.
Open MatchingHelper.swift and take a look at the code. It’s an empty helper class where you’ll write ShazamKit recognition code.
Don’t worry about the rest of the files for now. You’ll see them later in the tutorial when you create a custom audio experience. For now, you’ll learn more about how Shazam recognizes and matches audio.
You’ll also need an Apple Developer account in order to configure an App ID with the ShazamKit App Service.
You’ll also need an Apple Developer account in order to configure an App ID with the ShazamKit App Service.
Understanding Shazam’s Matching Mechanism
Before writing code and using the ShazamKit API, it’s essential to understand how Shazam works behind the scenes. This technology is exciting!
When you use Shazam, you tap the big recognition button, Tap to Shazam, while a song is playing. The app listens for a couple of seconds and then displays the song information if it finds a match. You can match any part of a song.
This is what goes under the hood:
- The app starts using the microphone to record a stream with a predefined buffer size.
- The Shazam library, now called ShazamKit, generates a signature from the audio buffer the app just recorded.
- Then, ShazamKit sends a query request with this audio signature to the Shazam API. The Shazam service matches the signature against reference signatures of popular music in the Shazam Catalog.
- If there’s a match, the API returns the metadata of the track to ShazamKit.
- ShazamKit calls the right delegate passing the metadata.
- Beyond this point, it’s up to the app logic to display the result with the track information.
Next, you’ll learn more about Shazam signatures and catalogs.
Shazam Signatures
Signatures are a fundamental part of the identification process. A signature is a lossy or simplified version of the song that’s easier to process and analyze. Shazam starts creating signatures by generating the spectrogram of the recorded part, then extracting and identifying the highs or the loudest parts.
A signature is not reversible to the original audio to ensure the original audio’s privacy.
During the identification process, Shazam matches query signatures sent by apps against reference signatures. A reference signature is a signature generated from the whole song or track.
Instead of comparing the recorded audio as is, there are many benefits to using signatures in identification. For example, Shazam signatures prevent most background noises from affecting the matching process, ensuring matching even in noisy conditions.
Signatures are also easier to share, store and index as they have a much smaller footprint than the original audio.
You can learn more about Shazam’s algorithm in this research paper by the founder of Shazam, Avery Wang.
Next, you’ll explore Shazam catalogs.
Shazam Catalogs
As mentioned earlier, Shazam matches signatures against reference signatures. It stores reference signatures and their metadata in catalogs. A signature’s metadata has information about the song, like its name, artist and artwork.
The Shazam Catalog has almost all popular songs’ reference signatures and metadata. You can also create a custom catalog locally in an app and store reference signatures and metadata for your audio tracks. You’ll create custom catalogs later in this tutorial.
Enough theory for now. Next, you’ll learn how to make the app identify popular music.
Matching Music Against Shazam’s Catalog
Time to implement the app’s first feature, a simplified Shazam clone. Open MatchingHelper.swift and look at the code:
import AVFAudio
import Foundation
import ShazamKit
class MatchingHelper: NSObject {
private var session: SHSession?
private let audioEngine = AVAudioEngine()
private var matchHandler: ((SHMatchedMediaItem?, Error?) -> Void)?
init(matchHandler handler: ((SHMatchedMediaItem?, Error?) -> Void)?) {
matchHandler = handler
}
}
It’s a helper class that controls the microphone and uses ShazamKit to identify audio. At the top, you can see the code imports ShazamKit along with AVFAudio
. You’ll need AVFAudio to use the microphone and capture audio.
MatchingHelper
also subclasses NSObject
since that’s required by any class that conforms to SHSessionDelegate
.
Take a look at MatchingHelper
‘s properties:
-
session
: The ShazamKit session you’ll use to communicate with the Shazam service. -
audioEngine
: AnAVAudioEngine
instance you’ll use to capture audio from the microphone. -
matchHandler
: A handler block the app views will implement. It’s called when the identification process finishes.
The initializer makes sure matchHandler
is set when you create an instance of the class.
Add the following method below the initializer:
func match(catalog: SHCustomCatalog? = nil) throws {
// 1. Instantiate SHSession
if let catalog = catalog {
session = SHSession(catalog: catalog)
} else {
session = SHSession()
}
// 2. Set SHSession delegate
session?.delegate = self
// 3. Prepare to capture audio
let audioFormat = AVAudioFormat(
standardFormatWithSampleRate:
audioEngine.inputNode.outputFormat(forBus: 0).sampleRate,
channels: 1)
audioEngine.inputNode.installTap(
onBus: 0,
bufferSize: 2048,
format: audioFormat
) { [weak session] buffer, audioTime in
// callback with the captured audio buffer
session?.matchStreamingBuffer(buffer, at: audioTime)
}
// 4. Start capture audio using AVAudioEngine
try AVAudioSession.sharedInstance().setCategory(.record)
AVAudioSession.sharedInstance()
.requestRecordPermission { [weak self] success in
guard
success,
let self = self
else { return }
try? self.audioEngine.start()
}
}
match(catalog:)
is the method the rest of the app’s code will use to identify audio with ShazamKit. It takes one optional parameter of type SHCustomCatalog
if you want to match against a custom catalog.
Take a look at each step:
SHSession
defaults to the Shazam Catalog if you don’t provide a catalog, which will work for the first part of the app.
Finally, you start recording by calling AVAudioEngine.start()
.
- First, you create an
SHSession
and pass a catalog to it if you use a custom catalog.SHSession
defaults to the Shazam Catalog if you don’t provide a catalog, which will work for the first part of the app. - You set the
SHSession
delegate, which you’ll implement in a moment. - You call
AVAudioEngine
‘sAVAudioNode.installTap(onBus:bufferSize:format:block:)
, a method that prepares the audio input node. In the callback, which is passed the captured audio buffer, you callSHSession.matchStreamingBuffer(_:at:)
. This converts the audio in the buffer to a Shazam signature and matches against the reference signatures in the selected catalog. - You set
AVAudioSession
category, or mode, to recording. Then, you request microphone recording permission by callingAVAudioSession
‘srequestRecordPermission(_:)
to ask the user for the microphone permission the first time the app runs.Finally, you start recording by calling
AVAudioEngine.start()
.
matchStreamingBuffer(_:at:)
handles capturing audio and passing it to ShazamKit. Alternatively, you can use SHSignatureGenerator
to generate a signature object and pass it to the match
of SHSession
. However, matchStreamingBuffer(_:at:)
is suitable for contiguous audio and therefore fits your use case.
Next, you’ll implement the Shazam Session delegate.