Vision Tutorial for iOS: Detect Body and Hand Pose
Learn how to detect the number of fingers shown to the camera with help from the Vision framework. By Saeed Taheri.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Vision Tutorial for iOS: Detect Body and Hand Pose
25 mins
- Getting Started
- Getting Ready for Detection
- Creating the Camera Session
- CameraPreview
- CameraViewController
- CameraView
- Connecting to the Camera Session
- Preparing the Camera
- Detecting Hands
- Request
- Handler and Observation
- Anatomy to the Rescue!
- Detecting Fingertips
- Displaying Fingertips
- Adding Game Logic
- Adding a Success Badge
- Final Step
- More Use Cases
- Where to Go From Here?
Machine learning is everywhere, so it came as no surprise when Apple announced its Core ML frameworks in 2017. Core ML comes with many tools including Vision, an image analysis framework. Vision analyzes still images to detect faces, read barcodes, track objects and more. Over the years, Apple added many cool features to this framework, including the Hand and Body Detection APIs, introduced in 2020. In this tutorial, you’ll use these Hand and Body Detection APIs from the Vision framework to bring a touch of magic to a game called StarCount. You’ll count the number of stars falling from the sky using your hands and fingers.
StarCount needs a device with a front-facing camera to function, so you can’t follow along with a simulator.
Finally, it would help if you could prop up your device somewhere, you’ll need both hands to match those high numbers!
StarCount needs a device with a front-facing camera to function, so you can’t follow along with a simulator.
Finally, it would help if you could prop up your device somewhere, you’ll need both hands to match those high numbers!
Getting Started
Download the starter project using the Download Materials button at the top or bottom of this page. Then, open the starter project in Xcode.
Build and run. Tap Rain in the top left corner and enjoy the scene. Don’t forget to wish on those stars!
The magic of raining stars is in StarAnimatorView.swift. It uses UIKit Dynamics APIs. Feel free to take a look if you’re interested.
The app looks nice, but imagine how much better it would look if was showing live videos of you in the background! Vision can’t count your fingers if the phone can’t see them.
Getting Ready for Detection
Vision uses still images for detection. Believe it or not, what you see in the camera viewfinder is essentially a stream of still images. Before you can detect anything, you need to integrate a camera session into the game.
Creating the Camera Session
To show a camera preview in an app, you use AVCaptureVideoPreviewLayer
, a subclass of CALayer
. You use this preview layer in conjunction with a capture session.
Since CALayer
is part of UIKit, you need to create a wrapper to use it in SwiftUI. Fortunately, Apple provides an easy way to do this using UIViewRepresentable
and UIViewControllerRepresentable
.
As a matter of fact, StarAnimator
is a UIViewRepresentable
so you can use StarAnimatorView
, a subclass of UIView
, in SwiftUI.
You’ll create three files in the following section: CameraPreview.swift, CameraViewController.swift and CameraView.swift. Start with CameraPreview.swift.
Create a new file named CameraPreview.swift in the StarCount group and add:
// 1
import UIKit
import AVFoundation
final class CameraPreview: UIView {
// 2
override class var layerClass: AnyClass {
AVCaptureVideoPreviewLayer.self
}
// 3
var previewLayer: AVCaptureVideoPreviewLayer {
layer as! AVCaptureVideoPreviewLayer
}
}
Here, you:
- Import
UIKit
sinceCameraPreview
is a subclass ofUIView
. You also importAVFoundation
sinceAVCaptureVideoPreviewLayer
is part of this module. - Next, you override the static
layerClass
. This makes the root layer of this view of typeAVCaptureVideoPreviewLayer
. - Then you create a computed property called
previewLayer
and force cast the root layer of this view to the type you defined in step two. Now you can use this property to access the layer directly when you need to work with it later.
Next, you’ll create a view controller to manage your CameraPreview
.
The camera capture code from AVFoundation
is designed to work with UIKit
, so to get it working nicely in your SwiftUI app you need to make a view controller and wrap it in UIViewControllerRepresentable
.
Create CameraViewController.swift in the StarCount group and add:
import UIKit
final class CameraViewController: UIViewController {
// 1
override func loadView() {
view = CameraPreview()
}
// 2
private var cameraView: CameraPreview { view as! CameraPreview }
}
Here you:
- Override
loadView
to make the view controller useCameraPreview
as its root view. - Create a computed property called
cameraPreview
to access the root view asCameraPreview
. You can safely force cast here because you recently assigned an instance ofCameraPreview
toview
in step one.
Now, you’ll make a SwiftUI view to wrap your new view controller, so you can use it in StarCount.
Create CameraView.swift in the StarCount group and add:
import SwiftUI
// 1
struct CameraView: UIViewControllerRepresentable {
// 2
func makeUIViewController(context: Context) -> CameraViewController {
let cvc = CameraViewController()
return cvc
}
// 3
func updateUIViewController(
_ uiViewController: CameraViewController,
context: Context
) {
}
}
This is what’s happening in the code above:
- You create a struct called
CameraView
which conforms toUIViewControllerRepresentable
. This is a protocol for making SwiftUIView
types that wrap UIKit view controllers. - You implement the first protocol method,
makeUIViewController
. Here you initialize an instance ofCameraViewController
and perform any one time only setups. -
updateUIViewController(_: context:)
is the other required method of this protocol, where you would make any updates to the view controller based on changes to the SwiftUI data or hierarchy. For this app, you don’t need to do anything here.
After all this work, it’s time to use CameraView
in ContentView
.
Open ContentView.swift. Insert CameraView
at the beginning of the ZStack
in body
:
CameraView()
.edgesIgnoringSafeArea(.all)
Phew! That was a long section. Build and run to see your camera preview.
Huh! All that work and nothing changed! Why? There’s another piece of the puzzle to add before camera previewing works, an AVCaptureSession
. You’ll add that next.
Connecting to the Camera Session
The changes you’ll make here seem long but don’t be afraid. They’re mostly boilerplate code.
Open CameraViewController.swift. Add the following after import UIKit
:
import AVFoundation
Then, add an instance property of type AVCaptureSession
inside the class:
private var cameraFeedSession: AVCaptureSession?
It’s good practice to run the capture session when this view controller appears on screen and stop the session when the view is no longer visible, so add the following:
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
do {
// 1
if cameraFeedSession == nil {
// 2
try setupAVSession()
// 3
cameraView.previewLayer.session = cameraFeedSession
cameraView.previewLayer.videoGravity = .resizeAspectFill
}
// 4
cameraFeedSession?.startRunning()
} catch {
print(error.localizedDescription)
}
}
// 5
override func viewWillDisappear(_ animated: Bool) {
cameraFeedSession?.stopRunning()
super.viewWillDisappear(animated)
}
func setupAVSession() throws {
}
Here’s a code breakdown:
- In
viewDidAppear(_:)
, you check to see if you’ve already initializedcameraFeedSession
. - You call
setupAVSession()
, which is empty for now, but you’ll implement it shortly. - Then, you set the session into the
session
of thepreviewLayer
ofcameraView
and set the resize mode of the video. - Next, you start running the session. This makes the camera feed visible.
- In
viewWillDisappear(_:)
, turn off the camera feed to preserve battery life and be a good citizen.
Now, you’ll add the missing code to prepare the camera.
Add a new property for the dispatch queue on which Vision will process the camera samples:
private let videoDataOutputQueue = DispatchQueue(
label: "CameraFeedOutput",
qos: .userInteractive
)
Add an extension to make the view controller conform to AVCaptureVideoDataOutputSampleBufferDelegate
:
extension
CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
}
With those two things in place, you can now replace the empty setupAVSession()
:
func setupAVSession() throws {
// 1
guard let videoDevice = AVCaptureDevice.default(
.builtInWideAngleCamera,
for: .video,
position: .front)
else {
throw AppError.captureSessionSetup(
reason: "Could not find a front facing camera."
)
}
// 2
guard
let deviceInput = try? AVCaptureDeviceInput(device: videoDevice)
else {
throw AppError.captureSessionSetup(
reason: "Could not create video device input."
)
}
// 3
let session = AVCaptureSession()
session.beginConfiguration()
session.sessionPreset = AVCaptureSession.Preset.high
// 4
guard session.canAddInput(deviceInput) else {
throw AppError.captureSessionSetup(
reason: "Could not add video device input to the session"
)
}
session.addInput(deviceInput)
// 5
let dataOutput = AVCaptureVideoDataOutput()
if session.canAddOutput(dataOutput) {
session.addOutput(dataOutput)
dataOutput.alwaysDiscardsLateVideoFrames = true
dataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
} else {
throw AppError.captureSessionSetup(
reason: "Could not add video data output to the session"
)
}
// 6
session.commitConfiguration()
cameraFeedSession = session
}
In the code above you:
- Check if the device has a front-facing camera. If it doesn’t, you throw an error.
- Next, check if you can use the camera to create a capture device input.
- Create a capture session and start configuring it using the high quality preset.
- Then check if the session can integrate the capture device input. If yes, add the input you created in step two to the session. You need an input and an output for your session to work.
- Next, create a data output and add it to the session. The data output will take samples of images from the camera feed and provide them in a delegate on a defined dispatch queue, which you set up earlier.
- Finally, finish configuring the session and assign it to the property you created before.
Build and run. Now you can see yourself behind the raining stars.
A key-value pair in Info.plist stores the reason. It’s already there in the starter project.
A key-value pair in Info.plist stores the reason. It’s already there in the starter project.
With that in place, it’s time to move on to Vision.