Face Detection Tutorial Using the Vision Framework for iOS
In this tutorial, you’ll learn how to use Vision for face detection of facial features and overlay the results on the camera feed in real time. By Yono Mittlefehldt.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Face Detection Tutorial Using the Vision Framework for iOS
20 mins
Precise gene editing technology has been around since about 2012. So why don’t we all have super powers yet?!?
And what’s the greatest super power? No. Not flying. That’s far too dangerous.
The correct answer is laser heat vision!
Imagine what you could do with laser heat vision! You could save money on a microwave, easily light any candle in sight and don’t forget the ability to burn your initials into your woodworking projects. How cool would that be?
Well, apparently real life superpowers aren’t here yet, so you’ll have to deal with the next best thing. You’ll have to use your iPhone to give you pretend laser heat vision.
Fortunately, Apple has a framework that can help you out with this plan B.
In this tutorial, you’ll learn how to use the Vision framework to:
- Create requests for face detection and detecting face landmarks.
- Process these requests.
- Overlay the results on the camera feed to get real-time, visual feedback.
Get ready to super power your brain and your eyes!
Getting Started
Click the Download Materials button at the top or bottom of this tutorial. Open the starter project and explore to your heart’s content.
Currently, the Face Lasers app doesn’t do a whole lot. Well, it does show you your beautiful mug!
There’s also a label at the bottom that reads Face. You may have noticed that if you tap the screen, this label changes to read Lasers.
That’s exciting! Except that there don’t seem to be any lasers. That’s less exciting. Don’t worry — by the end of this tutorial, you’ll be shooting lasers out of your eyes like Super(wo)man!
You’ll also notice some useful Core Graphics extensions. You’ll make use of these throughout the tutorial to simplify your code.
Vision Framework Usage Patterns
All Vision framework APIs use three constructs:
- Request: The request defines the type of thing you want to detect and a completion handler that will process the results. This is a subclass of
VNRequest
. - Request handler: The request handler performs the request on the provided pixel buffer (think: image). This will be either a
VNImageRequestHandler
for single, one-off detections or aVNSequenceRequestHandler
to process a series of images. - Results: The results will be attached to the original request and passed to the completion handler defined when creating the request. They are subclasses of
VNObservation
Simple right?
Writing Your First Face Detector
Open FaceDetectionViewController.swift and add the following property at the top of the class:
var sequenceHandler = VNSequenceRequestHandler()
This defines the request handler you’ll be feeding images to from the camera feed. You’re using a VNSequenceRequestHandler
because you’ll perform face detection requests on a series of images, instead a single static one.
Now scroll to the bottom of the file where you’ll find an empty captureOutput(_:didOutput:from:)
delegate method. Fill it in with the following code:
// 1
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// 2
let detectFaceRequest = VNDetectFaceRectanglesRequest(completionHandler: detectedFace)
// 3
do {
try sequenceHandler.perform(
[detectFaceRequest],
on: imageBuffer,
orientation: .leftMirrored)
} catch {
print(error.localizedDescription)
}
With this code you:
- Get the image buffer from the passed in sample buffer.
- Create a face detection request to detect face bounding boxes and pass the results to a completion handler.
- Use your previously defined sequence request handler to perform your face detection request on the image. The
orientation
parameter tells the request handler what the orientation of the input image is.
Now you maybe be wondering: But what about detectedFace(request:error:)
? In fact, Xcode is probably wondering the same thing.
You’ll define that now.
Add the following code for detectedFace(request:error:)
to the FaceDetectionViewController
class, wherever you like:
func detectedFace(request: VNRequest, error: Error?) {
// 1
guard
let results = request.results as? [VNFaceObservation],
let result = results.first
else {
// 2
faceView.clear()
return
}
// 3
let box = result.boundingBox
faceView.boundingBox = convert(rect: box)
// 4
DispatchQueue.main.async {
self.faceView.setNeedsDisplay()
}
}
In this method you:
- Extract the first result from the array of face observation results.
- Clear the
FaceView
if something goes wrong or no face is detected. - Set the bounding box to draw in the
FaceView
after converting it from the coordinates in theVNFaceObservation
. - Call
setNeedsDisplay()
to make sure theFaceView
is redrawn.
The result’s bounding box coordinates are normalized between 0.0 and 1.0 to the input image, with the origin at the bottom left corner. That’s why you need to convert them to something useful.
Unfortunately, this function doesn’t exist. Fortunately, you’re a talented programmer!
Right above where you placed the method definition for detectedFace(request:error:)
, add the following method definition:
func convert(rect: CGRect) -> CGRect {
// 1
let origin = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.origin)
// 2
let size = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.size.cgPoint)
// 3
return CGRect(origin: origin, size: size.cgSize)
}
Here you:
- Use a handy method from
AVCaptureVideoPreviewLayer
to convert a normalized origin to the preview layer’s coordinate system. - Then use the same handy method along with some nifty Core Graphics extensions to convert the normalized size to the preview layer’s coordinate system.
- Create a
CGRect
using the new origin and size.
You’re probably tempted to build and run this. And if you did, you would be disappointed to see nothing on the screen except your own face, sadly free of lasers.
Currently FaceView
has an empty draw(_:)
method. You need to fill that in if you want to see something on screen!
Switch to FaceView.swift and add the following code to draw(_:)
:
// 1
guard let context = UIGraphicsGetCurrentContext() else {
return
}
// 2
context.saveGState()
// 3
defer {
context.restoreGState()
}
// 4
context.addRect(boundingBox)
// 5
UIColor.red.setStroke()
// 6
context.strokePath()
With this code, you:
- Get the current graphics context.
- Push the current graphics state onto the stack.
- Restore the graphics state when this method exits.
- Add a path describing the bounding box to the context.
- Set the color to red.
- Draw the actual path described in step four.
Phew! You’ve been coding for quite some time. It’s finally time!
Go ahead and build and run your app.
What a good looking detected face!