Vision Tutorial for iOS: What’s New With Face Detection?
Learn what’s new with Face Detection and how the latest additions to Vision framework can help you achieve better results in image segmentation and analysis. By Tom Elliott.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Vision Tutorial for iOS: What’s New With Face Detection?
35 mins
- Getting Started
- A Tour of the App
- Reviewing the Vision Framework
- Looking Forward
- Processing Faces
- Debug those Faces
- Selecting a Size
- Detecting Differences
- Masking Mayhem
- Assuring Quality
- Handling Quality Result
- Detecting Quality
- Offering Helpful Hints
- Segmenting Sapiens
- Using Metal
- Building Better Backgrounds
- Handling the Segmentation Request Result
- Removing the Background
- Saving the Picture
- Saving to Camera Roll
- Where to Go From Here?
Assuring Quality
Before requesting a quality score, your app needs a place to store the quality of the current frame. First, update the model to hold information about face quality.
Open CameraViewModel.swift. Underneath the FaceGeometryModel
struct, add the following to store the quality state:
struct FaceQualityModel {
let quality: Float
}
This struct contains a float property to store the most recent detected quality.
Under the declaration of faceGeometryState
, add a property to publish face quality state:
// 1
@Published private(set) var faceQualityState: FaceObservation<FaceQualityModel> {
didSet {
// 2
processUpdatedFaceQuality()
}
}
- This follows a pattern like the
faceGeometryState
property above. AFaceObservation
enum wraps the underlying model value.FaceObservation
is a generic wrapper providing type safety. It contains three states: face found, face not found and error. - Updates to
faceQualityState
callprocessUpdatedFaceQuality()
.
Don't forget to initialize the faceQualityState
in init()
:
faceQualityState = .faceNotFound
This sets the initial value of faceQualityState
to .faceNotFound
.
Next, add a new published property for acceptable quality:
@Published private(set) var isAcceptableQuality: Bool {
didSet {
calculateDetectedFaceValidity()
}
}
As with the other properties, initialize it in the init()
method:
isAcceptableQuality = false
Now, you can write the implementation for processUpdatedFaceQuality()
:
switch faceQualityState {
case .faceNotFound:
isAcceptableQuality = false
case .errored(let error):
print(error.localizedDescription)
isAcceptableQuality = false
case .faceFound(let faceQualityModel):
if faceQualityModel.quality < 0.2 {
isAcceptableQuality = false
}
isAcceptableQuality = true
}
Here, you enumerate over the different states of FaceObservation
. An acceptable quality has a score of 0.2 or higher.
Update calculateDetectedFaceValidity()
to account for acceptable quality by replacing the last line with:
isAcceptableYaw && isAcceptableQuality
Handling Quality Result
The faceQualityState
property is now set up to store detected face quality. But, there isn't a way for anything to update that state. Time to fix that.
In the CameraViewModelAction
enum, add a new action after faceObservationDetected
:
case faceQualityObservationDetected(FaceQualityModel)
And, update the perform(action:)
method switch to handle the new action:
case .faceQualityObservationDetected(let faceQualityObservation):
publishFaceQualityObservation(faceQualityObservation)
Here, you're calling publishFaceQualityObservation()
whenever the model performs the faceQualityObservationDetected
action. Replace the function definition and empty implementation of publishFaceQualityObservation()
with:
// 1
private func publishFaceQualityObservation(_ faceQualityModel: FaceQualityModel) {
// 2
DispatchQueue.main.async { [self] in
// 3
faceDetectedState = .faceDetected
faceQualityState = .faceFound(faceQualityModel)
}
}
Here, you're:
- Updating the function definition to pass in a
FaceQualityModel
. - Dispatching to the main thread for safety.
- Updating the
faceDetectedState
andfaceQualityState
to record a face detection. The quality state stores the quality model.
Detecting Quality
Now the view model is all set up, and it's time to do some detecting. Open FaceDetector.swift.
Add a new request in captureOutput(_:didOutput:from:)
after setting the revision for detectFaceRectanglesRequest
:
let detectCaptureQualityRequest =
VNDetectFaceCaptureQualityRequest(completionHandler: detectedFaceQualityRequest)
detectCaptureQualityRequest.revision =
VNDetectFaceCaptureQualityRequestRevision2
Here, you create a new face quality request with a completion handler that calls detectedFaceQualityRequest
. Then, you set it to use revision 2.
Add the request to the array passed to sequenceHandler
a few lines below:
[detectFaceRectanglesRequest, detectCaptureQualityRequest],
Finally, write the implementation for the completion handler, detectedFaceQualityRequest(request:error:)
:
// 1
guard let model = model else {
return
}
// 2
guard
let results = request.results as? [VNFaceObservation],
let result = results.first
else {
model.perform(action: .noFaceDetected)
return
}
// 3
let faceQualityModel = FaceQualityModel(
quality: result.faceCaptureQuality ?? 0
)
// 4
model.perform(action: .faceQualityObservationDetected(faceQualityModel))
This implementation follows the pattern of the face rectangles completion handler above.
Here, you:
- Make sure the view model isn't nil, otherwise return early.
- Check to confirm the request contains valid
VNFaceObservation
results and extract the first one. - Pull out the
faceCaptureQuality
from the result (or default to 0 if it doesn't exist). Use it to initialize aFaceQualityModel
. - Finally, perform the
faceQualityObservationDetected
action you created, passing through the newfaceQualityModel
.
Open DebugView.swift. After the roll/pitch/yaw DebugSection
, at the end of the VStack
, add a section to output the current quality:
DebugSection(observation: model.faceQualityState) { qualityModel in
DebugText("Q: \(qualityModel.quality)")
.debugTextStatus(status: model.isAcceptableQuality ? .passing : .failing)
}
Build and run. The debug text now shows the quality of the detected face. The shutter is only enabled if the quality rises above 0.2.
Offering Helpful Hints
The app always displays the same message if one of the acceptability criteria fails. Because the model has state for each, you can make the app more helpful.
Open UserInstructionsView.swift and find faceDetectionStateLabel()
. Replace the entire faceDetected
case with the following:
if model.hasDetectedValidFace {
return "Please take your photo :]"
} else if model.isAcceptableBounds == .detectedFaceTooSmall {
return "Please bring your face closer to the camera"
} else if model.isAcceptableBounds == .detectedFaceTooLarge {
return "Please hold the camera further from your face"
} else if model.isAcceptableBounds == .detectedFaceOffCentre {
return "Please move your face to the centre of the frame"
} else if !model.isAcceptableRoll || !model.isAcceptablePitch || !model.isAcceptableYaw {
return "Please look straight at the camera"
} else if !model.isAcceptableQuality {
return "Image quality too low"
} else {
return "We cannot take your photo right now"
}
This code picks a specific instruction depending on which criteria has failed. Build and run the app and play with moving your face into and out of the acceptable region.
Segmenting Sapiens
New in iOS 15, the Vision framework now supports person segmentation. Segmentation just means separating out a subject from everything else in the image. For example, replacing the background of an image but keeping the foreground intact — a technique you've certainly seen on a video call in the last year!
In the Vision framework, person segmentation is available using GeneratePersonSegmentationRequest
. This feature works by analyzing a single frame at a time. There are three quality options available. Segmentation of a video stream requires analyzing the video frame by frame.
The results of the person segmentation request include a pixelBuffer
. This contains a mask of the original image. White pixels represent a person in the original image and black represent the background.
Passport photos need the person photographed against a pure white background. Person segmentation is a great way to replace the background but leave the person intact.
Using Metal
Before replacing the background in the image, you need to know a bit about Metal.
Metal is a very powerful API provided by Apple. It performs graphics-intensive operations on the GPU for high performance image processing. It is fast enough to process each frame in a video in real time. This sounds pretty useful!
Open CameraViewController.swift. Look at the bottom of configureCaptureSession()
. The camera view controller displays the preview layer from the AVCaptureSession
.
The class supports two modes. One where Metal is used and one where Metal is not used. Currently it's set up to not use Metal. You'll change that now.
In viewDidLoad()
, add the following code before the call to configureCaptureSession()
:
configureMetal()
This configures the app to use Metal. The view controller now draws the result from Metal instead of the AVCaptureSession
. This isn't a tutorial on Metal, though, so the setup code is already written. Feel free to read the implementation in configureMetal()
if you're curious.
With Metal configured to draw the view, you have complete control over what the view displays.