Person Segmentation in the Vision Framework
Learn how to use person segmentation via the Vision framework. By Vidhur Voora.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Person Segmentation in the Vision Framework
20 mins
- Getting Started
- Introducing Image Segmentation
- Creating Photo Greeting
- Blending All the Images
- Displaying the Photo Greeting
- Quality and Performance Options
- Creating Video Greeting
- Alternatives for Generating Person Segmentation
- Using Core Image to Generate Person Segmentation Mask
- Understanding Best Practices
- Where to Go From Here?
Computer Vision has gained more prominence than ever before. Its applications include cancer detection, cell classification, traffic flow analysis, real-time sports analysis and many more. Apple introduced the Vision framework as part of iOS 11. It allows you to perform various tasks, such as face tracking, barcode detection and image registration. In iOS 15, Apple introduced an API in the Vision framework to perform person segmentation, which also powers the Portrait mode.
In this tutorial, you’ll learn:
- What image segmentation is and the different types of segmentation.
- Created a person segmentation for a photo.
- Understand the different quality levels and performance tradeoffs.
- Created person segmentation for live video capture.
- Other frameworks that provide person segmentation.
- Best practices for person segmentation.
Getting Started
Download the project by clicking Download Materials at the top or bottom of this page. Open RayGreetings in starter. Build and run on a physical device.
You’ll see two tabs: Photo Greeting and Video Greeting. The Photo Greeting tab will show you a nice background image and a family picture. In this tutorial, you’ll use person segmentation to overlay family members on the greeting background. Tap the Video Greeting tab and grant the camera permissions. You’ll see the camera feed displayed. The starter project is set up to capture and display the camera frames. You’ll update the live frames to generate a video greeting!
Before you dive into implementing these, you need to understand what person segmentation is. Get ready for a fun ride.
Introducing Image Segmentation
Image segmentation divides an image into segments and processes them. It gives a more granular understanding of the image. Object detection provides a bounding box of the desired object in an image, whereas image segmentation provides a pixel mask for the object.
There are two types of image segmentation: semantic segmentation and instance segmentation.
Semantic segmentation is the process of detecting and grouping together similar parts of the image that belong to the same class. Instance segmentation is the process of detecting a specific instance of the object. When you apply semantic segmentation to an image with people, it generates one mask that contains all the people. Instance segmentation generates an individual mask for each person in the image.
The person segmentation API provided in Apple’s Vision framework is a single-frame API. It uses semantic segmentation to provide a single mask for all people in a frame. It’s used for both stream and offline processing.
The process of person segmentation has four steps:
- Creating a person segmentation request.
- Creating a request handler for that request.
- Processing the request.
- Handling the result.
Next, you’ll use the API and these steps to create a photo greeting!
Creating Photo Greeting
You have an image of a family and an image with a festive background. Your goal is to overlay the people in the family picture over the festive background to generate a fun greeting.
Open RayGreetings and open GreetingProcessor.swift.
Add the following below import Combine
:
import Vision
This imports the Vision framework. Next, add the following to GreetingProcessor
below @Published var photoOutput = UIImage()
:
let request = VNGeneratePersonSegmentationRequest()
Here, you create an instance of the person segmentation request. This is a stateful request and can be reused for an entire sequence of frames. This is especially useful when processing videos offline and for live camera capture.
Next, add the following to GreetingProcessor
:
func generatePhotoGreeting(greeting: Greeting) {
// 1
guard
let backgroundImage = greeting.backgroundImage.cgImage,
let foregroundImage = greeting.foregroundImage.cgImage else {
print("Missing required images")
return
}
// 2
// Create request handler
let requestHandler = VNImageRequestHandler(
cgImage: foregroundImage,
options: [:])
// TODO
}
Here’s what the code above is doing:
- Accesses
cgImage
frombackgroundImage
andforegroundImage
. Then, it ensures both the images are valid. You’ll be using them soon to blend the images using Core Image. - Creates
requestHandler
as an instance ofVNImageRequestHandler
. It takes in an image along with an optional dictionary that specifies how to process the image.
Next, replace // TODO
with the following:
do {
// 1
try requestHandler.perform([request])
// 2
guard let mask = request.results?.first else {
print("Error generating person segmentation mask")
return
}
// 3
let foreground = CIImage(cgImage: foregroundImage)
let maskImage = CIImage(cvPixelBuffer: mask.pixelBuffer)
let background = CIImage(cgImage: backgroundImage)
// TODO: Blend images
} catch {
print("Error processing person segmentation request")
}
Here’s a breakdown of the code above:
-
requestHandler
processes the person segmentation request usingperform(_:)
. If multiple requests are present, it returns after all the requests have been either completed or failed.perform(_:)
can throw an error while processing the request, so you handle it by enclosing it in ado-catch
. - You then retrieve the mask from the results. Because you submitted only one request, you retrieve the first object from the results.
- The
pixelBuffer
property of the returned result has the mask. You then create the CIImage versions of the foreground, background and mask. The CIImage is the representation of an image that the Core Image filter will process. You’ll need this to blend the images.
Blending All the Images
Add the following in GreetingProcessor.swift below import Vision
:
import CoreImage.CIFilterBuiltins
Core Image provides methods that give type-safe instances of CIFilter
. Here, you import CIFilterBuiltins
to access its type-safe APIs.
Next, add the following to GreetingProcessor
:
func blendImages(
background: CIImage,
foreground: CIImage,
mask: CIImage
) -> CIImage? {
// 1
let maskScaleX = foreground.extent.width / mask.extent.width
let maskScaleY = foreground.extent.height / mask.extent.height
let maskScaled = mask.transformed(
by: __CGAffineTransformMake(maskScaleX, 0, 0, maskScaleY, 0, 0))
// 2
let backgroundScaleX = (foreground.extent.width / background.extent.width)
let backgroundScaleY = (foreground.extent.height / background.extent.height)
let backgroundScaled = background.transformed(
by: __CGAffineTransformMake(backgroundScaleX,
0, 0, backgroundScaleY, 0, 0))
// 3
let blendFilter = CIFilter.blendWithMask()
blendFilter.inputImage = foreground
blendFilter.backgroundImage = backgroundScaled
blendFilter.maskImage = maskScaled
// 4
return blendFilter.outputImage
}
The code above:
- Calculates the X and Y scales of the mask with respect to the foreground image. It then uses
CGAffineTransformMake
to scale themask
size to theforeground
image. - Like the scaling of
mask
, it calculates the X and Y scales ofbackground
and then scalesbackground
to the size offoreground
. - Creates
blendFilter
, which is a Core Image filter. It then sets theinputImage
of the filter to theforeground
. ThebackgroundImage
and themaskImage
of the filter are set to the scaled versions of the image. -
outputImage
contains the result of the of the blend.
The returned result is of the type CIImage
. You’ll need to convert this to a UIImage
to display it in the UI.
In GreetingProcessor
, add the following at the top, below let request = VNGeneratePersonSegmentationRequest()
:
let context = CIContext()
Here, you create an instance of CIContext
. It’s used to create a Quartz 2D image from a CIImage
object.
Add the following to GreetingProcessor
:
private func renderAsUIImage(_ image: CIImage) -> UIImage? {
guard let cgImage = context.createCGImage(image, from: image.extent) else {
return nil
}
return UIImage(cgImage: cgImage)
}
Here, you use context
to create an instance of CGImage
from CIImage
.
Using cgImage
, you then create a UIImage
. The user will see that image.