Instruction 02

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

Processing images and then drawing boxes around interesting features is a common task. For example, funny-face filters that draw googly eyes need to know where the eyes are. The general workflow is to get the bounding boxes from the observations and then use those to draw an overlay on the image or draw on the image itself.

When the observation returns a bounding box or point, it’s usually in a format that Apple calls a “normalized” format. The values will all be represented as values between 0 and 1. This is so that regardless of your image’s display size, you’ll be able to locate and size the bounding box correctly. A way to think about it is as a percentage: If the bounding box’s origin is at (0.5, 0.5) it’s 50 percent across the face of the image. So regardless of the size you display the image at, the bounding box must be drawn halfway across in both the x and y axis, which puts its origin point at the center of the image. A point of (0, 0) is at the origin point of the image and (1.0, 1.0) will be in the corner opposite the origin. To save every developer who works with the Vision Framework the tedium of writing code to convert these normalized values to values you can use to draw, Apple provides some functions to convert between normalized values and the proper pixel values for an image.

Functions for Converting From Normalized Space to Pixel Space

VNImageRectForNormalizedRect

Converts a normalized bounding box (CGRect with values between 0.0 and 1.0) into a CGRect in the pixel coordinate space of a specific image. Use this when you need to draw a bounding box on the image.

VNImagePointForNormalizedPoint

Converts a normalized CGPoint (with values between 0.0 and 1.0) into a CGPoint in the pixel coordinate space of a specific image. This is useful for translating facial landmark points or other keypoints onto the image.

VNImageSizeForNormalizedSize

Converts a normalized CGSize (with values between 0.0 and 1.0) into a CGSize in the pixel coordinate space of a specific image. This can be used when scaling elements relative to the image size.

Origin Points

One of the difficulties when you work with the Vision Framework is where is the origin or (0.0) point of an image or rectangle? All Vision observations that return coordinates (rectangles, points, etc.) assume that (0.0, 0.0) is the bottom-left of the space. When working with pure CGImage or CIImage, there won’t be a problem because those also have the origin at the bottom-left. However:

Origin Points of Different Image Formats in iOS Development

Tajefe Ovudir ek OOOcigi Opecuv ulnit loqwanduth fu LSEcafa Poxizu

func convertImageOrientation(_ originalOrientation: UIImage.Orientation)
  -> UIImage.Orientation {
  switch originalOrientation {
    case .up: // 0
      return .downMirrored // 5
    case .down: // 1
      return .upMirrored // 4
    case .left: // 2
      return .rightMirrored // 7
    case .right: // 3
      return .leftMirrored // 6
    case .upMirrored: // 4
      return .down // 1
    case .downMirrored: // 5
      return .up // 0
    case .leftMirrored: // 6
      return .right // 3
    case .rightMirrored: // 7
      return .left // 2
    @unknown default:
      return originalOrientation
    }
}

Working With Faces

Now that you know about bounding boxes and rotation, it’s a good time to learn about the special cases that are the face requests. Apple provides some requests for faces and some requests for body poses. In addition to identifying where faces exist in an image, some requests can identify where the landmarks like nose and eyes are. Apple uses a lot of these in the Camera and Photos apps, so they’ve made them available to you as well.

iOS 11

  • VNDetectFaceRectanglesRequest and VNFaceObservation: Detects faces in an image by finding the bounding boxes of face regions.
  • VNDetectFaceLandmarksRequest and VNFaceObservation: Detects facial features such as eyes, nose, and mouth in detected face regions.

iOS 13

  • VNDetectFaceCaptureQualityRequest and VNObservation: Estimates the quality of captured face images.
  • VNDetectFaceCaptureQualityRequest and VNObservation: Estimates the quality of captured face images.

iOS 14

  • VNDetectHumanBodyPoseRequest and VNHumanBodyPoseObservation: Detects and tracks human body poses in images or videos.
  • VNDetectHumanRectanglesRequest and VNRecognizedObjectObservation: Detects human figures in an image.
See forum comments
Download course materials from Github
Previous: Demo 01 Next: Demo 02