Instruction 01

Since its introduction in 2017, the Vision Framework has maintained a rapid pace of evolution. If you understand the history of the framework, you’ll know what features you can use as you support multiple versions of iOS in your apps. In the final lesson of this module, you’ll explore the changes introduced during WWDC 2024.

Apple introduced the Vision Framework with iOS 11, including basic face recognition, text detection, and image classification. The text detection at that time didn’t read the text; it only recognized rectangles in an image that probably contained text. CoreML was also introduced with this version, so apps that needed more than the basics could use external models.

With each subsequent revision, the Vision Framework grew in features such as the following:

  • iOS 12 added the ability to scan and interpret barcodes as well as improvements to the existing APIs.
  • iOS 13 added human body-pose detection.
  • iOS 14 added rectangle edge detection and OCR (optical character recognition) for text-in-image detection.
  • iOS 15 added hand-pose detection as well as general improvements to object detection.
  • iOS 16 included a new PhotosUI image picker with enhanced security and privacy features.

You’ll learn about the iOS 17 changes in this module’s final lesson.

Importing the Framework

Although the Vision Framework has many different structures and classes you can use, a single import statement is all you need. Apple sometimes has different imports to let you import pieces of a framework or requires packages. But for the Vision Framework , it’s just:

import Vision

Add that to the top of any file in your app when you want to use the Vision Framework directly. It’s not needed to just display images or output. For instance, if you have a SwiftUI View that displays an image and an associated ViewModel in a different file that does all the image processing, you’d only need to import Vision in your ViewModel file. Additionally, you’ll frequently need to import UIKit. Because the Vision Framework predates SwiftUI, some of the image classes it uses, like CIImage and UIImage aren’t in SwiftUI but are in UIKit

For a complete app, you’ll probably also import some related libraries like AVFoundation or PhotosUI into your app, but that depends on how you’re importing and exporting data.

The Pattern for Vision

Regardless of the request type or the model being used, all Vision Framework operations follow a basic pattern. Once you know the pattern, you can easily implement new request types. The pattern is:

  1. Create a Request object for the data you need and the question you want to ask.
  2. Use a matching Handler to process the request.
  3. Evaluate the observations of the Request and Handler.

Process the Observations Create a Handler Create a Request

If you wanted to ask the Vision Framework what animal was in an image, you’d use a VNRecognizeAnimalsRequest and you’d need a handler to process the array of VNRecognizedObjectObservation results. Similarly, a request to read some text is a VNRecognizeTextRequest, and it returns VNRecognizedTextObservation objects for you to handle.

It’s critical to make sure the request type is the right one for your app and that the handler it works with is the right observation type. Because all request types inherit from VNRequest and all observations inherit from VNObservation, the compiler won’t always be able to spot a mismatch. When they’re mismatched, you’ll get nonsense answers to your requests at best; at worst, your app will crash.

CoreML Fills in the Gaps

A pattern that Apple seems to have adopted is to rely on outside CoreML models for any detectors they don’t provide. As they introduce models for their own purposes, they make those available to developers.

Apple has added about 30 request types to cover a wide range of image processing tasks. If none of the provided requests works for your app, you can use any CoreML model with a VNCoreMLRequest. Then, use the appropriate VNObservation subclass for the data format the model returns. CoreML models can be gigantic files, so if your app can use one of the provided request types, use that. It’s probably more efficient and definitely makes your .ipa file size smaller.

See forum comments
Download course materials from Github
Previous: Introduction Next: Instruction 02