Capturing Text From Camera Using SwiftUI

Learn how to capture text from the iPhone camera into your SwiftUI app so your users can enter data more quickly and easily. By Audrey Tam.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

It’s Magic

Now look at the code in AddPersonView.swift. There is absolutely nothing in the code about scanning text from the camera. This feature is part of iOS 15, and you get it free in any editable view.

So what’s in the rest of this article? A couple of features to improve the user experience:

  • Filtering camera input for specific text content types like dates, phone numbers and email addresses.
  • Displaying a Scan Text button to make the camera input feature visible to your users.

You can also implement a Scan Text button to create an editable view that isn’t a text field or text view, like the image-with-label example in the WWDC presentation.

Filtering Text Content Types

You’re probably a bit underwhelmed by this scanning, tapping and swiping procedure. If your app is looking for a specific format of information — URL, date, email or phone number — you want the camera to “see” only the relevant text and ignore the rest.

Your apps might already specify keyboard type to make it more convenient for users to enter numbers or email addresses. Maybe you also specify text content type to guide the keyboard’s suggestions and autofill.

Good news: You can use text content type to filter camera text input!

Filtering Date Text

Start by adding this modifier to the second (Birthday) TextField in AddPersonView.swift:

.textContentType(.dateTime)

This tells the system you expect the input text to be some form of date, time or duration. The Neural Engine’s Vision model will use this hint to filter the camera’s input for date or time text.

There are several text content types related to a person’s name, so why don’t you modify the Name text field? Well, for now, camera input only works with a few text content types.

Of all the text content types in the Attributes inspector Text Content Type menu, the camera currently filters for only fullStreetAddress, telephoneNumber, emailAddress, URL, shipmentTrackingNumber, flightNumber and dateTime.

Camera auto-detects only the content types with check marks.

Camera auto-detects only the content types with check marks.

Camera auto-detects only the content types with check marks.

OK, time to see if your modifier helps.

Build and run on your device and tap the + button. In the Add Person view, tap the Birthday text field then tap it again:

Button label might be Scan Date or Time when textContentType is dateTime.

Button label might be Scan Date or Time when textContentType is dateTime.

Button label might be Scan Date or Time when textContentType is dateTime.

Note: As with the Scan Text button label, you might see Paste | scan-button-icon instead of Scan Date or Time. The date filter still works.

Now the camera only highlights text that relates to dates or times:

Camera detects only date or time.

Camera detects only date or time.

Camera detects only date or time.

As before, any detected date or time text appears immediately in the text field. You must still tap Insert to accept the text.

What a great way to speed up text input from the camera!

Note: I added an email address to try out that text content type. If you change dateTime to emailAddress, the camera will focus only on email addresses.

Display a Camera Button

Everything so far is all built into iOS 15. But you can add the relevant code to your apps, too.

For example, to make the camera input feature more visible to your users, you can add a button that sets the whole magical process in motion. Once you know how the magic happens, you can use it to scan text from the camera into views that aren’t text fields or text views.

Magic Method

The new method captureTextFromCamera(responder:identifier:) is the key to the magic, which starts when your app calls this method to launch the camera. The responder must conform to UIResponder and UIKeyInput. A responder uses UIKeyInput methods to implement simple text entry.

Uh oh, UI prefixes … Yes indeed, captureTextFromCamera(responder:identifier:) is a UIAction, so you need a UIView to call it. You’ll create a UIButton that AddPersonView can display. You’ll set the action of this button to captureTextFromCamera(responder:identifier:). And the action’s responder will pass any text captured from the camera to a TextField in AddPersonView.

UIViewRepresentable

To create a UIView you can use in a SwiftUI app, you need to build a structure that conforms to UIViewRepresentable.

Note: Learn more about UIViewRepresentable in SwiftUI Apprentice and SwiftUI by Tutorials.

First, create a new Swift file and name it ScanButton. In this new file, replace import Foundation with the following code:

import SwiftUI

struct ScanButton: UIViewRepresentable {
  func makeUIView(context: Context) -> UIButton {
    let button = UIButton()
    return button
  }

  func updateUIView(_ uiView: UIButton, context: Context) { }
}

To conform to UIViewRepresentable, ScanButton must implement makeUIView(context:) and updateUIView(_:context:).

In this minimal form, makeUIView(context:) simply creates a UIButton. AddPersonView won’t update the button, so updateUIView(_:context:) is empty.

Coordinator

Tapping the button can capture text that ScanButton must pass to AddPersonView. To transfer data from a UIView to a SwiftUI View, ScanButton needs a coordinator.

Add this code inside ScanButton:

func makeCoordinator() -> Coordinator {
  Coordinator(self)
}

class Coordinator: UIResponder, UIKeyInput {
  let parent: ScanButton
  init(_ parent: ScanButton) { self.parent = parent }

  var hasText = false
  func insertText(_ text: String) { }
  func deleteBackward() { }
}

ScanButton calls makeCoordinator() before makeUIView(context:) and stores the Coordinator object in context.coordinator.

The action captureTextFromCamera(responder:identifier:) needs a UIResponder argument that conforms to UIKeyInput, so you make Coordinator a subclass of UIResponder and add the UIKeyInput protocol. Implementing this protocol will enable the coordinator to control text input.

UIKeyInput requires you to provide hasText, insertText(_:) and deleteBackward(). You want camera input and not keyboard input, so you only have to implement insertText(_:) to handle the camera input. The value of hasText doesn’t matter, so set it to false. And deleteBackward() doesn’t need to do anything.

The purpose of Coordinator is to pass text from the camera back to the SwiftUI view that calls ScanButton, so ScanButton needs a binding to a String property in the SwiftUI view.

Add this property at the top of ScanButton:

@Binding var text: String

AddPersonView will pass either $name or $birthday to ScanButton.

Now you can finish setting up Coordinator. Add this line to insertText:

parent.text = text

Yes, this really is all Coordinator needs to do!