ML Kit Tutorial for iOS: Recognizing Text in Images
In this ML Kit tutorial, you’ll learn how to leverage Google’s ML Kit to detect and recognize text. By David East.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
ML Kit Tutorial for iOS: Recognizing Text in Images
25 mins
- Machine Learning and Tooling
- ML Kit
- Getting Started
- Setting Up ML Kit
- Setting Up a Firebase Account
- Detecting Basic Text
- Creating a Text Detector
- Using the Text Detector
- Understanding the Classes
- Highlighting the Text Frames
- Detecting Frames
- Drawing
- Understanding Image Scaling
- Calculating the Scale
- Taking Photos with the Camera
- Dealing With Image Orientations
- Sharing the Text
- Where to Go From Here?
A few years ago, there were two types of machine learning (ML) developers: the advanced developers and everyone else. The lower levels of ML can be hard; it’s a lot of math, and it uses big words like logistic regression, sparsity and neural nets. But it doesn’t have to be that hard.
You can also be an ML developer! At its core, ML is simple. With it, you solve a problem by teaching a software model to recognize patterns instead of hard coding each situation and corner case you can think of. However, it can be daunting to get started, and this is where you can rely on existing tools.
Machine Learning and Tooling
Just like iOS development, ML is about tooling. You wouldn’t build your own UITableView, or at least you shouldn’t; you would use a framework instead, like UIKit.
It’s the same way with ML. ML has a booming ecosystem of tooling. Tensorflow, for example, simplifies training and running models. TensorFlow Lite brings model support to iOS and Android devices.
Each of these tools requires some experience with ML. What if you’re not an ML expert but want to solve a specific problem? For these situations, there’s ML Kit.
ML Kit
ML Kit is a mobile SDK that brings Google’s ML expertise to your app. There are two main parts of ML Kit’s APIs for common use cases and custom models that are easy to use regardless of experience.
The existing APIs currently support:
Each of these use cases comes with a pre-trained model wrapped in an easy-to-use API. Time to start building something!
Getting Started
In this tutorial, you’re going to build an app called Extractor. Have you ever snapped a picture of a sign or a poster just to write down the text content? It would be great if an app could just peel the text off the sign and save it for you, ready to use. You could, for example, take a picture of an addressed envelope and save the address. That’s exactly what you’ll do with this project! Get ready!
Start by downloading the project materials for this tutorial using the Download Materials button at the top or bottom of this tutorial.
This project uses CocoaPods to manage dependencies.
Setting Up ML Kit
Each ML Kit API has a different set of CocoaPods dependencies. This is useful because you only need to bundle the dependencies required by your app. For instance, if you’re not identifying landmarks, you don’t need that model in your app. In Extractor, you’ll use the Text Recognition API.
If you were adding the Text Recognition API to your app, then you would need to add the following lines to your Podfile, but you don’t have to do this for the starter project since the lines are there in the Podfile – you can check.
pod 'Firebase/Core' => '5.5.0'
pod 'Firebase/MLVision' => '5.5.0'
pod 'Firebase/MLVisionTextModel' => '5.5.0'
You do have to open the Terminal app, switch over to the project folder and run the following command to install the CocoaPods used in the project though:
pod install
Once the CocoaPods are installed, open Extractor.xcworkspace in Xcode.
If you’re unfamiliar with CocoaPods, our CocoaPods Tutorial will help you get started.
If you’re unfamiliar with CocoaPods, our CocoaPods Tutorial will help you get started.
This project contains the following important files:
- ViewController.swift: The only controller in this project.
-
+UIImage.swift: A
UIImage
extension to fix the orientation of images.
Setting Up a Firebase Account
To set up a Firebase account, follow the account setup section in this Getting Started With Firebase Tutorial. While the Firebase products are different, the account creation and setup is exactly the same.
The general idea is that you:
- Create an account.
- Create a project.
- Add an iOS app to a project.
- Drag the GoogleService-Info.plist to your project.
- Initialize Firebase in the AppDelegate.
It’s simple process but, if you hit any snags, the guide above can help.
Build and run the app, and you’ll see that it looks like this:
It doesn’t do anything yet except allow you to share the hard-coded text via the action button on the top right. You’ll use ML Kit to bring this app to life.
Detecting Basic Text
Get ready for your first text detection! You can begin by demonstrating to the user how to use the app.
A nice demonstration is to scan an example image when the app first boots up. There’s an image bundled in the assets folder named scanned-text, which is currently the default image displayed in the UIImageView
of the view controller. You’ll use that as the example image.
But first, you need a text detector to detect the text in the image.
Creating a Text Detector
Create a file named ScaledElementProcessor.swift and add the following code:
import Firebase
class ScaledElementProcessor {
}
Great! You’re all done! Just kidding. Create a text-detector property inside the class:
let vision = Vision.vision()
var textRecognizer: VisionTextRecognizer!
init() {
textRecognizer = vision.onDeviceTextRecognizer()
}
This textRecognizer
is the main object you can use to detect text in images. You’ll use it to recognize the text contained in the image currently displayed by the UIImageView
. Add the following detection method to the class:
func process(in imageView: UIImageView,
callback: @escaping (_ text: String) -> Void) {
// 1
guard let image = imageView.image else { return }
// 2
let visionImage = VisionImage(image: image)
// 3
textRecognizer.process(visionImage) { result, error in
// 4
guard
error == nil,
let result = result,
!result.text.isEmpty
else {
callback("")
return
}
// 5
callback(result.text)
}
}
Take a second to understand this chunk of code:
- Here, you check if the
imageView
actually contains an image. If not, simply return. Ideally, however, you would either throw or provide a graceful failure. -
ML Kit uses a special
VisionImage
type. It’s useful because it can contain specific metadata for ML Kit to process the image, such as the image’s orientation. -
The
textRecognizer
has aprocess
method that takes in theVisionImage
, and it returns an array of text results in the form of a parameter passed to a closure. -
The result could be
nil
, and, in that case, you’ll want to return an empty string for the callback. - Lastly, the callback is triggered to relay the recognized text.