Text Recognition with ML Kit
See how to use the new ML Kit library from Google to use machine learning and computer vision to perform text recognition in an image. By Victoria Gonda.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Moving on the the Firebase Console
Open the console, and make sure you’re signed in with a Google account. From there, you need to create a new project. Click on the “Add project” card.
In the screen that pops up, you have to provide a project name. Input TwitterSnap for the name of this project. While you’re there, choose your current country from the dropdown, and accept the terms and conditions.
You should then see the project ready confirmation screen, on which you should hit Continue.
Now that you have the project set up, you need to add it to your Android app. Select Add Firebase to your Android app.
On the next screen, you need to provide the package name. This will use the app ID you changed in the app/build.gradle file. Enter com.raywenderlich.android.twittersnap.YOUR_ID_HERE
being sure to replace YOUR_ID_HERE with the unique id you provided earlier. Then click the Register App button.
After this, you’ll be able to download a google-services.json file. Download it and place this file in your app/ directory.
Finally, you need to add the required dependencies to your build.gradle files. In the top level build.gradle, add the google services classpath in the dependencies
block:
classpath 'com.google.gms:google-services:3.3.1'
The Firebase console may suggest a newer version number, but go ahead and stick with the numbers given here so that you’ll be consistent with the rest of the tutorial.
Next, add the Firebase Core and Firebase ML vision dependencies to app/build.gradle in the dependencies
block.
implementation 'com.google.firebase:firebase-core:15.0.2'
implementation 'com.google.firebase:firebase-ml-vision:15.0.0'
Add this line to the bottom of app/build.gradle to apply the Google Services plugin:
apply plugin: 'com.google.gms.google-services'
Sync the project Gradle files, then build and run the app to make sure it’s all working. Nothing will change in the app, but you should see the app activated in the Firebase console if you finish the instructions for adding the Firebase to the app. You’ll then also see your app on the Firebase console Overview screen.
Enabling Firebase for in-cloud text recognition
There are a couple of extra steps to complete in order to run text recognition in the cloud. You’ll do this now, as it takes a couple minutes to propagate and be usable on your device. It should then be ready for when you get to that part of the tutorial.
On the Firebase console for your project, ensure you’re on the Blaze plan instead of the default Spark plan. Click on the plan information in the bottom left corner of the screen to change it.
You’ll need to enter payment information for Firebase to proceed with the Blaze plan. Don’t worry, the first 1000 requests are free, so you don’t have to pay towards it over the course of following this tutorial. You can switch back to the free Spark plan when you’ve finished the tutorial.
If you’re hesitant to put in payment information, you can just stick with the Spark plan while following the on-device steps in the remainder of the tutorial, and then just skip over the instruction steps below for using ML Kit in the cloud. :]
Next, you need to enable the Cloud Vision API. Choose ML Kit in the console menu at the left.
Next, choose Cloud API Usage on the resulting screen.
This takes you to the Cloud Vision API screen. Make sure to select your project at the top, and click the Enable button.
In a few minutes, you’ll be able to run your text recognition in the cloud!
Detecting text on-device
Now you get to dig into the code! You’ll start by adding functionality to detect text in the image on-device.
You have the option to run text recognition on both the device and in the cloud, and this flexibility allows you to use what is best for the situation. If there’s no network connectivity, or you’re dealing with sensitive data, running the model on-device might be better.
In MainActivityPresenter.kt, find the runTextRecognition()
method to fill in. Add this code to the body of runTextRecognition()
. Use Alt + Enter on PC or Option + Return on a Mac to import any missing dependencies.
view.showProgress()
val image = FirebaseVisionImage.fromBitmap(selectedImage)
val detector = FirebaseVision.getInstance().visionTextDetector
This starts by signaling to the view to show the progress so you have a visual queue that work is being done. Then you instantiate two objects, a FirebaseVisionImage
from the bitmap passed in, and a FirebaseVisionTextDetector
that you can use to detect the text.
Now you can use that detector to detect the text in the image. Add the following code to the same runTextRecognition()
method below the code you added previously. There is one method call, processTextRecognitionResult()
that is not implemented yet, and will show an error because of it. Don’t worry, you’ll implement that next.
detector.detectInImage(image)
.addOnSuccessListener { texts ->
processTextRecognitionResult(texts)
}
.addOnFailureListener { e ->
// Task failed with an exception
e.printStackTrace()
}
Using the detector, you detect the text by passing in the image. The method detectInImage()
takes two callbacks, one for success, and one for error. In the success callback, you have that method you still need to implement.
Once you have the results, you need to use them. Create the following method:
private fun processTextRecognitionResult(texts: FirebaseVisionText) {
view.hideProgress()
val blocks = texts.blocks
if (blocks.size == 0) {
view.showNoTextMessage()
return
}
}
At the start of this method you tell the view to stop showing the progress, and do a check to see if there is any text to process by checking the size
of the text blocks
property.
Once you know you have text, you can do something with it. Add the following to the bottom of the processTextRecognitionResult()
method:
blocks.forEach { block ->
block.lines.forEach { line ->
line.elements.forEach { element ->
if (looksLikeHandle(element.text)) {
view.showHandle(element.text, element.boundingBox)
}
}
}
}
The results come back as a nested structure so you can look at it in whatever kind of granularity you want. The hierarchy for on-device recognition is block > line > element > text. You iterate through each of these, check to see if it looks like a Twitter handle, using a regular expression in a helper method looksLikeHandle()
, and show it if it does. Each of these elements have a boundingBox
for where ML Kit found the text in the image. This is what the app uses to draw a box around where each detected handle is.
Now build and run the app, select an image containing Twitter handles, and see the results! If you tap on one of these results, it will open the Twitter profile. :]
You can click the above screenshot to see it full size and verify that the bounding boxes are surrounding Twitter handles. :]
As a bonus, the view also has a generic showBox(boundingBox: Rect?)
method. You can use this at any stage of the loop to show the outline of any of these groups. For example, in the line forEach
, you can call view.showBox(line.boundingBox)
to show boxes for all the lines found. Here’s what it would look like if you did that with the line
element: