Text Recognition with ML Kit

See how to use the new ML Kit library from Google to use machine learning and computer vision to perform text recognition in an image. By Victoria Gonda.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 3 of this article. Click here to view the first page.

Moving on the the Firebase Console

Open the console, and make sure you’re signed in with a Google account. From there, you need to create a new project. Click on the “Add project” card.

Add project button

In the screen that pops up, you have to provide a project name. Input TwitterSnap for the name of this project. While you’re there, choose your current country from the dropdown, and accept the terms and conditions.

Add project window

You should then see the project ready confirmation screen, on which you should hit Continue.

Project ready confirmation

Now that you have the project set up, you need to add it to your Android app. Select Add Firebase to your Android app.

Add Firebase to your Android ap

On the next screen, you need to provide the package name. This will use the app ID you changed in the app/build.gradle file. Enter com.raywenderlich.android.twittersnap.YOUR_ID_HERE
being sure to replace YOUR_ID_HERE with the unique id you provided earlier. Then click the Register App button.

Register app

After this, you’ll be able to download a google-services.json file. Download it and place this file in your app/ directory.

Download google-services.json

Finally, you need to add the required dependencies to your build.gradle files. In the top level build.gradle, add the google services classpath in the dependencies block:

classpath 'com.google.gms:google-services:3.3.1'

The Firebase console may suggest a newer version number, but go ahead and stick with the numbers given here so that you’ll be consistent with the rest of the tutorial.

Next, add the Firebase Core and Firebase ML vision dependencies to app/build.gradle in the dependencies block.

implementation 'com.google.firebase:firebase-core:15.0.2'
implementation 'com.google.firebase:firebase-ml-vision:15.0.0'

Add this line to the bottom of app/build.gradle to apply the Google Services plugin:

apply plugin: 'com.google.gms.google-services'

Sync the project Gradle files, then build and run the app to make sure it’s all working. Nothing will change in the app, but you should see the app activated in the Firebase console if you finish the instructions for adding the Firebase to the app. You’ll then also see your app on the Firebase console Overview screen.

Firebase console overview

Enabling Firebase for in-cloud text recognition

There are a couple of extra steps to complete in order to run text recognition in the cloud. You’ll do this now, as it takes a couple minutes to propagate and be usable on your device. It should then be ready for when you get to that part of the tutorial.

On the Firebase console for your project, ensure you’re on the Blaze plan instead of the default Spark plan. Click on the plan information in the bottom left corner of the screen to change it.

Modify Firebase plan

You’ll need to enter payment information for Firebase to proceed with the Blaze plan. Don’t worry, the first 1000 requests are free, so you don’t have to pay towards it over the course of following this tutorial. You can switch back to the free Spark plan when you’ve finished the tutorial.

If you’re hesitant to put in payment information, you can just stick with the Spark plan while following the on-device steps in the remainder of the tutorial, and then just skip over the instruction steps below for using ML Kit in the cloud. :]

Firebase pricing plans

Next, you need to enable the Cloud Vision API. Choose ML Kit in the console menu at the left.

ML Kit menu

Next, choose Cloud API Usage on the resulting screen.

Cloud API Usage link

This takes you to the Cloud Vision API screen. Make sure to select your project at the top, and click the Enable button.

Cloud Vision API

In a few minutes, you’ll be able to run your text recognition in the cloud!

Detecting text on-device

Now you get to dig into the code! You’ll start by adding functionality to detect text in the image on-device.

You have the option to run text recognition on both the device and in the cloud, and this flexibility allows you to use what is best for the situation. If there’s no network connectivity, or you’re dealing with sensitive data, running the model on-device might be better.

In MainActivityPresenter.kt, find the runTextRecognition() method to fill in. Add this code to the body of runTextRecognition(). Use Alt + Enter on PC or Option + Return on a Mac to import any missing dependencies.

view.showProgress()
val image = FirebaseVisionImage.fromBitmap(selectedImage)
val detector = FirebaseVision.getInstance().visionTextDetector

This starts by signaling to the view to show the progress so you have a visual queue that work is being done. Then you instantiate two objects, a FirebaseVisionImage from the bitmap passed in, and a FirebaseVisionTextDetector that you can use to detect the text.

Now you can use that detector to detect the text in the image. Add the following code to the same runTextRecognition() method below the code you added previously. There is one method call, processTextRecognitionResult() that is not implemented yet, and will show an error because of it. Don’t worry, you’ll implement that next.

detector.detectInImage(image)
    .addOnSuccessListener { texts ->
      processTextRecognitionResult(texts)
    }
    .addOnFailureListener { e ->
      // Task failed with an exception
      e.printStackTrace()
    }

Using the detector, you detect the text by passing in the image. The method detectInImage() takes two callbacks, one for success, and one for error. In the success callback, you have that method you still need to implement.

Once you have the results, you need to use them. Create the following method:

private fun processTextRecognitionResult(texts: FirebaseVisionText) {
  view.hideProgress()
  val blocks = texts.blocks
  if (blocks.size == 0) {
    view.showNoTextMessage()
    return
  }
}

At the start of this method you tell the view to stop showing the progress, and do a check to see if there is any text to process by checking the size of the text blocks property.

Once you know you have text, you can do something with it. Add the following to the bottom of the processTextRecognitionResult() method:

blocks.forEach { block ->
  block.lines.forEach { line ->
    line.elements.forEach { element ->
      if (looksLikeHandle(element.text)) {
        view.showHandle(element.text, element.boundingBox)
      }
    }
  }
}

The results come back as a nested structure so you can look at it in whatever kind of granularity you want. The hierarchy for on-device recognition is block > line > element > text. You iterate through each of these, check to see if it looks like a Twitter handle, using a regular expression in a helper method looksLikeHandle(), and show it if it does. Each of these elements have a boundingBox for where ML Kit found the text in the image. This is what the app uses to draw a box around where each detected handle is.

Now build and run the app, select an image containing Twitter handles, and see the results! If you tap on one of these results, it will open the Twitter profile. :]

On-device results

You can click the above screenshot to see it full size and verify that the bounding boxes are surrounding Twitter handles. :]

As a bonus, the view also has a generic showBox(boundingBox: Rect?) method. You can use this at any stage of the loop to show the outline of any of these groups. For example, in the line forEach, you can call view.showBox(line.boundingBox) to show boxes for all the lines found. Here’s what it would look like if you did that with the line element:

Showing line elements