Text Recognition with ML Kit

See how to use the new ML Kit library from Google to use machine learning and computer vision to perform text recognition in an image. By Victoria Gonda.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 3 of 3 of this article. Click here to view the first page.

Detecting text in the cloud

After you run the on-device text recognition, you may have noticed that the image on the FAB changes to a cloud icon. This is what you’ll tap to run the in-cloud text recognition. Time to make that button do some work!

When running text recognition in the cloud, you receive more detailed and accurate predictions. You also avoid doing all that extra processing on-device, saving some of that power. Make sure you completed the Enabling Firebase for in-cloud text recognition section above so you can get started.

The first method you’ll implement is very similar to what you did for the on-device recognition. Add the following code to the runCloudTextRecognition() method:

view.showProgress()
// 1
val options = FirebaseVisionCloudDetectorOptions.Builder()
    .setModelType(FirebaseVisionCloudDetectorOptions.LATEST_MODEL)
    .setMaxResults(15)
    .build()
val image = FirebaseVisionImage.fromBitmap(selectedImage)
// 2
val detector = FirebaseVision.getInstance()
    .getVisionCloudDocumentTextDetector(options)
detector.detectInImage(image)
    .addOnSuccessListener { texts ->
      processCloudTextRecognitionResult(texts)
    }
    .addOnFailureListener { e ->
      e.printStackTrace()
    }

There are a couple small differences from what you did for on-device recognition.

  • The first is that you’re including some extra options to your detector. Building these options using a FirebaseVisionCloudDetectorOptions builder, you’re saying that you want the latest model, and to limit the results to 15.
  • When you request a detector, you are also specifying that you want a FirebaseVisionCloudTextDetector, which you pass those options to. You handle the success and failure cases in the same way as on-device.

You will be processing the results similar to before, but diving in a little deeper using information that comes back from the in-cloud processing. Add the following nested class and helper functions to the presenter:

class WordPair(val word: String, val handle: FirebaseVisionCloudText.Word)

private fun processCloudTextRecognitionResult(text: FirebaseVisionCloudText?) {
  view.hideProgress()
  if (text == null) {
    view.showNoTextMessage()
    return
  }
  text.pages.forEach { page ->
    page.blocks.forEach { block ->
      block.paragraphs.forEach { paragraph ->
        paragraph.words
            .zipWithNext { a, b ->
              // 1
              val word = wordToString(a) + wordToString(b)
              // 2
              WordPair(word, b)
            }
            .filter { looksLikeHandle(it.word) }
            .forEach {
              // 3
              view.showHandle(it.word, it.handle.boundingBox)
            }
      }
    }
  }
}

 private fun wordToString(
     word: FirebaseVisionCloudText.Word): String =
     word.symbols.joinToString("") { it.text }

If you look at the results, you’ll notice something. The structure of the text recognition result is slightly different than when it runs on the cloud. The accuracy of the cloud model can provide us with some more detailed information we didn’t have before. The hierarchy you’ll see is page > block > paragraph > word > symbol. Because of this, we need to do a little extra to process it.

  • With the granularity of the results the “@” and the other characters of a handle are is separate words. Because of that, you are taking each word, creating a string from it using the symbols in wordToString(), and concatenating each neighboring word.
  • The new class you see, WordPair, is a way to give names to the pair of objects, the string you just created, and the Firebase object for the handle.
  • From there, you are displaying it the same as the on-device code.

Build and run the project, and test it out! After you pick an image, and run the recognition on the device, click the cloud icon in the bottom corner to run the recognition in the cloud. You may see results that the on-device recognition missed!

In-cloud results

Again, you can use showBox(boundingBox: Rect?) at any level of the loops to see what this level detects. This has boxes around every paragraph:

In-cloud extra results

Where to go from here?

Congratulations! You can now detect text from an image using ML Kit on both the device and in the cloud. Imagine the possibilities for where you can use this and other parts of ML Kit!

Feel free to download the completed project to check it out. Find the download link at the top or bottom of this tutorial.

Note: You must complete the “Setting up Firebase” and “Enabling Firebase for in-cloud text recognition” sections in order for the final project to work. Remember to also go back to the Firebase Spark plan if you upgraded to the Blaze plan in the Firebase console.

Note: You must complete the “Setting up Firebase” and “Enabling Firebase for in-cloud text recognition” sections in order for the final project to work. Remember to also go back to the Firebase Spark plan if you upgraded to the Blaze plan in the Firebase console.

ML Kit doesn’t stop here! You can also use it for face detection, barcode scanning, image labeling, and landmark recognition with similar ease. Be on the lookout for possible future additions to ML Kit as well. Google has talked about adding APIs for face contour and smart replies. Check out these resources as you continue on your ML Kit journey:

Feel free to share your feedback, findings or ask any questions in the comments below or in the forums. I hoped you enjoyed getting started with text recognition using ML Kit!

Happy coding!