Image Classification on Android with TensorFlow Lite and CameraX

Leverage the GPU delegate for machine learning on the edge

Published in

Better Programming

6 min readNov 18, 2019

TensorFlow Lite is the lightweight version of TensorFlow Mobile. It’s here to unleash the machine learning power on your smartphones while ensuring that the model binary size isn’t too big and there’s low latency. Additionally, it also supports hardware acceleration using the Neural Networks API and is destined to run 4X faster with GPU support.

CameraX is the latest Camera API released with the Jetpack Support library. It’s here to make developing with the camera much easier, and with Google’s automated lab testing, it strives to make things consistent across Android devices, of which there are many. CameraX represents a huge improvement from the Camera 2 API in terms of ease of use and simplicity.

The goal of this article is to merge the camera and ML worlds by processing CameraX frames for image classification using a TensorFlow Lite model. We’ll be building an Android application using Kotlin that leverages the power of GPUs of your smartphones.

CameraX: A Brief Overview

CameraX is lifecycle aware. So it removes the need for handling the states in the onResume and onPause methods.

The API is use case-based. The three main use cases that are currently supported are:

Preview — Displays the Camera feed.
Analyze — To process images for computer vision or other machine learning-related tasks.
Capture — To save high-quality images.

Additionally, CameraX provides Extensions to easily access features such as HDR, Portrait, and Night Mode on supported devices.

Tensor Flow Lite Converter

The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite FlatBuffer file. The .tflite model then can be deployed on mobile or embedded devices to run locally using the Tensor Flow interpreter.

The following code snippet depicts one such way of converting a Keras model to a mobile compatible .tflite file:

from tensorflow import liteconverter = lite.TFLiteConverter.from_keras_model_file( 'model.h5')tfmodel = converter.convert()open ("model.tflite" , "wb") .write(tfmodel)

In the following sections, we’ll be demonstrating a hands-on implementation of CameraX with a MobileNet TensorFlow Lite model using Kotlin. You can create your own custom trained models or choose among the hosted, pre-trained ones.

Implementation

Under the Hood

The flow is really simple. We pass the bitmap images from the Analyze use case in CameraX to the TensorFlow interpreter that runs inference on the image using the MobileNet model and the label classes. Here’s an illustration of how CameraX and TensorFlow Lite interact with one another.

Setup

Launch a new Android Studio Kotlin project and add the following dependencies in your app’s build.gradle file.

//CameraX
implementation 'androidx.camera:camera-core:1.0.0-alpha02'
implementation 'androidx.camera:camera-camera2:1.0.0-alpha02'// Task API
implementation "com.google.android.gms:play-services-tasks:17.0.0"

implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly'

The nightly TensorFlow Lite build provides experimental support for GPUs. The Google Play Services Task API is used for handling asynchronous method calls.

Next, add the MVP files, the labels, and the .tflite model file under your assets directory. Also, you need to ensure that the model isn’t compressed by setting the following aaptOptions in the build.gradle file:

android{
aaptOptions {
    noCompress "tflite"
    noCompress "lite"
}

Add the necessary permissions for the camera in your AndroidManifest.xml file:

<uses-permission android:name="android.permission.CAMERA" />

Now that the setup is complete, it’s time to establish the layout!

Layout

The layout is defined inside the activity_main.xml file, and it consists of a TextureView for displaying the Camera Preview and a TextView that shows the predicted output from your image classification model.

Request Camera Permissions

You’ll need to request runtime permissions before accessing the camera. The following code from the MainActivity.kt class shows how that’s done.

Once permission is granted, we’ll start our camera!

Setting Up Camera Use Cases

As seen in the previous section’s code, startCamera is called from the post method on the TextureView . This ensures that the camera is only started once the TextureView is laid on the screen. In the updateTransform method, we fix the orientation of the view with respect to the device’s orientation.

In the above code, we’re doing quite a few things. Let’s go through each of them:

Setting up our Preview use case using the PreviewConfig.Builder.
setOnPreviewOutputUpdateListener is where we add the surface texture of the camera preview to the TextureView.
Inside the Analyzer use case, we convert the image proxy to a Bitmap and pass it to the TFClassifier's classify method. If this looks out of place, skip it for now, as we’ll be discussing the TFClassifier class at length in the next section.

The following code snippet is used for converting the ImageProxy to a Bitmap:

fun ImageProxy.toBitmap(): Bitmap {
    val yBuffer = planes[0].buffer // Y
    val uBuffer = planes[1].buffer // U
    val vBuffer = planes[2].buffer // V

    val ySize = yBuffer.remaining()
    val uSize = uBuffer.remaining()
    val vSize = vBuffer.remaining()

    val nv21 = ByteArray(ySize + uSize + vSize)
    
    yBuffer.get(nv21, 0, ySize)
    vBuffer.get(nv21, ySize, vSize)
    uBuffer.get(nv21, ySize + vSize, uSize)

    val yuvImage = YuvImage(nv21, ImageFormat.NV21, this.width, this.height, null)
    val out = ByteArrayOutputStream()
    yuvImage.compressToJpeg(Rect(0, 0, yuvImage.width, yuvImage.height), 100, out)
    val imageBytes = out.toByteArray()
    return BitmapFactory.decodeByteArray(imageBytes, 0, imageBytes.size)
}

It’s now time to run image classification! Let’s jump to the next section.

Tensor Flow Lite Interpreter

The TensorFlow Lite Interpreter follows the following steps in order to return the predictions based on the input.

1. Converting the model into a ByteBuffer

We must memory map the model from the Assets folder to get a ByteBuffer, which is ultimately loaded into the interpreter:

@Throws(IOException::class)
private fun loadModelFile(assetManager: AssetManager, filename: String): ByteBuffer {
    val fileDescriptor = assetManager.openFd(filename)
    val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
    val fileChannel = inputStream.channel
    val startOffset = fileDescriptor.startOffset
    val declaredLength = fileDescriptor.declaredLength
    return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength)
}

2. Loading the labels classes into a Data Structure

The labels file consists of thousands of different classes from ImageNet. We’ll load those labels into an Array. In the end, the interpreter will return predictions based on these label strings.

@Throws(IOException::class)
fun loadLines(context: Context, filename: String): ArrayList<String> {
    val s = Scanner(InputStreamReader(context.assets.open(filename)))
    val labels = ArrayList<String>()
    while (s.hasNextLine()) {
        labels.add(s.nextLine())
    }
    s.close()
    return labels
}
var labels = loadLines(context, "labels.txt")

3. Initializing Our Interpreter

Now that we’ve got our ByteBuffer and label list, it’s time to initialize our interpreter. In the following code, we’ve added the GPUDelegate in our Interpreter.Options() method:

In the above code, once the model’s setup is done in the interpreter, we retrieve the input tensor shape of the model. This is done in order to preprocess the Bitmap into the same shape that the model accepts.

The Callable interface is similar to Runnable but allows us to return a result. The ExecutorService is used for managing multiple threads from a ThreadPool.

The initialize method is called in the onCreate method of our MainActivity, as shown below:

private var tfLiteClassifier: TFLiteClassifier = TFLiteClassifier(this@MainActivity)tfLiteClassifier
    .initialize()
    .addOnSuccessListener { }
    .addOnFailureListener { e -> Log.e(TAG, "Error in setting up the classifier.", e) }

4. Preprocessing the Input and Running Inference

We can now resize our Bitmap to fit the model input shape. Then, we’ll convert the new Bitmap into a ByteBuffer for model execution:

In the above code, the convertBitmapToByteBuffer masks the least significant 8 bits from each pixel in order to ignore the alpha channel.

Along with the ByteBuffer, we pass a float array for each of the image classes on which the predictions will be calculated and returned.

5. Computing Arg Max

Finally, the getMaxResult function returns the label with the highest confidence, as shown in the code snippet below:

private fun getMaxResult(result: FloatArray): Int {
    var probability = result[0]
    var index = 0
    for (i in result.indices) {
        if (probability < result[i]) {
            probability = result[i]
            index = i
        }
    }
    return index
}

The classifyAsync method that runs in the Analyzer use case gets a string consisting of prediction and inference time via the onSuccessListener, thanks to Callable interface.

tfLiteClassifier
    .classifyAsync(bitmap)
    .addOnSuccessListener { resultText -> predictedTextView?.text = resultText }

In return, we display the predicted label and the inference time on the screen, as shown below:

Conclusion

So that sums up this article. We used TensorFlow Lite and CameraX to build an image classification Android application using MobileNet while leveraging the GPU delegate—and we got a pretty accurate result pretty quickly. Moving on from here, you can try building your own custom TFLite Models and see how they fare with CameraX. CameraX is still in alpha stages, but there’s already a lot you can do with it.

The full source code of this guide is available here.

That’s it for this one. I hope you enjoyed reading.