Image Classification on Android with TensorFlow Lite and CameraX
Leverage the GPU delegate for machine learning on the edge
TensorFlow Lite is the lightweight version of TensorFlow Mobile. It’s here to unleash the machine learning power on your smartphones while ensuring that the model binary size isn’t too big and there’s low latency. Additionally, it also supports hardware acceleration using the Neural Networks API and is destined to run 4X faster with GPU support.
CameraX is the latest Camera API released with the Jetpack Support library. It’s here to make developing with the camera much easier, and with Google’s automated lab testing, it strives to make things consistent across Android devices, of which there are many. CameraX represents a huge improvement from the Camera 2 API in terms of ease of use and simplicity.
The goal of this article is to merge the camera and ML worlds by processing CameraX frames for image classification using a TensorFlow Lite model. We’ll be building an Android application using Kotlin that leverages the power of GPUs of your smartphones.
CameraX: A Brief Overview
CameraX is lifecycle aware. So it removes the need for handling the states in the onResume
and onPause
methods.
The API is use case-based. The three main use cases that are currently supported are:
- Preview — Displays the Camera feed.
- Analyze — To process images for computer vision or other machine learning-related tasks.
- Capture — To save high-quality images.
Additionally, CameraX provides Extensions to easily access features such as HDR, Portrait, and Night Mode on supported devices.
Tensor Flow Lite Converter
The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite FlatBuffer
file. The .tflite model then can be deployed on mobile or embedded devices to run locally using the Tensor Flow interpreter.
The following code snippet depicts one such way of converting a Keras model to a mobile compatible .tflite
file:
from tensorflow import liteconverter = lite.TFLiteConverter.from_keras_model_file( 'model.h5')tfmodel = converter.convert()open ("model.tflite" , "wb") .write(tfmodel)
In the following sections, we’ll be demonstrating a hands-on implementation of CameraX with a MobileNet TensorFlow Lite model using Kotlin. You can create your own custom trained models or choose among the hosted, pre-trained ones.
Implementation
Under the Hood
The flow is really simple. We pass the bitmap images from the Analyze use case in CameraX to the TensorFlow interpreter that runs inference on the image using the MobileNet model and the label classes. Here’s an illustration of how CameraX and TensorFlow Lite interact with one another.
Setup
Launch a new Android Studio Kotlin project and add the following dependencies in your app’s build.gradle
file.
//CameraX
implementation 'androidx.camera:camera-core:1.0.0-alpha02'
implementation 'androidx.camera:camera-camera2:1.0.0-alpha02'// Task API
implementation "com.google.android.gms:play-services-tasks:17.0.0"
implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly'
The nightly TensorFlow Lite build provides experimental support for GPUs. The Google Play Services Task API is used for handling asynchronous method calls.
Next, add the MVP files, the labels, and the .tflite
model file under your assets directory. Also, you need to ensure that the model isn’t compressed by setting the following aaptOptions
in the build.gradle
file:
android{
aaptOptions {
noCompress "tflite"
noCompress "lite"
}
Add the necessary permissions for the camera in your AndroidManifest.xml
file:
<uses-permission android:name="android.permission.CAMERA" />
Now that the setup is complete, it’s time to establish the layout!
Layout
The layout is defined inside the activity_main.xml
file, and it consists of a TextureView for displaying the Camera Preview and a TextView that shows the predicted output from your image classification model.
Request Camera Permissions
You’ll need to request runtime permissions before accessing the camera. The following code from the MainActivity.kt
class shows how that’s done.
Once permission is granted, we’ll start our camera!
Setting Up Camera Use Cases
As seen in the previous section’s code, startCamera
is called from the post
method on the TextureView
. This ensures that the camera is only started once the TextureView
is laid on the screen. In the updateTransform
method, we fix the orientation of the view with respect to the device’s orientation.
In the above code, we’re doing quite a few things. Let’s go through each of them:
- Setting up our Preview use case using the
PreviewConfig.Builder
. setOnPreviewOutputUpdateListener
is where we add the surface texture of the camera preview to theTextureView
.- Inside the Analyzer use case, we convert the image proxy to a Bitmap and pass it to the
TFClassifier
'sclassify
method. If this looks out of place, skip it for now, as we’ll be discussing theTFClassifier
class at length in the next section.
The following code snippet is used for converting the ImageProxy to a Bitmap:
fun ImageProxy.toBitmap(): Bitmap {
val yBuffer = planes[0].buffer // Y
val uBuffer = planes[1].buffer // U
val vBuffer = planes[2].buffer // V
val ySize = yBuffer.remaining()
val uSize = uBuffer.remaining()
val vSize = vBuffer.remaining()
val nv21 = ByteArray(ySize + uSize + vSize)
yBuffer.get(nv21, 0, ySize)
vBuffer.get(nv21, ySize, vSize)
uBuffer.get(nv21, ySize + vSize, uSize)
val yuvImage = YuvImage(nv21, ImageFormat.NV21, this.width, this.height, null)
val out = ByteArrayOutputStream()
yuvImage.compressToJpeg(Rect(0, 0, yuvImage.width, yuvImage.height), 100, out)
val imageBytes = out.toByteArray()
return BitmapFactory.decodeByteArray(imageBytes, 0, imageBytes.size)
}
It’s now time to run image classification! Let’s jump to the next section.
Tensor Flow Lite Interpreter
The TensorFlow Lite Interpreter follows the following steps in order to return the predictions based on the input.
1. Converting the model into a ByteBuffer
We must memory map the model from the Assets folder to get a ByteBuffer, which is ultimately loaded into the interpreter:
@Throws(IOException::class)
private fun loadModelFile(assetManager: AssetManager, filename: String): ByteBuffer {
val fileDescriptor = assetManager.openFd(filename)
val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
val fileChannel = inputStream.channel
val startOffset = fileDescriptor.startOffset
val declaredLength = fileDescriptor.declaredLength
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength)
}
2. Loading the labels classes into a Data Structure
The labels file consists of thousands of different classes from ImageNet. We’ll load those labels into an Array. In the end, the interpreter will return predictions based on these label strings.
@Throws(IOException::class)
fun loadLines(context: Context, filename: String): ArrayList<String> {
val s = Scanner(InputStreamReader(context.assets.open(filename)))
val labels = ArrayList<String>()
while (s.hasNextLine()) {
labels.add(s.nextLine())
}
s.close()
return labels
}
var labels = loadLines(context, "labels.txt")
3. Initializing Our Interpreter
Now that we’ve got our ByteBuffer and label list, it’s time to initialize our interpreter. In the following code, we’ve added the GPUDelegate
in our Interpreter.Options()
method:
In the above code, once the model’s setup is done in the interpreter, we retrieve the input tensor shape of the model. This is done in order to preprocess the Bitmap into the same shape that the model accepts.
The Callable
interface is similar to Runnable
but allows us to return a result. The ExecutorService
is used for managing multiple threads from a ThreadPool.
The initialize
method is called in the onCreate
method of our MainActivity
, as shown below:
private var tfLiteClassifier: TFLiteClassifier = TFLiteClassifier(this@MainActivity)tfLiteClassifier
.initialize()
.addOnSuccessListener { }
.addOnFailureListener { e -> Log.e(TAG, "Error in setting up the classifier.", e) }
4. Preprocessing the Input and Running Inference
We can now resize our Bitmap to fit the model input shape. Then, we’ll convert the new Bitmap into a ByteBuffer for model execution:
In the above code, the convertBitmapToByteBuffer
masks the least significant 8 bits from each pixel in order to ignore the alpha channel.
Along with the ByteBuffer, we pass a float array for each of the image classes on which the predictions will be calculated and returned.
5. Computing Arg Max
Finally, the getMaxResult
function returns the label with the highest confidence, as shown in the code snippet below:
private fun getMaxResult(result: FloatArray): Int {
var probability = result[0]
var index = 0
for (i in result.indices) {
if (probability < result[i]) {
probability = result[i]
index = i
}
}
return index
}
The classifyAsync
method that runs in the Analyzer use case gets a string consisting of prediction and inference time via the onSuccessListener
, thanks to Callable
interface.
tfLiteClassifier
.classifyAsync(bitmap)
.addOnSuccessListener { resultText -> predictedTextView?.text = resultText }
In return, we display the predicted label and the inference time on the screen, as shown below:
Conclusion
So that sums up this article. We used TensorFlow Lite and CameraX to build an image classification Android application using MobileNet while leveraging the GPU delegate—and we got a pretty accurate result pretty quickly. Moving on from here, you can try building your own custom TFLite Models and see how they fare with CameraX. CameraX is still in alpha stages, but there’s already a lot you can do with it.
The full source code of this guide is available here.
That’s it for this one. I hope you enjoyed reading.