Better Programming

Advice for programmers.

Follow publication

SwiftUI + Core ML + ARKit — Create an Object Detection iOS App

Eren Çelik
Better Programming
Published in
4 min readMay 19, 2021

Camera on tripod
Photo by Patrick on Unsplash.

While I was reading and researching iOS articles on Medium recently, I realized that there are few articles about Core ML, ARKit, and SwiftUI. That’s why I wanted to put these three libraries together and write a simple application.

Long story short, the application uses a Swift version of the deep learning library called ResNet written using Python.

Let’s talk about the application logic now. When we click on the screen in the application, we will capture the current image of the center of the screen, process it with our Core ML model, and create an augmented reality text. The logic is very simple.

Let’s Start!

First, we download our model from the models offered to us by Apple and put our model in our project root directory. One of the things we need to pay attention to here is that our target device is a real device. Otherwise, the model and ARView may be given an error. I guess it’s an Xcode bug.

What Are We Going To Do?

We will create an augmented reality project whose interface is SwiftUI. Then we will delete the name Experience.rcproject in the project. We don’t need it.

After that, we will create our ResNet model in the UIViewRepresentable. Since our ResNet model can throw errors, we configure our model in do catch blocks.

We move our constant named arView in makeUIView() out of the function and create an ARRaycastResult object where we can get the information of the touched point on the screen and a VNRequest array where we will collect our Core ML requests.

let arView = ARView(frame: .zero)
var hitTestResultValue : ARRaycastResult!
var visionRequests = [VNRequest]()

After completing all these steps, we define a Coordinator class in Struct and provide a reference to the init method in the class into our UIViewRepresentable object and introduce it to our UIViewRepesentable object with the makeCoordinator() function.

We have performed simple operations so far. Now let’s move on to the main operations.

First, we add a tapGesture to our arView object in our makeUIView() function and add the action method to our Coordinator class.

This is our job in UIViewRepresentable. Now let’s move on to our operations in our Coordinator class.

First, we create a function where we will put the object names defined on the screen and define a text parameter for this function.

Let’s talk about the function briefly. We create an augmented reality object within the genrateText function in the Mesh Resource class, and we create a Simple Material object. These objects help to apply some material (etc. color, roughness) to our augmented reality object.

We convert this material into an Entity object, and according to the coordinates in the rayCastValue we created above, we place it according to the real-world coordinates.

And then we create the function in the visionRequest name that will process images. This function takes a value of the “CVPixelBuffer” type.

After creating a Core ML request through this function that processes this request, we created a text object in the real world with the name and percentage of the object we want to display information on.

As the last process, we create our method labeled with obj. In this method, we obtain 3D positions in the real world according to the screen coordinates coming from our tapGesture object thanks to our raycast method.

After we get the snapshot of the moment when we click on this screen, the location we clicked from this data is the rayCastResultValue. We are transferring our constant and the screenshot we obtained at that moment to our method then we send this screenshot to our Core ML model.

Result

GIF of final result
Result

There is only one small problem here. Text objects do not appear to be two-sided and there is no API for this. I hope Apple will bring some innovation to this issue soon.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response