SwiftUI + Core ML + ARKit — Create an Object Detection iOS App
Use the power of machine learning and augmented reality to detect objects around you
While I was reading and researching iOS articles on Medium recently, I realized that there are few articles about Core ML, ARKit, and SwiftUI. That’s why I wanted to put these three libraries together and write a simple application.
Long story short, the application uses a Swift version of the deep learning library called ResNet written using Python.
Let’s talk about the application logic now. When we click on the screen in the application, we will capture the current image of the center of the screen, process it with our Core ML model, and create an augmented reality text. The logic is very simple.
Let’s Start!
First, we download our model from the models offered to us by Apple and put our model in our project root directory. One of the things we need to pay attention to here is that our target device is a real device. Otherwise, the model and ARView
may be given an error. I guess it’s an Xcode bug.
What Are We Going To Do?
We will create an augmented reality project whose interface is SwiftUI. Then we will delete the name Experience.rcproject
in the project. We don’t need it.
After that, we will create our ResNet model in the UIViewRepresentable
. Since our ResNet model can throw errors, we configure our model in do catch
blocks.
We move our constant named arView
in makeUIView()
out of the function and create an ARRaycastResult
object where we can get the information of the touched point on the screen and a VNRequest
array where we will collect our Core ML requests.
let arView = ARView(frame: .zero)
var hitTestResultValue : ARRaycastResult!
var visionRequests = [VNRequest]()
After completing all these steps, we define a Coordinator
class in Struct
and provide a reference to the init method in the class into our UIViewRepresentable
object and introduce it to our UIViewRepesentable
object with the makeCoordinator()
function.
We have performed simple operations so far. Now let’s move on to the main operations.
First, we add a tapGesture
to our arView
object in our makeUIView()
function and add the action
method to our Coordinator
class.
This is our job in UIViewRepresentable
. Now let’s move on to our operations in our Coordinator
class.
First, we create a function where we will put the object names defined on the screen and define a text parameter for this function.
Let’s talk about the function briefly. We create an augmented reality object within the genrateText
function in the Mesh Resource class, and we create a Simple Material object. These objects help to apply some material (etc. color, roughness) to our augmented reality object.
We convert this material into an Entity
object, and according to the coordinates in the rayCastValue
we created above, we place it according to the real-world coordinates.
And then we create the function in the visionRequest
name that will process images. This function takes a value of the “CVPixelBuffer” type.
After creating a Core ML request through this function that processes this request, we created a text object in the real world with the name and percentage of the object we want to display information on.
As the last process, we create our method labeled with obj
. In this method, we obtain 3D positions in the real world according to the screen coordinates coming from our tapGesture
object thanks to our raycast
method.
After we get the snapshot of the moment when we click on this screen, the location we clicked from this data is the rayCastResultValue
. We are transferring our constant and the screenshot we obtained at that moment to our method then we send this screenshot to our Core ML model.
Result

There is only one small problem here. Text objects do not appear to be two-sided and there is no API for this. I hope Apple will bring some innovation to this issue soon.
Conclusion
Using Core ML and the ARKit4 library, we developed an object recognition application within the SwiftUI interface. You can find the full project on GitHub. I have left useful resources below for more information.
Thank you for reading!
Core ML
- https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture
- https://www.raywenderlich.com/7960296-core-ml-and-vision-tutorial-on-device-training-on-ios