Introduction to RealityKit on iOS— Entities, Gestures, and Ray Casting
Leveraging RealityKit, Vision, and PencilKit frameworks. Time to say goodbye to SceneKit?
This article was originally posted on my Substack, iOSDevie.
The introduction of iOS 13 brought a major upgrade to Apple’s augmented reality framework. ARKit 3 arrived with a lot of interesting new features — people occlusion, motion tracking, simultaneous front and back camera, and collaborative support. These enhancements to ARKit strongly indicate Apple’s ambition for pushing AR immersion even further.
Up until iOS 12, we had SceneKit, SpriteKit, and Metal as the primary rendering frameworks. Among these, SceneKit, the 3D graphics framework, had been the most logical choice for building ARKit apps.
While a lot of enhancements in the SceneKit framework were expected to be announced at WWDC 2019, Apple surprised us by introducing a completely new and independent 3D engine framework—RealityKit, which allows developers to create AR experiences and scenes more easily than ever. Additionally, it comes with a utility app, Reality Composer, which allows us to create our own 3D objects and customizations.
Our Goal
The purpose of this article is to get you started with RealityKit and set you up to start building awesome augmented reality-based applications. We’ll start off by setting up an Xcode Project for our AR-based iOS application, followed by a brief tour through the various key components of the RealityKit framework.
As we work through this tutorial, we’ll put the various pieces together to end up with a really cool AR application that lets users add 3D models and structures to the RealityKit’s virtual scene and interact with them by using gestures.
Additionally, we’ll set up a drawing canvas view for handling user input. In this case, the user input that will include digits inferred using the MNIST Core ML model, which will then be converted into 3D text that eventually gets placed in the virtual scene.
Besides RealityKit
and ARKit
, we’ll be using the following iOS frameworks in our application:
- PencilKit — This is a drawing framework introduced in iOS 13 that allows us to create custom, canvas-based applications. We’ll leverage this framework for handling the input.
- SwiftUI and Vision — SwiftUI is the popular new declarative UI framework, and Vision abstracts complex computer vision algorithms with an easy-to-use API.
Project Setup
To start off, open Xcode 11 or above and create a new project. Go to the iOS tab and select the Augmented Reality App template. In the wizard, make sure to choose RealityKit as the technology and SwiftUI as the user interface, as shown below:
If you look at the left panel in Xcode you’ll see a file named Experience.rcproject
. This is a Reality Composer file. By default, it comes with a single scene consisting of a steel box. You can create your own scenes with custom models, 3D assets, and effects.
The starter project that you’ve just created consists of an ARView
, in which the box entity is loaded and added to the anchor of the ARView
. Upon building the project, the following box will be displayed in the middle of your AR app’s screen:
The starter project is devoid of any gestures and interactions with the virtual scene. As we go along, instead of using the Reality Composer to construct scenes and structures, we’ll create our own 3D entities programmatically. But before we do that, let's talk about the core components that build a RealityKit scene and address the fancy terms — scenes, entities, anchors, etc.
Anatomy of RealityKit
RealityKit’s ARView
is the view responsible for handling the AR experience. From setting up the onboarding experience (more on this later) to configuring ARKit configurations, camera, and interactions, everything goes through the ARView
.
Every ARView
consists of a single scene
—a read-only instance over which we add our AnchorEntities
.
An Entity
is the most important component of RealityKit. All objects in a RealityKit scene
are entities. An AnchorEntity
is the root of all entities. Similar to the ARAnchor
of ARKit, it’s responsible for holding the entities and their children.
We can add Components
to an entity to further customize it. A ModelComponent
lets us define the geometry of the 3D object, and a CollisionComponent
lets us handle collisions between objects.
RealityKit makes it really easy to generate simple 3D shapes, such as boxes, spheres, planes, and text.
The following code showcases how to create a ModelEntity
that represents a cube:
let box = MeshResource.generateBox(size: 0.3) // size in metreslet material = SimpleMaterial(color: .green, isMetallic: true)
let entity = ModelEntity(mesh: box, materials: [material])
The Material
protocol is used to set the color and texture of the entity. Currently, the three built-in types of Material
available with RealityKit are:
SimpleMaterial
— For setting the color and whether or not the entity is metallic.OcclusionMaterial
— An invisible material that hides objects rendered behind it.UnlitMaterial
— This kind of entity doesn’t react to lights in the AR scene.
An entity is added to the scene in the following way:
let anchor = AnchorEntity(plane: .horizontal)
anchor.addChild(entity)
arView.scene.addAnchor(anchor)
In order to add the entity to the virtual scene, we need to ensure that it conforms to the HasAnchoring
protocol or is added as a child to an Anchor with this property, as we did above.
So the following won’t work, since the ModelEntity
doesn’t conform to the HasAnchoring
protocol:
arView.scene.anchors.append(entity) //this would not work
Before we create our first custom entity and add it to the scene, let's see what ARCoachingOverlay
is and how to integrate it into our ARView
.
Configuring ARCoachingOverlay
The ARCoachingOverlayView
is used to provide visual instructions to the user in order to facilitate ARKit’s world tracking. For this, we need to add this view as a subview of the ARView
and set up the goal
property, which specifies the tracking requirements — horizontalPlane
, verticalPlane
, anyPlane
, or tracking
(tracks feature points). Once the goal is determined, the ARCoachingOverlayView
is dismissed.
extension ARView: ARCoachingOverlayViewDelegate {
func addCoaching() {
let coachingOverlay = ARCoachingOverlayView()
coachingOverlay.delegate = self
coachingOverlay.session = self.session
coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
coachingOverlay.goal = .anyPlane
self.addSubview(coachingOverlay)
}
public func coachingOverlayViewDidDeactivate(_ coachingOverlayView: ARCoachingOverlayView) {
//Ready to add entities next?
}
}
The delegate’s coachingOverlayViewDidDeactivate
function gets triggered once the goal is met. The ARCoachingOverlay
is automatic by default. This means if, during the scene, the feature points or the plane is lost, onboarding would start again. You can prevent this by setting it as a one-off operation and disable the automatic behavior by setting coachingOverlayView.activatesAutomatically = false
.
Next, just execute the addCoaching
function from above on the ARView
instance as shown below:
struct ARViewContainer: UIViewRepresentable {
func makeUIView(context: Context) -> ARView {
let arView = ARView(frame: .zero)
arView.addCoaching()
let config = ARWorldTrackingConfiguration()
config.planeDetection = .horizontal
arView.session.run(config, options: [])
return arView
}
func updateUIView(_ uiView: ARView, context: Context) {}
}
Next up, we’ll create a custom entity and add it to the scene once the ARCoachingOverlayView
is dismissed.
Creating a Custom Box Entity
We can create our own Entity
subclasses of custom shape and sizes by conforming to the HasModel
and HasAnchoring
protocols. Additionally, the HasCollision
protocol is used to enable interactions with the entity — ray casting (more on this later), gesture handling (scale, translate, rotate), etc.
The following code shows how to create a custom entity box structure:
class CustomBox: Entity, HasModel, HasAnchoring, HasCollision {
required init(color: UIColor) {
super.init()
self.components[ModelComponent] = ModelComponent(
mesh: .generateBox(size: 0.1),
materials: [SimpleMaterial(
color: color,
isMetallic: false)
]
)
}
convenience init(color: UIColor, position: SIMD3<Float>) {
self.init(color: color)
self.position = position
}
required init() {
fatalError("init() has not been implemented")
}
}
There’s also a convenience initializer that allows us to specify the position of the entity in the scene with respect to the camera:
let box = CustomBox(color: .yellow)
//or
let box = CustomBox(color: .yellow, position: [-0.6, -1, -2])self.scene.anchors.append(box) //self is arView
Now we’ve added an entity to our AR scene, but we can’t perform any interactions with it yet! To do that we’ll need to add gestures, which we’ll explore next.
Entity Gestures and Child Entities
RealityKit provides us with a bunch of built-in gesture interactions. Specifically, it allows scaling, rotating, and translating the entities in the AR Scene. To enable gestures on an entity, we need to ensure that it conforms to the HasCollision
protocol (which we did in the previous section).
Also, we need to “install” the relevant gestures (scale
, translate
, rotate
or all
) on the entity in the following way:
let box = CustomBox(color: .yellow, position: [-0.6, -1, -2])
self.installGestures(.all, for: box)
box.generateCollisionShapes(recursive: true)
self.scene.anchors.append(box)
The function generateCollisionShapes
generates the shape of the Collision Component of the entity with the same dimensions as the entity’s Model Component. The collision component is responsible for interacting with the entity.
To install multiple gestures, we invoke the method with the list of gestures in an array, as shown below:
arView.installGestures(.init(arrayLiteral: [.rotate, .scale]), for: box)
With this, our entity is ready to be interacted and played around with in the AR scene.
Adding an entity to another entity
We can also add child entities to the current entity and position them relative to it. Let’s extend our current case by adding a 3D text mesh on top of the box, as shown below:
let mesh = MeshResource.generateText(
"RealityKit",
extrusionDepth: 0.1,
font: .systemFont(ofSize: 2),
containerFrame: .zero,
alignment: .left,
lineBreakMode: .byTruncatingTail)
let material = SimpleMaterial(color: .red, isMetallic: false)
let entity = ModelEntity(mesh: mesh, materials: [material])
entity.scale = SIMD3<Float>(0.03, 0.03, 0.1)
box.addChild(entity)
entity.setPosition(SIMD3<Float>(0, 0.05, 0), relativeTo: box)
The following is a glimpse of our RealityKit application with the text placed above the box:
As a note, the world’s environment has an impact on the lighting of the entities. The same box that looks pale yellow in the above illustration would look brighter in different surroundings.
Now that we’ve added interactivity to the entities and created a 3D text mesh, let’s move on to the last segment of RealityKit — ray casting.
Ray Casting
Ray casting, much like hit testing, helps us find a 3D point in an AR scene from your screen point. It’s responsible for converting the 2D points on your touch screen to real 3D coordinates by using ray intersection to find the point on the real-world surface.
Though hitTest
is available in RealityKit for compatibility reasons, ray casting is the preferred method, as it continuously refines the results of tracked surfaces in the scene.
We’ll extend the above application to allow touch gestures in the ARView
in SwiftUI to be converted into the 3D points, where we’ll eventually position the entities.
Currently, the TapGesture
method in SwiftUI doesn’t return the location of the view — where it’s pressed. So we’ll fall back onto the UIKit framework to help us find the 2D location of the tap gesture.
In the following code, we’ve set up our UITapGestureRecognizer
in the ARView
, as shown below:
- Take note of the
findEntities
function — this helps us find nearby entities in 3D space based on the 2D screen point. - The
setupGestures
method will be invoked on ourARView
instance. - The
makeRaycastQuery
creates anARRaycastQuery
, in which we’ve passed the point from the screen. Optionally, you can pass the center point of the screen if you intend to just add the entities to the center of the screen each time. Additionally, the planetype
(exact or estimated) andorientation
(you can set either amonghorizontal
,vertical
orany
). - The results returned from ray casting are used to create an
AnchorEntity
on which we’ve added our box entity with the text. overlayText
is what we’ll receive from the user input as the label for the 3D text (more on this later).
Before we jump onto PencilKit
for creating input digits, let’s modify the ARViewContainer
that loads the ARView
with the changes we’ve made so far.
Configuring ARView with SwiftUI Coordinator
In the following code, the Coordinator
class is added to the ARViewContainer
in order to allow data to flow from the PencilKitView
to the ARView
.
The overlayText
is picked up by the ARView
scene from the Coordinator
class. Next up, PencilKit meets the Vision framework.
Handling Input with PencilKit
PencilKit is the new drawing framework introduced in iOS 13. In our app, we’ll let the user draw digits on the PencilKit’s canvas and classify those handwritten digits by feeding the Core ML MNIST model to the Vision framework.
The following code sets up the PencilKit view (PKCanvasView
) in SwiftUI:
struct PKCanvasRepresentation : UIViewRepresentable {
let canvasView = PKCanvasView()
func makeUIView(context: Context) -> PKCanvasView {
canvasView.tool = PKInkingTool(.pen, color: .secondarySystemBackground, width: 40)
return canvasView
}
func updateUIView(_ uiView: PKCanvasView, context: Context) {
}
}
ContentView
Now it’s time to merge the ARView
and PKCanvasView
in our ContentView
. By default, SwiftUI views occupy the maximum space available to them. Hence, both of these views would take up almost half of the screen.
The code for the ContentView.swift
file is presented below:
The following code does the styling for the SwiftUI button:
struct MyButtonStyle: ButtonStyle {
var color: Color = .green
public func makeBody(configuration: MyButtonStyle.Configuration) -> some View {
configuration.label
.foregroundColor(.white)
.padding(15)
.background(RoundedRectangle(cornerRadius: 5).fill(color))
.compositingGroup()
.shadow(color: .black, radius: 3)
.opacity(configuration.isPressed ? 0.5 : 1.0)
.scaleEffect(configuration.isPressed ? 0.8 : 1.0)
}
}
Finally, our app is ready! An illustration of a working RealityKit + PencilKit iOS application is given below:
Once the digit is extracted from the PencilKit drawing, all we do is a ray cast from the point where the ARView
is touched on the screen to create an entity on the plane. Currently, the entities do not support collision and can be dragged in and out of each other. We’ll be handling collisions and more interactions in the a subsequent tutorial, so stay tuned!
Conclusion
RealityKit is here to abstract a lot of boilerplate code to allow developers to focus on building more immersive AR experiences. It’s fully written in Swift and has come as a replacement for SceneKit.
Here, we too a good look at the RealityKit entities and components and saw how to set up a coaching overlay. Furthermore, we created our own custom entity and child entities. Subsequently, we dug into the 3D gestures currently supported with RealityKit and integrated them on the entities, and then explored ray casting. Finally, we integrated PencilKit for handling user inputs and used the Vision framework for predicting hand-drawn digits.
The full source code along with the MNIST Core ML model is available in this GitHub Repository.
Moving on from here, we’ll explore the other interesting functionalities available in RealityKit. Loading different kinds of objects, adding sounds, and the ability to perform and detect collisions will be up next.
That’s it for this one. Thanks for reading.