New in iOS 15: Vision Person Segmentation
Separate people from backgrounds in images and videos
Vision is Apple’s framework that provides out-of-the-box solutions for complex computer vision challenges. It also abstracts the Core ML request by handling the pre-processing of images during classification.
With iOS 14, we got half a dozen new Vision features, including contour detection, optical flow, trajectory detection, offline video processing, and hand and body pose estimation.
At WWDC 2021, Apple announced two new Vision requests: person and document segmentation.
In this article, I’ll be focusing on the new Person Segmentation Vision request introduced with iOS 15.
Vision Person Segmentation Request
Semantic segmentation is a technique used to classify each pixel in an image. It’s commonly used to separate foreground objects from the background.
Autonomous driving and virtual backgrounds in video calls are two popular use cases where you might’ve observed semantic segmentation in some form.
DeepLabV3 is a popular machine learning model for performing image segmentation. In case you’re looking to go the Core ML way, check out this tutorial that shows you how to modify backgrounds in images.
Coming back to the Vision framework, the new VNGeneratePersonSegmentationRequest
class facilitates person segmentation and returns a segmentation mask for people in a frame.
Here’s how to set it up:
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate
request.outputPixelFormat = kCVPixelFormatType_OneComponent8
The three quality levels are accurate
, balanced
, and fast
— the latter ones are recommended in video processing tasks.
In the next section, we’ll see how to perform person segmentation on images in a SwiftUI app. Subsequently, we’ll see how to do the same using the new Core Image filter.
Setting Up Our SwiftUI View
Here’s a simple SwiftUI interface that contains three images in a vertical stack:
task
is the all-new SwiftUI modifier to execute asynchronous stuff. In it, we’ve invoked our runVisionRequest
method.
Running the Vision Request
The following method runs the Vision Person Segmentation Request and returns the masked output result:
Here’s how the original and masked images look:
While the above segmentation mask uses the accurate
quality level, here’s a look at fast
and balanced
results:
To add a new background, we pass the segmentation mask into the maskInputImage
function.
This function uses a CoreImage
blend filter to crop out the masked image on the original and then blend it with a new background image.
It returns the following result:
Running Core Image Filter
Core Image is Apple’s image processing framework that’s widely used in simple computer vision tasks.
In iOS 15, Core Image gets the following new filter:
CIFilter.personSegmentation()
We can apply it to our earlier SwiftUI view in the following way:
Here’s how it looks on the simulator:
To blend the above segmentation in a new background, you’d need to transform the red color into white for the blend filter to work as intended.
Conclusion
That sums up Vision’s new Person Segmentation request. It’ll be interesting to see the results on a real-time video player.
You can download the full source from the GitHub Repository.
We can also run sequenced Vision requests. For instance, you could combine the above Person Segmentation with a face pose request to build creative AI applications.
Philipp Gehrke shows how to implement Vision Body Pose Estimation. It might be a great place to start integrating the above Person Segmentation Request and detect human poses more accurately.
That’s it for this one. Thanks for reading.