Member-only story

Synthetic Data Generation for Computer Vision in Blender

(part 1)

Alex Martinelli

Published in

Better Programming

8 min readAug 9, 2022

Example synth data from thiscatdoesnotexist.com, paralleldomain.com and Microsoft “Fake It Till You Make It”

What: This entry gives an introduction to synthetic-data-generation, and how you can use it via Blender to train performant and robust vision models. We’ll provide an overview of the Blender setup and, for demonstrative purposes, present a concrete visual classification scenario from the fashion domain.

Why: to leverage Blender procedural capabilities and adopt a data-centric approach to get better machine-learning models with little or no need for human annotations.

Who: we will rely on Blender >3.1 and Python ≥ 3.7. Generated images can then be used for any downstream task, regardless of possible dependent frameworks (e.g. Tensorflow, PyTorch).

Synthetic Data Generation (SDG)

Synthetic Data Generation (SDG) encompasses a variety of methods that aim at programmatically generating data to support downstream tasks. In statistics and Machine Learning (ML), the resulting goal is to synthesize samples with the same distribution of a target domain, to be used for model training or testing purposes. It is part of the data-centric ML approach, where in order to achieve better performances we actively work on data, instead of models, algorithms, or architectures.

SDG is adopted for multiple reasons, the primary ones being:

minimize the need for human labeling and curation
facilitate and/or reach the data requirements for ever-higher capacity models
tackle issues such as generality, robustness, portability, biases
overcome real data usage restrictions (privacy and regulations)

In Computer Vision (CV) we are interested in synthesizing realistic visual samples (most common media such as images and videos). The two major approaches to synthesize data for this domain are generative-models and Computer Graphics (CG) pipelines. Hybrid approaches exist that combine multiple methods in different measures based on the target setup.

Think for example about generating images of non-existing cats to train a catVSdog classifier or feeding images from games and simulated environments to bootstrap the training of self-driving systems, or…

Better Programming

Synthetic Data Generation for Computer Vision in Blender

(part 1)

Synthetic Data Generation (SDG)

Create an account to read the full story.

Published in Better Programming

Written by Alex Martinelli

Responses (1)