Member-only story
Synthetic Data Generation for Computer Vision in Blender
(part 1)

What: This entry gives an introduction to synthetic-data-generation, and how you can use it via Blender to train performant and robust vision models. We’ll provide an overview of the Blender setup and, for demonstrative purposes, present a concrete visual classification scenario from the fashion domain.
Why: to leverage Blender procedural capabilities and adopt a data-centric approach to get better machine-learning models with little or no need for human annotations.
Who: we will rely on Blender >3.1 and Python ≥ 3.7. Generated images can then be used for any downstream task, regardless of possible dependent frameworks (e.g. Tensorflow, PyTorch).
Synthetic Data Generation (SDG)
Synthetic Data Generation (SDG) encompasses a variety of methods that aim at programmatically generating data to support downstream tasks. In statistics and Machine Learning (ML), the resulting goal is to synthesize samples with the same distribution of a target domain, to be used for model training or testing purposes. It is part of the data-centric ML approach, where in order to achieve better performances we actively work on data, instead of models, algorithms, or architectures.
SDG is adopted for multiple reasons, the primary ones being:
- minimize the need for human labeling and curation
- facilitate and/or reach the data requirements for ever-higher capacity models
- tackle issues such as generality, robustness, portability, biases
- overcome real data usage restrictions (privacy and regulations)
In Computer Vision (CV) we are interested in synthesizing realistic visual samples (most common media such as images and videos). The two major approaches to synthesize data for this domain are generative-models and Computer Graphics (CG) pipelines. Hybrid approaches exist that combine multiple methods in different measures based on the target setup.
Think for example about generating images of non-existing cats to train a catVSdog classifier or feeding images from games and simulated environments to bootstrap the training of self-driving systems, or…