Member-only story
Bagging Tutorial — Classify Higgs Boson Particles With AI
A practical guide to ensemble learning with hands-on Python code

Bagging is a meta-algorithm from the ensemble learning paradigm where multiple models (often termed “weak learners”) are trained to solve the same problem and combined to get better results.
With bagging, we build the same model on multiple bootstraps from the data and combine each model’s prediction to get an overall classification.
In this article, I will walk you through a practical example in physics and explain how bagging works for classifying Higgs bosons (controversially called the “God particle”).
I use a small subset of the HIGGS dataset in the UCI machine learning repository. This 2014 paper contains further details about the data.
Each row represents an experiment of colliding beams of protons at high energy. The class column differentiates between collisions that produce Higgs bosons (value 1) and collisions that produce only background noise (value 0). We are interested in predicting the class using the bagging technique.

Classification With Decision Tree
Let's start with a full classification tree that would split the training data until each leaf contains a single observation. This tree would achieve a perfect classification of the training observations and the bias would be 0
(misclassification error on training).
In other words, the full tree would overfit training data. Such a tree would be very sensitive because little changes to the training observations would cause the predicted classes to change significantly. This means the model variance would be very high.