Member-only story

Bagging Tutorial — Classify Higgs Boson Particles With AI

A practical guide to ensemble learning with hands-on Python code

Michel Kana, Ph.D

Published in

Better Programming

5 min readApr 28, 2021

Purple and pink plasma — Photo by Hal Gatewood on Unsplash.

Bagging is a meta-algorithm from the ensemble learning paradigm where multiple models (often termed “weak learners”) are trained to solve the same problem and combined to get better results.

With bagging, we build the same model on multiple bootstraps from the data and combine each model’s prediction to get an overall classification.

In this article, I will walk you through a practical example in physics and explain how bagging works for classifying Higgs bosons (controversially called the “God particle”).

I use a small subset of the HIGGS dataset in the UCI machine learning repository. This 2014 paper contains further details about the data.

Each row represents an experiment of colliding beams of protons at high energy. The class column differentiates between collisions that produce Higgs bosons (value 1) and collisions that produce only background noise (value 0). We are interested in predicting the class using the bagging technique.

Classification With Decision Tree

Let's start with a full classification tree that would split the training data until each leaf contains a single observation. This tree would achieve a perfect classification of the training observations and the bias would be 0 (misclassification error on training).

In other words, the full tree would overfit training data. Such a tree would be very sensitive because little changes to the training observations would cause the predicted classes to change significantly. This means the model variance would be very high.

Improving Performance With Cross-Validation

Better Programming

Bagging Tutorial — Classify Higgs Boson Particles With AI

A practical guide to ensemble learning with hands-on Python code

Classification With Decision Tree

Improving Performance With Cross-Validation

Create an account to read the full story.

Published in Better Programming

Written by Michel Kana, Ph.D

Responses (1)