Member-only story
Data Science Modeling: How to Use Linear Regression with Python
Taking a look at R², Mean Squared Error, and more
by Brian Henriquez, Chris Kazakis, and Dean Sublett

Introduction and Objectives
Linear regression is a widely used technique in data science because of the relative simplicity in implementing and interpreting a linear regression model.
This tutorial will walk through simple and multiple linear regression models of the 80 Cereals dataset using Python and will discuss some relevant regression metrics, but we do not assume prior experience with linear regression in Python. The 80 Cereals dataset can be found here.
Here are some objectives:
- Understand the meaning and limitations of R²
- Learn about evaluation metrics for linear regression and when to use them
- Implement a simple and multiple linear regression model with the 80 Cereals dataset
Exploring the Data
After downloading the dataset, import the necessary Python packages and the cereals dataset itself:
Here we see that each row is a brand of cereal, and each column is a nutritional (protein, fat, etc.) or identifying feature (manufacturer, type) of the cereal. Notice that rating is the response or dependent variable.
Next, we created a pairs plot of the correlations between each feature of the dataset, and from this visualization we selected three predictor variables: calories, fiber, and sugars. The plot displaying every correlation is too large to share here, but we can take a closer look with a smaller pairs plot that includes only our predictor variables. Using seaborn.pairplot
, we can see three scatter plots with fitted least squares lines: