Member-only story

Data Science Modeling: How to Use Linear Regression with Python

Taking a look at R², Mean Squared Error, and more

Dean Sublett

Published in

Better Programming

11 min readMay 24, 2019

by Brian Henriquez, Chris Kazakis, and Dean Sublett

Introduction and Objectives

Linear regression is a widely used technique in data science because of the relative simplicity in implementing and interpreting a linear regression model.

This tutorial will walk through simple and multiple linear regression models of the 80 Cereals dataset using Python and will discuss some relevant regression metrics, but we do not assume prior experience with linear regression in Python. The 80 Cereals dataset can be found here.

Here are some objectives:

Understand the meaning and limitations of R²
Learn about evaluation metrics for linear regression and when to use them
Implement a simple and multiple linear regression model with the 80 Cereals dataset

Exploring the Data

After downloading the dataset, import the necessary Python packages and the cereals dataset itself:

Here we see that each row is a brand of cereal, and each column is a nutritional (protein, fat, etc.) or identifying feature (manufacturer, type) of the cereal. Notice that rating is the response or dependent variable.

Next, we created a pairs plot of the correlations between each feature of the dataset, and from this visualization we selected three predictor variables: calories, fiber, and sugars. The plot displaying every correlation is too large to share here, but we can take a closer look with a smaller pairs plot that includes only our predictor variables. Using seaborn.pairplot, we can see three scatter plots with fitted least squares lines:

Better Programming

Data Science Modeling: How to Use Linear Regression with Python

Taking a look at R², Mean Squared Error, and more

Introduction and Objectives

Exploring the Data

Published in Better Programming

Written by Dean Sublett

Responses (1)