Better Programming

Advice for programmers.

Follow publication

Member-only story

3 Pandas Functions To Group and Aggregate Data

Eugenia Anello
Better Programming
Published in
7 min readMay 3, 2021

Flowers
Photo by John-Mark Smith on Unsplash.

When you work with data in Python, there is surely a library that will never leave your side: pandas. It’s a pretty powerful and intuitive open source library that provides data structures that are useful for dealing with high-dimensional datasets.

There are two principal data structures:

  • Series for one-dimensional arrays.
  • DataFrame for two-dimensional tables that contain rows and columns.

In this article, I will focus on the most useful functions that split the dataset into groups. Then you can compute statistics, such as average, standard deviation, maximum, minimum, and much more.

You’ll learn to utilize the apply, cut, groupby, and agg functions. They can be very useful to have new insights about the data through graphical representations.

Table of contents:1. Import data2. Simple aggregations3. Multiple aggregations

1. Import Data

Let’s import the libraries and the dataset. We’ll use the Boston house prices dataset that is available in the sklearn library.

Table with data

This DataFrame contains only numeric features, but we need categorical variables to split the dataset into groups. Thus, we’ll create these categorical variables using the descriptive statistics of the data set:

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Eugenia Anello
Eugenia Anello

Written by Eugenia Anello

Data Scientist | Top 1500 Writer on Medium | Love to share Data Science articles| https://www.linkedin.com/in/eugenia-anello

No responses yet

Write a response