Member-only story
Pandas Illustrated: The Definitive Visual Guide to Pandas
Is it a copy or a view? Should I merge or join? And what the heck is MultiIndex?

Pandas is an industry standard for analyzing data in Python. With a few keystrokes, you can load, filter, restructure, and visualize gigabytes of heterogeneous information. Built on top of the NumPy library, it borrows many of its concepts and syntax conventions, so if you are comfortable with NumPy, you’ll find Pandas a pretty familiar tool. And even if you’ve never heard of NumPy, Pandas provides a great opportunity to crack down on data analysis problems with little or no programming background.
There are a lot of Pandas guides out there. In this particular one, you’re expected to have a basic understanding of NumPy. If you don’t, I’d suggest you skim through the NumPy Illustrated guide to get an idea of what a NumPy array is, in which ways it is superior to a Python list, and how it helps avoid loops in elementary operations.
Two key features that Pandas brings to NumPy arrays are:
1. Heterogeneous types — each column is allowed to have its own type;
2. Index — improves lookup speed for the specified column(s).
It turns out these features are enough to make Pandas a powerful competitor to both spreadsheets and databases.
Polars, the recent reincarnation of Pandas (written in Rust, thus faster¹) doesn’t use NumPy under the hood any longer, yet the syntax is pretty similar, so learning Pandas will let you feel at ease with Polars as well.
The article consists of four parts:
Part 1. Motivation
Part 2. Series and Index
Part 3. DataFrames
Part 4. MultiIndex
… and is quite lengthy, though easy to read as it is mostly images.
For a 1-minute read of the “first steps” in Pandas I can recommend an excellent Visual Intro to Pandas² by Jay Alammar.
Discussions
• Hacker News (263 points, 41 comments)
• Reddit r/Python (290 points, 29 comments)