Load Data Faster in Python With Compressed Pickles
Store any Python object faster and in a smaller file size

Do you hate how long it takes to load data? Is your hard drive running low on available space? Here are four easy-to-implement functions that will help any Python programmer, from beginner to advanced, manage their projects.
Compressed Pickles
If you have been working in Python for a while, you may be familiar with the _pickle
library.
It saves almost any Python object (including massive datasets) as bytes. It cuts loading time to a fraction. Depending on the object, this might save you some space as well. However, it often won’t be enough.
Enter the bz2
library for python, which enables bz2 compression for any file. By sacrificing some of the speed gained by pickling your data, you can compress it to a quarter of its original size.
The Four Functions
Below are four Python methods that make short work of working with data, functions that I include in the utils.py
file of any project I work on.
Imports
import bz2
import pickle
import _pickle as cPickle
1. Full pickle
The full_pickle
method takes almost any object (list
, dictionary
, pandas.DataFrame
, and more) and saves it as a .pickle
file.
# Saves the "data" with the "title" and adds the .pickle
def full_pickle(title, data):
pikd = open(title + ‘.pickle’, ‘wb’)
pickle.dump(data, pikd)
pikd.close()
Example usage:
full_pickle('filename', data)
filename
is the name of the file with no extension.data
is any object.
2. Loosen
Load the pickle files you or others have saved using the loosen
method. Include the .pickle
extension in the file
arg.
# loads and returns a pickled objects
def loosen(file):
pikd = open(file, ‘rb’)
data = pickle.load(pikd)
pikd.close()
return data
Example usage:
data = loosen('example_pickle.pickle')
file
is the file name with the.pickle
extension.
3. Compressed pickle
The compressed_pickle
works just like full_pickle
. It even takes the same arguments. It creates a pickle
object and then compresses it using the bz2
library, adding the .pbz2
extension to the saved file automatically.
# Pickle a file and then compress it into a file with extension
def compressed_pickle(title, data):
with bz2.BZ2File(title + ‘.pbz2’, ‘w’) as f:
cPickle.dump(data, f)
Example usage:
compressed_pickle('filename', data)
filename
is the name of the file with no extension.data
is any object.
Notice that this compresses a pickle file, it doesn’t work as well the other way around.
4. Decompress pickle
The decompress_pickle
method works just like the loosen
function. Include the .pbz2
extension in the file
arg.
# Load any compressed pickle file
def decompress_pickle(file):
data = bz2.BZ2File(file, ‘rb’)
data = cPickle.load(data)
return data
Example usage:
data = decompress_pickle('example_cp.pbz2')
file
is the file name with the.pbz2
extension.
Benchmarks
So, how much faster is pickling and how much space are we saving?
Here’s a benchmark test I performed on an AWS virtual machine for less than a penny ($0.01) using a module I created for cloud computing.
Save CSV File: 3.384 seconds
Load CSV File: 1.977 seconds
CSV File Size: 39,575,154 bytesSave Pickle File: 3.422 seconds
Load Pickle File: 0.156 seconds
Pickle File Size: 40,759,166 bytesSave Compressed Pickle: 4.837
Load Compressed Pickle: 1.139
Compressed Pickle File Size: 1,467,842
Saving the 39 MB pandas.DataFrame()
object as a .csv
file took 3.4 seconds. Almost as long as it took to save the .pickle
file and more than one second faster than it took to compress.
The .pickle
file and the .csv
files took up about the same space, around 40 MB, but the compressed pickle file took up only 1.5 MB. That’s a lot of saved space.
Another big difference is in the load times. If you’re looking for faster loading, either function will work, it just depends on your space needs.
Loading the .csv
file took 2 seconds, loading the compressed pickle .pbz2
file took only 1.2 seconds, whereas loading the pickle files took a mere 0.15 seconds.
Things to Try or Look Out For
- The order of pickling then compressing is tested and works without degrading data. Changing the order leads to worse performance.
- You might want to try other compression methods that suit your needs better than bz2 compression.
- Pickling or compressing certain class objects might not work, in these cases, try saving the class attributes (usually accessible as a dictionary) and then loading another class instance and assigning it the attributes.
That’s it. In the future, I will be writing more articles about simple yet remarkably useful functions and classes that I often use in my projects. Some of them build on what we’ve seen here, others do not.
Thanks for reading.