Better Programming

Advice for programmers.

Follow publication

Python’s Missing Batteries: Essential Libraries You’re Missing Out On

Even though Python’s standard library comes with “batteries included,” it’s still missing some essentials…

Martin Heinz
Better Programming
Published in
9 min readMay 1, 2023

--

Photo by Murat Onder on Unsplash

Python is known to come with “batteries included,” thanks to its very extensive standard library, which includes many modules and functions that you would not expect to be there. However, there are many more “essential” Python libraries out there that you should know about and use in all your Python projects, and here’s the list.

General-Purpose Utilities

We will begin with a couple of general-purpose libraries you can use in many projects. The first one is described in the docs as:

“Boltons is a set of pure-Python utilities in the same spirit as — and yet conspicuously missing from — the standard library.”

We would need a whole article to go over every function and feature of boltons, but here are a couple of examples of handy functions:

# pip install boltons
from boltons import jsonutils, timeutils, iterutils
from datetime import date

# {"name": "John", "id": 1, "active": true}
# {"name": "Ben", "id": 2, "active": false}
# {"name": "Mary", "id": 3, "active": true}
with open('input.jsonl') as f:
for line in jsonutils.JSONLIterator(f): # Automatically converted to dict
print(f"User: {line['name']} with ID {line['id']} is {'active' if line['active'] else 'inactive'}")
# User: John with ID 1 is active
# ...

start_date = date(year=2023, month=4, day=9)
end_date = date(year=2023, month=4, day=30)

for day in timeutils.daterange(start_date, end_date, step=(0, 0, 2)):
print(repr(day))
# datetime.date(2023, 4, 9)
# datetime.date(2023, 4, 11)
# datetime.date(2023, 4, 13)

data = {"deeply": {"nested": {"python": {"dict": "value"}}}}
iterutils.get_path(data, ("deeply", "nested", "python"))
# {'dict': 'value'}

data = {"id": "234411",
"node1": {"id": 1234, "value": "some data"},
"node2": {"id": "2352345",
"node3": {"id": "123422", "value": "more data"}
}
}

iterutils.remap(data, lambda p, k, v: (k, int(v)) if k == 'id' else (k, v))

While Python’s standard library has json module, it does not support JSON Lines (.jsonl) format. The first example shows how you can process jsonl using boltons.

The second example showcases boltons.timeutils module which allows you to create date ranges. You can iterate over them and set step argument to, for example, get every other day. Again, this is something that’s missing from Python’s datetime module.

Finally, in the third example, we use remap function from boltons.iterutils module to recursively convert all id fields in dictionary to integers. The boltons.iterutils here serves as a nice extension to builtin itertools.

Speaking of iterutils and itertools, the next great library you need to check out is more-itertools, which provides well, more itertools. Again, discussion about more-itertools would warrant a whole article and... I wrote one. You can check it out at:

The last one for this category is sh, which is a subprocess module replacement. It’s great if you find yourself orchestrating lots of other processes in Python:

# https://pypi.org/project/sh/
# pip install sh
import sh

# Run any command in $PATH...
print(sh.ls('-la'))
# total 36
# drwxrwxr-x 2 martin martin 4096 apr 8 14:18 .
# drwxrwxr-x 41 martin martin 20480 apr 7 15:23 ..
# -rw-rw-r-- 1 martin martin 30 apr 8 14:18 examples.py

with sh.contrib.sudo:
# Do stuff using 'sudo'...
...

# Write to a file:
sh.ifconfig(_out='/tmp/interfaces')

# Piping:
print(sh.wc('-l', _in=sh.ls('.', '-1')))
# Same as 'ls -1 | wc -l'

When we invoke sh.some_command, the sh library tries to look for a built-in shell command or a binary in your $PATH with that name. If it finds such a command, it will simply execute it for you.

In case you need to use sudo, you can use the sudo context manager from contrib module, as shown in the second part of the snippet.

To write the output of a command to a file, you only need to provide the _out argument to the function. And finally, you can also use pipes (|) by using _in argument.

Data Validation

Another “missing battery” in Python standard library is the category of data validation tools. One small library that provides this is called validators. This library lets you validate common patterns such as emails, IPs, or credit cards:

# https://python-validators.github.io/validators/
# pip install validators
import validators

validators.email('someone@example.com') # True
validators.card.visa('...')
validators.ip_address.ipv4('1.2.3.456') # ValidationFailure(func=ipv4, args={'value': '1.2.3.456'})

Next up is fuzzy string comparison — Python includes difflib for this, but this module could use some improvements. Some of which can be found in thefuzz library (previously known as fuzzywuzzy):

# pip install thefuzz
from thefuzz import fuzz
from thefuzz import process

print(fuzz.ratio("Some text for testing", "text for some testing")) # 76
print(fuzz.token_sort_ratio("Some text for testing", "text for some testing")) # 100
print(fuzz.token_sort_ratio("Some text for testing", "some testing text for some text testing")) # 70
print(fuzz.token_set_ratio("Some text for testing", "some testing text for some text testing")) # 100

songs = [
'01 Radiohead - OK Computer - Airbag.mp3',
'02 Radiohead - OK Computer - Paranoid Android.mp3',
'04 Radiohead - OK Computer - Exit Music (For a Film).mp3',
'06 Radiohead - OK Computer - Karma Police.mp3',
'10 Radiohead - OK Computer - No Surprises.mp3',
'11 Radiohead - OK Computer - Lucky.mp3',
'02 Radiohead - Pablo Honey - Creep.mp3',
'04 Radiohead - Pablo Honey - Stop Whispering.mp3',
'06 Radiohead - Pablo Honey - Anyone Can Play Guitar.mp3',
"10 Radiohead - Pablo Honey - I Can't.mp3",
'13 Radiohead - Pablo Honey - Creep (Radio Edit).mp3',
# ...
]
print(process.extract("Radiohead - No Surprises", songs, limit=1, scorer=fuzz.token_sort_ratio))
# [('10 Radiohead - OK Computer - No Surprises.mp3', 70)]

The appeal of thefuzz library is the *ratio functions that will likely do a better job than the builtin difflib.get_close_matches or difflib.SequenceMatcher.ratio. The snippet above shows their different uses. First, we use the basic ratio, which computes a simple similarity score of two strings. After that, we use token_sort_ratio which ignores the order of tokens (words) in the string when calculating the similarity. Finally, we test the token_set_ratio function, which instead ignores duplicate tokens.

We also use the extract function from process module, which is an alternative to difflib.get_close_matches. This function looks for the best match(es) in a list of strings.

If you’re already using difflib and are wondering if you should use thefuzz instead, then make sure to check out an article by the library's author that nicely demonstrates why builtin difflib is not always sufficient and why the above functions might work better.

Debugging

Quite a few debugging and troubleshooting libraries also bring superior experience compared to standard libraries. One such library is stackprinter, which brings more helpful versions of Python's built-in exception messages:

# pip install stackprinter

import stackprinter
stackprinter.set_excepthook(style='darkbg2')

def do_stuff():
some_var = "data"
raise ValueError("Some error message")

do_stuff()

All you need to do to use it is import it and set the exception hook. Then, running the code that throws an exception will result in the following:

This is a big improvement because it shows local variables and context — things you would need an interactive debugger for. Check out the docs for additional options, such as integration with logging or different color themes.

stackprinter helps with debugging issues that result in exceptions, but that's only a small fraction of issues we all debug. Most of the time, troubleshooting bugs involves just putting print or log statements all over the code to see the current state of variables or whether the code was run. And there's a library that can improve upon the basic print-style debugging:

# pip install icecream
from icecream import ic

def do_stuff():
some_var = "data"
some_list = [1, 2, 3, 4]
ic()
return some_var

ic(do_stuff())
# ic| examples.py:46 in do_stuff() at 11:27:44.604
# ic| do_stuff(): 'data'

It’s called icecream and it provides ic function that serves as a print replacement. You can use plain ic() (without arguments) to test which parts of the code were executed. Alternatively, you can use ic(some_func(...)) which will print the function/expression along with the return value.

For additional options and configuration, check out GitHub README.

Testing

While on the topic of debugging, we should also mention testing. I’m not going to tell you to use other test frameworks rather than the built-in unittest (even though pytest is just better), instead I want to show you three little helpful tools:

The first one is the freezegun library, which allows you to mock datetime:

# pip install pytest freezegun
from freezegun import freeze_time
import datetime

# Run 'pytest' in shell
@freeze_time("2022-04-09")
def test_datetime():
assert datetime.datetime.now() == datetime.datetime(2022, 4, 9) # Passes!


def test_with():
with freeze_time("Apr 9th, 2022"):
assert datetime.datetime.now() == datetime.datetime(2022, 4, 9) # Passes!


@freeze_time("Apr 9th, 2022", tick=True)
def test_time_ticking():
assert datetime.datetime.now() > datetime.datetime(2022, 4, 9) # Passes!

All you need to do is add a decorator to the test function that sets the date (or datetime). Alternatively, you can also use it as a context manager (with statement).

Above, you can also see that it allows you to specify the date in a friendly format. And finally, you can also pass in tick=True, which will restart the time from the given value.

Optionally, if you’re using pytest, you can also install pytest-freezegun for Pytest-style fixtures.

The second essential testing library/helper you need is dirty-equals. It provides helper equality functions for comparing things that are kind of equal:

# pip install dirty-equals
from dirty_equals import IsApprox, IsNow, IsJson, IsPositiveInt, IsPartialDict, IsList, AnyThing
from datetime import datetime

assert 1.0 == IsApprox(1)
assert 123 == IsApprox(120, delta=4) # close enough...

now = datetime.now()
assert now == IsNow # just about...

assert '{"a": 1, "b": 2}' == IsJson
assert '{"a": 1}' == IsJson(a=IsPositiveInt)

assert {'a': 1, 'b': 2, 'c': 3} == IsPartialDict(a=1, b=2) # Validate only subset of keys/values

assert [1, 2, 3] == IsList(1, AnyThing, 3)

Above is a sample of helpers that test whether two integers or datetimes are approximately the same; whether something is a valid JSON, including testing individual keys in that JSON, or whether the value is a dictionary or a list with specific keys/values.

And finally, the third helpful library is called pyperclip. It provides functions for copying and pasting to/from the clipboard. I find this very useful for debugging, e.g., to copy values of variables or error messages to the clipboard, but this can have a lot of other use cases:

# pip install pyperclip
# sudo apt-get install xclip
import pyperclip
try:
print("Do something that throws error...")
raise SyntaxError("Something went wrong...")
except Exception as e:
pyperclip.copy(str(e))

# CTRL+V -> Something went wrong...

In this snippet, we use pyperclip.copy to automatically copy the exception message into the clipboard so that we don't have to copy it manually from the program output.

CLI

The last category that deserves a mention is CLI tooling. If you build CLI applications in Python, then you can put tqdm to good use. This little library provides a progress bar for your programs:

# pip install tqdm
from tqdm import tqdm, trange
from random import randint
from time import sleep

for i in tqdm(range(100)):
sleep(0.05) # 50ms per iteration

# 0% | | 0/100 [00:00<?, ?it/s]
# 100%|██████████| 100/100 [00:05<00:00, 19.95it/s]

with trange(100) as t:
for i in t:
t.set_description('Step %i' % i)
t.set_postfix(throughput=f"{randint(100, 999)/100.00}Mb/s", task=i)
sleep(0.05)

# Step 60: 60%|██████ | 60/100 [00:03<00:02, 19.78it/s, task=60, throughput=4.06Mb/s]

To use it, we wrap a loop with tqdm, and we get a progress bar in the program output. For more advanced cases, you can use trange context manager and set additional options such as description or any custom progress bar fields, such as throughput or time elapsed.

The module can also be executed as a shell command (python -m tqdm), which could be useful, e.g., when creating a backup with tar or looking for files with find.

See the docs for further advanced examples and things like integrations with Pandas or Jupyter Notebook.

Closing Thoughts

With Python, you should always search for existing libraries before implementing anything from scratch. Unless you’re creating a particularly unusual or bespoke solution, chances are someone has already built and shared it on PyPI.

In this article, I listed only general-purpose libraries that anyone can benefit from, but there are many other specialized ones, e.g., for ML or web development. I would recommend you check out https://github.com/vinta/awesome-python, which has an extensive list of interesting libraries, or you can also simply search PyPI by category, and I’m sure you will find something useful there.

Want to Connect?

This article was originally posted at martinheinz.dev.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Martin Heinz
Martin Heinz

Written by Martin Heinz

CKA | RHCE | DevOps Engineer | Working with Python, Kubernetes, Linux and more | https://martinheinz.dev/ | https://ko-fi.com/martinheinz

Responses (9)

Write a response