Should You Use Python pathlib or os?

A duel between two Python path libraries

Adam Green
Better Programming

--

Gokyo, Nepal (photo by author)

This article compares two Python libraries for working with file paths: os and the newer pathlib.

It is a duel at dawn — a four-round showdown comparing two approaches to common programming tasks.

Who will be our champion?

os or pathlib ?

If you’ve been programming in Python a while, it’s likely you are using the functions in os, such as os.path.join.

pathlib was introduced in Python 3.4 and offers a different set of abstractions for working with paths. However, just because it is newer, that doesn't mean it's better.

So how to decide which is best? How about a good old fashioned duel? Pistols at dawn!

Round One — Single Paths

Our first round is composed of the tasks commonly done with a single file path:

  • forming file paths from strings
  • getting the home and current working directory
  • working with file names and suffixes

Creating a single file path

Operating systems take different approaches with file paths — for example, LINUX uses / where Windows uses \ as a separator in paths.

Because of this complexity, portability is a key concern — both os.path and pathlib offer portable ways to construct paths.

os offers os.path.join to create a file path:

import os
path = os.path.join(os.path.expanduser('~'), 'data', 'file.txt')
# /Users/adam/data/file.txt

In pathlib the path is formed using the division operator / with an initialized Path object:

from pathlib import Path
path = Path.home() / ‘data’ / ‘file.txt’
# /Users/adam/data/file.txt

The Path object is the focus of pathlib. Almost all of the functionality we need can be accessed as either attributes or methods on this object.

Getting the home directory

The home directory is in different places on different operating systems. Both our contenders offer a way to get the user’s home directory that will work on both UNIX and Windows systems:

  • Ubuntu /home/$USER
  • MacOS /Users/$USER
  • Windows C:\Users\$USER

With os:

import os
os.path.expanduser(‘~’)
# /Users/adam

And with pathlib:

from pathlib import Path
Path.home()
# /Users/adam

Getting the current working directory

With os:

import os
os.getcwd()

And with pathlib:

from pathlib import Path
Path.cwd()

Working with file names and suffixes

The name of a file includes the suffix.

Getting this with os requires using basename:

import os
os.path.basename('/path/file.suffix')
# file.suffix

With pathlib we can use the name attribute on a Path object:

from pathlib import Path
Path('/path/file.suffix').name
# file.suffix

The stem doesn’t include the suffix. Getting this with os requires using both basename and splitext:

from os.path import basename, splitext
splitext(basename('/path/file.suffix'))[0]
# file

With pathlib we can use the stem attribute on a Path object:

from pathlib import Path
Path('/path/file.suffix').stem
# file

The suffix is the final part of a file path, usually indicating the file type. To get the suffix with os.path:

import os
os.path.splitext('/path/file.suffix')[-1]
# .suffix

pathlib has suffix as an attribute of the Path object:

from pathlib import Path
Path('/path/file.suffix').suffix
# .suffix

Summary — single paths

Our first round is done — the winner is pathlib! Some of the things our judges liked:

  • moving data and functionality onto a single Path object
  • remembering some of the os methods (such as os.path.expanduser) is difficult

Round Two — Making Things

Our second round consists of tasks commonly done when making things, including:

  • making directories
  • saving data to text files
  • appending data to text files

Making directories

First with os:

import os
path = os.path.join(os.path.expanduser('~'), 'python-file-paths')
os.mkdir(path)

And with pathlib:

from pathlib import Path
path = Path.home() / 'python-file-paths'
path.mkdir()

Sometimes we want to make a new folder that is deeper than the last folder that exists. Trying this will raise an error (as foo doesn't exist yet):

from pathlib import Path
path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir()
# FileNotFoundError

We can avoid this by using parents=True:

from pathlib import Path
path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir(parents=True)

Another cause of error is trying to make a directory that already exists:

from pathlib import Path
path = Path.home() / 'python-file-paths'
path.mkdir()
# FileExistsError

We commonly use both parents=True and exist_ok=True whenever we make a folder:

from pathlib import Path
path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir(parents=True, exist_ok=True)

The examples above are all about creating a directory from a path. Sometimes we have a path with a filename as well.

Sometimes we actually have a full file path (including both folders and a file name). If we use mkdir on a full file path, we will end up making a directory with the same name as our soon-to-be file!

We can use Path.parent to access the enclosing folder of our file and call .mkdir on that folder:

from pathlib import Path
path = Path.home() / 'python-file-paths' / 'foo' / 'bar' / 'baz.file'
path.parent.mkdir(parents=True, exist_ok=True)

Writing data to files

Imagine we have a dataset of 32 samples, and we want to save each sample in a file in $HOME/python-file-paths/.

Using os, where we the lack of the exist_ok argument in os.mkdir, means we need to check if the base folder exists before making it:

We can use cat to print out our first sample:

$ cat ~/python-file-paths/sample_0.data
[37.45401188 95.07143064 73.19939418 59.86584842]

And then using pathlib, where we can use exist_ok=True along with a write_text method on our Path object:

Again using cat to print out our first sample, which due to our random seed, is the same:

$ cat ~/python-file-paths/sample_0.data
[37.45401188 95.07143064 73.19939418 59.86584842]

Appending data to a file

The above task was writing to many files: one file per sample. Other times we want to append to a file, the advantage being all our data is stored in one file.

These examples append text to a single file, all_samples.data. First with os:

And with pathlib — note here we are forced to use context management to be able to pass an append flag of a:

Now our data is stored in a single file (one line per row):

$ head -n 2 ~/python-file-paths/samples.data
[37.45401188 95.07143064 73.19939418 59.86584842]
[15.60186404 15.59945203 5.80836122 86.61761458]

Summary — making things

Ding ding ding! The winner is again pathlib! Some of the things our judges liked in this round:

  • The argument of exist_ok in Path.mkdir() avoids an error in making a directory that already exists.
  • The Path.parents attribute allows easy access of the folder a file is in.
  • Writing data to text files with Path.write_text

Round Three — Reading and Finding

Our third round is about working with things that already exist, such as:

  • reading text files
  • finding files
  • finding directories

Reading from text files

Let’s open one of the text files we created earlier.

First we’ll open it with os, which requires context management to properly close the file after opening:

from os.path import join, expanduser
path = join(expanduser('~'), 'python-file-paths', 'samples.data')
with open(path, 'r') as fi:
data = fi.read()

And then we’ll use pathlib, where can open, read, and close the file using the read_text() method on our Path object:

from pathlib import Path
path = Path.home() / 'python-file-paths', 'samples.data')
data = path.read_text()

Finding many files recursively

Sometimes we want to find the paths for many files. We want to find paths deep in the file system, i.e., recursively.

With os we can use os.walk to do this:

With pathlib, glob is best:

from pathlib import Path
path = Path().home()
paths = [p for p in path.glob('**/*.py') if p.is_file()]

glob will not return path orders deterministically. If you are relying on the order, be sure to call sorted on paths.

Finding all directories

Often we want a list of directories at a certain path. Here we use the user’s home directory. We don’t want this to be recursive.

For os.path we use os.path.listdir() to iterate over a path, with os.path.isdir() to check that the path is a directory:

from os import listdir
from os.path import expanduser, join, isdir
path = expanduser('~')
dirs = [join(path, p) for p in listdir(path) if isdir(join(path, p))]

For pathlib we use path.iterdir and path.is_dir. Both methods are called on the Path object:

from pathlib import Path
path = Path().home()
dirs = [p.name for p in path.iterdir() if p.is_dir()]

Finding all directories recursively

Sometimes we want to look beyond a single path and recursively search for folders.

We can do this using os.walk:

With pathlib this is best done using path.glob:

from pathlib import Path
path = Path().home()
paths = [p for p in path.glob('**/*') if p.is_dir()]

Summary — reading and finding

Our third round is done — the winner is pathlib! Some of the things our judges liked:

  • Reading from text files with Path.read_text
  • Being able to check if a path is a directory using Path.is_dir() or a folder with Path.is_file()

Final Round — Removing Things

Our final round is about removing things:

  • removing directories
  • removing files

Removing directories

The best way to do this is with shutil.rmtree, which will remove the directory even if it is not empty.

There is no real difference between os and pathlib except for when creating the file path. The example below uses pathlib:

from shutil import rmtree
from pathlib import Path
path = Path.home() / 'python-file-paths'
rmtree(path)

This is usually the behaviour you want when removing directories: remove even if not empty.

Removing files

Sometimes we want to remove specific files when we know the path.

We can do this with os:

import os
from os.path import expanduser, isdir, join
path = join(expanduser('~'), 'python-file-paths', 'data.txt')
if os.path.exists(path):
os.remove(path)

And with pathlib:

from pathlib import Path
path = Path.home() / 'python-file-paths' / 'data.txt'
path.unlink(missing_ok=True)

Because there is a lot of complexity in this task, a summary is given below:

table comparing the code to use in os and pathlib

Summary — removing things

Our final round is done — and it’s a draw! This was the final task, and due to the complexity of both approaches needing to use the additional library shutil.

It’s a fair result by our judges — perhaps with a hint of sympathy for os, a valiant competitor but outclassed by its successor.

Summary

The duel is over and final results are in:

  • Round One — Single Paths — pathlib
  • Round Two — Making Things — pathlib
  • Round Three — Reading and Finding — pathlib
  • Round Four — Removing Things — draw

pathlib comes out on top due to:

  • Moving most functionality onto a Path class
  • The exist_ok argument on Path.mkdir()
  • Avoiding the need to use context management when reading and writing to text files

Thanks for reading!

Originally published at https://www.datasciencesouth.com.

--

--