Better Programming

Advice for programmers.

Follow publication

From Functions to Python Package

Learn from the scratch about how to convert your functions into packages and distribute

Arafath Hossain
Better Programming
Published in
11 min readApr 8, 2022

Photo by Leone Venter on Unsplash

Coding in Python is fun but what makes it even more fun is the availability of Packages suited for different purposes. For example, availability of scientific calculation and Machine Learning packages is what has made Python the most popular language in Data Science and Analytics. In this post, we will get an introduction to the world of Python packages and see how we can build our own packages.

In my previous post about Python modules, Use Modules to Better Organize your Python Code, I discussed how we can better organize our codes using Python Modules. In this post we will take the next step and learn how we can better organize our Python modules using Package.

What and Why?

Python packages are collections of modules. If Python modules are considered as the homes of functions and variables then packages are the homes of modules.

❓ But if we can organize our codes using modules then why bother to use packages?

As we already know the codebase tends to grow. With the growing code base, you will very likely group your functions into multiple modules based on the types of tasks they perform. As the number of modules grow, a natural progression is to find a way to organize the modules based on a common group they fall into. And a set of modules organized in a directory form can easily be turned into Python Package in order to make the module usage and maintenance streamlined.

Packages are a way of structuring Python’s module namespace by using “dotted module names”. — Source: Python Tutorial

Once a package is installed, we can easily access to the modules stored in different directory levels using a dot notation.

An Example Scenario

Before getting into our discussion about packages, let’s think about a scenario so that it’ll be easier to put our learning into context. Let’s assume that we are working on a project where we get user names separated by a specific character stored in a long string. We need to break them down and assign them with unique numeric IDs so that they can be stored in a database in the future. So our tasks are to:

  1. Break the input string into a set list of names,
  2. Then assign unique IDs to these names.

To perform these tasks we will build a couple of functions, store them in a couple of modules, and eventually will wrap them inside a Python package. It’s definitely an overkill to build a package for such a trivial project but for our learning purpose let’s assume it’s a good idea for now.

Two Modules

Let’s create a couple of modules: stringProcess, idfier.

  • stringProcess: A module that will help us process strings. Currently, it contains just one function called stringSplit() that takes a string and then splits it based on the selected separator.
  • idfier: A module that will help us create unique IDs. Currently, it contains only one function called randomIdfier() that takes a list of name, assign them with randomly generated unique IDs, and save them as a dictionary.

Create two separate .py files with the codes below and name the files by the module names. Also, make sure to save them in the same directory where you are running your script or notebook.

Useful Module Properties for Package Building

We have discussed general module properties in our last post. Here we will discuss some additional properties that will come in handy in our discussion about package development.

Module Initialization

Once a module is imported, Python implicitly executes the module and have it initialize some of its aspects. One important aspect to notice is that:

A module initialization takes place only once in a project.

If a module is imported multiple times, which is not necessary but even if it happens i.e. inside a module used, for the subsequent imports, Python remembers the previous import and silently ignores the following initializations. You can see this feature in action in the following example. Where we see that when we import idfier and stringProcess modules twice but they only print out the messages once during the first initialization and don't produce any message during the subsequent imports.

import idfierimport idfierimport stringProcessimport stringProcessidfier is used as a module
stringProcess is used as a module

Private Properties in Modules

You may want to have variables inside your Modules that are only intended for internal use. By nature these are considered private properties. You can declare such properties i.e. variables or functions, by one or two underscores (__).

But adding underscores is merely a convention and doesn't impose any protection per se.

⚠️ Unlike other languages like Java, Python doesn’t impose any strict restrictions on accessing such private properties. Adding underscores gives other developers a note that the properties are only intended for internal use.

__name__ Variable

Modules are essentially Python scripts. When Python scripts are imported as Modules, Python creates a variable called __name__ and stores the name of the module in it e.g. when we import our idfier module, the __name__ variable contains the value - idfier in it. On the contrary when a script is directly executed, the __name__ variable contains __main__ as the value.

For demonstration, let’s add the following simple if-else condition inside the stringProcess.py file and check out how we can use the __name__ property.

name = "stringProcess"if __name__ == "__main__":print(name, "is used as a script")else:print(name, "is used as a module")

In the code cell below we called stringProcess module both as a script and a module. See how two methods produce two different messages.

🛑 Remember, to restart your Jupyter notebook or re-run your script before running the following code block otherwise you wouldn’t see the outcomes properly. Remember the rule of Python initializing a module only once?

%run -i stringProcess.pyimport stringProcessstringProcess is used as a script
stringProcess is used as a module

💡 So how would this __name__ variable be of any help?

__name__ variable can be a very useful feature to run some primary tests on codes inside the module script. Using this variable’s stored value, you can ask Python to run some tests during the development mode when you are actively working with the script and ignore them while they are used as modules in a project. We will see a demo of this functionality a bit later.

Improving idfier

Now based on our newly known properties of the module, let’s improve the module — idfier.

  • Improving randomIdfier(): Currently this function generates a random integer between 1000 to 9999 and assigns it as a value. But since this number is generated randomly there's no guarantee that the numbers will be unique. But for our purpose, we need it to create unique numbers to be used as IDs. To ensure uniqueness, let’s add a while loop to check for duplicates and re-generate random numbers unless it finds a unique one.
  • Using __name__ for Automated Test: We will add a test code block and wrap it inside a if-else condition so that it will run the test only when the value of __name__ == "__main__" , or in other words the modules scripts are directly run. This simple test will check if the length of unique ID values equals to 2 when we run randomIdfier() with an input of string containing two names.
  • Adding Short Descriptions: We have used """ """ (docstring) to add short descriptions for the code blocks.
import idfier as idfidf.randomIdfier(['name1', 'name2'])idfier is used as a module{'name1': 6571, 'name2': 7469}
  • Execute idfier.py as a script. Can you guess the output?
  • Try changing the code inside so that the test fails. They execute it again.

Python’s Search for Module

So far we have left our modules in the same directory with our project script. But in real projects, we would like to keep our modules and packages in a separate location. So to mimic that, let’s copy our two modules to a different folder, and let’s call this folder Silly_Anonymizer.

So how do we add this location to our Python project?

Python maintains a list of locations or folders in path variable from sys module where Python searches for modules.

It searches the locations inside sys.path in the order they are stored in the list starting with the location where the script's execution happens.

import syssys.path['C:\\Users\\ahfah\\Desktop\\Curious-Joe\\content\\post\\2202-04-02-oop-python-package',
'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\python39.zip',
'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\DLLs',
'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\lib',
'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39',
'c:\\Python_Envs\\pcap02',
'',
'c:\\Python_Envs\\pcap02\\lib\\site-packages',
'c:\\Python_Envs\\pcap02\\lib\\site-packages\\win32',
'c:\\Python_Envs\\pcap02\\lib\\site-packages\\win32\\lib',
'c:\\Python_Envs\\pcap02\\lib\\site-packages\\Pythonwin']

We can add our custom module location to this list and make sure that Python knows where to find the module. The Silly_Anonymizer module directory is located here on my workstation: C:\Users\ahfah\Desktop\Anonymizer\Silly_Anonymizer. Let's add that and check that out.

sys.path.append('C:\\Users\\ahfah\\Desktop\\Anonymizer\\Silly_Anonymizer\\')sys.path[len(sys.path)-1]'C:\\Users\\ahfah\\Desktop\\Anonymizer\\Silly_Anonymizer\\'

🛑 Note the double backslashes. Backslashes are used to escape other characters so we need to use double backslashes to have Python understand that we are looking for a literal backslash.

Building a Package

Putting our modules inside Silly_Anonymizer is the first direct step towards making a Package. Let’s create two sub-directories inside Silly_AnonymizerNonStringOperation and StringOperation, so that in the future if we have more modules we can store them based on their types of tasks - manipulating strings, or manipulating non-string operations. For now, let's move our two modules inside these two sub-directories: idfier.py inside NonStringOperation, and stringProcess.py inside StringOperation. So the folder structure should look like this:

Image by author

Looking at the Silly_Anonymizer directory as our package shows us the directory structure of the Python package!

A concrete view of the function, module, and package relationship for our Silly_Anonymizer package is as follows:

Image by author

Initializing a Package

Like Modules, Python Packages also need an initializer. To do that we need to include a file called __init__.py in the root directory of Silly_Anonymizer. But since a Package is not a file we can't do that as part of a function and hence this separate file is used for initialization. It can be left empty but it needs to be present at the root directory of a module directory to be considered as a Package.

So after adding __init__.py the Anonymizer folder should look like this:

Image by author

🛑 Note, that you can have __init__.py in other sub-folders too depending on if you need any special initialization for them or just want to consider them as a sub-package. Which we will do later on in our package too.

Importing a Module from Package

Once we have __init__.py file in our module's home directory, we are ready to use it as a Package. To import a module from a Python package we need to use a fully qualified path from the root of the package. In our case, for the module - stringProcess the import would look as follows:

import Anonymizer.StringOperation.stringProcess as sp> stringProcess is used as a module
sp.stringSplit(string="Arafath, Samuel, Tiara, Nathan, Moez", separator=",")> ['Arafath', ' Samuel', ' Tiara', ' Nathan', ' Moez']

💡 Python can also read packages from the compressed locations.

Python packages can be imported from zip folder too. If you notice the output from sys.path you may already find some zip folders in the list. That's because Python treats the zip folders as regular folders.

🛑 Try zipping Simmy_Anonymizer into a zipped folder Silly_Anonymizer.zip and try to import it.

Publishing a Package

We have built our package and it’s ready to be used locally. Now let’s talk very briefly about how we can make it available for others to use. For this purpose, we will use PyPi. Python Packaging Index or PyPi is the most commonly used repository to host Python packages.

⚠️ A Word of caution

The steps for package publishing explained here is the bare minimum requirement to publish a package. Use it as the stepping stone and then explore the official documentation to understand the nitty-gritty of package publication. I will add a couple of resources as references.

Preparation

To make our package ready for upload, let’s add the following files to the directory where our module directory is located:

Add a Git Repository for Silly_Anonymizer

Create a remote repository in GitHub, or any other distributed version control solutions, and add the Silly_Anonymizer package to the repo. You can check mine here.

Add a readme.md file

A README file give the users a description of the project. In our case, it will describe the users, what the Silly_Anonymizer package is about, how to use it and so on. Check the GitHub repo for a sample readme file.

Add a setup.py file

This is the main file required to successfully prepare a package for PyPi upload. This file contains some basic instructions that ensure that this local directory is prepared properly for the upload to PyPI. Check out the sample file in the GitHub repo.

The information in the setup files facilitates the package development, hosting, and maintenance. The three absolute minimum required properties are name, version, and packages. For detail about these and all the parameters checkout the official documentation.

With these files included the file directory should look like this:

Image by author

Building the Distribution Package

PyPi distributes Python package source codes wrapped inside distribution packages. Two of the commonly used distribution packages are source archives and Python wheels. To create to source archive and wheel for our package we will use a package called twine and run python setup.py sdist bdist_wheel.

This should create two files in the newly created directory called dist - a source archive (.tar.gz file) and a wheel (.whl file). Checkout the source archive file to make sure all the source codes are populated inside it.

Quick Check

We are ready to push our package to PyPi. Before pushing it to PyPi, let’s run twine check dist/* to quickly check if the package will properly render in PyPi. If everything goes properly, you should see PASSED printed on the screen after running the check.

Upload

PyPi has a test version of it, which we can use for learning and testing purposes. We will use the PyPi test to host our package. Before doing that, make sure you register in PyPi test.

Once you are registered, run twine upload --repository-url https://test.pypi.org/legacy/ dist/*. Enter your username and password once prompted.

🛑 If you have followed the article so far, you will not likely be able to upload the package with the same name as I have already uploaded it. Give it a different name and re-build the package.

🛑 Use Test PyPi search to find out if your package name is available.

That’s it! You should see the location of your newly created Python package on your console. Checkout mine here.

What’s Next?

In this post, I tried to take someone from the functional understanding of writing Python functions to build his/her first Python package. Before ending by reiterating what was stated earlier, this article is by no means a detailed tutorial on how to build a package for production. In a production code, it’s imperative that you add rigorous testing. Also, it’s unlikely that you’d build a package without any dependencies! I didn’t cover either of these, so make sure you learn about them.

As promised, here are a couple of resources that you can use to supplement your understanding:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Arafath Hossain
Arafath Hossain

Written by Arafath Hossain

Curious Learner and Passionate Data Professional. Read my personal blog: https://www.laminarinsight.com/

Responses (2)

Write a response

Well written and "packaged"! This was the clearest explanation I've found on modules to packages. Following... =)

--

It seems to me that you have skipped over the most valuable use of packages. They can hold classes. which are much more versatile than functions, especially if you have to initialize them as you do here. I would also note that you could guarantee that your ID numbers are unique if you kept them in a set.

--