2 Use Cases of Python Pre-commit Hooks to Tidy Up Your Git Repositories
Strategies to have a better-organized codebase
Well organized repo leads to a well-organized code and a better developer experience.
In this article, I’ll show you 2 use-cases of pre-commit to maintain a better repository.
Introducing pre-commit?
Pre-commit is a python based tool that allows you to ‘hook’ into your git repository and trigger a simple code whenever changes are committed.
It’s written and maintained by the awesome python developer and YouTuber Anthony Sottile.
With pre-commit, developers can take advantage of gits distributed architecture and make short and simple code checks before submitting to code review.
This saves a lot of back and forward inter-team communication that is redundant and counter-productive and allows the code reviewer to focus on actual code implementation rather than on surface level concerns.
By using git commit and code review before running ci\cd builds, we can save time to deliver and some of the costs involved.
Quick start
To start with git commit you only need 3 steps:
- Install pre-commit from pip:
pip install pre-commit
2. Create the pre-commit configuration file in your repository (about that, shortly)
3. Install the commit configuration into your git repo:
pre-commit install
And from that moment on, the hook will be executed according to your configuration file every time when you commit new code.
Use-case 1: Clean outputs
Let's take the example of Jupyter notebooks.
A Jupyter notebook contains both the source code and its output materials in a JSON format.
pip install pandas plotly jupyter
If you commit the notebook as is, you will commit it with its outputs and sometimes it could take too much space and contaminate the repository.
So let's create a hook to clean the repository on commit.
1. Let's initiate our git repo:
git init
2. Now let's add a pre-commit configuration file:
3. Install the hook:
pre-commit install>>>
pre-commit installed at .git\hooks\pre-commit
4. Run git status
git status>>>
On branch mainNo commits yetUntracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
.pre-commit-config.yaml
notebooks/
I’ve placed the notebook under the notebooks
directory
5. Track all the files
git add --all>>>
warning: LF will be replaced by CRLF in .gitignore.
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in notebooks/notebook.ipynb.
The file will have its original line endings in your working directory
6. Commit the changes
git commit -m "good commit">>>
jupyter-nb-clear-output..................................................Failed
- hook id: jupyter-nb-clear-output
- exit code: 1C:\workspace-vscode\pre-commit-demo\venv\Scripts\python.EXE: No module named nbconvert
Our pre-commit failed because it couldn’t find nbconvert
module.
We need to install it:
pip install nbconvert
And commit again:
git commit -m "good commit">>>
jupyter-nb-clear-output..................................................Failed
- hook id: jupyter-nb-clear-output
- files were modified by this hook[NbConvertApp] Converting notebook notebooks/notebook.ipynb to notebook
[NbConvertApp] Writing 878 bytes to notebooks\notebook.ipynb
Now pre-commit notifies us that the file has been modified.
It cleaned the output off of the notebook.
Now we need to add the changes:
git add --all
And commit:
git commit -m "good commit">>>
jupyter-nb-clear-output..................................................Passed
[main(root-commit) 8560ba3] good commit
4 files changed, 379 insertions(+)
create mode 100644 .gitignore
create mode 100644 .pre-commit-config.yaml
create mode 100644 notebooks/notebook.ipynb
create mode 100644 requirements.txt
Use-case 2: Static code analysis
This can be divided into 2 domains:
Linting
Let’s create a new dir named scripts
and add a file named mainapp.py
In this file, we will write some useless code :)
I’ve already written about the benefits of linting here, now it’s time to add linting into our workflow.
- Run git status:
git status>>>
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
scripts
2. Add a new hook to pre-commit config file:
We’ve added a new hook with pylint
id to our pre-commit config.
3. Install the new hook configuration:
pre-commit install>>>
pre-commit installed at .git\hooks\pre-commit
4. Track changes:
git add --all
5. Commit
git commit -m "commit #2">>>
jupyter-nb-clear-output..............................(no files to check)Skipped
pylint...................................................................Failed
- hook id: pylint
- exit code: 16PYLINTHOME is now 'C:\Users\user\AppData\Local\pylint\pylint\Cache' but obsolescent 'C:\Users\user\.pylint.d' is found; you can safely remove the latter
************* Module somescript
scripts\somescript.py:1:0: C0114: Missing module docstring (missing-module-docstring)-----------------------------------
Your code has been rated at 5.00/10
The commit failed due to missing module docstring error from pylint.
Now lets add a module docstring.
Add the changes
git add --all
And commit:
git commit -m "commit #2"
jupyter-nb-clear-output..............................(no files to check)Skipped
pylint...................................................................Passed
[master b2d3105] commit #2
2 files changed, 8 insertions(+), 2 deletions(-)
Commit passed :)
Notice — the hook for Jupyter notebooks skipped because there were no changes made to the notebook.
This makes pre-commit extremely efficient.
Formatting
Just like linting, we can trigger auto formatting packages like black
We added the function function
to our original script, but foo is ugly formatted.
- add the hook configuration:
2. Install the new hook
pre-commit install>>>
pre-commit installed at .git\hooks\pre-commit
3. Track the changes:
git add --all
4. Commit
git commit -m "add function">>>
jupyter-nb-clear-output..............................(no files to check)Skipped
pylint...................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1Executable `black` not found
Our commit failed because black was not found.
Lets install it:
pip install black
And commit again:
git commit -m "add function">>>
jupyter-nb-clear-output..............................(no files to check)Skipped
pylint...................................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hookreformatted scripts\somescript.pyAll done! \u2728 \U0001f370 \u2728
1 file reformatted.
File has been formatted!
Much better!
Now lets add the changes:
git add --all
And commit:
git commit -m "add function"
jupyter-nb-clear-output..............................(no files to check)Skipped
pylint...................................................................Passed
black....................................................................Passed
[master 4bd3217] add function
2 files changed, 12 insertions(+)
Conclusion
Pre-commit can help you quickly identify problems on your local clone of the code without going all the way to a ci/cd build or code review.
It allows you to effectively enforce coding standards and conventions across the organization and have a better-organized codebase.