Member-only story
Extract Keywords Using spaCy in Python
Find the top keywords from an article and generate hashtags

In this piece, you’ll learn how to extract the most important keywords from a chunk of text — an article, academic paper, or even a short tweet. You can freely use it to generate hashtags, calculate the importance of the sentence and so on.
I will be using an industrial strength natural language processing module called spaCy for this tutorial. I have made a tutorial on similarity matching using spaCy previously — feel free to check it out. There are three sections in this tutorial:
- Setup
- Implementation
- Conclusion
1. Setup
We will be installing the spaCy module via the pip install. Administrative privilege is required to create a symlink when you download the language model. Open a terminal in administrator mode. It’s highly recommended to create a virtual environment before you run the following command:
pip install -U spacy
The next step is to download the language model of your choice. I will be using the large English model for this tutorial. Feel free to check the official website for the complete list of available models.
en_core_web_lg (large)
python -m spacy download en_core_web_lg
The file size of the model is about 800MB. If you would like to just try it out, download the smaller version of the language model.
en_core_web_md (medium)
The medium model is much smaller at just 100MB.
python -m spacy download en_core_web_md
en_core_web_sm (small)
The smallest English language model should take only a moment to download as it’s around 11MB.
python -m spacy download en_core_web_sm
When you’re done, run the following command to check whether spaCy is working properly. It also indicates the models that have been installed.
python -m spacy validate