Member-only story
How to Build a Text Summarizer (TL;DR) With Simple Natural Language Processing
Learn how to summarize any online article using Python, NLP, and basic web crawling

Wouldn’t it be great if you could automatically get a summary of any online article? Whether you’re too busy or have too many articles on your reading list, sometimes all you really want is a short article summary.
That’s why TL;DR is so commonly used these days. While this internet acronym can criticize a piece of writing as overly long, it is often used to give a helpful summary of a much longer story or complicated phenomenon. Today, we will build a TL;DR for any given article.
Getting Started
For this tutorial, we’ll be using two Python libraries:
- Beautiful Soup for web crawling.
“Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.” — Beautiful Soup’s documentation
- NLTK (Natural Language Toolkit) for text summarization.
“NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.” — NLTK’s documentation
Go ahead and get familiar with the libraries before continuing. Also, make sure to install them locally. Alternatively, run this command within the project repo directory:
pip install -r requirements.txt
Next, we will download the stopwords
corpus from the NLTK library individually. Open Python’s command line and enter:
import nltk
nltk.download("stopwords")
Text Summarization Using NLP
Let’s describe the algorithm:
- Get URL from user input.