Member-only story
How to Turn the Web Into Data With Python and Scrapy
A guide to web scraping powered by Python and Scrapy

This tutorial will be an ultimate guide for you to learn web scraping using Python. At first, I’ll walk you through some basic examples to make you familiar with web scraping. Later on, we’ll use that knowledge to extract data of football matches from Livescore.
Without any further ado, let’s begin
Getting Started
To get us started, you’ll need to start a new Python3 project and install Scrapy (a web-scraping and web-crawling library for Python). I’m using Pipenv for this tutorial, but you can use pip and venv — or conda.
pipenv install scrapy
At this point, you have Scrapy, but you still need to create a new web-scraping project, and for that, Scrapy provides us with a command line that does the work for us.
Let’s now create a new project named web_scraper
by using the Scrapy CLI.
If you’re using Pipenv like me, use:
pipenv run scrapy startproject web_scraper .
Otherwise, launch from your virtual environment using:
scrapy startproject web_scraper .
This will create a basic project in the current directory with the following structure:
scrapy.cfg
web_scraper/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
spiders/
__init__.py
Building Our First Spider With XPath queries
We’ll start our web-scraping tutorial with a very simple example. First, we’ll locate the Live Code Stream logo within the website’s HTML. And, as we know, it’s just a text and not an image, so we’ll simply extract this text.
The code
To get started, we need to create a new spider for this project. We can do that by either creating a new file or using the CLI.
Since we already know the code we need, we’ll create a new Python file at this path /web_scraper/spiders/live_code_stream.py
.