Member-only story

How to Turn the Web Into Data With Python and Scrapy

A guide to web scraping powered by Python and Scrapy

Juan Cruz Martinez

Published in

Better Programming

7 min readNov 23, 2020

A blurred image of HTML code with certain parts in focus: Peru, Argentina, 0.2, and 20:45. — Image by the author

This tutorial will be an ultimate guide for you to learn web scraping using Python. At first, I’ll walk you through some basic examples to make you familiar with web scraping. Later on, we’ll use that knowledge to extract data of football matches from Livescore.

Without any further ado, let’s begin

Getting Started

To get us started, you’ll need to start a new Python3 project and install Scrapy (a web-scraping and web-crawling library for Python). I’m using Pipenv for this tutorial, but you can use pip and venv — or conda.

pipenv install scrapy

At this point, you have Scrapy, but you still need to create a new web-scraping project, and for that, Scrapy provides us with a command line that does the work for us.

Let’s now create a new project named web_scraper by using the Scrapy CLI.

If you’re using Pipenv like me, use:

pipenv run scrapy startproject web_scraper .

Otherwise, launch from your virtual environment using:

scrapy startproject web_scraper .

This will create a basic project in the current directory with the following structure:

scrapy.cfg
web_scraper/
    __init__.py
    items.py
    middlewares.py
    pipelines.py
    settings.py
    spiders/
        __init__.py

Building Our First Spider With XPath queries

We’ll start our web-scraping tutorial with a very simple example. First, we’ll locate the Live Code Stream logo within the website’s HTML. And, as we know, it’s just a text and not an image, so we’ll simply extract this text.

The code

To get started, we need to create a new spider for this project. We can do that by either creating a new file or using the CLI.

Since we already know the code we need, we’ll create a new Python file at this path /web_scraper/spiders/live_code_stream.py.

Better Programming

How to Turn the Web Into Data With Python and Scrapy

A guide to web scraping powered by Python and Scrapy

Getting Started

Building Our First Spider With XPath queries

The code

Create an account to read the full story.

Published in Better Programming

Written by Juan Cruz Martinez

Responses (2)