Better Programming

Advice for programmers.

Follow publication

Member-only story

How to Turn the Web Into Data With Python and Scrapy

Juan Cruz Martinez
Better Programming
Published in
7 min readNov 23, 2020
A blurred image of HTML code with certain parts in focus: Peru, Argentina, 0.2, and 20:45.
Image by the author

This tutorial will be an ultimate guide for you to learn web scraping using Python. At first, I’ll walk you through some basic examples to make you familiar with web scraping. Later on, we’ll use that knowledge to extract data of football matches from Livescore.

Without any further ado, let’s begin

Getting Started

To get us started, you’ll need to start a new Python3 project and install Scrapy (a web-scraping and web-crawling library for Python). I’m using Pipenv for this tutorial, but you can use pip and venv — or conda.

pipenv install scrapy

At this point, you have Scrapy, but you still need to create a new web-scraping project, and for that, Scrapy provides us with a command line that does the work for us.

Let’s now create a new project named web_scraper by using the Scrapy CLI.

If you’re using Pipenv like me, use:

pipenv run scrapy startproject web_scraper .

Otherwise, launch from your virtual environment using:

scrapy startproject web_scraper .

This will create a basic project in the current directory with the following structure:

scrapy.cfg
web_scraper/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
spiders/
__init__.py

Building Our First Spider With XPath queries

We’ll start our web-scraping tutorial with a very simple example. First, we’ll locate the Live Code Stream logo within the website’s HTML. And, as we know, it’s just a text and not an image, so we’ll simply extract this text.

The code

To get started, we need to create a new spider for this project. We can do that by either creating a new file or using the CLI.

Since we already know the code we need, we’ll create a new Python file at this path /web_scraper/spiders/live_code_stream.py.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Juan Cruz Martinez
Juan Cruz Martinez

Written by Juan Cruz Martinez

I stream, blog, and make youtube videos about tech stuff. I love coding, I love React, and I love building stuff!

Responses (2)

Write a response