Better Programming

Advice for programmers.

Follow publication

Member-only story

How to Scrape Modern Websites Without Headless Browsers

Aris Pattakos
Better Programming
Published in
9 min readJan 5, 2021
Photo by Christopher Gower on Unsplash

Many developers think that web scraping is hard, too slow, or difficult to scale — especially when using headless browsers. In my experience, you can scrape modern websites without even using headless browsers. It’s easy, fast, and highly scalable.

Instead of using Selenium, Puppeteer, or any other headless browser solution, we’ll just be using Python requests to show how it works. I’ll explain how you can scrape information from public APIs that most modern websites consume in their front end.

In traditional web pages, your goal is to parse the HTML and extract the relevant information. In modern websites, the front end will likely not contain a lot of HTML because the data is fetched asynchronously after the first request. Most people use headless browsers for that reason — the headless browser can execute JavaScript, make further requests, and you can then parse the complete page.

But there’s another method that you can use quite often.

Scrape Public APIs

Let’s look at how you can consume APIs that websites use to bring data. I will scrape Amazon product reviews and show how you to do the same thing. If you follow the process that I outline, you may be surprised how easy it is to build.

The goal is to extract all product reviews for a specific product. To follow along click here, or find any other product.

Screenshot of the product.

Our goal is to extract as much information as we can for reviews. Remember, whenever you are scraping data it pays to be greedy. If you don’t extract some information, then you would have to run the entire process again, just to add some more data. And since the heavy part of scraping is the HTTP requests, processing shouldn’t take long, but you should try to minimize the number of requests.

After going to the product page, clicking on the ratings, and then going to “See all reviews”, this is what we see:

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Aris Pattakos
Aris Pattakos

Written by Aris Pattakos

Lead Software Engineer @Flash Pack - I post programming advice on https://www.bestpractices.tech/

Responses (4)

Write a response