Member-only story
How to Scrape Modern Websites Without Headless Browsers
Using Python and common sense

Many developers think that web scraping is hard, too slow, or difficult to scale — especially when using headless browsers. In my experience, you can scrape modern websites without even using headless browsers. It’s easy, fast, and highly scalable.
Instead of using Selenium, Puppeteer, or any other headless browser solution, we’ll just be using Python requests to show how it works. I’ll explain how you can scrape information from public APIs that most modern websites consume in their front end.
In traditional web pages, your goal is to parse the HTML and extract the relevant information. In modern websites, the front end will likely not contain a lot of HTML because the data is fetched asynchronously after the first request. Most people use headless browsers for that reason — the headless browser can execute JavaScript, make further requests, and you can then parse the complete page.
But there’s another method that you can use quite often.
Scrape Public APIs
Let’s look at how you can consume APIs that websites use to bring data. I will scrape Amazon product reviews and show how you to do the same thing. If you follow the process that I outline, you may be surprised how easy it is to build.
The goal is to extract all product reviews for a specific product. To follow along click here, or find any other product.

Our goal is to extract as much information as we can for reviews. Remember, whenever you are scraping data it pays to be greedy. If you don’t extract some information, then you would have to run the entire process again, just to add some more data. And since the heavy part of scraping is the HTTP requests, processing shouldn’t take long, but you should try to minimize the number of requests.
After going to the product page, clicking on the ratings, and then going to “See all reviews”, this is what we see: