Better Programming

Advice for programmers.

Follow publication

Serverless Puppeteer — Use Cases in 2022

Emil Hein
Better Programming
Published in
5 min readJun 13, 2022

Photo by Sivani B on Unsplash

Earlier I’ve written about how to get Puppeteer up and running on AWS, using the serverless framework and AWS Lambda.

Below I will describe some use cases and give examples of how to implement some of them.

Intercept network traffic

Reading the network traffic from a website

This is a use case that I am using to do timing-testing when the browser loads different scripts. Here is a short snippet that takes a Puppeteer page object as well as an URL to navigate to

Above we check if a specific URL is being loaded by the site. I have set up a timeout function that will timeout after 20 seconds, which in my case would mean, that I can safely say the script has not been loaded.

Take screenshots

This is also a pretty basic use case that can provide you with some visual feedback that would otherwise be impossible to get. This can be extremely useful for automated tests or to automate, so you can send your clients screenshots of a wanted page.

Taking a screenshot with puppeteer is extremely easy:

What you do with the screenshot is up to you, but in my case, I save it in S3 and attach it to a Slack channel for automated tests.

Look through the global context

First of all. What does that even mean?

Image your on a website and you open the developer tools. In the console, you will be able to write document or window and some big objects will be printed out. This is because these (together with many others) are global variables specific to the site you are on.

If you own a site and somewhere in your code have the following

globalThis.myVaraible = 'Something i make up'

You will be able to write myVariable in the console and getting the following

Example

This means the global scope can and will contain information only accessible to that tab. With Puppeteer we are also able to look through all the other global variables. The implementation could look something like this

Scrape website

This use case is most likely the most used one.

Let's imagine you have a site, where you want to get all the paragraphsor all the source paths for all the images on the site. In this kind of situation, Puppeteer is shining.

We can in these use cases relatively simple, instruct the browser to return all these values for us.

For retrieving more nested or non-static data, don’t underestimate how much work can go into creating these scripts.

If you rely on specific selectors, you might end up changing your script every time the site change or update the DOM or selector ids.

When scraping data from an external site you should also be aware that it might be illegal in your country, even though it's technically your own data.

Get login token

This use case is a bit controversial, but I'll mention it anyway :)

Recently I encountered a system I needed to interact with via an API.
This was not possible as the system did not have an official API. So what do you do?

I knew that I could do the actions I needed to do by navigating their UI (behind a login), so I thought I might be able to instrument Puppeteer to do the same.

It turned out that I was able to do some shortcuts that made the solution a bit simpler and maybe you can use the same technique.

The idea is rather simple if you want to replicate what an external UI, behind a login, can do, given that the UI communicates with a backend server of some kind.

1. Use Puppeteer to navigate to the login page
2. Use Puppeteer to log in using the form (only works if there is no two-factor authentication)
3. Now you should either get an authorization token directly or maybe it's stored in the website's cookies. You need to retrieve this token and send it with the following requests to the external API.
4. With the authorization token you can now emulate whatever you would typically do in the external UI.

Below is a rough sketch of what’s going on.

Theoretical retrieval of auth token to use with external API

NOTE: This method will not work for all systems.
In some cases you have to be a little creative in trying to emulate what a real user can do behind a login.

Puppeteer for the win!

There are many, many other use cases, but this was to give you an idea of what you could use it for.

It's worth noticing that, when using Puppeteer, you typically interact with websites you have little or no control over. This means that your script should effectively adapt the website.

If the website changes often this can become tricky. Therefore it might save you time to think a little bit about making the scripts a little more general, so they won’t break when a website changes a class name from id1 to id2 .

Make the computers do the work!

You might like:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Emil Hein
Emil Hein

Written by Emil Hein

Fullstack developer. I enjoy prototyping and testing new services. I like working with JavaScript, Nodejs, AWS and Vue, Browser API's, adtech, Go + more

Write a response