Why Token-based Pagination Performs Better Than Offset Based?

It’s almost 1.5x faster

Shuhan Mirza
Better Programming

--

Image licensed by author. From Canva

Almost every modern digital product has to incorporate some sort of pagination API. The main role of that API is to retrieve data from a large table chunk by chunk so that your Frontend application does not need to make an API call that has a huge payload. Smaller payload results in less latency.

Offset based pagination

A commonly found approach for developing a pagination API is offset-based. Here, the Frontend requested 10 rows after the 100th row of the products table. This is harmless when the size of the table is small. However, latency increases with the increasing value of the offset.

Why? Because the database (not all kinds of database) actually prepares offset+limit numbers of rows for this query. So, when the offset value becomes larger, the database has to go through offset+limit numbers of rows.

Let’s do a quick experiment. I have prepared a table named products with 110001 rows in a PostgreSQL database. It has 3 columns; namely id , name , andcreated_at .

Now let’s run this query,

Database Response:

Limit  (cost=1.74..1.91 rows=10 width=27) (actual time=0.552..0.554 rows=10 loops=1)
-> Seq Scan on products (cost=0.00..1909.01 rows=110001 width=27) (actual time=0.541..0.548 rows=110 loops=1)
Planning Time: 1.231 ms
Execution Time: 0.580 ms

Here, we can see the database has processed 110 rows and then returned 10 rows. Now, let’s throttle this by increasing the offset number to 90000.

Limit  (cost=1561.90..1562.08 rows=10 width=27) (actual time=10.275..10.277 rows=10 loops=1)
-> Seq Scan on products (cost=0.00..1909.01 rows=110001 width=27) (actual time=0.425..6.666 rows=90010 loops=1)
Planning Time: 1.070 ms
Execution Time: 10.302 ms

Now, you can see it has read 90010 rows for returning only 10 rows! Execution time also takes a significant amount of 10.302ms .

Token based pagination

In this type of pagination, the API sends the consumer an next_token on every page. This token is used for the next API call and so on. A column or property is used to pivot through the data in the table. You can choose any column that fits the requirement. The main idea is that you have to come up with conditions and parameters that will help you navigate through the table. A common practice is to choose row creation time as a token if the creation time is unique. In the above diagram, created_at column of products table has been used as the token. You can see the API has sent the created_at of the 111th row as the next_token along with 10 products as result. However, token-based pagination requires much less latency if the table gets bigger.

Why? Because this query filters through the table using the condition in the query and starts scanning from the row where the condition is matched. In short, it only reads limit numbers of rows.

Let’s do some experiments on the products table created earlier. We will query for the same results that were done in earlier experiments. Note that, proper indexing is will result in even better performance. I kept things simple.

let’s run this query for the 100th to 110th row,

Here, we can already see it has less execution time than before. The database has filtered out and only calculated 10 rows instead of 110 rows.

Now, let’s throttle the query with a larger value of the token. This query targets 90000 to 90010th rows.

Limit  (cost=0.00..1.11 rows=10 width=27) (actual time=6.587..6.589 rows=10 loops=1)
-> Seq Scan on products (cost=0.00..2184.01 rows=19735 width=27) (actual time=6.586..6.587 rows=10 loops=1)
"Filter: (created_at >= '1662535267497'::bigint)"
Rows Removed by Filter: 90000
Planning Time: 1.314 ms
Execution Time: 6.617 ms

Did you notice the big difference? The query took only 6.617ms for execution. The database did not have to go through 90000 rows, instead, it filtered through the table and only read the 10 required rows.

Summary

Database query for Offset based pagination has to read offset+limit number of rows, on the other hand, token-based query only requires to read limit numbers of rows with additional processing for the condition.

Want to Connect?If you have any feedback, please ping me on my LinkedIn: https://linkedin.com/in/shuhanmirza/

--

--