Better Programming

Advice for programmers.

Follow publication

A Memory-Friendly Way of Reading Files in Node.js

How to read gigabytes of data with a limited amount of memory

Kasper Moskwiak
Better Programming
Published in
7 min readJan 4, 2021

--

Cup of liquid spilling over
Photo by Zac Harris on Unsplash.

The need to read a file may arise in a variety of cases. It may be a one-time job of parsing error logs, a functionality of an application, a scheduled data migration task, part of a deployment pipeline, etc. Regardless of the reason, reading files in Node.js is a very simple and straightforward task. However, a problem occurs when the file size exceeds the amount of RAM in your machine. Beyond hardware limitation, RAM can be limited by your VPS provider, Kubernetes pod settings, etc.

A simple fs.readFile won't do the job. And closing your eyes in hopes that it will somehow pass this time won't help.

It’s time to do some memory-aware programming.

In this article, I will look into three ways of reading files in Node.js. My goal is to find the most efficient approach in terms of memory usage. I will cover:

  1. Built-in fs.readFileSync
  2. Iterating over fs.createReadStream
  3. fs.read with a shared buffer

The Experiment

I implemented each approach as a small Node.js app and ran it inside a Docker container. Each application was given the task of processing a 1GB file in chunks of 10MB.

During the program execution, I measured the memory usage of the Docker container multiple times. Note that in the plots shown in this article, the measured memory is the sum of the memory used by the Node.js program itself and all processes in the Docker container (including the OS).

Have you noticed that the size of a single chunk is rather large? 10MB of text data holds 10,000,000 characters (one character in UTF-8 takes one byte). It far exceeds the number of characters in a single line of an ordinary log file or CSV. The size of a single line would be a reasonable size in a real-life application. I use chunks of comparable size to those of an idle Docker container. This way, any differences between implementations sensitive to chunk size will be more visible in the charts.

Let’s quickly see what we are dealing with. In the chart below, we can see a moving maximum of the memory…

--

--

Responses (5)

Write a response