How to Store Documents Larger Than 16 MB in MongoDB
A simple approach without using GridFS

I used to develop an app for viewing huge tables of data (250K rows x 120 columns). The main requirement was that all the data was available in the client’s app. The data was immutable — it was saved once and then used in table views in the client app.
As you know, MongoDB stores data in a document. The limit for one document is 16Mb. You can also use GridFS to store large files that can exceed 16Mb. It will store them in multiple chunks.
I decided to store each column as a single document in MongoDB where _id
was a hash of that column data. In this article, I will share an approach I used to store documents in MongoDB columns larger than 16Mb.
There are four types of data in our app — strings, numbers, dates (represented as Unix ms), and booleans. With its 16Mb limit, a MongoDB document can easily store around 2 million values of 64-bit numbers (also dates and booleans). But strings are a special case. Each UTF-8 character takes one byte.
The interface for the column with strings is below:
At some point, our users started to upload tables with 250K values with columns of 40-character hashes. Simple math shows that 250,000 x 40 characters x 2 bytes = 20Mb. At that point, our app was unable to save such columns to MongoDB.
I had to find a solution.
Ways to Decrease the Size of the Column With Strings
1. Stringify and zip string of values
The most naive approach still gets the job done for some repeated texts:
2. Use the Excel approach with a dictionary
Create a dictionary, put unique strings into a dictionary, and replace the strings with indexes. This method is useful if there are many repeated strings in a column. It does not help if someone wants to save a column of hashes.
Unfortunately, if someone wants to save a column of hashes, the two previous methods will not help. So, this is the third approach: Do what GridFS does for files.
3. Create multiple documents for a single column
We always serve a full column to the user so that we create a linked list of documents. If our column fits 16Mb, we store it as is. Otherwise, we split the column into chunks and save the chunks in other documents linked to the main document.
Our get
function for the column will look like this:
Now our users can save large amounts of data without crashing our app. We will store a column in multiple documents.
Conclusion
I used a naive approach to calculate the MongoDB document size in the code above. In the next article, I will write about how to better calculate document size.