Member-only story
Use Binary Encoding Instead of JSON
Encode the same data in far fewer bytes
Why Should I Care?
In memory, the data is kept as data structures like objects, lists, arrays, etc. But when you want to send the data over a network or store it in a file, you need to encode the data as a self-contained sequence of bytes. The translation from the in-memory representation to a byte sequence is called encoding and the inverse is called decoding. With time, the schema for data that an application handles or stores may evolve, a new field can get added, or an old one can be removed. Therefore, the encoding used needs to support both backward (new code should be able to read the data written by the old code) and forward (old code should be able to read the data written by the new code) compatibility.
In this article, we will discuss different encoding formats, how binary encoding formats are better than JSON, XML, and how these support schema evolution.
Types of Encoding Formats
There are two types of encoding formats:
- Textual formats
- Binary formats
Textual Formats
Textual formats are somewhat human-readable. Some of the common formats are JSON, CSV, and XML. Textual formats are easy to use and understand but can result in different problems:
- Textual formats can contain a lot of ambiguity. For example, in XML and CSV, you cannot distinguish between strings and numbers. JSON can distinguish between string and numbers but cannot distinguish between integers and floating numbers and doesn’t specify a precision. This becomes a problem when dealing with large numbers. An example of numbers larger than 253 occurs on Twitter, which uses a 64-bit number to identify each tweet. The JSON returned by Twitter’s API includes tweet IDs twice — once as a JSON number and once as a decimal string — to work around the fact that the numbers are not correctly parsed by JavaScript applications.
- CSV doesn’t contain any schema, leaving it to the application to define the meaning of each row and column.
- Textual formats take more space than binary encoding. For example, as JSON and XML are…