Member-only story
Getting Started with Cassandra NoSQl Database
Cassandra architecture and internals explained

Since the advent of NoSql Databases, they have been very popular because of the in-built horizontal partitioning, flexibility of schema, and the ability to calculate metrics/intelligent metadata for the huge amounts of data that they store. As such, it is not a panacea to use in all cases; joins can be harder on NoSql databases, relationships between data may not be implicit, read times can be slower, and there are no ACID properties.
There are broadly 5 different types of NoSql Databases:
- Key Value Store: Can be thought of as a gigantic distributed hash map. Some of the popular examples include Redis and Couchbase.
- Document Store: Used for storing and retrieving document-like structures such as JSON, XML.
Document Store
is a sub-class ofKey-Value
store where the distinction lies in how the data is inherently opaque to the database in key-value store whereas document store uses the underlying structure of the document to generate metadata and optimize further with it. MongoDB is one the popular examples. - Graph: Graph databases can be viewed as document database with an added layer of relationship between those documents.
- Columnar Database: A database that stores data in multiple columns. It is similar to a two-dimensional key-value store. One of the most popular examples is Cassandra.
Cassandra was developed and open-sourced by Facebook and currently is being heavily used by the likes of Netflix, Apple, Spotify, and many more. Columnar databases like Cassandra have the advantage that each column is stored in a separate file on disk, so if you query only certain columns, you will only need to read them instead of parsing the whole row with columns that are not part of the query. Furthermore, duplicate sequential data in the columns can be compacted to improve storage efficiency.
Features of Cassandra
- Distributed, nonrelational datastore.
- Horizontally scalable(low data density).
- Designed for access pattern(SQL-like queries on data for i/o operations).
- Seamless data replication.