Member-only story
Normalization vs. Denormalization
When is each appropriate?
One Place or Many
The difference between normalization and denormalization is simple. When data is normalized it exists in one and only one source-of-truth location. Denormalized data exists in multiple summarized locations.
Data living in one or many locations has important consequences for accuracy and speed. If data lives in one source location (i.e., is normalized), it must be summarized with other data to be useful. If data is summarized in multiple locations (i.e., is denormalized), when it’s updated, it will be out of sync for some time before each location can be changed.
Here is an example of each below:
Normalized data:
Below we can see three users: Patrick, Eric, and Robert. Patrick is a friend of the others. Eric and Robert do not know each other.
Users table:
ID NAME EMAIL
1 Patrick patrick@somewebsite.com
2 Eric eric@somewebsite.com
3 Robert robert@somewebsite.com
Friendship join table:
Friendship join table:
USER_A_ID USER_B_ID
1 1
1 2
The important thing to notice here is that everyone’s name and email address is listed only once. If Eric’s email address changes, it is simply updated in the one spot it exists. However, if Patrick logs into the system and wants to know who his friends are and their email addresses, the database would need to use some compute time to tie it all together and deliver the result to Patrick.
Denormalized data:
Below are the same three users with the same relationship to each other.
{
name: "Patrick",
email: "patrick@somewebsite.com",
friends: [
{
name: "Eric",
email: "eric@somewebsite.com"
},
{
name: "Robert",
email: "robert@somewebsite.com"
},
]
},
{
name: "Robert",
email: "robert@somewebsite.com",
friends: [
{
name: "Patrick",
email: "patrick@somewebsite.com"
}
]
},
{
name: "Eric",
email: "eric@somewebsite.com",
friends: [
{
name: "Patrick",
email: "patrick@somewebsite.com"
}
]
}