Database Normalization

Paul Brown · 07/02/2024 · 1 min read


This post makes no effort to be the final word on normalization, nor does it aim to be exhaustive. It is, rather, a synthesis of existing resources written in plain English as a reference point for the author. More comprehensive coverage of the topic can be found on the corresponding Wikipedia page and in a video on the Decomplexify YouTube channel.

Normalization

Normalization is the process of organizing information in a database. Creating tables and establishing relationships in a way that eliminates redundancy and are flexible for future extension are all part of database normalization.

Sets of rules exist to guide users through database normalization. These rules, when grouped, are called 'normal forms'. The most commonly discussed normal forms range from the first normal form through to the fifth normal form. Adhering to each successive form provides diminishing returns: regarding normalization, the most important thing for database design is that it conforms to the first normal form; the second most important things is that it conforms to the second normal form and so on.

The following sections detail these rules

First normal form

  1. Database rows are not inherently ordered: they are not permittted to convey order information
  2. Each database column (sometimes called a 'field') must hold only one type of data. A 'type' in this case is something like an integer (there are various types -- tinyint, int, bigint) or a datetime or a string (of various specified lengths).
  3. Each table must have a primary key.
  4. Repeating groups are not permitted. That is, columns with names like 'customer01', 'customer02', 'customer03' and so on are not allowed.

Second normal form

Third normal form

Fourth normal form

Fifth normal form

Deletion anomalies

Update anomalies

Insertion anomalies


Discussions

Login to Post Comments
No Comments Posted