Contents

SQL Normalizations

Normalization

Normalization in SQL refers to the process of organizing data in a relational database to minimize data redundancy and improve data integrity. This is done by dividing larger tables into smaller, more manageable ones and establishing relationships between them. There are several normal forms, with the most common being first normal form (1NF), second normal form (2NF), and third normal form (3NF). Each normal form has specific rules that must be followed to ensure that the data is properly organized. Normalization can help to improve the performance and maintainability of a database.

First Normal Form (1NF)

First normal form (1NF) is the most basic level of normalization for a relational database. It is defined by the following rules:

  1. Each table must have a primary key, which is a unique identifier for each row in the table.

  2. Each column in the table can contain only atomic (indivisible) values, meaning that it cannot contain repeating groups of data.

Example: Consider a table “Sales” that contains data about sales of a product. The table has the following columns:

  • Sale ID (primary key)
  • Product Name
  • Product Quantity
  • Product Price
  • Customer Name
  • Customer Address

This table is not in 1NF as it violates the 2nd rule of 1NF, as the “Customer Name” and “Customer Address” columns contain multiple values. To bring it in 1NF we can create another table “Customers” with columns “Customer ID” (Primary Key), “Customer Name”, “Customer Address” and in the “Sales” table we can add a foreign key “Customer ID” which references the “Customer ID” in the “Customers” table.

This way we have brought the “Sales” table in 1NF and eliminated the data redundancy.

By enforcing 1NF, we can ensure that the data is stored in a consistent and organized manner, making it easier to maintain and query the data in the future.

Second Normal Form (2NF)

Second normal form (2NF) builds upon the rules of first normal form (1NF) and adds an additional requirement:

  1. The table must be in 1NF
  2. Non-key columns in the table must be dependent on the primary key.

This means that any column in the table that is not part of the primary key must be functionally dependent on the primary key. This means, if we know the primary key, we can determine the value of the non-key columns.

Example: Consider a table “Orders” that contains data about customer orders, with the following columns:

  • Order ID (Primary Key)
  • Customer ID (Primary Key)
  • Product ID (Primary Key)
  • Order Quantity
  • Order Price

This table is in 1NF because it has a primary key and all columns contain atomic values. However, it is not in 2NF because the non-key columns “Order Quantity” and “Order Price” are not dependent on the primary key. They are dependent on the combination of “Customer ID” and “Product ID” which are not part of the primary key. To bring it in 2NF, we can create two new tables “Order_Details” and “Order_Summary” with columns:

Order_Details:

  • Order ID (Primary Key)
  • Product ID (Primary Key)
  • Order Quantity
  • Order_Summary:

Order_Summary:

  • Order ID (Primary Key)
  • Customer ID (Primary Key)
  • Order Price

By breaking the table into two tables, we can ensure that the non-key columns are dependent on the primary key, bringing the table to 2NF.

By enforcing 2NF, we can ensure that the data is stored in a more efficient and organized manner, making it easier to maintain and query the data in the future. It also eliminates the data redundancy and anomalies caused by partial dependency.

Third Normal Form (3NF)

Third normal form (3NF) builds upon the rules of second normal form (2NF) and adds an additional requirement:

  1. The table must be in 2NF
  2. Non-key columns in the table must not be dependent on other non-key columns.

This means that any column in the table that is not part of the primary key must not be functionally dependent on any other non-key columns. This means, if we know the primary key, we can determine the value of the non-key columns, and the value of other non-key columns should not be dependent on any other non-key columns.

Example: Consider a table “Employees” that contains data about employees, with the following columns:

  • Employee ID (Primary Key)
  • Employee Name
  • Employee Salary
  • Employee Manager ID (Non-key Column)
  • Manager Name (Non-key Column)

This table is in 2NF because it has a primary key and all non-key columns are dependent on the primary key, but it is not in 3NF because the non-key column “Manager Name” is dependent on “Employee Manager ID” which is another non-key column. To bring it in 3NF, we can create a new table “Managers” with columns:

Managers:

  • Manager ID (Primary Key)
  • Manager Name

And in the “Employees” table, we can add a foreign key “Manager ID” which references the “Manager ID” in the “Managers” table.

By breaking the table into two tables, we can ensure that the non-key columns are dependent only on the primary key, bringing the table to 3NF.

By enforcing 3NF, we can ensure that the data is stored in a more efficient and organized manner, making it easier to maintain and query the data in the future. It also eliminates the data redundancy and anomalies caused by transitive dependency. It also makes the database more robust to change, by having a separate table for non-key columns that are dependent on other non-key columns.