Normalization in Database

 

Normalization in Database

 

What is Normalization in Database:

Normalization is a process of organizing and structuring data to reduce redundancy and improve data integrity. The main goal of normalization is to eliminate data anomalies and inconsistencies that can arise when data is duplicated or stored in an inefficient manner. It is typically achieved by dividing a large table into smaller, related tables and establishing relationships between them. The process involves a set of rules called normal forms, with each normal form addressing specific issues related to data organization.

 

Advantages of Normalization:

 

1.     Minimizes Data Redundancy:

One of the primary advantages of normalization is that it reduces data redundancy. Redundant data can lead to inconsistencies and increases storage requirements. By storing data in a more compact and efficient manner, you save storage space and maintenance effort.

 

2.     Improves Data Integrity:

 Normalization enhances data integrity by reducing the risk of anomalies like update anomalies, insertion anomalies, and deletion anomalies. This means that data remains accurate and reliable throughout its lifecycle.

 

3.     Simplifies Data Maintenance:

With normalized databases, updates or changes to data only need to be made in one place, as opposed to multiple locations in a denormalized database. This simplifies data maintenance and reduces the likelihood of errors.

 

4.     Enhances Query Performance:

 While normalization may create more tables and relationships, it often results in improved query performance. Smaller tables with fewer columns are typically faster to query, especially when indexes are properly utilized.

 

5.     Supports Data Consistency:

Normalization ensures that data is consistent by adhering to a set of rules for data organization. This consistency is crucial in maintaining data quality.

 

6.     Facilitates Scalability:

Normalized databases are often more scalable because they can accommodate changes and additions to data without major restructuring. This makes it easier to adapt to evolving business needs.

 

7.     Eases Database Design and Development:

Normalization provides a structured framework for designing and developing databases. It guides the process and helps maintain a clear and logical database schema.

 

8.     Supports Data Integrity Constraints:

Normalized databases can easily implement integrity constraints, such as foreign keys and unique constraints, which further enhance data quality and accuracy.

 

9.     Reduces Data Anomalies:

 By eliminating or minimizing data anomalies, normalization contributes to data reliability and reduces the likelihood of errors in data-driven decision-making.

 

Case Study: Library Database

 

Consider a database for a library with a single table called Books. The initial structure of the Books table might look like this: Books Table:

 

| BookID | Title       | Author      | Genre     | Publisher   | Publication Year |

|--------|-------------|-------------|-----------|-------------|-----------------|

| 1      | Book A      | Author X    | Fiction   | Publisher 1 | 2020            |

| 2      | Book B      | Author Y    | Non-Fiction | Publisher 2 | 2019           |

| 3      | Book C      | Author X    | Fiction   | Publisher 3 | 2021            |

 

Step 1: First Normal Form (1NF)

In a 1NF-compliant table, each column should contain only atomic (indivisible) values. This means that a column should not contain repeating groups or arrays of values. Each cell in the table should hold a single, indivisible piece of data.

In 1NF, you ensure that each column contains only atomic (indivisible) values, and each row is uniquely identifiable. The initial table is already in 1NF because each cell contains a single value, and there is a unique identifier (BookID).

Step 2: Second Normal Form (2NF)

It is in 1NF: Before a table can be in 2NF, it must first satisfy the rules of 1NF, which include having atomic (indivisible) values in each column and unique identifiers for rows.

 

No Partial Dependencies: There should be no partial dependencies of any column on the primary key. In other words, every non-key attribute (a column that is not part of the primary key) must be fully functionally dependent on the entire primary key.

2NF deals with partial dependencies by removing them. To do this, we break the table into two tables: Books and Authors.

Books Table:

| BookID | Title       | Genre     | Publisher   | Publication Year | AuthorID |

|--------|-------------|-----------|-------------|-----------------|----------|

| 1      | Book A      | Fiction   | Publisher 1 | 2020            | 1        |

| 2      | Book B      | Non-Fiction | Publisher 2 | 2019           | 2        |

| 3      | Book C      | Fiction   | Publisher 3 | 2021            | 1        |

 

Authors Table:

| AuthorID | Author      |

|----------|-------------|

| 1        | Author X    |

| 2        | Author Y  

 

Now, the Authors table contains author information, and the Books table is linked to it by AuthorID, removing partial dependencies.

 

Step 3: Third Normal Form (3NF)

It is in 2NF: Before a table can be in 3NF, it must first satisfy the rules of 2NF, which include being in 1NF (having atomic values) and eliminating partial dependencies.

 

No Transitive Dependencies: There should be no transitive dependencies of any non-key attribute on the primary key. In other words, every non-key attribute should be functionally dependent only on the primary key and not on other non-key attributes.

3NF deals with transitive dependencies. We'll create another table, Publishers, to remove the transitive dependency of "Publisher" on "BookID."

Publishers Table:

| PublisherID | Publisher   |

|------------|-------------|

| 1          | Publisher 1 |

| 2          | Publisher 2 |

| 3          | Publisher 3 |

 

Books Table:

| BookID | Title       | Genre     | Publication Year | AuthorID | PublisherID |

|--------|-------------|-----------|-----------------|----------|-------------|

| 1      | Book A      | Fiction   | 2020            | 1        | 1           |

| 2      | Book B      | Non-Fiction | 2019           | 2        | 2           |

| 3      | Book C      | Fiction   | 2021            | 1        | 3           |

 

Now, the Publishers table contains publisher information, and the Books table is linked to it by PublisherID, eliminating transitive dependencies.

 

 

To illustrate 3NF, let's consider a table for a university database:

 

| StudentID | StudentName | Department   | DepartmentHead |

|-----------|-------------|--------------|----------------|

| 101       | Alice       | Computer Sci | Prof. Smith    |

| 102       | Bob         | Physics      | Prof. Johnson  |

| 103       | Carol       | Computer Sci | Prof. Smith    |

 

In this table, StudentID is the primary key. However, there is a transitive dependency of DepartmentHead on Department because the department head's name is determined by the department alone, not by the StudentID. This violates 3NF.

 

To bring this table to 3NF, you would need to create separate tables for departments and department heads and establish relationships:

Students Table:

| StudentID | StudentName | DepartmentID |

|-----------|-------------|--------------|

| 101       | Alice       | 1            |

| 102       | Bob         | 2            |

| 103       | Carol       | 1            |

Departments Table:

| DepartmentID | Department   |

|--------------|--------------|

| 1            | Computer Sci |

| 2            | Physics      |

DepartmentHeads Table:

| DepartmentID | DepartmentHead |

|--------------|----------------|

| 1            | Prof. Smith    |

| 2            | Prof. Johnson  |

Now, the DepartmentHeads table contains department head information, and the Students table is linked to the Departments table by DepartmentID, eliminating the transitive dependency. This satisfies the requirements of 3NF.

 

BCNF (Boyce-Codd Normal Form) 

BCNF is a higher level of database normalization that builds upon the principles of the Third Normal Form (3NF). BCNF is a critical concept in database design that aims to eliminate certain types of redundancy and anomalies in a relational database.


Here's a simple explanation of BCNF with a rule, example, and a case study:


Rule for BCNF:

A relational database table is in BCNF if, 

for every non-trivial functional dependency X -> Y (where X is a superkey), X is a superkey itself. 

In other words, every non-trivial dependency must be determined by a superkey.

Non Trival Dependencies:

It is a relationship between attributes (columns) in a table where the dependent attribute is not a subset of the determinant attribute(s).

For example,


Consider a database table named Student_Courses with the following attributes: Student_ID, Course_ID, and Instructor. 

Suppose we have the following functional dependencies:


Student_ID -> Course_ID

Course_ID -> Instructor

In this case, 

Student_ID is a superkey because it uniquely identifies the courses taken by each student. 

However, the attribute Course_ID is not a superkey by itself because it doesn't uniquely identify instructors. 

To bring the table into BCNF, we must decompose it into two separate tables: Students and Courses.

Case Study:


Original Table: Student_Courses

| Student_ID | Course_ID | Instructor   |

|------------|-----------|--------------|

| 101        | CS101     | Mr. Smith    |

| 101        | MATH201   | Dr. Johnson  |

| 102        | CS101     | Mr. Smith    |

| 103        | MATH201   | Dr. Johnson  |


In this original table, we have redundancy because the same student and course combination is repeated. To bring it into BCNF, we'll create two separate tables:

Table 1: Students

| Student_ID |

|------------|

| 101        |

| 102        |

| 103        |

Table 2: Courses

| Course_ID | Instructor   |

|-----------|--------------|

| CS101     | Mr. Smith    |

| MATH201   | Dr. Johnson  |


Now, each table is in BCNF because every non-trivial dependency is determined by a superkey. The Students table has a superkey Student_ID, and the Courses table has a superkey Course_ID. The original redundancy has been eliminated, and the data is more efficiently organized.


Another example to understand BCNF,

Consider a database table named Customer_Purchases with the following attributes: Customer_ID, Product_ID, Purchase_Date, and Product_Price. This table records customer purchases, including the customer's ID, the product purchased, the date of purchase, and the price of the product.


Let's apply BCNF to the Customer_Purchases table with attributes Customer_ID, Product_ID, Purchase_Date, and Product_Price.


Step 1: Identify Functional Dependencies:

First, identify all the functional dependencies in the table. From your example, we have:


Customer_ID -> Product_ID

Product_ID -> Product_Price

Step 2: Determine Superkeys:

Identify the superkeys in the table. A superkey is a set of attributes that can uniquely identify a tuple. In this case, Customer_ID is a superkey because it uniquely identifies customers, and Product_ID is also a superkey because it uniquely identifies products.


Step 3: Check for Violations:

Check whether there are any violations of BCNF. A violation occurs when a non-trivial functional dependency exists where the left-hand side is not a superkey.


In the Customer_Purchases table, there are no violations of BCNF because both functional dependencies involve superkeys (Customer_ID and Product_ID).


The Fourth Normal Form (4NF) and Fifth Normal Form (5NF) are advanced levels of database normalization, and their rules are defined based on certain dependencies and constraints. Here are the rules for each:

 

Fourth Normal Form (4NF):

4NF deals with multi-valued dependencies and is an extension of the Third Normal Form (3NF). To be in 4NF, a table must satisfy the following conditions:

 

Be in 3NF: The table must already be in 3NF, which means that it is free from transitive dependencies.

 

No Non-Key Dependency: There should be no non-trivial multi-valued dependencies. In other words, if A is a non-key attribute and B is another non-key attribute, then A should not determine B, and vice versa. There should be no functional dependency between non-key attributes.

 

Candidate Key Dependency: All multi-valued dependencies must be dependent on a candidate key. This means that multi-valued attributes should be fully functionally dependent on candidate keys.

Person        Mobile        Food_Likes

Mahesh       9893/9424   Burger/Pizza

Ramesh       9191  Pizza

 

Person->-> mobile,

Person ->-> food_likes

This is read as “person multi determines mobile” and “person multi determines food_likes.”

the multi-valued attribute is "Food_Likes," as a person can like multiple types of food. We'll create separate tables for "Person," "Mobile," and "Food_Likes" to normalize the data.

Step 1: Create the "Person" Table

| Person  |

| ------- |

| Mahesh  |

| Ramesh  |

 

Step 2: Create the "Mobile" Table

| Person  | Mobile     |

| ------- | ---------- |

| Mahesh  | 9893/9424  |

| Ramesh  | 9191       |

 

Step 3: Create the "Food_Likes" Table

| Person  | Food_Likes |

| ------- | ---------- |

| Mahesh  | Burger     |

| Mahesh  | Pizza      |

| Ramesh  | Pizza      |

 

Fifth Normal Form (5NF):

5NF deals with join dependencies, which are not as common as other normal forms. To be in 5NF, a table must satisfy the following conditions:

 

Be in 4NF: The table must already be in 4NF.

 

Join Dependencies: The table can be divided into two or more smaller tables (projections), and each smaller table can be rejoined using a natural join operation to recreate the original table. This means that if you can decompose the table into smaller tables and then recombine them to obtain the original data, it's in 5NF.

 

In practical terms, 5NF is often relevant in complex scenarios where the database design involves multiple lev

टिप्पणी पोस्ट करा

0 टिप्पण्या