Normalization in Database
What is Normalization in Database:
Normalization is a process of
organizing and structuring data to reduce redundancy and improve data
integrity. The main goal of normalization is to eliminate data anomalies and
inconsistencies that can arise when data is duplicated or stored in an
inefficient manner. It is typically achieved by dividing a large table into
smaller, related tables and establishing relationships between them. The
process involves a set of rules called normal forms, with each normal form
addressing specific issues related to data organization.
Advantages of Normalization:
1. Minimizes Data Redundancy:
One of the primary
advantages of normalization is that it reduces data redundancy. Redundant data
can lead to inconsistencies and increases storage requirements. By storing data
in a more compact and efficient manner, you save storage space and maintenance
effort.
2.
Improves Data
Integrity:
Normalization enhances data integrity by
reducing the risk of anomalies like update anomalies, insertion anomalies, and
deletion anomalies. This means that data remains accurate and reliable
throughout its lifecycle.
3.
Simplifies Data
Maintenance:
With normalized databases,
updates or changes to data only need to be made in one place, as opposed to
multiple locations in a denormalized database. This simplifies data maintenance
and reduces the likelihood of errors.
4.
Enhances Query
Performance:
While normalization may create more tables and
relationships, it often results in improved query performance. Smaller tables
with fewer columns are typically faster to query, especially when indexes are
properly utilized.
5.
Supports Data
Consistency:
Normalization ensures that
data is consistent by adhering to a set of rules for data organization. This
consistency is crucial in maintaining data quality.
6.
Facilitates
Scalability:
Normalized databases are often
more scalable because they can accommodate changes and additions to data
without major restructuring. This makes it easier to adapt to evolving business
needs.
7.
Eases Database
Design and Development:
Normalization provides a
structured framework for designing and developing databases. It guides the
process and helps maintain a clear and logical database schema.
8.
Supports Data
Integrity Constraints:
Normalized databases can
easily implement integrity constraints, such as foreign keys and unique
constraints, which further enhance data quality and accuracy.
9.
Reduces Data
Anomalies:
By eliminating or minimizing data anomalies,
normalization contributes to data reliability and reduces the likelihood of
errors in data-driven decision-making.
Case
Study: Library Database
Consider a database for a
library with a single table called Books. The initial structure of the Books
table might look like this: Books Table:
| BookID | Title | Author | Genre | Publisher | Publication Year |
|--------|-------------|-------------|-----------|-------------|-----------------|
| 1 |
Book A | Author X | Fiction
| Publisher 1 | 2020 |
| 2 | Book B | Author Y | Non-Fiction | Publisher 2 | 2019 |
| 3 | Book C | Author X | Fiction
| Publisher 3 | 2021 |
Step 1: First Normal Form (1NF)
In a 1NF-compliant table, each
column should contain only atomic (indivisible) values. This means that a
column should not contain repeating groups or arrays of values. Each cell in
the table should hold a single, indivisible piece of data.
In 1NF, you ensure that each
column contains only atomic (indivisible) values, and each row is uniquely
identifiable. The initial table is already in 1NF because each cell contains a
single value, and there is a unique identifier (BookID).
Step 2: Second Normal Form (2NF)
It is in 1NF: Before a table
can be in 2NF, it must first satisfy the rules of 1NF, which include having
atomic (indivisible) values in each column and unique identifiers for rows.
No Partial Dependencies: There
should be no partial dependencies of any column on the primary key. In other
words, every non-key attribute (a column that is not part of the primary key)
must be fully functionally dependent on the entire primary key.
2NF deals with partial
dependencies by removing them. To do this, we break the table into two tables:
Books and Authors.
Books Table:
| BookID | Title | Genre | Publisher | Publication Year | AuthorID |
|--------|-------------|-----------|-------------|-----------------|----------|
| 1 | Book A
| Fiction | Publisher 1 | 2020 | 1 |
| 2 | Book B | Non-Fiction | Publisher 2 | 2019 | 2 |
| 3 | Book C | Fiction | Publisher 3 | 2021 | 1
|
Authors Table:
| AuthorID | Author |
|----------|-------------|
| 1 | Author X |
| 2 | Author Y
Now, the Authors table
contains author information, and the Books table is linked to it by AuthorID,
removing partial dependencies.
Step 3: Third Normal Form (3NF)
It is in 2NF: Before a table
can be in 3NF, it must first satisfy the rules of 2NF, which include being in
1NF (having atomic values) and eliminating partial dependencies.
No Transitive Dependencies:
There should be no transitive dependencies of any non-key attribute on the
primary key. In other words, every non-key attribute should be functionally
dependent only on the primary key and not on other non-key attributes.
3NF deals with transitive
dependencies. We'll create another table, Publishers, to remove the transitive
dependency of "Publisher" on "BookID."
Publishers Table:
| PublisherID | Publisher |
|------------|-------------|
| 1 | Publisher 1 |
| 2 | Publisher 2 |
| 3 | Publisher 3 |
Books Table:
| BookID | Title | Genre | Publication Year | AuthorID |
PublisherID |
|--------|-------------|-----------|-----------------|----------|-------------|
| 1 | Book A | Fiction | 2020 | 1 | 1 |
| 2 | Book B | Non-Fiction | 2019 | 2 | 2 |
| 3 | Book C | Fiction | 2021 | 1 | 3 |
Now, the Publishers table
contains publisher information, and the Books table is linked to it by
PublisherID, eliminating transitive dependencies.
To
illustrate 3NF, let's consider a table for a university database:
| StudentID | StudentName |
Department | DepartmentHead |
|-----------|-------------|--------------|----------------|
| 101 | Alice | Computer Sci | Prof. Smith |
| 102 | Bob | Physics | Prof. Johnson |
| 103 | Carol | Computer Sci | Prof. Smith |
In this table, StudentID is
the primary key. However, there is a transitive dependency of DepartmentHead on
Department because the department head's name is determined by the department
alone, not by the StudentID. This violates 3NF.
To bring this table to 3NF,
you would need to create separate tables for departments and department heads
and establish relationships:
Students Table:
| StudentID | StudentName |
DepartmentID |
|-----------|-------------|--------------|
| 101 | Alice | 1 |
| 102 | Bob | 2 |
| 103 | Carol | 1 |
Departments Table:
| DepartmentID |
Department |
|--------------|--------------|
| 1 | Computer Sci |
| 2 | Physics |
DepartmentHeads Table:
| DepartmentID |
DepartmentHead |
|--------------|----------------|
| 1 | Prof. Smith |
| 2 | Prof. Johnson |
Now, the DepartmentHeads table
contains department head information, and the Students table is linked to the
Departments table by DepartmentID, eliminating the transitive dependency. This
satisfies the requirements of 3NF.
BCNF (Boyce-Codd Normal Form)
BCNF is a higher level of database normalization that builds upon the principles of the Third Normal Form (3NF). BCNF is a critical concept in database design that aims to eliminate certain types of redundancy and anomalies in a relational database.
Here's a simple explanation of BCNF with a rule, example, and a case study:
Rule for BCNF:
A relational database table is in BCNF if,
for every non-trivial functional dependency X -> Y (where X is a superkey), X is a superkey itself.
In other words, every non-trivial dependency must be determined by a superkey.
Non Trival Dependencies:
It is a relationship between attributes (columns) in a table where the dependent attribute is not a subset of the determinant attribute(s).
For example,
Consider a database table named Student_Courses with the following attributes: Student_ID, Course_ID, and Instructor.
Suppose we have the following functional dependencies:
Student_ID -> Course_ID
Course_ID -> Instructor
In this case,
Student_ID is a superkey because it uniquely identifies the courses taken by each student.
However, the attribute Course_ID is not a superkey by itself because it doesn't uniquely identify instructors.
To bring the table into BCNF, we must decompose it into two separate tables: Students and Courses.
Case Study:
Original Table: Student_Courses
| Student_ID | Course_ID | Instructor |
|------------|-----------|--------------|
| 101 | CS101 | Mr. Smith |
| 101 | MATH201 | Dr. Johnson |
| 102 | CS101 | Mr. Smith |
| 103 | MATH201 | Dr. Johnson |
In this original table, we have redundancy because the same student and course combination is repeated. To bring it into BCNF, we'll create two separate tables:
Table 1: Students
| Student_ID |
|------------|
| 101 |
| 102 |
| 103 |
Table 2: Courses
| Course_ID | Instructor |
|-----------|--------------|
| CS101 | Mr. Smith |
| MATH201 | Dr. Johnson |
Now, each table is in BCNF because every non-trivial dependency is determined by a superkey. The Students table has a superkey Student_ID, and the Courses table has a superkey Course_ID. The original redundancy has been eliminated, and the data is more efficiently organized.
Another example to understand BCNF,
Consider a database table named Customer_Purchases with the following attributes: Customer_ID, Product_ID, Purchase_Date, and Product_Price. This table records customer purchases, including the customer's ID, the product purchased, the date of purchase, and the price of the product.
Let's apply BCNF to the Customer_Purchases table with attributes Customer_ID, Product_ID, Purchase_Date, and Product_Price.
Step 1: Identify Functional Dependencies:
First, identify all the functional dependencies in the table. From your example, we have:
Customer_ID -> Product_ID
Product_ID -> Product_Price
Step 2: Determine Superkeys:
Identify the superkeys in the table. A superkey is a set of attributes that can uniquely identify a tuple. In this case, Customer_ID is a superkey because it uniquely identifies customers, and Product_ID is also a superkey because it uniquely identifies products.
Step 3: Check for Violations:
Check whether there are any violations of BCNF. A violation occurs when a non-trivial functional dependency exists where the left-hand side is not a superkey.
In the Customer_Purchases table, there are no violations of BCNF because both functional dependencies involve superkeys (Customer_ID and Product_ID).
The
Fourth Normal Form (4NF) and Fifth Normal Form (5NF) are advanced
levels of database normalization, and their rules are defined based on certain
dependencies and constraints. Here are the rules for each:
Fourth
Normal Form (4NF):
4NF deals with multi-valued
dependencies and is an extension of the Third Normal Form (3NF). To be in 4NF,
a table must satisfy the following conditions:
Be
in 3NF: The table must already be in 3NF, which means that it is free from
transitive dependencies.
No
Non-Key Dependency: There should be no non-trivial multi-valued
dependencies. In other words, if A is a non-key attribute and B is another
non-key attribute, then A should not determine B, and vice versa. There should
be no functional dependency between non-key attributes.
Candidate
Key Dependency: All multi-valued dependencies must be dependent on a candidate key.
This means that multi-valued attributes should be fully functionally dependent
on candidate keys.
Person Mobile Food_Likes
Mahesh 9893/9424 Burger/Pizza
Ramesh 9191 Pizza
Person->-> mobile,
Person ->-> food_likes
This is read as “person multi
determines mobile” and “person multi determines food_likes.”
the multi-valued attribute is
"Food_Likes," as a person can like multiple types of food. We'll
create separate tables for "Person," "Mobile," and
"Food_Likes" to normalize the data.
Step
1: Create the "Person" Table
| Person |
| ------- |
| Mahesh |
| Ramesh |
Step 2: Create the
"Mobile" Table
| Person | Mobile
|
| ------- | ---------- |
| Mahesh | 9893/9424
|
| Ramesh | 9191
|
Step 3: Create the
"Food_Likes" Table
| Person | Food_Likes |
| ------- | ---------- |
| Mahesh | Burger
|
| Mahesh | Pizza
|
| Ramesh | Pizza
|
Fifth
Normal Form (5NF):
5NF deals with join
dependencies, which are not as common as other normal forms. To be in 5NF, a
table must satisfy the following conditions:
Be
in 4NF: The table must already be in 4NF.
Join
Dependencies: The table can be divided into two or more smaller tables (projections),
and each smaller table can be rejoined using a natural join operation to
recreate the original table. This means that if you can decompose the table
into smaller tables and then recombine them to obtain the original data, it's
in 5NF.
In practical terms, 5NF is often relevant in complex scenarios where the database design involves multiple lev
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏