Normalization in DBMS

Dasari Swaroop Kumar
4 min readNov 22, 2020

Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations.

Let’s see why this concept came in the first place.

consider the below table

student_id     name     grade     department   room_id     phone           101        Rama Rao   A         CSE          201    112233,112198
202 Ramesh C ECE 301 334421
104 Arjun B CSE 201 234321
411 Jamuna A ME 501 445577
111 Swaroop B CSE 201 220198,288017
209 Sameera B ECE 301 112233

The problem with the above table is that every time we add the same department student to the table, we have to add the same room number again beside him/her. What if some other columns required to be duplicated in a row? Assume that there is a column class_teacher added to the table. Should we write the name of the class teacher every time we add a new student?

Let’s see how this redundancy affects insertion — insertion anomaly.

Assume that you want to add just a department, since your student_id is a primary key, it cannot be null. so, you cannot add just a department directly. If you want to add a department, you must add student details too, which is not a feasible thing to do.

Secondly, let’s see how this affects the updation — updation anomaly.

Assume that room_id needs to be changed for the CSE department. You have to update it in all the tuples whose department is named as CSE. If some attributes were dependent on this update then those related to this attribute must also be updated accordingly.

Lastly, let’s see how this affects deletion — deletion anomaly.

Assume from our table, I want to delete the student with id 411. It deletes the student as well as the department ME as well, which was not intended to happen.

To fix these problems, we follow rules to design the tables and it’s necessary to see if the tables are holding up to the rules of normal forms.

First normal form(1NF):

If a relation contains a composite or multi-valued attribute, it violates the first normal form. A relation is in first normal form if every attribute in that relation is a single-valued attribute.

let’s convert our table to 1NF.

student_id     name     grade     department   room_id    phone   101        Rama Rao   A         CSE          201        112233
101 Rama Rao A CSE 201 112198
202 Ramesh C ECE 301 334421
104 Arjun B CSE 201 234321
411 Jamuna A ME 501 445577
111 Swaroop B CSE 201 220198
111 Swaroop B CSE 201 288017
209 Sameera B ECE 301 112233

The above table is in 1NF since we removed multi-valued attributes. The problem with the above table is that there is no way you can select a primary key but, we can select the combination of student_id and phone as a primary key.

Second normal form(2NF):

To be in second normal form, a relation must be in first normal form and the relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.

we know student_id and phone is a candidate key from the above table(because it’s the minimal superkey). But the non-prime attributes depend on the student_id which is a part of the primary key in the above table. Since there is a partial dependency the above table we have doesn’t satisfy for 2NF.

So, how to convert it to 2NF?

Just separate the tables where it’s creating a conflict. Let me add a column salutation to the first table below to continue with 3NF in the next bit.

student_id(P.K) name     grade     department   room_id  salutation   101        Rama Rao   A         CSE          201      Mr.
202 Ramesh C ECE 301 Mr.
104 Arjun B CSE 201 Mr.
411 Jamuna A ME 501 Mrs.
111 Swaroop B CSE 201 Mr.
209 Sameera B ECE 301 Mrs.
student_id(FK) phone
101 112233
101 112198
202 334421
104 234321
411 445577
111 220198
111 288017
209 112233

Third normal form(3NF):

A relation is in third normal form if there is no transitive dependency for non-prime attributes as well as it is in second normal form.
A relation is in 3NF if the given relation —

  • Is in 2NF.
  • Has no transitive functional dependencies.

In our previous table, changing of non-key attribute name may change salutation. Thus we need to put that transitive dependency causing columns separate from each other.

student_id(P.K) name  grade   department  room_id salutation_id(F.K)          101        Rama Rao   A         CSE      201      1
202 Ramesh C ECE 301 1
104 Arjun B CSE 201 1
411 Jamuna A ME 501 2
111 Swaroop B CSE 201 1
209 Sameera B ECE 301 2
student_id(FK) phone
101 112233
101 112198
202 334421
104 234321
411 445577
111 220198
111 288017
209 112233
salutation_id(P.K) salutation
1 Mr.
2 Mrs.

We have again divided our tables and created a new table which stores Salutations. There are no transitive functional dependencies, and hence our table is in 3NF.

This is the maximum extent to which we can normalize the above table.

That’s all for now on the normalization in RDBMS. catch you guys in the next one.

--

--