Normalization

Normalization:
Normalization is the process of breaking up the table or relation into multiple tables or relations so that we can reduce redundancy and we can make our database more efficient as well as error free.
Consider StudentInfo table,

Id Name Department
1 S1 CSE
2 S2 EC
3 S3 CSE
4 S5 IT
5 S4 CSE

Now if you have to add one more column for name of HOD also, so now Student table will look like,

Id Name Department HOD
1 S1 CSE Prof. A
2 S2 EC Prof. B
3 S3 CSE Prof. A
4 S5 IT Prof. C
5 S4 CSE Prof. A

Here you can see that whenever name of department is repeating, HOD name is also repeating, this is causing redundancy in the database.
Now suppose we also want to add phone number column for every department. So our table will be like,

Id Name Department HOD PNO
1 S1 CSE Prof. A 32272
2 S2 EC Prof. B 67897
3 S3 CSE Prof. A 32272
4 S5 IT Prof. C 67327
5 S4 CSE Prof. A 32272

Now suppose if phone number of CSE department changed from 32272 to 32222, you have to make the changes at three places.

Id Name Department HOD PNO
1 S1 CSE Prof. A 32222
2 S2 EC Prof. B 67897
3 S3 CSE Prof. A 32222
4 S5 IT Prof. C 67327
5 S4 CSE Prof. A 32222

The same is true for deletion also. So you can see that for insertion, deletion as well as updation, this database is inefficient.
So we will use normalization here,
We will break StudentInfo table into two tables as Student and Department as,
Student

Id Name Department
1 S1 CSE
2 S2 EC
3 S3 CSE
4 S5 IT
5 S4 CSE

Department:

Dept_ID HOD PNO
CSE Prof. A 32222
EC Prof. B 67897
IT Prof. C 67327

We can make Department column of Student table as a foreign key which will refer to Dept_ID of Department table.
Here whenever we will need details of department, we will refer to the Department table with the help of the foreign key.
Now, suppose you want to update phone number of any department so you just need to update that only one place. Hence, our database is more efficient now.

First Normal Form (1NF):
The following criteria must be satisfied for 1NF.

  1. Values of each attribute are atomic.
  2. No composite values
  3. All entries in any column must be of the same kind
  4. Each column must have a unique name
  5. No two rows are identical  

Consider the below Student table

Roll Name Courses
101 Asif DBMS, CN, CD, SE
102 Amit CO, OS
103 Arpit CD, OS, CN, DBMS

Here you can see that column Courses contain multiple values in a single row. This is not allowed in 1NF.
We can convert this table into 1NF as below,

Roll Name Courses
101 Asif DBMS
101 Asif CN
101 Asif CD
101 Asif SE
102 Amit CO
102 Amit OS
103 Arpit CD
103 Arpit OS
103 Arpit CN
103 Arpit DBMS

Second Normal Form (2NF):
A relation is said to be in 2NF if it is in 1NF and all non-prime attributes are fully functional dependent on any key of R.
Example: Consider the relation R(A,B,C,D,E,F) and the following functional dependencies,
F = { A→ BCDEF
         BC →ADEF
         B→F
         D→E }
Candidate Keys = {A, BC}
Prime Attributes = A, B, C
Non prime Attributes = D, E, F
BC is a candidate key so any of it's subset can not determine other attributes of the relation but B is determining attribute F here.
So B→F is a partial dependency and hence this relation is not in 2NF.
In order to make it 2NF, we will break the relation into two parts: R1 (ABCDE) and R2 (BF).
Now the relations Rand Rare in 2NF.

Third Normal Form (3NF):
Criteria for 3NF:
1. Relation should be in 2NF.
2. No non-prime attribute should be transitively dependent on candidate key. 
   OR There should not be the case that a non prime attribute is determined by another non prime attribute.
Example: Consider the relation R(A,B,C,D,E,F) and the following functional dependencies,
F = { A→ BCDEF
         BC →ADEF
         B→F
         D→E }
Check for 2NF:
Candidate Keys = {A, BC}
Prime Attributes = A, B, C
Non prime Attributes = D, E, F
BC is a candidate key so any of it's subset can not determine other attributes of the realtion but B is determining attribute F here.
So B→F is a partial dependency and hence this relation is not in 2NF.
In order to make it 2NF, we will break the relation into two parts: R1 (ABCDE) and R2 (BF).
Now the relations Rand Rare in 2NF.
Check for 3NF: 
We have two relations now R1 (ABCDE) and R2 (BF).
R2 (BF):  We have B→F dependency here.
Candidate key = B
Prime Attribute = B
Non Prime Attribute = F
Since there is only one non prime attribute so no chance of violation of 3NF. So R2 (BF) is in 3NF.
R1 (ABCDE): We have following dependencies here:
A→ BCDEF
BC →ADEF
D→E 
Candidate keys = A, BC
Prime Attributes = A,B,C
Non Prime Attributes = D, E
Here D→E dependency is violating 3NF because 'D' is a non prime attribute and it is determining another non prime attribute 'E'. So in order to make that relation in 3NF we will break the relation into two parts as,
1. R11(ABCD)
2. R12(DE)
Now all relations R11(ABCD), R12(DE) and R2 (BF) are in 3NF.

Boyce Codd Normal Form  (BCNF):
Criteria for  BCNF:  
1. Relation should be in 3NF
2. For each dependency X→Y, X should be a super key.
Example: Consider the realtion R(ABCD) and the following FDs,
F = { A→BCD
         BC→AD
         D→B
      }
Candidate Keys = A, BC
Prime Attributes = A, B, C
Non prime attributes = D
Check for 2NF:
There is no partial dependency, hence relation is in 2NF.
Check for 3NF:
There is only one non prime attribute so no chance of 3NF violation. Hence relation is in 3NF.
Check for BCNF:
Here  D→B is violating BCNF because 'D' is not a superkey. So it is not in BCNF.
In order to make it BCNF, relation will be divided into two parts as,
1. R1(ADC)
2. R2(DB)
Now relations R1(ADC) and R2(DB) are in BCNF.

Contributor's Info

Created: Edited:
3Comments
Michael Doley @michaeldoley 24 May 2017 10:16 pm

Thanq sumit bhaiya..:)

Michael Doley @michaeldoley 24 May 2017 10:17 pm

Thank you Sumit bhaiya

Sumit Verma @sumitverma 25 May 2017 05:41 pm

your welcome @michaeldoley 

Database Normalization

Database Normalization

 

Normalization is a systematic approach which is applied on relations to reduce the degree of redundancy. It is defined as a systematic approach because it always gives guarantee for the following properties:

1) Lossless decomposition and

2) Dependency preservation

 

Normal form: It is the property of relation which indicates the amount of redundancy existing in relation. Normal form and degree of redundancy are inversely proportional.

When the normalization process gets applied on a relation, it performs the following activities:

1) It finds out the highest normal form of the relation

2) Then the normalization process will start decomposing relation from its existing normal form to the higher normal form.

 

Procedure for finding the highest normal form of the relation:

1) Find all the possible candidate keys of the given relation

2)Divide the attributes of the relation into two groups, they are prime or key attributes and non-prime or non-key attributes.

3)Identify all the existing full dependencies, partial dependencies, transitive dependencies and overlapping candidate key dependencies.

4) Refer to the definitions of normal forms and hierarchy of normal form before evaluating highest normal form of the relation.

 

Now, lets see some simple definitions which will help in solving the problems faster:

Prime or key attributes: The attribute of the relation is said to be either prime or key attributes iff it exist in atleast one of the possible candidate keys.

Non-prime or non-key attributes: If the attribute of the relation does not exist in any of the possible candidate keys, then it is a non-prime attribute.

 

The simplified definitions of types of dependencies(for problem solving):

 

Full dependencies: While identifying full dependencies, make sure that determinant(left side of a function dependency) is either a candidate key or a super key.

Partial dependency: While identifying partial dependency, make sure that dependent(right side of relation) is a non-prime attribute and determinant is a part of candidate key.

Transitive dependency: While identifying transitive dependency, make sure both dependent and determinant are non-prime attributes or determinant can be combination of part of C.K. along with a non-prime attribute.

Overlapping candidate key dependency: While identifying these type of dependency, make sure dependent is part of C.K. and determinant can be prime or non-prime attributes, but not the C.K.

 

Finding the normal form:

1) According to Codd's rules, every relation should be in minimum of First normal form.

2) A relation is in Second normal form, if it is in First normal form and does not have any partial dependency.

3)A relation is said to be in Third normal form if it is in Second normal form and does not have any transitive dependencies.

4)A relation is said to be in BCNF if all the dependencies of relation are full dependencies.

Contributor's Info

Created:
0Comment