A Guide to Normalization 

Data is abundant, so organisations have large databases to store the massive volumes of data generated daily. It is well-known that the data stored is analysed by scientists and analysts to identify patterns and trends and forecast upcoming marketing decisions. However, as the data collected from diverse sources may be redundant, data analysts first employ several procedures known as normalization to separate the important data from the redundant ones.

What Is Normalization?

Normalization is the effortless process of organising the data by eliminating redundant data and maintaining its data integrity. The data will then be neatly tabulated after being normalized, making the analysing process hassle-free.

Before learning about the normal forms, first, let’s learn about the keys.

Types of Keys in DBMS

A key in a DBMS can uniquely identify any record or row in a table. Moreover, it is employed to define and pinpoint relationships between tables. Usually, a table comprises a variety of key kinds.

The different types of keys you should be aware of are,

  1. The Primary Key
  2. Candidate Key
  3. Super Key
  4. Foreign Key
  5. Composite Key

The Primary Key

The primary key is the only key used to identify an entity uniquely.

For example, the employee id in an employee table is unique to each employee and is used to identify an entity uniquely.

Candidate Key

The candidate key is a collection of keys, each representing an entity’s distinct attribute. The most obviously unique key among the candidate keys is chosen as the primary key.

In the employee table, for example, in addition to the employee id, other criteria such as passport number and licence card number are other keys that can uniquely identify an entity.

Super Key

A super key is a set of attributes that can be used to identify a tuple uniquely. A super key is a candidate key’s superset.

Foreign Key

Foreign keys are the columns of the table used as the primary key in another table.

For example, the department id column in the employee table is the primary key of the department table.

Composite Key

The DBMS key with two or more characteristics that, when combined, may uniquely identify a tuple in a table is known as a composite key. When no column may serve as a primary key, other columns are concatenated to serve as the primary key, known as a composite key.

Database Normalization

Data is usually collected from various types of sources in large amounts. The collected data initially is collected in large tables in the database.

These enormous tables frequently have an abundance of anomalies that could cause the database to be inconsistent, lose its integrity, or feature redundant data. Furthermore, these inconsistencies might hinder analysis and lead to inaccurate predictions.

Hence why analysts first normalise the database tables using the following types of normal forms,

  • 1NF or First Normal Form
  • 2NF or Second Normal Form
  • 3NF or Third Normal Form

Most tables will reach normalization in the third normal form.

Now, let’s not further delay and look at the different types of normal forms.

First Normal Form 1NF

The rule of the First Normal Form is that,

  • Each record must be distinct.
  • Each cell of the table should contain only one value.

So even if a column were to have two or more values in the same cell, it should be separated into different rows.

Example:

The initial table in the example below includes two values for each row. The two values are divided into two records.

Before 1NF:

ID   Name   Courses

——————————–

1   A      t1, t2

2    E      t3

3    M      t2, t3 

After 1NF:

ID   Name   Courses

———————————

1    A       t1

1    A       t2

2    E       t3

3    M       t2

3    M       t3

Second Normal Form 2NF

The rule of the Second Normal Form is that,

  • The table first should be normalised in the first normal form.
  • The table should not possess partial dependency.

This means the primary key should not depend on other candidate keys.

Example:

The T2 is mentioned twice in the below example, which is redundant data. By applying 2NF, the following table will be split into two; table 1 will mention the student and their topics, while the other table will include the topics and their relevant fees.

Before 2NF:

STUDENT_NO           TOPIC_NO    TOPIC_FEE

1                     T1                  2500

2                     T2                  3333

1                     T4                  6700

4                     T3                  5500

4                     T1                  2500

2                     T5                  5003

After 2NF:

Table 1                                    Table 2

STUDENT_NO           TOPIC_NO          TOPIC_NO                TOPIC_FEE     

1                 T1                  T1                        2500

2                 T2                  T2                        3333

1                 T4                  T3                        5500

4                 T3                 T4                        6700

4                 T1                  T5                        5003        

2                 T5                         

The reduction of redundant data is the aim of 2NF. The table would include 100 entries for the fees if 100 students chose option T1. By splitting the data into two tables, you can now use the other table to determine that T1’s fees are 1000.

Third Normal Form 3NF

The rule of the Third Normal Form is that,

  • The table should be normalised in the second normal form.
  • The table should not have any transitive functional dependency for non-prime attributes.

Non-prime attributes are those that do not belong to the candidate key set. These non-prime attributes should not depend on other non-prime characteristics in the table.

The third normal form is used to enhance data integrity.

Example:

In the below, table the STUD_NO determines the sub_id, which automatically determines the SUB. STU ID, therefore, identifies SUB through SUB ID. This suggests that there is a transitive functional dependency on the table.

Before 3NF:

STUD_NO NAME SUB_ID SUB ADDRESS

1 John 11 SQL California

2 Matthew 12 C++ Florida

3 Abraham 13 Java Illinois

4 Chris 12 C++ Oregon

Hence, the above table will be divided into two, ensuring that the other columns depend only on the primary key.

After 3NF:

STUD_NO NAME SUB_ID ADDRESS

1 John 11 California

2 Matthew 12 Florida

3 Abraham 13 Illinois

4 Chris 12 Oregon

SUB_ID SUB

11 SQL

12 C++

13 Java

12 C++

The Bottom Line on Normalization

Normalization is done to make the process of analysing databases a lot easier and efficient enough to derive solutions. In this article, we have discussed the main three types of normal forms. Though another form exists, usually, analysts only normalise the table till the third form. 

Want to know more about data? Check out these blogs on Interval Data, Qualitative versus Quantitative Data, and this one on Database Programming.

Start learning code

If you want to learn more about things you can do with data, check out our article on predictive analytics. Alternatively, if you want to learn some coding basics for free, then check out our free 5 Day Coding Challenge. Register now through the form below.

Programming & Coding Terms, Defined 

As with any skill, it is essential to familiarise yourself with the fundamental terminologies used in that field before fully committing to practice mode. Understanding coding words can speed up and improve your knowledge of your area. This blog looks at simple definitions for programming and coding terms.    We have provided you with a list […]

6 Essential Non-Technical Skills A Software Developer Should Have in 2023

Whenever you search “What skills should a developer possess” in Google, you will be responded with tons of results advising you on learning various programming languages, algorithms, and other technical aspects of programming. However, is mastering the art of programming alone enough to land a job? Not necessarily. In this article, we look at some of […]

A Guide to Node JS: What is it & What's It Used For?

Node.js is an open-source runtime environment that allows javascript code to be executed outside the browser. It is based on the V8 JavaScript engine of Google Chrome. If you are working on a web application project, you probably have come through the term Node.js. This fairly popular tool is mainly used for backend development and […]