Databases, in combination with a Database Management System (DBMS), allow persistent storage, easy administration, and safe manipulation of information and are therefore among the most important computer applications. Data in a database is structured according to one of the multiple sets of rules. Proposed in 1970 by E.F. Codd, the relational data model is the foundation of the most used DBMS. As a general rule, it is therefore recommended to use a relational database unless the form of data or access patterns is better suited for other forms of databases.
Deciding how to break up data into multiple tables and then establishing the relationships between these tables is called data modelling. The data model shows the tables and their relationships.
Entities are abstractions of real-world objects and feature properties, which are called attributes. As an example, employees are an entity with the attributes employee ID, name, department, and internal telephone number. Department would be another entity with the attributes department ID, department name, and number of employees.
Relationships connect entities with each other, usually through a predicate verb, e.g. <employee name> works in <department name> or <department name> produces <product>. For each side of a relationship, it is specified how many such relationships the entity can have. This is called the cardinality of the relationship. There are four possible cardinalities:
- one-to-one relationship: each employee works in one department
- one-to-many relationship: each department produces multiple products
- many-to-one relationship: many employees work in the same department
- many-to-many relationship: many customers can order the same products
What is a Relational Database?
In a relational database, entities are listed in tables. Each table shows the connections between the entities’ attributes. The table’s schema defines which attributes each record in the table must include. Each attribute is listed in a column, and each realization of the entity is displayed in a row, which is also often called a record. Attributes have a specific data type, like string for name and department, and integer for internal phone number and EmployeeID. A table for the employees from the above example looks like this:
Each record in the table is a tuple, e.g. (2011, Sandra Keller, IT, 45). If there is no data available for an attribute or if an attribute is not applicable for a record (like pregnancy leave for male employees), the special value Null is entered. The attribute that allows the unique identification of a tuple is called the primary key. In the above example, the employeeID is the primary key. The primary key can’t be Null and each table can have only one primary key.
Normalization is the process of optimizing the organization of data in a relational database. It aims to protect the data from corruption and inconsistent dependency, make the database more flexible and speed up data access by eliminating redundancy, and reduce unnecessary disk space usage. The rules for database normalization are called Normal Forms. Although there are 7 Normal Forms, the first three Normal Forms are considered the highest level necessary for most applications. Steps to normalize data include eliminating repeating groups in individual tables, creating separate tables for each set of related data and identifying these sets with a primary key, creating separate tables for sets of values that apply to multiple records and relating these tables with a foreign key, and eliminating fields that do not depend on the key.
What is a Relational Database Management System?
A relational database management system (RDBMS) is used to maintain relational databases. It allows the user to perform CRUD (create, read, update, and delete) operations, as well as relational operations based on relational algebra. Examples of relational operations are
- The selection operation σ which returns all tuples from a relation that meets a specific criterion, e.g. all employees from the IT department.
- The projection operation π which returns only the specified attributes from one or more tuples, e.g. name and department.
To query the database, most RDBMS use SQL (Structured Query Language).
Create, modify or delete a relation
To create our employee table, the command create table is used together with the attributes, datatype and (optional) constraints.
CREATE TABLE Employees ( EmployeeID integer primary key, Name char(50) not null, Department char(30), internal telephone number integer(4) );
- alter table: allows the addition of attributes to a table, but not the removal
- drop table: deletes the relation
Create custom views
To create views for user groups or users by only providing relevant data from the database, the command create view is used. The following code snippet creates a view called salesEmployees and returns the names and internal phone numbers for members of the Sales department in the Employees table.
CREATE INDEX EmployeeName ON Employees(Name);
Insert, update, and delete records
To insert a complete tuple (or record) into the employees’ table:
INSERT INTO Employees values(2274, 'Thomas Hupe', 'IT', 20);
To insert an incomplete tuple:
INSERT INTO Employees (EmployeeID, Name, Department) values(2275, 'Michelle Gunderson', 'Sales');
To update a tuple:
UPDATE Employees SET Department = 'IT' WHERE EmployeeID = 2275;
To delete a specific tuple and with boolean operators:
DELETE FROM Employees WHERE Name = 'Thomas Hupe' AND Department = 'IT';
To delete all vertical entities from a relation:
DELETE FROM Employees WHERE Department = 'HR';
Top Relational Database Management Systems
Here is a brief overview of popular RDBMSs:
- Two licensing models: free Community Server and proprietary Enterprise server
- Supports both SQL and JSON data
- Focus on robustness, stability, and maturity
- Proprietary, enterprise-grade RDBMS
- Supports SQL, JSON, XML, Spatial Data, RDF Store, and offers Blockchain Tables
- Suited for Data Warehousing
- Most advanced Open Source RDBMS with many extensions
- Supports SQL, JSON, XML, Key-Value, and Spatial Data
- Offers a sophisticated query planner and advanced reliability and disaster recovery
Microsoft SQL Server
- Proprietary RDBMS with free and commercial licences
- Supports SQL, JSON, and Spatial Data
- Great tooling support
- Most used database engine in the word
- Stable, cross-plattform, and backward compatible file format
- Full-featured SQL implementation with advanced capabilities, zero-configuration, easy to use API and small code footprint.
Scott Böning, Code Institute Graduate
Experience Software Development