9 minute read

The components of system design

When designing any system you will realise that it will require to store some data. That is where a databases comes in. Choosing a database for a job can either make-or-break your system.

Introduction

A database is a collection of data stored in a specific way so at to allow efficient storage and retrieval of data.

A Database Management System(DBMS) is a software that interacts with a database providing an interface for creating, modifying and querying a database. Examples of DBMS include SQL Server, MySQL, PostgreSQL, Oracle and MongoDB.

Types of Database Management System(DBMS)

There are hundreds of databases which have different properties and are optimized for special functionalities. It’s difficult to group the databases, however, when we look at the structure imposed on the data we can come up with two main groups: relational and non-relational databases.

Relational Databases

Relational databases are also known as Relational Database Management System(RDBMS) or SQL databases. They organize and store data in tabular format of rows and columns, making it easy to understand how data relates to one another. The most important ones are MySQL, SQL Server, PostgreSQL and Oracle.

Benefits of Relational Databases

i. Query capabilities - It provides a powerful query language using SQL to allow easy data retrieval and data manipulation.

ii. Simplicity - The main benefit of relational databases is provides an intuitive way to store data and allows an easy access to related data points. It’s free from complex structuring. This is why it’s commonly used by organizations which handle large amounts of structured data, from tracking products to handling transactions.

iii. Normalization - Relational databases employs normalization to reduce data redundancy.

iv. Data Security - It provides multiple levels of security, including user-level to system-level security which protects data stored in database.

v. ACID Compliance - Relational databases supports ACID compliance. ACID is a concept that describe four properties of a transaction of a database system which ensure accuracy and integrity of data. A transaction is a group of operations that are executed as a single unit of work. An example of a transaction is when transferring money between bank accounts. Money must be debited from one account and credited to another. The key property of the transaction are:

  • Atomicity - When performing a transaction, all the operations are performed or none of them are. Atomicity ensures that if debit is made successfully from one account, the corresponding credit is made in the other account.

  • Consistency - Consistency ensures that the transactions maintain data integrity, leaving the data consistent and correct. Data constraints must be followed. For example we can have a constraint in the amount column that the data cannot be negative. If the transaction leaves data in invalid state, the transaction will be aborted and an error is reported.

  • Isolation - Isolation means that the intermediate state of transaction is invisible to other transactions until a commit is done(concurrent control). For example Account A has $1,000 and there are two transactions happening at the same time. Transaction A wants to transfer $1000 to another account and transaction B wants to transact $200 to another account. If these two transaction were allowed, they would leave the account in an invalid state(-$200). To prevent this, a database should only allow only one transaction on an account at a time. The transactions should be done sequentially and put in some sort of a queue.

  • Durability - Durability ensures that when a transaction is completes successfully, changes will persist in non-volatile memory even in the event of system failure.

Limitations of Relational Databases

i. Expensive - The main drawback of relational databases is increased costs. Relational databases are costly to setup and maintain. It requires performance hardware and a lot of storage which is costly. The initial cost of software can be pricey. It gets worse when you factor in the cost hiring skilled personnel to manage the relational databases.

ii. Scalability - Relational databases may struggle to handle very large amounts of data or a high volume of concurrent transactions. Relational databases are typically designed to run on a single server, you can only scale vertically. Vertical scaling is limited to processors and memory upgrades. To scale relational database effectively it has to be distributed on to multiple servers. Handling tables across different servers is difficult.

iii. Flexibility - Relational databases are designed to store structured data, making it difficult to handle unstructured or semi-structured data such as text or images. They require a schema which makes development difficult if you don’t know the data-shape upfront.

iv. Performance - Relational databases may have performance issues when executing complex queries or when dealing with a high number of joins.

v. Limited Indexing - Relational databases typically have a limited indexing capability, which can lead to poor query performance for large datasets.

Non-Relational Databases

Non-relational databases are also known as non-relational database management system or NoSQL(not only SQL). They are non-tabular. There are four major types of NoSQL databases: document, key-value, wide columns and graph databases.

  • Document Databases - Stores data in a document. A document is a container for key-value pairs. The document is stored in JSON or XML format. The are unstructured and don’t require a schema. The documents are grouped together into collections. The collections can be organized into hierarchy allowing you to model and retrieve relational data to a significant degree without joins. They are mainly suited for Online games, IOT, mobile apps etc. The most popular ones are Firestore, DynamoDB, CouchDB and MongoDB.

  • Key-Value databases - Store data in a format where each item contains keys and values. They store data in a hash table where each key is unique just like in dictionaries and the value can be JSON, string, BLOB(Binary Large Object) etc. Key-Value data store mainly store data in memory and not the disk. This limits the data you can store but makes it extremely fast.They are mainly used for data caching.A good example is Redis or memcached.

  • Wide-Column database - A wide-column database, also known as a column-family database, is a type of NoSQL database that organizes data into rows and dynamic columns. It just like key-value databases but it adds columns in the values. Individual rows then constitute a column family. This allows for more flexible and efficient storage of large amounts of unstructured data, as well as faster querying and retrieval of specific data subsets. Column Dbs are ideal when your app is write heavy with little updates. A popular use case is for storing time-series data such as records for an IOT device. Examples of wide-column databases include Apache Cassandra, Hbase and Amazon SimpleDB.