Back to top

Relational vs No-SQL

Quick Video Glance

Develop Intuition

The comparison between RDS and NoSQL data-store can be clearly understood using a real-life analogy of living in downtown vs. living in the suburbs. Due to the scarcity of land in the city, the development is often done vertically in the form of skyscrapers, whereas, in the suburbs, there is an abundance of land, so new constructions are done on land (aka horizontal growth). Same as downtown (i.e. vertical growth), relational databases are generally scaled up by adding more computational resources to the existing hardware (which is costly and soon hits the threshold), whereas, NoSQL stores can be scaled out by simply adding more commodity hardwares which are comparatively a lot cheaper.

Fig 0: Relational vs NoSQL - City vs Suburb Landscape Analogy

Evolution of Databases

Database Management Systems (aka database) are software applications which are used to provide data to other applications. In the early days, data used to be stored in a flat-file system which could not store relationships between different data entities. The evolution of databases started with the advent of Hierarchical Data-Model in the 1960s. In this model, the data was stored as records in a tree-like structure where each record can have only one parent node, which resulted in this data-model being inflexible. In 1969, the research on Network Data-Model got published by the Conference on Data Systems Languages (CODASYL). This model structured the data in a graph-form where each record can have multiple parents. This data-model failed to become dominant due to two reasons: a) IBM chose to stick with the hierarchical model with semi-network extensions ii) relational model became more popular around the same time (after E. F. Codd introduced the concept in 1970s). The Relational Data-Model stored the data as relations (collection of tables) and provided relational operators for data manipulation. Object-Oriented-Databases first appeared around 1985 which allowed information to be stored and manipulated as objects. However, object-oriented databases didn’t get enough traction as a relational model provided lesser coupling between application and databases.

The data needs grew in the 21st century resulting in the further evolution of databases with the center of the focus on catering to the different flavors of usage patterns. This led to the rise of NoSQL, which in most cases is more flexible and scalable than relational databases. In the rest of this chapter, we will gain deeper insights into Relational and NoSQL databases. The image below shows the evolution which databases have had so far after its inception.

_Fig 1: Journey of DBMS so far_

Relational Databases

Relational databases have had a long period of dominance and have been the default choice for enterprise data storage, even though there have been alternative database technologies such as object databases since the 1990s. Some of the popular relational databases are: Oracle, MySQL, Postgres and so forth. These databases use structured query language (SQL) for defining and manipulating data and structure of the data stored in them is determined using pre-defined schemas. This implies that all the data needs to follow the same structure and any change in the structure may be disruptive.

Relational databases provide multiple advantages due to ACID properties, which stands for Atomicity, Consistency, Isolation, and Durability. The acronym ACID was coined by Andreas Reuter and Theo Härder building on earlier research by Jim Gray. In the context of databases, a sequence of operations that satisfies the ACID properties is called a transaction. Let’s try to understand what these individual properties signify.

  • Atomicity: It guarantees that each transaction is treated as a single “unit,” which either succeeds completely or fails completely. In the scenario any of the statements constituting a transaction fails to complete, the entire transaction fails, and the database is left unmodified.
  • Consistency: This property ensures that any transaction on the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
  • Isolation: It ensures that the concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
  • Durability: This property guarantees that the transaction is committed even in case of power outages. It implies that transactions are recorded in non-volatile memory.

Limitations of Relational Databases

Nevertheless, they have had some major issues from their early days, some of them are listed below.

  • Impedance Mismatch: It refers to the different representations of the relational model and in-memory databases which require translations. The issue lies with the foundation of relational databases which are: i) tuple or row, which is a set of name-value pairs and ii) relation or table, which is a set of tuples. The values in a relational tuple have to unaffected - they can’t contain any structure, such as nested record or list. Therefore, a richer memory data structure had to be translated into multiple tables, as shown in Fig 2.

Fig 2: An order data structure is split into relations and tuples

The frustration with impedance mismatch has been minimized to a great extent by using object relational mapping frameworks - such as Hibernate and Mybatis. However, the mapping problem with relational databases still remains unsolved.

  • Running on clusters: With the advent of the Internet, there is an apparent increase in the demand for data storage. In order to combat this increased demand, we have two choices: Scaling Up and Scaling Out. Given that scaling up requires bigger machines and more significant bills, the option we had was to scale out by using a lot of small machines in a cluster using commodity hardware. Often, relational databases which run on clusters have a single point of failure which make them inefficient in running on large clusters.

NoSQL Databases

The term NoSQL database was first coined by Carlo Strozzi in the late 90s when he created an open source relational database: Strozzi NoSQL. The term NoSQL was derived from the fact that the database got manipulated using shell scripts rather than SQL as a query language. However, the usage of the term NoSQL kicked off when Johan Oskarsson used it as a Twitter hashtag for a meetup on June 11, 2009, in San Francisco. The emergence of NoSQL databases is based on the massive data storage needs of the early 21st century which also led to the rise of polyglot persistence - using different data stores in different circumstances based on the understanding of the nature of data. NoSQL databases share some common characteristics which are listed below.

  • Running on Clusters: Most NoSQL databases are efficient at running on large clusters. However, some NoSQL databases such as Graph database aren’t strongly geared towards running on clusters but offers a range of different data models which are ideal for dealing with complicated relationships.
  • Schema-less: These databases generally operate without a schema, allowing you to freely add fields to database records without having the need to change the data-storage structure.
  • Open Source: Often, NoSQL databases are open-source projects. Nevertheless, NoSQL has been frequently applied to closed-source systems.

Comparison Chart

We have provided comparison between relational and NoSQL data stores on some important aspects below.

Sl. No.

Description

Relational Database

NoSQL Database

Schema

Follow a pre-defined schema for structured data

Follow a dynamic schema for unstructured data

Scalability

Scale Up – Increase hardware configurations such as memory, CPU and so forth.

Scale Out – Adding more commodity hardware

Query Language

Use structured query language (SQL)

Aggregate oriented NoSQL databases don’t use standard query language. Graph based NoSQL stores do use query language for traversal (e.g. Gremlin)

Storing Hierarchical Data

Not fit for storing hierarchical data

Good for storing hierarchical data as it stores data in a key-value pair way

Handling complex transactions

Best fit for handling complex transactions spanning over different data entities as it guarantees atomicity

Doesn’t support transactions spanning over multiple aggregates

Support for complex queries

Supports queries spanning over multiple data entities using joins and foreign keys

Complex queries need to be implemented in the application logic

Conclusion

The growth in NoSQL development is not only the result of big data running on clusters, but also, it’s the age-old frustration with impedance mismatch problem of relational databases. The primary reasons for considering NoSQL are i) handling significant data access that requires a cluster ii) enhancing application development productivity by using more adequate data access patterns. However, relational databases are still the choice for data which isn’t changing frequently and has a lesser hierarchical depth.

References

  1. https://martinfowler.com/books/nosql.html
  2. https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
  3. https://en.wikipedia.org/wiki/Hierarchical_database_model
  4. https://en.wikipedia.org/wiki/Network_model
  5. https://en.wikipedia.org/wiki/Relational_database
  6. https://en.wikipedia.org/wiki/ACID