Friday, October 14, 2016

Vertical scaling vs Horizontal scaling: Which one is more challenging?

I think this topic is more appropriate when it comes to deciding the database system for your application. Vertical scaling refers to scale the database system by providing more hardware resources to the host machine. In the other hand, horizontal scaling refers to scale out the database systems by adding more nodes. Either way the objective is to have better performing database system so that it can serve users fast. 

I opened up a discussion with my fellow DBAs and DB enthusiasts and below are some of the views they expressed about the topic. 

"I think Horizontal scaling (by adding node/machines) is less challenging. Any way in vertical scaling you will hit a ceiling with CPU memory etc depending on the capacity of the machines used." [Bhanu]

"Vertical scaling is less challenging to implement as it is just to increase the resources of a given machine.in a virtual environment VS becomes more and more easier to implement. However, as Bhanu pointed out, there is a limit where u can reach with VS.
Horizontal Scaling will require more expertise (hence costly), additional software and in some cases, even additional hardware features.  Therefore, HS will be more challenging to implement" [Shamil]

"Horizontal scaling is much more align with NoSQL Technologies where as in Relational aren't.
Relational does this through partitioning and there should be some routing mechanism implemented in some where to cater.
My guess is, it is based on the technologies we are trying to implement." [Abi]

"I would say, irrespective of the technology still Vertical Scaling is less challenging to implement. for example, adding more RAM and processor power to a mongoDB box is much easier than implementing mongo sharding.

In the other hand, even relational DB technologies support horizontal scaling. SQL server has AlwaysOn and Oracle has Real Application Clusters. both can do Horizontal Scaling." [Shamil]

My view

Regardless of the DB system, the core concept is to process small chunks of data to achieve better performance. If you consider SQL Server we always try to deal with small chunks of data, when scanning, deleting, updating, inserting, etc. That is because no matter you increase the processing power, the underneath algorithms are not salable. As a result you're forced to deal with small chunks of data to achieve better performance. By adding multi-cores does not resolve this mathematical limitation of algorithms. Then there is additional overhead when it comes to threads management. 

In relational db systems, they are designed to handle large monolith databases in a single server. That is the reason we've many TBs of databases in relational systems and we're facing many limitations and challenges with those databases. 

When it comes to SQL Server, it has some horizontal scaling solutions like table partitioning and replicas. If you consider table partitioning, it is still within a single machine and you can scale out IO operations within a single machine. (There is an exception to this with the newly released version of SQL Server 2016. A feature called stretched database allows you to partition a table to keep frequently working data set in on-prem and less frequently used data set in cloud.) You can not distribute partitions among many nodes. So still it has a limitation. 

If you consider replicas in SQL Server, those can be distributed in many nodes. However replicas are read only so that you can scale out applications reads but still writes are not scalable. In AlwaysOn, writes can be accepted by the primary node only. Consequently SQL Server does not provides true horizontal scaling capability just because of the limitations that I've mentioned above. 

In true horizontal scaling, data should be able to partitions and assign to different nodes/machines. This technique is called sharding. Database systems like, MongoDB, Cassandra and DynamoDB all provides this capability. In these database systems, the data can be partitioned and also these partitions can be placed in many nodes which gives the horizontal scaling capability backed by additional CPU and memory power. In other words, CPU and memory can be scale out to other nodes.

The specialty of Cassandra and DynamoDB is, they have masterless architecture which means any nodes of the cluster can accepts reads as well as writes. So there is no single hot spot when it comes to writes as in SQL Server. However, in this architecture has numerous challenges since data is fully distributed among many nodes by using some kind of hashing technique. You can imagine how complex is to handle the concurrency related issues when the your data is distributed and the complexity becomes increase when any nodes can accept the write. These challenges are simply because those DB systems are designed for horizontal scaling. 

So in my opinion horizontal scaling is more challenging than vertical scaling. Actually in vertical scaling there is no distinctive challenge.

Hope this is interesting discussion. 

Cheers!

7 comments:

  1. Great Content and love the collection of views.

    I think horizontal and vertical scaling are both equally challenging.
    In fact, I believe horizontal scaling is the solution (yet challenging) we found as the practical limit to scaling up on a single machine, which i believe is a great challenge.

    ReplyDelete
  2. I think I may need to clarify the meaning of challenging in this context. The challenging in this context means the complexity of the database system software piece itself. Adding more memory, CPU and storage in single machine does not change the way to handle concurrency, memory management and storage management because data is not distributed. The data still in the scope and control in the power of single machine. However when same database is partitioned among many nodes adds lots of complexity when it comes to concurrency handling and managing the data in globally distributed architecture. For example, in distributed system, lets assume when you insert a record, it can accept by any node and after that it replicates to two other nodes. Next time when there is an update operation issued for the same record it has to locate the original entry using the hashing algorithm and the same update should again needs replicate to two other nodes. I think this is more complex when working on a database which is residing in a single database.

    ReplyDelete
  3. I agree..

    However, as you have mentioned, beefing up a single machine doesn't scale up (in many aspects concurrency, etc etc) as horizontal scaling would, which is exactly my point..I see that as a great challenge (which may/may not have been solved, yet).

    In order to further elaborate, If we consider similar work loads on the two rivals, as the load goes up vertical scaling will face practical ceilings.Which we would have to think out of the box and solve (one way or the other all problems can be solved, I believe in that, as mentioned in my previous comment that is how we thought of horizontal scaling), may be there are other ways..

    Yeah, so there is no doubt that horizontal scaling is challenging yet a greater way to handle many problems but if we consider the definition of challenge I would vote for vertical scaling since it is about managing what we have to achieve goals.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. One of the quick advantage of vertical scaling over horizontal scaling is, it can be done without code changes and it simplify the code debugging.

      I believe that, horizontal or vertical scaling is depending on the requirement and other factors (like what you have discussed here). Some applications can be accept that distributing a data is good enough for them and some are not. Both sides have Pros & Cons and we need to make sure that select right decision at right time.

      Cheers!

      Delete
  5. From: Dinesh Asanka
    Horizontal scaling is the future, I think. With the emergence of Hadoop and No SQL technologies Horizontal scaling will grow. Cloud is another concepts which are increasing getting it's mark in the technological arena now. With the Horizontal scaling, you have the option of leveraging benefits of cloud much more.
    vertical scaling is more classical and kind of boring :)

    ReplyDelete
  6. Yep, agrees that horizontal scaling is the way of the future and it has its own challenges/limitations.
    On a side note to what Susantha had to say, on SQL server not been a true write scaling technology with it's HADR solution. There is also the read limitation on the passive replica nodes. ie. Assume a two node synchronous replica, a transaction will commit on the primary only after its harden on the secondary. The hardening of a transaction will require the data to be loaded to memory before written to disk. This will cause the passive node to be less effective as a true readonly replica with the data cache been consumed (even polluted).

    ReplyDelete