When a system grows it’s data also grows with it requiring the database to be scaled in order to handle loads of data. The two broad approaches for scaling a database is vertical scaling and horizontal scaling. Same as in scaling an application server vertical database scaling refers to allocating more resources to the database server and horizontal scaling refers to techniques like database replication and database sharding
However as in every system design it all depends on the context. Before blindly jumping into replication or sharding there might be other possible optimization you could do to scale your database to support the load
Query Optimizations & Connection Pools
Effectively these things are should be there from the beginning if you are planning to cater to a large amount of requests with low latency. But it never hurts to pay the first attention to these basics and make sure things are in order.
- Proper indexing
- Add redundant columns (denormalize data) to reduce join queries
- Use connection pooling and monitor/tweak connection pool capacity
Vertical Scaling
Vertical scaling refers to adding more resources to the database server to improve it’s performance and support the load. This can be either CPU/RAM or Disk depending on the requirement.
Pros
- Easier to do since there is zero configuration for the database. Simply increase the CPU/RAM/Disk for the server instance
- Powerful instances are already available via major cloud providers (Ex: Amazon RDS with 24TB of RAM)
Note
Stackoverflow supported 10 million MAU with a single master database
Cons
- Single point of failure
- Limited by the possible maxed-out configuration for the server
- Cost
Database Replication
Database replication can be handy when you want to improve read performance for database queries. In most of the cases read
operations are more frequent than write
operations in the database. Replication allows us to distribute read
operations between multiple instances thus reducing the load on the source
node and improving read
throughput.
To setup replication
- Configure two or more database servers having the same schema
- Configure the DBMS to use one of the nodes as the
source
and others asreplica
nodes.
replica
nodes will receive copies of data from the source
node and will only support read
operations. All write
operations should be directed to the source
node.
Pros
- Improved performance as now we can perform multiple read operations in parallel from each
replica
node - Improved reliability as multiple copies of data are available in case of a failure in one node
- High Availability can be achieved by having replicas in multiple locations
Cons
- Single
source
configuration will not improve write performance significantly - If the
source
server fails promoting areplica
node as the newsource
could require some manual work
Horizontal Scaling
References:
- A Guide To Understanding Database Scaling Patterns - FreeCodeCamp
- How to set up replication in MySQL - DigitalOcean
- System Design Interview – An insider’s guide, Second Edition - Alex Xu