Database: Replication (Part 1: Single-Leader Architectures and Their Challenges)

In this first part of our series on database replication, we dive into the mechanics of single-leader replication, exploring its strategies for managing consistency, failover, and replication lag

Aug 07, 2024

Introduction

Replication means keeping a copy of the same data on multiple machines that are connected via a network. It is a fundamental technique used in database systems to enhance various aspects of data management and service delivery. There are several key reasons for implementing replication in a database system:

Geographic Proximity: By replicating data across multiple locations, you can keep data geographically close to your users, thereby reducing latency and improving user experience.
Fault Tolerance: Replication ensures that your system can continue to operate even if some of its parts fail, thus increasing the overall availability of your service.
Read Scalability: By spreading data across multiple machines, replication allows you to scale out the number of machines that can serve read queries, thereby increasing read throughput and performance.

In this blog, we will explore how replication works, the challenges involved, especially when dealing with changes to replicated data, and how different replication strategies address these challenges. For simplicity of the blog, we will assume that your dataset is small enough that each machine can hold a copy of the entire dataset. If the data you’re replicating does not change over time, replication is straightforward: simply copy the data to every node once, and the job is done. However, the real challenge lies in managing changes to replicated data, ensuring consistency, and maintaining performance.

Single leader Replication

This leader-follower architecture is a common replication strategy used to manage the complexities of data synchronization across multiple nodes. A leader is a node that holds the authoritative copy of the data and is responsible for handling all write operations. When a client needs to update the data, the write request is sent to the leader. The leader then performs the update and subsequently propagates the changes to its followers. This process is known as leader-based replication. Followers are nodes that replicate the data from the leader. They primarily handle read operations, which helps in offloading the read load from the leader, thereby improving the overall system performance and scalability.

Many modern databases and distributed message brokers use leader-based replication to manage data consistency and improve performance. Here are some prominent examples:

Databases: MySQL, PostgreSQL (asynchronous), MongoDB, Apache Cassandra (NetworkTopologyStrategy), Elasticsearch, and Apache HBase.
Message Brokers: Apache Kafka, RabbitMQ, and Apache Pulsar

Methods of Passing Data Changes

The mechanism by which the leader passes data changes to followers can vary based on the database system and the type of replication (synchronous or asynchronous).

Log-Based Replication: The leader writes all changes to a Write-Ahead Log (WAL) before applying them to the database. This log is then used as the source of truth for replication. Followers read the WAL to determine what changes need to be applied to their local copy of the data.
Statement-Based Replication: The leader sends the actual SQL statements or commands that were executed to modify the data to the followers. If statements involve non-deterministic functions (e.g., RAND(), NOW()), they may yield different results on followers. Also may result in different side effects if the statement interacts with external systems or has conditional logic based on data that differs between leader and followers.
Row-Based Replication: Involves sending the actual data rows that were changed, rather than the SQL statements. Larger data volume may be transferred compared to log-based replication
Logical Replication: Involves sending a higher-level representation of changes, such as logical change records (such as INSERT, UPDATE, DELETE operations, and their associated data), to followers.

Synchronous vs. Asynchronous Replication

Synchronous Replication

In synchronous replication, data changes made to the leader (primary) node are immediately propagated to the follower (secondary) nodes. The leader waits for confirmation from all followers that they have successfully received and written the data before acknowledging the write operation as complete to the client. This ensures that all replicas are always consistent with the leader. In case of a leader failure, followers are ready to take over immediately, as they are always up-to-date. Synchronous replication is used where data consistency is crucial, such as in banking transactions.

Drawbacks:

Higher Latency: The write operation is not complete until all followers acknowledge the write, which can increase latency.
Potential for Bottlenecks: If one follower is slow or unreachable, it can delay the entire replication process

In the context of synchronous replication, there is often a misconception that all followers must be synchronous with the leader. However, in practice, it is common to have a hybrid approach where only a subset of followers are synchronous while others may be asynchronous. Examples of Systems Using Hybrid Replication are Cassandra (quorum-based), MongoDB, and CockroachDB(consensus algorithm - Raft)

Asynchronous Replication

In asynchronous replication, the leader node does not wait for followers to confirm that they have received and written the data before acknowledging the write operation as complete. Instead, the leader sends the data to the followers and immediately confirms the write to the client. Followers update their data at their own pace, which may lead to temporary inconsistencies. Temporary network issues or slow followers do not affect write performance. Asynchronous replication is used where eventual consistency is sufficient, such as in content delivery networks (CDNs).

Drawbacks:

Delayed Failover: If the leader fails, followers may not have the latest data, which can complicate failover processes.
Data Loss Risk: In case of a leader failure before data is replicated to followers, there is a risk of data loss.

Set Up a New Follower

Setting up a new follower in a database replication system involves several steps to ensure that the new node is fully synchronized with the current leader and ready to handle read requests or act as a backup in case of failover.

Provisioning: Setting up the necessary hardware or virtual machine resources (CPU, memory, storage, network) to run the new follower. Installing the appropriate database software on the new node to ensure compatibility with the leader and other followers.
Snapshot Copy: A snapshot of the current data on the leader is created and transferred to the new follower (Taking a backup of the leader and restoring it on the follower).
Replaying the Write-Ahead Log (WAL): To ensure that the new follower is fully up-to-date with any transactions that occurred during the “snapshot copy” stage, the write-ahead log (WAL) or transaction log is replayed.

Node Outages

In distributed systems, node outages are inevitable due to unexpected faults or planned maintenance activities. Whether it's a server crash or a routine reboot for security updates, the ability to handle node outages without affecting system availability is crucial for maintaining seamless operations. In this section, we'll explore how leader-based replication architectures handle node outages to ensure high availability and minimize disruption.

Follower Failure

Followers can go offline for various reasons, such as crashes or network interruptions. To manage follower failures, systems employ a "catch-up recovery" mechanism. Each follower maintains a log of data changes received from the leader. This log records the last transaction processed before the outage occurred. When a follower is back online, it uses its log to determine where it left off. It then requests any missing data changes from the leader. The follower applies the pending changes to its local dataset, ensuring it is synchronized with the leader. Once caught up, the follower resumes receiving live updates from the leader and continues its role in the replication architecture.

Leader Failure

Handling a leader failure is more complex due to the leader's central role in processing write requests. To address this, systems implement a "failover" process, which can be either manual or automatic. Systems continuously monitor the leader's health through heartbeat messages exchanged between nodes. If the leader doesn't respond within a predefined timeout period (e.g., 30 seconds), it is considered down. Upon detecting a leader failure, a new leader is elected among the followers. This can be achieved through a consensus protocol, where the follower with the most up-to-date data is typically chosen to minimize data loss. Once a new leader is elected, the system must redirect write requests to the new leader. Additionally, the old leader, if it comes back online, needs to recognize the new leader and step down to avoid conflicts.

Replication lag

Replication lag occurs for several reasons: 1) High network latency. 2) Limited network bandwidth. 3) Followers handling a high volume of read operations or other processing tasks may take longer to apply updates from the leader. 4) When the leader processes large transactions or bulk updates, followers may struggle to keep up with the rapid influx of data. 5) Temporary network partitions or connectivity issues.

Problems with Replication lag

Consistency models define the rules for the visibility and ordering of data updates in distributed systems. Replication lag, which causes delays in propagating updates from the leader to the followers, can significantly impact these models. Here's a closer look at how different consistency models are affected

Read-Your-Writes Consistency: guarantees that after a client performs a write, any subsequent reads by that client will reflect that write.

Impact of Replication Lag: A client may not immediately see their own updates if they read from a lagging follower node. Users may observe inconsistencies if their session spans multiple nodes with different replication lags.
Solutions
- Leader-Directed Reads: Direct reads to the leader node for operations requiring read-your-writes consistency, ensuring that clients see their recent updates immediately.
- Session Stickiness: Maintain session affinity by directing a client’s operations to the same node, which can provide a consistent view of their writes.
- Bounded Staleness: Implement bounded staleness guarantees, where reads are guaranteed to be only a certain amount of time behind the latest writes, providing a trade-off between consistency and performance.

Monotonic Reads Consistency: ensures that if a client has seen a particular version of a data item, any subsequent reads will return the same version or a more recent version. It prevents a client from observing "time travel" effects where older data is seen after newer data.

Impact of Replication Lag: Monotonic reads consistency can be violated if a client reads an older version of data after reading a newer version due to replication lag.
Solutions
- Session Affinity: Route a client’s read requests to the same replica or set of replicas to maintain a monotonic view of data.
- Consistent Hashing: Use consistent hashing to ensure that a client’s requests are directed to a consistent set of nodes, minimizing the risk of non-monotonic reads.
- Version Checks: Implement version checks that detect and correct out-of-order reads, ensuring that clients do not see older data after newer data.

Causal Consistency: ensures that operations that are causally related are seen by all clients in the same order. If an operation A causally precedes another operation B, all clients will see A before B

Impact of Replication Lag: Clients may see different causal orders due to varying degrees of replication lag across nodes. Causal consistency requires maintaining the causal order of operations, which can be disrupted by replication lag.
Solutions
- Causal Tracking Mechanisms: Implement mechanisms such as version vectors or dependency tracking to maintain causal relationships. These mechanisms allow nodes to track dependencies and ensure that operations are applied in the correct causal order
- Causal Consistency Protocols: Use protocols specifically designed for causal consistency, such as Chain Replication or COPS, which are optimized for maintaining causal order despite replication lag.

Eventual Consistency: guarantees that if no new updates are made to a data item, eventually all accesses to that item will return the last updated value. It allows for temporary inconsistencies but ensures convergence over time.

Impact of Replication Lag: Eventual consistency inherently allows for temporary inconsistencies due to replication lag, with the expectation that all nodes will eventually converge. Clients may observe stale data while waiting for updates to propagate across replicas.
Solutions
- Anti-Entropy Mechanisms: Periodic background processes reconcile differences between replicas to ensure eventual consistency.
- Conflict Resolution: Techniques like last-write-wins, version vectors, or application-specific logic are used to resolve conflicts.

Conclusion

Replication stands as a cornerstone in the architecture of resilient, scalable, and efficient database systems. Through the exploration of single-leader replication, we've delved into the mechanisms that ensure data consistency, manage node outages, and handle the inevitable replication lag that accompanies distributed systems. Understanding these dynamics is crucial for architects and engineers aiming to design systems that not only perform well but also gracefully handle failures and maintain data integrity.

However, single-leader replication is just one approach in the diverse landscape of replication strategies. In scenarios demanding higher write throughput, reduced latency, or greater fault tolerance, alternative methods like multi-leader replication and leaderless replication offer compelling advantages.

Stay Tuned for Part 2

In the second part of this blog series, we will delve deeper into Multi-Leader Replication and Leaderless Replication. We'll explore how these models address the challenges faced by single-leader systems and the trade-offs they introduce. Whether you're dealing with cross-data center replication, require flexible write operations, or seek to understand the nuances of eventual consistency, the next installment promises valuable insights into these advanced replication strategies.

Copyright Notice

This content is freely available for academic and research purposes only. Redistribution, modification, and use in any form for commercial purposes is strictly prohibited without explicit written permission from the copyright holder.

For inquiries regarding permissions, please contact trnquiltrips@gmail.com

System Design

Discussion about this post