Transaction: (Part 1: ACID, Read Committed, Read Skew, Snapshot Isolation)

Unraveling ACID, Isolation, and Serializability for Bulletproof Data Integrity

Aug 24, 2024

Introduction

In the world of data systems, where failures and unexpected issues are a harsh reality, transactions play a crucial role. Imagine a scenario where a database operation is interrupted—be it by hardware failure, software crash, or a network hiccup. Without transactions, you risk ending up with incomplete or inconsistent data, leading to potential chaos.

A transaction groups several database operations into a single, logical unit that either fully succeeds or fully fails. This all-or-nothing approach ensures that, even if something goes wrong mid-operation, the database remains consistent and reliable. Transactions simplify error handling and concurrency management, allowing applications to focus on their core functionality rather than on the myriad of things that can go wrong in a database environment.

By providing a safety net against partial failures, race conditions, and other concurrency issues, transactions make it easier for developers to build robust and reliable applications. However, they come with trade-offs, and understanding these is key to determining whether your application truly needs the guarantees that transactions offer.

ACID

In the world of databases, ensuring that transactions are processed safely and reliably is paramount. This is where the ACID properties—Atomicity, Consistency, Isolation, and Durability—come into play. These properties form the backbone of transaction safety, ensuring that your data remains accurate and resilient, even in the face of unexpected challenges.

Atomicity ensures that a transaction is all-or-nothing. Imagine you're performing several writes, but a failure occurs midway—perhaps a process crashes or a network connection drops. Without atomicity, you would be left in a state of uncertainty, with some changes applied and others not. Atomicity guarantees that if a transaction cannot be fully completed, none of its changes will be applied. It's as if the transaction never happened, protecting the integrity of your data.
Consistency is about maintaining a "good state" within the database, defined by your application. Every transaction should move the database from one valid state to another, preserving the rules you've set for your data. While atomicity, isolation, and durability are responsibilities of the database, consistency is something your application needs to define and enforce within its transactions.
Isolation addresses the challenges of concurrency. In a multi-user environment, transactions often occur simultaneously. Isolation ensures that these transactions don't interfere with each other, preserving the illusion that each transaction is operating in isolation. This prevents race conditions and ensures that the outcome of concurrent transactions is the same as if they were executed sequentially. However, true serializable isolation is rarely used due to performance costs, with databases often opting for weaker, but faster, isolation levels.
Durability is the promise that once a transaction has been committed, it won't be lost, even in the face of system crashes or hardware failures. Durability means that the changes are safely stored on non-volatile media like SSDs or replicated across multiple nodes in a distributed system. This ensures that your data remains intact and accessible, regardless of the challenges your database might face.

Together, these ACID properties ensure that your transactions are handled with the highest degree of safety and reliability, laying the groundwork for robust and trustworthy database operations.

Importance of ACID property on Single-Object and Multi-Object Operations

Single-Object operations

ACID properties are important when performing operations on a single object. Storage engines in databases are designed to ensure that even a single object, like a key-value pair in a key-value store, is updated atomically and in isolation from other operations. This means that if a crash occurs during the update of this object, mechanisms like write-ahead logging (WAL) or other forms of logging are employed to ensure that the operation can either be completed or rolled back without leaving the object in an inconsistent state. This level of atomicity and isolation at the single-object level is critical for maintaining the integrity of the data, even in the face of system failures.

Multi-Object Operations

When dealing with transactions in databases, the ACID properties (Atomicity, Consistency, Isolation, Durability) are often discussed in the context of operations that involve multiple objects—whether they are rows in a table, documents in a NoSQL database, or records in a key-value store. These multi-object transactions are essential in scenarios where multiple pieces of data must be kept in sync to maintain the integrity of the application. For instance, in a banking application, transferring money between accounts involves debiting one account and crediting another, which must happen together to avoid any inconsistencies.

In relational databases, multi-object transactions are typically managed by grouping read and write operations under the same transaction boundary. This boundary is often defined by a BEGIN TRANSACTION and a COMMIT statement within a client’s session or connection to the database server. Everything executed between these two commands is treated as part of a single transaction, ensuring that either all the changes are applied, or none are, thus maintaining atomicity and consistency.

Weak Isolation Levels

In database systems, weak isolation levels come into play when managing concurrency, which is the simultaneous execution of multiple transactions. Concurrency can introduce race conditions—situations where the outcome of a transaction depends on the timing of other transactions—making application development challenging. The complexity increases as multiple users interact with the data simultaneously, leading to potential data changes at any moment.

To simplify the development process, databases offer transaction isolation, a mechanism designed to hide concurrency issues from application developers. The goal is to allow developers to work as if transactions are executed sequentially, one after another, without overlapping—this is known as serializable isolation. However, achieving this level of isolation comes at a performance cost that many databases are reluctant to bear. As a result, weaker isolation levels are often employed. These weaker levels offer protection against some, but not all, concurrency issues.

Read Committed

One of the most basic forms of transaction isolation is the Read Committed level. It provides two essential guarantees:

No Dirty Reads: When a transaction reads data from the database, it will only see data that has been committed. This means that any uncommitted changes made by other transactions are invisible to the current transaction, thus preventing dirty reads.
No Dirty Writes: When writing data to the database, a transaction will only overwrite data that has already been committed. This ensures that one transaction cannot overwrite uncommitted changes made by another transaction, usually by delaying the second write until the first write’s transaction has committed or aborted, avoiding dirty writes.

To enforce these Read Committed guarantees, databases commonly use row-level locks. When a transaction wants to modify a specific row or document, it must first acquire a lock on that object. This lock is held until the transaction is either committed or aborted, ensuring that no other transaction can modify the same object concurrently. If another transaction attempts to write to the locked object, it must wait until the first transaction completes, maintaining data consistency.

However, preventing dirty reads through locking can cause significant delays in read operations, particularly if a long-running write transaction forces multiple read-only transactions to wait. To address this, many databases avoid using read locks. Instead, they maintain both the old committed value and the new uncommitted value during a transaction. While the transaction is ongoing, any read requests for that object are served the old value, ensuring that dirty reads do not occur. Once the transaction commits, the new value is made available for subsequent reads.

Snapshot Isolation

Snapshot Isolation (SI) is a powerful technique employed by modern databases to provide a consistent view of the data during concurrent transactions. Unlike weaker isolation levels like Read Committed, which can still allow concurrency anomalies such as non-repeatable reads or read skew, Snapshot Isolation guarantees that each transaction operates on a stable snapshot of the database. This snapshot reflects the state of the data as it was at the start of the transaction, regardless of any subsequent changes made by other transactions.

How Snapshot Isolation Works

At the core of Snapshot Isolation is the concept of Multi-Version Concurrency Control (MVCC). In MVCC, multiple versions of a database object are maintained simultaneously, allowing each transaction to view the version of the data that was committed at the moment the transaction began. This means that even if another transaction modifies the data during the execution of your transaction, you will continue to see the consistent, unaltered snapshot of the database as it existed when your transaction started.

The implementation of Snapshot Isolation typically involves assigning a unique transaction ID (txid) to each transaction when it begins. Every data modification (insert, update, delete) is tagged with the transaction ID of the transaction that made the change. When a transaction reads data, the database uses these transaction IDs to determine which versions of the data are visible to the transaction based on two key rules:

Commit Visibility: The data being read must have been committed by the time the reading transaction started.
Delete Visibility: If a row has been marked for deletion, the deletion must not have been committed before the reading transaction started.

These rules ensure that each transaction sees a consistent view of the data, as if it were the only transaction running in the system, thus preventing issues like read skew.

Benefits and Performance Considerations

One of the significant advantages of Snapshot Isolation is its ability to allow readers and writers to operate without blocking each other. This non-blocking behavior is particularly beneficial for long-running analytical queries that need to scan large portions of the database. Such queries can operate on a consistent snapshot without being affected by ongoing write operations, leading to better performance and reduced contention.

Practical Implementation

Many popular databases, such as PostgreSQL, implement Snapshot Isolation using MVCC. In PostgreSQL, when a transaction modifies a row, the row is not physically removed but marked with the transaction ID of the modifying transaction. This approach enables the database to maintain multiple versions of the row, each tagged with the transaction that created or deleted it. A background process eventually cleans up old versions once they are no longer visible to any active transactions, thereby reclaiming storage space.

Example:

With snapshot isolation:

Conclusion

In this part of the blog, we explored the fundamentals of database transactions, focusing on the ACID properties that ensure data reliability, even during complex operations involving single or multiple objects. We also delved into the limitations of weak isolation levels, particularly Read Committed, which, while useful, does not address issues like Read Skew. Snapshot Isolation was introduced as a stronger alternative, providing a consistent view of the database throughout a transaction, reducing the risk of anomalies in concurrent environments.

In Part 2, we will further explore advanced transaction concepts, including techniques for "Preventing Lost Updates," handling "Write Skew," and managing "Phantoms." These discussions will build on the foundation laid here, offering a deeper understanding of transaction management in modern databases. Stay tuned!

Copyright Notice

This content is freely available for academic and research purposes only. Redistribution, modification, and use in any form for commercial purposes is strictly prohibited without explicit written permission from the copyright holder.

For inquiries regarding permissions, please contact trnquiltrips@gmail.com

System Design

Discussion about this post