DDIA-Transactions – manno的博客

Introduce

In the harsh reality of data systems, many things can go wrong:

The database software or hardware may fail at any time (including in the middle of a write operation).
The application may crash at any time (including halfway through a series of operations).
Interruptions in the network can unexpectedly cut off the application from the database, or one database node from another.
Several clients may write to the database at the same time, overwriting each other’s changes.
A client may read data that doesn’t make sense because it has only partially been updated.
Race conditions between clients can cause surprising bugs.In order to be reliable, a system has to deal with these faults and ensure that they don’t cause catastrophic failure of the entire system. However, implementing fault- tolerance mechanisms is a lot of work. It requires a lot of careful thinking about all the things that can go wrong, and a lot of testing to ensure that the solution actually works.

For decades, transactions have been the mechanism of choice for simplifying these issues. A transaction is a way for an application to group several reads and writes together into a logical unit. Conceptually, all the reads and writes in a transaction are executed as one operation: either the entire transaction succeeds (commit) or it fails (abort, rollback). If it fails, the application can safely retry. With transactions, error handling becomes much simpler for an application, because it doesn’t need to worry about partial failure—i.e., the case where some operations succeed and some fail (for whatever reason).

Transactions are not a law of nature; they were created with a purpose, namely to simplify the programming model for applications accessing a database. By using transactions, the application is free to ignore certain potential error scenarios and concurrency issues, because the database takes care of them instead (we call these safety guarantees).

Isolation levels and race conditions

Transactions are an abstraction layer that allows an application to pretend that cer‐ tain concurrency problems and certain kinds of hardware and software faults don’t exist. A large class of errors is reduced down to a simple transaction abort, and the application just needs to try again.

we went particularly deep into the topic of concurrency control. We discussed several widely used isolation levels, in particular

read committed,

snapshot isolation (sometimes called repeatable read), and

serializable.

We characterized those isolation levels by discussing various examples of race conditions:

Dirty reads(脏读，读未提交)

One client reads another client’s writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.

Dirty writes(脏写)

One client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.

Read skew (读偏差，又称nonrepeatable reads不可重复读)

A client sees different parts of the database at different points in time. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at one point in time. It is usually implemented with multi-version concurrency control (MVCC).

Lost updates(丢失更新)

Two clients concurrently perform a read-modify-write cycle. One overwrites the other’s write without incorporating its changes, so data is lost. Increased isolation level to ‘Repeatable Read’ so that database can perform efficient checks in conjunction, Optimistic/Pressmistic locking is the common method used to prevent lost update problems too.

Write skew(写偏差)

A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.

Phantom reads(幻读)

A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.

Weak isolation levels protect against some of those anomalies but leave you, the application developer, to handle others manually (e.g., using explicit locking). Only serializable isolation protects against all of these issues.

Three different approaches to implementing serializable transactions:

1.Literally executing transactions in a serial order(串行执行事务)

If you can make each transaction very fast to execute, and the transaction throughput is low enough to process on a single CPU core, this is a simple and effective option.

2.Two-phase locking(两阶段锁)

For decades this has been the standard way of implementing serializability, but many applications avoid using it because of its performance characteristics.

3.Serializable snapshot isolation (SSI序列化快照隔离)

A fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable.

The examples in this chapter used a relational data model. However, as discussed in “The need for multi-object transactions”, transactions are a valuable database feature, no matter which data model is used.

In this chapter, we explored ideas and algorithms mostly in the context of a database running on a single machine. Transactions in distributed databases open a new set of difficult challenges, which we’ll discuss in the next two chapters.