Lesson 17-18 原子性与隔离(事务处理)
在上一节中,我们讨论了如何利用“可靠性工程”防止系统彻底失效。然而,仅仅“不崩溃”是不够的。在多用户并发、硬件随时可能断电的现实环境中,我们还需要保证数据的一致性。
想象一个简单的银行转账场景:A 转账给 B。这包含两个动作:A-100 和 B+100。
- 如果执行完
A-100后系统断电了,钱没了,这是原子性 问题。 - 如果 A 给 B 转账的同时,C 也在给 A 转账,两个线程乱序读写导致余额计算错误,这是隔离性 问题。
为了解决这些问题,系统设计中最伟大的抽象之一诞生了:事务。
1. 事务处理模型 (ACID)
事务是一个抽象层,它向应用层隐藏了内部的复杂性(如并发竞争、磁盘写入延迟、奔溃恢复),提供了一个干净的编程接口。
- Atomicity (原子性):All-or-Nothing。要么全部成功,要么仿佛什么都没发生。这主要应对故障。
- Consistency (一致性):事务必须使系统从一个有效状态变为另一个有效状态(例如,转账前后总金额守恒)。这是应用层的目标。
- Isolation (隔离性):并发执行的多个事务,其结果应当与某种串行执行的结果相同。这主要应对并发。
- Durability (持久性):一旦提交,数据就是永恒的。
核心矛盾:我们试图在不可靠的组件(可能随时崩溃的内存、缓慢的磁盘)上,构建出一个逻辑上完美串行且永不丢失数据的系统。
2. 原子性的实现演进
如何保证“崩了也要正确”?核心在于如何处理中间状态。
2.1 方案一:影子副本 (Shadow Copy)
这是最直观的“计算机思维”解法。
- 做法:修改文件时,不直接改原文件,而是复制一份副本。在副本上修改完成后,通过原子的
rename操作,将指针瞬间指向新文件。 - 优点:实现简单,利用了文件系统的原子性元操作。
- 缺点:性能极差。改一个字节也要复制整个 1GB 的文件,通常不可接受。
2.2 方案二:预写日志 (WAL - Write Ahead Logging)
这是现代数据库的标准解法。系统不再复制整个文件,而是记录“变更历史”。
架构演变:
- 单元存储 (Cell Storage):数据库中的记录(即变量 Variable)最终需要落在磁盘的某个位置。
- 日志 (Log):在修改 Cell 之前,必须先将“我要修改 X,旧值是 A,新值是 B”这条记录追加写入到磁盘的 Log 文件中。
- 缓存 (Cache):为了性能,我们不能每次修改 Cell 都刷盘。我们引入内存 Cache。
关键流程:
- 写操作:
- 生成日志条目,并强制落盘(确保 Durability)。
- 在内存 Cache 中更新数据。
- 此时事务即视为提交(Commit)。注意,此时最新的数据在 Log(磁盘)和 Cache(内存)中,而不在数据库原本的存储位置(磁盘 Cell)中。
- 刷盘 (也称 检查点Checkpoint):系统会在后台偶尔将 Cache 中的脏数据刷新到磁盘 Cell 中。
崩溃恢复 (Recovery):
当系统崩溃重启时,内存 Cache 丢失,磁盘 Cell 中的数据可能是旧的。系统必须扫描 Log 进行重放:
- Redo (重做):对于 Log 中显示“已 Commit”但 Cell 中还是旧值的事务,重新执行写入。
- Undo (回滚):对于 Log 中显示“已开始”但“未 Commit”的事务(即崩溃打断了事务),执行逆向操作,清除其影响。
结论:日志即真理。数据库文件只是日志的一个滞后快照。
3. 隔离性的实现:两阶段锁 (2PL)
如果说原子性是对抗时间(断电),隔离性就是对抗空间(并发)。
3.1 理论基石:可串行化
并发系统正确性的黄金标准是可串行化。
- 定义:如果一个并发调度的结果,等于这些事务按某种顺序依次串行执行的结果,那么这个调度就是正确的。
- 冲突:当两个操作访问同一数据且至少有一个是写操作时,就发生了冲突。
- 冲突可串行化:如果在冲突图中没有环,则该调度是可串行化的。
3.2 实现机制:两阶段锁 (2PL)
为了保证调度不产生冲突环,我们引入锁机制。但普通的加锁不够,必须遵循两阶段协议:
- 增长阶段:事务可以获得锁,但不能释放锁。
- 缩减阶段:事务可以释放锁,但决不能再申请新的锁。
关键点:
- 严格 2PL:为了防止级联回滚,通常要求写锁必须在事务 Commit 之后才能释放。
- 读写锁优化:读锁(共享)和写锁(互斥)分离,提高并发度。
- 代价:死锁 (Deadlock)。不同事务互相持有对方需要的锁,导致死循环。系统必须有机制检测死锁(如超时或构建等待图)并强制回滚其中一个事务。
4. 对照
Lesson 17-18 Atomicity and Isolation: Transaction Processing
In the previous session, we discussed how to use “Reliability Engineering” to prevent total system failure. However, merely “not crashing” is not enough. In a reality filled with concurrent users and potential hardware power failures, we also need to guarantee data Consistency.
Imagine a simple bank transfer scenario: A transfers to B. This involves two actions: A-100 and B+100.
- If the power fails after
A-100, the money disappears. This is an Atomicity problem. - If A transfers to B while C transfers to A, and threads interleave causing incorrect balance calculations, this is an Isolation problem.
To solve these problems, one of the greatest abstractions in system design was born: Transactions.
1. Transaction Processing Model (ACID)
A transaction is an abstraction layer that hides internal complexity (such as concurrency competition, disk write latency, and crash recovery) from the application, providing a clean programming interface.
- Atomicity: All-or-Nothing. Either everything succeeds, or it is as if nothing happened. This primarily deals with Faults.
- Consistency: A transaction must transition the system from one valid state to another (e.g., conservation of total money before and after a transfer). This is the application’s goal.
- Isolation: The result of executing multiple transactions concurrently should be the same as if they were executed serially in some order. This primarily deals with Concurrency.
- Durability: Once committed, the data is eternal.
Core Conflict: We are trying to build a system that is logically perfectly serial and never loses data, on top of unreliable components (volatile memory, slow disks).
2. Evolution of Atomicity Implementation
How do we guarantee “correctness even after a crash”? The core lies in how we handle Intermediate States.
2.1 Solution 1: Shadow Copy
This is the most intuitive “computer science” solution.
- Method: When modifying a file, do not modify the original directly. Instead, make a copy. After finishing modifications on the copy, use an atomic
renameoperation to instantly switch the pointer to the new file. - Pros: Simple implementation; utilizes atomic primitives of the file system.
- Cons: Terrible performance. Changing a single byte requires copying an entire 1GB file. Usually unacceptable.
2.2 Solution 2: Write Ahead Logging (WAL)
This is the standard solution for modern databases. The system no longer copies the entire file but records the “Change History.”
Architecture Evolution:
- Cell Storage: Records in the database (i.e., Variables) ultimately need to reside in a specific location on the disk.
- Log: Before modifying a Cell, a record saying “I am going to modify X, Old Value: A, New Value: B” must be appended to the Log file on the disk.
- Cache: For performance, we cannot flush to disk every time a Cell is modified. We introduce an in-memory Cache.
Key Protocol:
- Write Operation:
- Generate a log entry and Force Write to Disk (ensuring Durability).
- Update the data in the memory Cache.
- At this point, the transaction is considered Committed. Note that the latest data exists in the Log (Disk) and Cache (Memory), but not yet in the database’s original storage location (Disk Cell).
- Flush (Checkpoint): A background process occasionally flushes dirty data from the Cache to the Disk Cell.
Crash Recovery:
When the system restarts after a crash, the memory Cache is lost, and data in the Disk Cell might be stale. The system must scan the Log and replay it:
- Redo: For transactions marked as “Committed” in the Log but where the Cell still holds the old value, execute the write again.
- Undo: For transactions marked as “Started” but “Not Committed” in the Log (meaning the crash interrupted the transaction), execute reverse operations to clear their effects.
Conclusion: The Log is the Truth. The database file is just a lagging snapshot of the log.
3. Implementation of Isolation: Two-Phase Locking (2PL)
If Atomicity fights Time (power loss), Isolation fights Space (concurrency).
3.1 Theoretical Foundation: Serializability
The gold standard for correctness in concurrent systems is Serializability.
- Definition: If the result of a concurrent schedule is equal to the result of these transactions executed serially in some order, then the schedule is correct.
- Conflict: A conflict occurs when two operations access the same data, and at least one of them is a Write.
- Conflict Serializability: If there are no cycles in the conflict graph, the schedule is conflict-serializable.
3.2 Implementation Mechanism: Two-Phase Locking (2PL)
To ensure the schedule produces no conflict cycles, we introduce locking. However, simple locking is not enough; the Two-Phase Protocol must be followed:
- Growing Phase: The transaction can acquire locks but cannot release them.
- Shrinking Phase: The transaction can release locks but must never acquire new locks.
Key Points:
- Strict 2PL: To prevent cascading rollbacks, it is usually required that Write locks must be released only after the transaction Commits.
- Reader/Writer Lock Optimization: Separating Read locks (Shared) and Write locks (Exclusive) improves concurrency.
- The Cost: Deadlock. Different transactions hold locks needed by each other, causing an infinite loop. The system must have mechanisms to detect deadlocks (like timeouts or wait-for graphs) and force a rollback on one of the transactions.