Lesson 08 虚拟链路(并发与锁)
在并发的世界里,唯一可以确定的就是不确定性。我们将链路抽象为虚拟链路,但在实现多任务通信(如本地 RPC)时存在一个棘手的问题:如果多个线程同时读写同一块内存区域(缓冲区),数据就会乱套。
这一讲的核心在于协调,我们将看到系统如何从理想化的“正确性假设”一步步走向现实,并引入锁这一关键机制来对抗竞态条件(Race Condition)。
1. 6大正确性假设
在初级版本的虚拟链路设计中,我们默认系统是完美的,并基于以下 6 个假设编写代码:
- 单线程收发:每个缓冲区只有一个写者和一个读者(单一写原则)。
- 独占处理器:每个线程都有自己独立的物理处理器,不涉及复杂的抢占。
- 不溢出:
in和out指针的增加不会溢出,算法逻辑永远正确。 - 读写连贯性 (Coherence):一旦数据写入内存,后续的读操作能立刻看到最新值。
- 时间先后原子性 (Atomicity/Isolation):即使是跨越多个机器字长的数据读写,也是不可分割的(不会读到一半的新数据和一半的旧数据)。
- 指令不重排:编译器和 CPU 不会为了优化性能而交换指令的执行顺序。
2. 竞态条件:当假设破灭
一旦去掉这些假设(尤其是假设 1 和 5),系统就会陷入竞态条件。
- Heisenbug:这类并发错误难以检测,因为它们只在特定的时序下出现,一旦你加入调试代码,时序改变,错误可能就消失了。
- “1 vs 5”的噩梦:如果数据的长度超过了机器字长(例如 64 位系统写 128 位数据),在没有原子性保证的情况下,读者可能在写者更新了一半时就读取了数据,导致数据损坏。
3. 锁 (Lock):强制建立原子性
为了对抗竞态条件,我们引入了锁。它是一种人为制造的时间先后原子性。
- 策略与颗粒度:
- 粗粒度锁:一个锁保护所有资源(如 Python 的 GIL)。简单安全,但并发性差,本质上是将并行变成了串行。
- 细粒度锁:多个锁保护不同的变量。并发性高,但复杂度飙升,极易引发死锁。
- 死锁与消除:
- 成因:循环等待(A 等 B,B 等 A)。
- 方案:静态方案(全局统一加锁顺序)或动态方案(维护等待图 Wait-for Graph,检测到回路则回滚)。
4. 锁的底层实现:自举
锁不能靠简单的 if 语句实现,它需要硬件和软件的共同配合。
- 硬件锁 (RSM 指令):
- 核心是 Read-Set-Modify (RSM)。硬件通过总线仲裁确保一条指令能同时完成读和写,且在执行期间抢占总线,不让其他处理器插足。
- 常见形式:
test-and-set,compare_and_swap (CAS)。
- 软件锁 (Lamport 方案):
- 在没有 RSM 指令时,依靠标志位 (flag) 和轮转位 (turn) 实现。前提是必须保证读写连贯且指令不重排。
- 高级技巧:RCU:
- Linux 内核常用。读操作完全不加锁,写操作通过创建副本并原子替换指针。
5. 对照
Lesson 08 Virtual Link(Concurrency and Locks)
In the world of concurrency, the only certainty is uncertainty. We abstract links into virtual links, but a thorny problem arises when implementing multi-task communication (such as local RPC): if multiple threads read from and write to the same memory area (buffer) simultaneously, the data becomes a mess.
The core of this lesson is coordination. We will see how systems transition step-by-step from idealized “correctness assumptions” to reality, and how the Lock is introduced as a key mechanism to combat Race Conditions.
1. The 6 Correctness Assumptions
In the primary design of virtual links, we assume the system is perfect and write code based on the following six assumptions:
- Single-threaded Transmission/Reception: Each buffer has only one writer and one reader (the Single Writer Principle).
- Dedicated Processors: Each thread has its own independent physical processor, involving no complex preemption.
- No Overflow: The incrementing of
inandoutpointers never overflows; the algorithmic logic is always correct. - Read/Write Coherence: Once data is written to memory, subsequent read operations can see the latest value immediately.
- Temporal Atomicity/Isolation: Even for data reads and writes spanning multiple machine word lengths, the operation is indivisible (you won’t read half-new and half-old data).
- No Instruction Reordering: The compiler and CPU will not swap the execution order of instructions to optimize performance.
2. Race Conditions: When Assumptions Break
Once these assumptions are removed (especially assumptions 1 and 5), the system falls into a Race Condition.
- Heisenbug: These types of concurrency errors are difficult to detect because they only appear under specific timings. Once you add debugging code, the timing changes, and the error may disappear.
- The “1 vs. 5” Nightmare: If the data length exceeds the machine word length (e.g., writing 128-bit data on a 64-bit system), without an atomicity guarantee, a reader might access the data while the writer has only updated half of it, leading to data corruption.
3. Locks: Enforcing Atomicity
To combat race conditions, we introduce Locks. A lock is an artificially manufactured form of temporal atomicity.
- Strategy and Granularity:
- Coarse-grained Lock: One lock protects all resources (e.g., Python’s GIL). It is simple and safe but offers poor concurrency, essentially turning parallel execution into serial execution.
- Fine-grained Lock: Multiple locks protect different variables. This offers high concurrency but causes complexity to skyrocket, easily leading to Deadlocks.
- Deadlock and Elimination:
- Cause: Cyclic waiting (A waits for B, B waits for A).
- Solutions: Static schemes (enforcing a global locking order) or dynamic schemes (maintaining a Wait-for Graph and rolling back when a cycle is detected).
4. Bottom-level Implementation of Locks: Bootstrapping
Locks cannot be implemented with simple if statements; they require the cooperation of both hardware and software.
- Hardware Locks (RSM Instructions):
- The core is Read-Set-Modify (RSM). The hardware uses Bus Arbitration to ensure that a single instruction can complete both a read and a write simultaneously, preempting the bus during execution so no other processor can intervene.
- Common forms:
test-and-set,compare_and_swap (CAS).
- Software Lock (Lamport’s Scheme):
- In the absence of RSM instructions, locks are implemented using flags and “turn” variables. This requires guaranteed read/write coherence and no instruction reordering.
- Advanced Technique: RCU (Read-Copy Update):
- Commonly used in the Linux kernel. Read operations are completely lock-free; write operations create a copy and then atomically replace the pointer.