Lesson 08 虚拟链路（并发与锁）

在并发的世界里，唯一可以确定的就是不确定性。我们将链路抽象为虚拟链路，但在实现多任务通信（如本地 RPC）时存在一个棘手的问题：如果多个线程同时读写同一块内存区域（缓冲区），数据就会乱套。

这一讲的核心在于协调，我们将看到系统如何从理想化的“正确性假设”一步步走向现实，并引入锁这一关键机制来对抗竞态条件（Race Condition）。

1. 6大正确性假设

在初级版本的虚拟链路设计中，我们默认系统是完美的，并基于以下 6 个假设编写代码：

单线程收发：每个缓冲区只有一个写者和一个读者（单一写原则）。
独占处理器：每个线程都有自己独立的物理处理器，不涉及复杂的抢占。
不溢出：in 和 out 指针的增加不会溢出，算法逻辑永远正确。
读写连贯性 (Coherence)：一旦数据写入内存，后续的读操作能立刻看到最新值。
时间先后原子性 (Atomicity/Isolation)：即使是跨越多个机器字长的数据读写，也是不可分割的（不会读到一半的新数据和一半的旧数据）。
指令不重排：编译器和 CPU 不会为了优化性能而交换指令的执行顺序。

2. 竞态条件：当假设破灭

一旦去掉这些假设（尤其是假设 1 和 5），系统就会陷入竞态条件。

Heisenbug：这类并发错误难以检测，因为它们只在特定的时序下出现，一旦你加入调试代码，时序改变，错误可能就消失了。
“1 vs 5”的噩梦：如果数据的长度超过了机器字长（例如 64 位系统写 128 位数据），在没有原子性保证的情况下，读者可能在写者更新了一半时就读取了数据，导致数据损坏。

3. 锁 (Lock)：强制建立原子性

为了对抗竞态条件，我们引入了锁。它是一种人为制造的时间先后原子性。

策略与颗粒度：
- 粗粒度锁：一个锁保护所有资源（如 Python 的 GIL）。简单安全，但并发性差，本质上是将并行变成了串行。
- 细粒度锁：多个锁保护不同的变量。并发性高，但复杂度飙升，极易引发死锁。
死锁与消除：
- 成因：循环等待（A 等 B，B 等 A）。
- 方案：静态方案（全局统一加锁顺序）或动态方案（维护等待图 Wait-for Graph，检测到回路则回滚）。

4. 锁的底层实现：自举

锁不能靠简单的 if 语句实现，它需要硬件和软件的共同配合。

硬件锁 (RSM 指令)：
- 核心是 Read-Set-Modify (RSM)。硬件通过总线仲裁确保一条指令能同时完成读和写，且在执行期间抢占总线，不让其他处理器插足。
- 常见形式：test-and-set, compare_and_swap (CAS)。
软件锁 (Lamport 方案)：
- 在没有 RSM 指令时，依靠标志位 (flag) 和轮转位 (turn) 实现。前提是必须保证读写连贯且指令不重排。
高级技巧：RCU：
- Linux 内核常用。读操作完全不加锁，写操作通过创建副本并原子替换指针。

5. 对照

Lesson 08 Virtual Link(Concurrency and Locks)

In the world of concurrency, the only certainty is uncertainty. We abstract links into virtual links, but a thorny problem arises when implementing multi-task communication (such as local RPC): if multiple threads read from and write to the same memory area (buffer) simultaneously, the data becomes a mess.

The core of this lesson is coordination. We will see how systems transition step-by-step from idealized “correctness assumptions” to reality, and how the Lock is introduced as a key mechanism to combat Race Conditions.

1. The 6 Correctness Assumptions

In the primary design of virtual links, we assume the system is perfect and write code based on the following six assumptions:

Single-threaded Transmission/Reception: Each buffer has only one writer and one reader (the Single Writer Principle).
Dedicated Processors: Each thread has its own independent physical processor, involving no complex preemption.
No Overflow: The incrementing of in and out pointers never overflows; the algorithmic logic is always correct.
Read/Write Coherence: Once data is written to memory, subsequent read operations can see the latest value immediately.
Temporal Atomicity/Isolation: Even for data reads and writes spanning multiple machine word lengths, the operation is indivisible (you won’t read half-new and half-old data).
No Instruction Reordering: The compiler and CPU will not swap the execution order of instructions to optimize performance.

2. Race Conditions: When Assumptions Break

Once these assumptions are removed (especially assumptions 1 and 5), the system falls into a Race Condition.

Heisenbug: These types of concurrency errors are difficult to detect because they only appear under specific timings. Once you add debugging code, the timing changes, and the error may disappear.
The “1 vs. 5” Nightmare: If the data length exceeds the machine word length (e.g., writing 128-bit data on a 64-bit system), without an atomicity guarantee, a reader might access the data while the writer has only updated half of it, leading to data corruption.

3. Locks: Enforcing Atomicity

To combat race conditions, we introduce Locks. A lock is an artificially manufactured form of temporal atomicity.

Strategy and Granularity:
- Coarse-grained Lock: One lock protects all resources (e.g., Python’s GIL). It is simple and safe but offers poor concurrency, essentially turning parallel execution into serial execution.
- Fine-grained Lock: Multiple locks protect different variables. This offers high concurrency but causes complexity to skyrocket, easily leading to Deadlocks.
Deadlock and Elimination:
- Cause: Cyclic waiting (A waits for B, B waits for A).
- Solutions: Static schemes (enforcing a global locking order) or dynamic schemes (maintaining a Wait-for Graph and rolling back when a cycle is detected).

4. Bottom-level Implementation of Locks: Bootstrapping

Locks cannot be implemented with simple if statements; they require the cooperation of both hardware and software.

Hardware Locks (RSM Instructions):
- The core is Read-Set-Modify (RSM). The hardware uses Bus Arbitration to ensure that a single instruction can complete both a read and a write simultaneously, preempting the bus during execution so no other processor can intervene.
- Common forms: test-and-set, compare_and_swap (CAS).
Software Lock (Lamport’s Scheme):
- In the absence of RSM instructions, locks are implemented using flags and “turn” variables. This requires guaranteed read/write coherence and no instruction reordering.
Advanced Technique: RCU (Read-Copy Update):
- Commonly used in the Linux kernel. Read operations are completely lock-free; write operations create a copy and then atomically replace the pointer.