Lesson 03-04 构造抽象计算机系统
“计算机科学中的任何问题,都可以通过增加一个间接层来解决。” —— David Wheeler
上一讲引入了计算机系统的“复杂性”这一问题,这一讲则重点介绍了应对它的方式:模块化、抽象、分层、层级。
在构建大型系统时,我们无法像搭积木一样简单堆砌,而需要一种机制来隐藏细节、限制交互,这就是模块化和抽象。同时,我们也需要通过命名构建出强大的层次结构,从而实现存储、通信和计算资源的有效管理。
1. 抽象的三大支柱:计算系统的 DNA
为了应对复杂性,我们将纷繁碎裂的硬件抽象为三个最基本的组件。无论是一个简单的单片机,还是复杂的云服务器,其本质都是这三者的组合:
存储:
本质:不仅是内存,它是“名称到值”的映射(Write: $Name \rightarrow Value$;Read: $Name \rightarrow Value$)。
ACID 特性:原子性、一致性、隔离性、持续性。为了对抗时序错误和系统崩溃,我们需要隔离性和原子性。
由四个词首字母组成。是一个非常非常重要的概念。
- 接口设计:通过
bind(写入)和resolve(读取)将逻辑需求与物理存储解耦。
- 接口设计:通过
解释器:
- 本质:执行动作的部件。它是一个永不停歇的循环:取指 $\rightarrow$ 译码 $\rightarrow$ 执行。
- **三要素:指令引用(PC)**指向要做什么,**指令集(ISA)**定义能做什么,**环境引用(Context)**记录当前在哪。
链路:
本质:信息在物理分隔的组件间移动的途径。
链路 vs. 存储:这是一个很有趣的对比!存储是为了在时间上延续信息,而链路是为了在空间上移动信息。链路的挑战在于其不确定性:时延难预测、丢包不可避免。
这个问题我们在后续课程中还会继续探讨,也是计算机网络课的一个重要命题
2. 命名与间接性:系统设计的灵魂
如果说三大组件是骨架,那么命名就是连接它们的筋膜,是实现模块化的根基。
- 为什么要命名? 为了解耦。通过给资源起名字,我们不再需要关心它的具体物理位置。
- 双层间接:PLT(跳转桩)$\rightarrow$ GOT(全局偏移表)
- 它允许程序在运行时才去映射实际的函数地址。这种“名”与“值”的延迟绑定,是动态链接库(DLL/Shared Library)存在的根本原因。
- 命名模型三要素:
- 名称空间:所有可能名字的集合。
- 映射算法:如何从名字找到值(解析)。
- 值空间:名字最终指向的对象。
3. 案例分析:UNIX 文件系统的分层之美
UNIX 并没有直接给用户一个“磁盘”,而是通过层层抽象,构建了一个优雅的命名系统:
- 块层:最底层,操作系统访问的最小单元。
- inode 层:这是文件系统的核心。它不关心文件名,只关心文件的元数据(长度、权限、物理块地址)。
- 文件名层:将“人类可读的名字”对应到“inode 编号”。
- 路径名层:通过递归解析(如
/etc/passwd),实现了层级化的组织结构。
这种分层让 UNIX 实现了一切皆文件的宏伟愿景——无论是硬盘、键盘还是网络接口,在抽象层看来,都只是一个个可以被 open/read/write 的名称。
一切皆文件的想法是很伟大的,我们在后面的操作系统课还会提到这一点。
4. 经典阅读:The UNIX Time-Sharing System
Ritchie 和 Thompson 的传世神作嘻嘻嘻
- 设计目标:不是为了追求极致的性能,而是为了提供一个“好用”的编程环境。
- 核心贡献:
- 层级文件系统:我们现在习惯的目录结构,在当时是极大的创新。
- I/O 重定向与管道:通过简单的符号(
>或|),将不同的程序组合在一起,契合了模块化思想。
UNIX 的成功再次印证了 Worse is Better。
5.对照
Lesson 03-04 Constructing Abstract Computer Systems
“Any problem in computer science can be solved by another layer of indirection.” —— David Wheeler
The previous lesson introduced the problem of “complexity” in computer systems; this lesson highlights the methods to counter it: modularity, abstraction, layering, and hierarchy.
When building large-scale systems, we cannot simply stack components like toy blocks. We need mechanisms to hide details and restrict interactions—this is modularity and abstraction. Simultaneously, we use naming to construct powerful hierarchical structures, enabling the effective management of storage, communication, and computing resources.
1. The Three Pillars of Abstraction: The DNA of Computing
To tame complexity, we abstract diverse and fragmented hardware into three fundamental components. Whether it is a simple microcontroller or a complex cloud server, it is essentially a combination of these three:
- Memory (Storage):
Essence: More than just RAM, it is a mapping from names to values (Write: $Name \rightarrow Value$; Read: $Name \rightarrow Value$).
ACID Properties: Atomicity, Consistency, Isolation, and Durability. To combat timing errors and system crashes, we need Isolation and Atomicity.
Formed by the initials of four words, ACID is an incredibly important concept in system design.
- Interface Design: Decoupling logical needs from physical storage via
bind(write) andresolve(read).
Interpreter:
- Essence: The component that executes actions. It is an endless loop: Fetch $\rightarrow$ Decode $\rightarrow$ Execute.
- The Three Elements: Instruction Reference (PC) points to what to do, Instruction Set (ISA) defines how it can be done, and Environment Reference (Context) tracks where it is.
Links:
Essence: The pathway for information moving between physically separated components.
Links vs. Storage: A fascinating comparison! Storage preserves information over Time, while links move information across Space. The challenge of links lies in uncertainty: unpredictable latency and inevitable packet loss.
We will continue to explore this in later lessons; it is also a fundamental proposition in computer networking.
2. Naming & Indirection: The Soul of System Design
If the three pillars are the skeleton, then naming is the fascia connecting them—the powerful foundation of modularity.
- Why name? For decoupling. By naming resources, we no longer need to worry about their specific physical locations.
- Double Indirection: PLT (Procedure Linkage Table) $\rightarrow$ GOT (Global Offset Table)
- It allows programs to map actual function addresses only at runtime. This “late binding” of names to values is the fundamental reason why Shared Libraries (DLLs) can exist.
- The Three Elements of a Naming Model:
- Naming Space: The collection of all possible names.
- Mapping Algorithm: How to find the value from a name (resolution).
- Value Space: The objects that names eventually point to.
3. Case Study: The Layered Beauty of the UNIX File System
UNIX does not present a “raw disk” directly to the user; instead, it builds an elegant naming system through layers of abstraction:
- Block Layer: The lowest level, the minimum unit of OS access.
- inode Layer: The heart of the file system. It ignores filenames and focuses on metadata (length, permissions, physical block addresses).
- Filename Layer: Maps human-readable names to inode numbers.
- Pathname Layer: Through recursive resolution (e.g.,
/etc/passwd), it achieves a hierarchical organizational structure.
This layering realizes the grand vision of “Everything is a file”—whether it is a hard drive, a keyboard, or a network interface, the abstraction layer treats them all as names that can be open/read/write.
The idea of “Everything is a file” is profound; we will revisit this in our future Operating Systems course.
4. Reading: The UNIX Time-Sharing System
The legendary masterpiece by Ritchie and Thompson.
- Design Goal: Not to chase peak performance, but to provide a “comfortable” programming environment.
- Core Contributions:
- Hierarchical File System: The directory structure we take for granted today was a massive innovation at the time.
- I/O Redirection & Pipes: Using simple symbols like
>or|to combine different programs, perfectly aligning with the philosophy of modularity.
The success of UNIX once again validates the philosophy of “Worse is Better.”