Lesson 14-15:性能工程


Lesson 14-15 性能工程

性能 是计算机系统的“货币”。我们通过牺牲性能(花费货币)来换取功能的便利性、抽象的简洁性或系统的安全性。

而在摩尔定律逐渐失效的今天,单纯等待硬件变快已经行不通了,我们需要主动进行性能工程。其核心目标只有一个:消除瓶颈

1. 核心指标与权衡

  • 时延:完成一项任务需要的时间(从请求到响应)。
    • 优化目标:越短越好。
  • 吞吐率:单位时间内完成的任务数量。
    • 优化目标:越大越好。
  • **利用率 **:资源忙碌的时间比例。
    • 权衡:利用率太低是浪费;利用率太高(接近 100%)是灾难
    • 原理:根据排队论,当系统接近满载时,排队长度会指数级暴增,导致时延急剧恶化。

关系:在没有排队的情况下,$\text{Throughput} = 1 / \text{Latency}$。但一旦出现瓶颈,吞吐率封顶,时延开始飙升。

2. 结构优化:从架构层面提速

当系统遇到瓶颈时,我们有三个方法来重构系统。

2.1 局部性与缓存

这是计算机科学中最普遍的现象:程序对数据的访问不是随机的

  • 时间局部性:刚被访问过的数据,很可能马上再次被访问。(做法:缓存热点数据)
  • 空间局部性:访问了地址 A,很可能马上访问 A 附近的数据。(做法:预取 Pre-fetching)

设计原则:优化高频用例

  • 区分快路径慢路径
  • 让 90% 的常见请求走经过高度优化的快路径(如命中 Cache),剩下的 10% 走慢路径。

2.2 并行

  • 思路:把一个大任务拆成 N 个小任务,同时做。
  • 案例:Google 搜索。一个查询被分发给几千台机器,每台机器搜一部分文档,最后合并结果。
  • 限制
    • Amdahl 定律:并行的加速比受限于任务中“必须串行执行”的那部分比例。
    • 短板效应:整体速度取决于最慢的那个子任务。

2.3 交叠与流水线

如果任务不能完全拆分,且必须串行(如 CPU 指令:取指 -> 解码 -> 执行),怎么办?

  • 做法流水线 (Pipeline)。当第一条指令在“执行”时,第二条指令已经在“解码”,第三条指令正在“取指”。
  • 收益:虽然单条指令的时延没变(甚至因为流水线控制略微变长),但吞吐率大幅提升。
  • 挑战:冒险
    • 数据冒险 (RAW):第二条指令要用第一条指令还没算出来的结果。 $\rightarrow$ 解法:数据前递
    • 控制冒险:遇到 if/else 跳转,不知道下一条该取谁。 $\rightarrow$ 解法:分支预测

3. 策略优化:用算法换时间

除了改架构,我们还可以在处理策略上使用trick。

3.1 批处理

  • 原理:很多操作有固定的启动开销。比如磁盘寻道,无论读 1字节还是 1MB,寻道时间都差不多。
  • 做法:攒一堆请求,一次性处理。
  • 收益:分摊了启动开销,提高了吞吐率。
  • 代价:增加了第一个请求的等待时延

3.2 等待

  • 这是一种“以退为进”的智慧。为了进行批处理,系统故意等待一小会儿,赌马上会有新请求到来,从而能合并操作。
  • 例子:Nagle 算法(TCP)、磁盘写入缓冲。

3.3 推测

  • 原理:既然等待有代价,不如。利用空闲资源提前执行可能需要的操作。
  • 案例:分支预测(猜你会跳进循环)、预取文件块。
  • 风险
    • 猜错了要回滚 (Undo),设计复杂。

    • 安全漏洞:这是 Spectre/Meltdown 攻击的根源(利用推测执行留下的痕迹窃取数据)。

      Spectre 攻击在最后安全部分会讲,非常非常精彩!!!

4. 调度问题:资源分配的艺术

当排队不可避免时,调度 决定了谁先谁后。

4.1 核心设计模式:策略与机制分离

这是系统设计的一条黄金法则(牛!!!!!!)

  • 调度策略决定谁该运行。是高层决策(如“优先服务 VIP”)。
  • 调度机制负责执行切换。是底层实现(如“保存寄存器,加载新栈”)。
  • 好处:我们可以随时更换策略(比如从 FIFO 换成 LRU),而不需要重写底层的机制代码。

4.2 常见策略

  • FCFS (先来先服务):公平,但容易被长任务阻塞。
  • SJF (最短任务优先):吞吐率最高,但长任务会“饿死”。
  • Round Robin (轮询):保证响应时间,适合交互式系统。

5. 对照

Lesson 14-15 Performance Engineering

Performance is the “currency” of computer systems. We pay with performance (spending this currency) to gain functionality, simplicity of abstraction, or system security.

With Moore’s Law fading, we can no longer simply wait for hardware to get faster. We must actively engage in Performance Engineering. The core goal is singular: Eliminate Bottlenecks.

1. Core Metrics & Trade-offs

Before optimizing, we must define what “good” looks like.

  • Latency: The time required to complete a task (time from request to response).
    • Goal: Minimize it.
  • Throughput: The number of tasks completed per unit of time.
    • Goal: Maximize it.
  • Utilization: The proportion of time resources are busy.
    • Trade-off: Too low is waste; Too high (near 100%) is a disaster.
    • Principle: According to queuing theory, as the system approaches saturation, queue lengths increase exponentially, causing latency to skyrocket.

Relationship: Without queuing, $\text{Throughput} = 1 / \text{Latency}$. However, once a bottleneck is hit, throughput caps, and latency begins to spike.

2. Structural Optimization: Speeding Up via Architecture

When a system hits a bottleneck, we have three “scalpels” to refactor the system.

2.1 Locality & Caching

This is the most universal phenomenon in computer science: Data access is not random.

  • Temporal Locality: Recently accessed data is likely to be accessed again soon. (Action: Cache hot data).
  • Spatial Locality: If address A is accessed, data near A is likely to be accessed soon. (Action: Pre-fetching).

Design Principle: Make the Common Case Fast

  • Distinguish between the Fast Path and the Slow Path.
  • Let 90% of common requests go through the highly optimized fast path (e.g., Cache Hit), and leave the remaining 10% for the slow path.
2.2 Parallelism
  • Idea: Split a large task into N small tasks and do them simultaneously.
  • Case: Google Search. A query is distributed to thousands of machines; each searches a fraction of the documents, and results are merged.
  • Constraints:
    • Amdahl’s Law: Parallel speedup is limited by the portion of the task that must be serial.
    • Tail Latency: Overall speed is determined by the slowest sub-task.
2.3 Interleaving & Pipelining

If tasks cannot be fully split and must be serial (e.g., CPU instructions: Fetch -> Decode -> Execute), what then?

  • Action: Pipeline. When the first instruction is “Executing,” the second is “Decoding,” and the third is “Fetching.”
  • Benefit: Although the latency of a single instruction hasn’t changed (or slightly increased due to overhead), Throughput increases dramatically.
  • Challenge: Hazards
    • Data Hazard (RAW): Instruction 2 needs a result from Instruction 1 that isn’t ready. $\rightarrow$ Solution: Data Forwarding.
    • Control Hazard: Encountering an if/else jump, not knowing which to fetch next. $\rightarrow$ Solution: Branch Prediction.

3. Strategic Optimization: Trading Algorithms for Time

Besides architectural changes, we can use clever processing strategies.

3.1 Batching
  • Principle: Many operations have fixed startup Overhead (e.g., Disk Seek time is similar whether reading 1 byte or 1 MB).
  • Action: Accumulate requests and process them in a bunch.
  • Benefit: Amortizes overhead, improving Throughput.
  • Cost: Increases Wait Latency for the first request in the batch.
3.2 Dallying
  • The wisdom of “advancing by retreating.” To enable batching, the system deliberately waits a bit, betting that new requests will arrive soon to be merged.
  • Example: Nagle’s Algorithm (TCP), Disk Write Buffers.
3.3 Speculation
  • Principle: Since waiting has a cost, why not guess? Use idle resources to execute operations that might be needed.
  • Case: Branch Prediction (guessing you’ll enter the loop), File Prefetching.
  • Risks:
    • Must Undo if guessed wrong (complex design).
    • Security Vulnerabilities: This is the root of Spectre/Meltdown attacks (stealing data via traces left by speculative execution).

4. Scheduling: The Art of Resource Allocation

When queuing is inevitable, Scheduling decides who goes first.

4.1 Core Design Pattern: Separation of Policy and Mechanism

A golden rule of system design.

  • Scheduling Policy: Decides WHO runs. High-level decision (e.g., “Prioritize VIPs”).
  • Scheduling Mechanism: Handles the SWITCH. Low-level implementation (e.g., “Save registers, load new stack”).
  • Benefit: We can swap policies (e.g., from FIFO to LRU) anytime without rewriting the underlying mechanism code.
4.2 Common Policies
  • FCFS (First Come First Served): Fair, but prone to Head-of-Line Blocking.
  • SJF (Shortest Job First): Highest throughput, but long tasks can “starve.”
  • Round Robin: Guarantees response time, suitable for interactive systems.

Author: linda1729
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source linda1729 !
评论
  TOC