6.5950 Lecture 1

Hardware Attacks on the Spotlight

Rowhammer
Spectre
- With branch in hand
- Correlates to branch prediction exploitation

Hardware attacks

The attacks target a key micro-architecture mechanism of the processor: speculative execution.

It is not a bug, which is a mis-implementation of the spec, it just means the spec itself is wrong.

Performance vs Security vs Power Tradeoffs

It is hard to make a processor that is faster, more secure, and doesn’t use as much power. Sort of a rope-pulling problem.

Usually get ad-hoc mitigation solutions.

Hardware Security Features

Example: Apple M1 chip. Secure enclave.

System Abstractions

Programs –> Virtual Machine
System Software –> ISA
Comp Arch.
Digital Circuits –> Digital Abstraction
Analog Circuits; Devices (transistors)

You really need to know how these things are connected together to exploit the interactions between the abstractions

Demo: Abstractions Hides Details

Program 1

Sums up integers 1 to 10.

for (i = 0; i < 10; i++) {
  sum += 1;
}

Program 2

printf("Hello world");

Which one will take more instructions? Program 1: 85 Program 2: 616

And this only counts non-kernel instructions.

Counting kernel: Program 1: 1240 Program 2: 5321

Keep in mind that it’s non-deterministic due to other things happening on computer, but this shows the point. Point in case: abstractions hide away a lot of detail.

6.004 Review

Basic Architecture Concepts

Let’s say we are running a program. We have a laptop. The laptop has:

CPU
main memory (DRAM, 16GB)
non-volatile storage (Disk, SSD, etc.)

How to execute the program? Program is usually stored in non-volatile storage. What should happen? Pull things into main memory, but this is not a simple procedure.

In the old days, no OS. Non-volatile storage is like the punchcard, and you need to manually put things into the punchcard reader (which goes to main memory). To run program, look inside CPU, we have a special register called PC (program counter), which points to the place where we can read the program, which issues a command / instruction. The memory replies and we write the instruction into the instruction buffer/queue. This process is called “fetch.” Now we have to decode the instruction. For example, add r1, r2, r3. After decoding, we are going to execute it. We have a register file, from which we can read source data, write destination data, etc. After performing the right operation with the ALU, we “write-back.” Although, we might be wanting to use something from the memory (includes cache) instead, based on instruction. We have a selector at the writeback stage Overall, basic idea:

Fetch
Decode
Execute
Access memory
Writeback (from memory, or from ALU, based on instruction)

Timing:

t_clk = t_f + t_d + t_e + t_m + t_wb

To make this faster, we can employ some ideas, like:

pipelining (allow multiple stages to execute at once)

Timing becomes:

t_clk = max(t_f, t_d, t_e, t_m , t_wb)

End up getting higher throughput. Still some issues from allowing us to get the best throughput:

data hazard
control hazard

Some solutions:

Stall
Bypass
Guess / Speculate

Date: February 3, 2025

Tags: lecture