6.5950 Lecture 1
Hardware Attacks on the Spotlight
- Rowhammer
- Spectre
- With branch in hand
- Correlates to branch prediction exploitation
Hardware attacks
The attacks target a key micro-architecture mechanism of the processor: speculative execution.
It is not a bug, which is a mis-implementation of the spec, it just means the spec itself is wrong.
Performance vs Security vs Power Tradeoffs
It is hard to make a processor that is faster, more secure, and doesn’t use as much power. Sort of a rope-pulling problem.
Usually get ad-hoc mitigation solutions.
Hardware Security Features
Example: Apple M1 chip. Secure enclave.
System Abstractions
- Programs –> Virtual Machine
- System Software –> ISA
- Comp Arch.
- Digital Circuits –> Digital Abstraction
- Analog Circuits; Devices (transistors)
You really need to know how these things are connected together to exploit the interactions between the abstractions
Demo: Abstractions Hides Details
Program 1
Sums up integers 1 to 10.
for (i = 0; i < 10; i++) {
sum += 1;
}
Program 2
printf("Hello world");
Which one will take more instructions? Program 1: 85 Program 2: 616
And this only counts non-kernel instructions.
Counting kernel: Program 1: 1240 Program 2: 5321
Keep in mind that it’s non-deterministic due to other things happening on computer, but this shows the point. Point in case: abstractions hide away a lot of detail.
6.004 Review
Basic Architecture Concepts
Let’s say we are running a program. We have a laptop. The laptop has:
- CPU
- main memory (DRAM, 16GB)
- non-volatile storage (Disk, SSD, etc.)
How to execute the program? Program is usually stored in non-volatile storage. What should happen? Pull things into main memory, but this is not a simple procedure.
In the old days, no OS. Non-volatile storage is like the punchcard, and you need to manually put things into the punchcard reader (which goes to main memory). To run program, look inside CPU, we have a special register called PC (program counter), which points to the place where we can read the program, which issues a command / instruction. The memory replies and we write the instruction into the instruction buffer/queue. This process is called “fetch.” Now we have to decode the instruction. For example, add r1, r2, r3. After decoding, we are going to execute it. We have a register file, from which we can read source data, write destination data, etc. After performing the right operation with the ALU, we “write-back.” Although, we might be wanting to use something from the memory (includes cache) instead, based on instruction. We have a selector at the writeback stage Overall, basic idea:
- Fetch
- Decode
- Execute
- Access memory
- Writeback (from memory, or from ALU, based on instruction)
Timing:
t_clk = t_f + t_d + t_e + t_m + t_wb
To make this faster, we can employ some ideas, like:
- pipelining (allow multiple stages to execute at once)
Timing becomes:
t_clk = max(t_f, t_d, t_e, t_m , t_wb)
End up getting higher throughput. Still some issues from allowing us to get the best throughput:
- data hazard
- control hazard
Some solutions:
- Stall
- Bypass
- Guess / Speculate