Processor Components¶
Overview¶
This lecture bridges the gap from last class's 4-bit counter to a full RISC-V processor. We start by generalizing the counter into the canonical "state + combinational logic" picture that every digital system fits into, then identify the concrete state elements a processor needs (general-purpose registers, program counter, and data memory). Next we lay out the major processor components as a block diagram — labeling each as sequential or combinational — and walk through what it means for one instruction to complete per clock cycle. The rest of the lecture zooms in on the first three components: the program counter, the instruction memory (ROM), and the register file. The ALU and the instruction decoder come next class.
Learning Objectives¶
- Generalize a counter into the universal "state + combinational logic" model of digital systems
- Identify the state elements of a RISC-V processor: PC, register file, data memory
- Read and interpret a block diagram of the processor components, labeling each as sequential or combinational
- Distinguish single-cycle, multi-cycle, and pipelined processor designs
- Specify the Program Counter as a 64-bit register with synchronous clear
- Compute the word address that drives the instruction ROM from a byte-addressed PC
- Describe the register file interface: two read ports, one write port, write enable,
x0hard-wired to zero - Sketch the internal structure of a register file built from 32 registers, two 32:1 MUXes, and a 5-to-32 write decoder
Prerequisites¶
- Digital Design 2: Sequential Logic — D flip-flops, registers with enable and clear, 4-bit counter
- Digital Design — combinational logic, multiplexers, ripple-carry adder
- RISC-V Machine Code — instruction word format, opcodes, register numbers
- RISC-V Emulation — fetch/decode/execute cycle
From Counter to Processor¶
The 4-Bit Counter, Recapped¶
Last class we built a 4-bit counter from three pieces:
- A 4-bit register (four D flip-flops with synchronous enable and clear)
- A 4-bit ripple-carry adder that computes
register + 1 - A clock that drives the rising edge
On each rising edge the register captures the adder's output. The register holds the current count; the adder produces the next count.
Every Digital System Has the Same Shape¶
Generalize the counter. Replace "register + adder" with "state + combinational logic" and you have the canonical picture of every synchronous digital system — including a processor:
flowchart LR
CLK[CLK] --> STATE[STATE<br/>registers, PC,<br/>data memory]
STATE --> CL[Combinational<br/>Logic]
CL --> STATE
The state elements store the current values; the combinational logic computes the next values. Each rising edge promotes "next" to "current."
Processor State Elements¶
A RISC-V processor has three kinds of state:
| State Element | Purpose | Size |
|---|---|---|
| General-purpose registers | x0–x31 |
32 × 64 bits |
| Program counter (PC) | Address of current instruction | 64 bits |
| Data memory | Stack and heap storage | Variable (RAM) |
Everything else in the processor is combinational logic that reads from state and produces the next state.
What One Instruction Does¶
Consider a single RISC-V instruction:
Two things have to happen this clock cycle:
- Compute the result: read
t1andt2from the register file, add them in the ALU, write the sum back tot0. - Advance the PC:
PC = PC + 4so that the next rising edge fetches the next instruction.
Both happen simultaneously inside one clock cycle. The register file update and the PC update both occur at the rising edge.
For branches and jumps, step 2 becomes PC = BTA (branch target address) instead of PC + 4.
Processor Components¶
Here is the block diagram the whiteboard built up, with each component labeled S (sequential — holds state) or C (combinational — pure function of inputs):
flowchart LR
PC["PC<br/>(S)"] -->|addr| IM["Instruction<br/>Memory<br/>(C, ROM)"]
IM -->|IW| DEC["Instruction<br/>Decoder<br/>(C)"]
DEC --> RF["Register<br/>File<br/>(S)"]
RF -->|RD0, RD1| ALU["ALU<br/>(C)"]
ALU --> DM["Data<br/>Memory<br/>(S, RAM)"]
DM --> RF
Component Roles¶
| Component | Type | Function |
|---|---|---|
| PC | Sequential | Holds the address of the current instruction |
| Instruction Memory | Combinational (ROM) | Maps PC to a 32-bit instruction word |
| Instruction Decoder | Combinational | Extracts register numbers, immediates, control lines |
| Register File | Sequential | 32 general-purpose registers, x0–x31 |
| ALU | Combinational | Arithmetic, logical, shift operations |
| Data Memory | Sequential (RAM) | Load/store target memory |
One Instruction per Clock Cycle¶
Here is the picture of the clock driving the processor. Each rising edge completes one instruction:
init add sub beq
↓ ↓ ↓ ↓
CLK ────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌────
└────┘ └────┘ └────┘ └────┘
↑ ↑ ↑ ↑
reset PC execute execute execute
add sub beq
Between edges, values flow through the combinational logic (instruction fetch → decode → read registers → ALU → memory). On the rising edge, the new values are captured into PC, register file, and data memory.
Design Evolution¶
Processor designs have grown more complex over time:
| Design | Description |
|---|---|
| Single-cycle | Each instruction completes in one clock cycle |
| Multi-cycle | Instructions take a variable number of cycles (common in the 1970s) |
| Pipelined | Multiple instructions are in flight at once, each in a different stage |
Modern CPUs combine pipelining with out-of-order execution, superscalar issue, and speculative execution. In this course we build a single-cycle processor — the simplest design that can execute every RISC-V instruction.
Trade-off
In a single-cycle design, the clock period must be at least as long as the longest combinational path — typically the load-word path through the ALU and data memory. Faster instructions (like `add`) don't finish early; they wait for the next clock edge. Pipelining attacks this by letting each stage run at its own short latency.Program Counter (PC)¶
The PC is a 64-bit register that holds the address of the instruction currently being executed.
PC Signals¶
| Signal | Width | Direction | Description |
|---|---|---|---|
| D | 64 | Input | Next PC value |
| Q | 64 | Output | Current PC value |
| CLK | 1 | Input | Clock |
| EN | 1 | Input | Enable update (normally 1) |
| CLR | 1 | Input | Synchronous clear to 0 |
Update Rule¶
For sequential execution, the PC advances by 4 bytes each cycle (each RISC-V instruction is 4 bytes):
For branches and jumps, the PC receives a calculated target:
Adding CLR in Digital¶
The Digital simulator's built-in register component does not have a CLR input. We add one the same way we did for the 1-bit register last class — a synchronous clear via MUX:
flowchart LR
D[D] --> CM["CLR MUX"] --> FF["64-bit<br/>register<br/>(D, CLK, EN)"] --> Q[Q]
ZERO["constant 0"] --> CM
CLR[CLR] --> CM
CLK[CLK] --> FF
EN[EN] --> FF
When CLR = 1, the MUX routes 0 into the flip-flop's D input, so the next rising edge loads 0 instead of the incoming value. The CLR is synchronous — it still waits for the clock edge.
Instruction Memory¶
Instruction memory stores the program as 32-bit instruction words. We implement it with Digital's ROM (read-only memory) component, loaded from a .hex file produced by makerom3.py.
Specifications¶
| Parameter | Value | Notes |
|---|---|---|
| Data width | 32 bits | One RISC-V instruction |
| Address width | 8 bits (typical) | 2⁸ = 256 instructions |
| Type | ROM | Read-only, loaded from .hex |
Byte Address vs. Word Address¶
The PC holds a byte address, but the ROM is indexed by word address. Since each instruction is 4 bytes, the word address is the byte address shifted right by 2:
We do not build a shifter — we use a splitter on the PC and pick off the right bits. With an 8-bit ROM address, we take bits [9:2] of the PC:
flowchart LR
PC["PC<br/>(64 bits)"] -->|"bits [9:2]"| SPL["splitter"]
SPL -->|"8-bit word addr"| ROM["Instruction<br/>Memory (ROM)"]
ROM -->|"32-bit IW"| OUT["instruction word"]
Why bits [9:2]?
Bits 0 and 1 of the byte address are always 0 for an aligned 4-byte instruction — the PC advances by 4 every cycle. Bit 2 is the low bit of the word address. With 8 address bits (supporting 256 instructions), we need bits 2 through 9 of the byte address, written `PC[9:2]`.Register File¶
The register file holds the 32 general-purpose registers x0–x31. In a single clock cycle it must:
- Read up to two registers (for the two ALU operands)
- Write at most one register (for the instruction's destination)
Interface¶
| Signal | Width | Direction | Description |
|---|---|---|---|
| RR0 | 5 | Input | Read register number for port 0 |
| RR1 | 5 | Input | Read register number for port 1 |
| WR | 5 | Input | Write register number |
| WD | 64 | Input | Write data |
| WE | 1 | Input | Write enable |
| CLK | 1 | Input | Clock |
| CLR | 1 | Input | Clear all registers to 0 |
| RD0 | 64 | Output | Value of register RR0 |
| RD1 | 64 | Output | Value of register RR1 |
x0–x31 |
64 each | Output | Individual register values (debug) |
Reads are combinational: once RR0 or RR1 stabilizes, the selected register's value appears on RD0 or RD1 with no clock edge needed. Writes are sequential: on the rising edge, if WE = 1, the register selected by WR captures WD.
x0 Is Hard-Wired to Zero¶
RISC-V's x0 always reads as 0 and ignores writes:
In the circuit, x0 is not a real register — it is literally wired to a constant 0. Any write with WR = 0 has no effect because there is no register to update at index 0.
Internal Structure¶
The register file is built from 31 D-flip-flop-based registers (x1–x31) plus the hard-wired x0. Two 32:1 multiplexers select the read-port outputs. A 5-to-32 decoder, AND-ed with WE, drives the individual register enables.
flowchart LR
X0["x0 = 0<br/>(hard-wired)"] --> M0["32:1 MUX<br/>(RD0)"]
X1["x1"] --> M0
XN["... x31"] --> M0
X0 --> M1["32:1 MUX<br/>(RD1)"]
X1 --> M1
XN --> M1
RR0[RR0] --> M0
RR1[RR1] --> M1
M0 --> RD0[RD0]
M1 --> RD1[RD1]
WR[WR] --> DEC["5-to-32<br/>decoder"]
WE[WE] --> DEC
DEC --> X1
DEC --> XN
The decoder activates exactly one register's enable line (if WE = 1); on the next rising edge only that register captures WD.
Read vs. write timing
Reads are asynchronous — the MUX output tracks `RR0`/`RR1` directly. That matters because the ALU needs `RD0` and `RD1` *this cycle*, before the next rising edge. Writes are synchronous — the destination register only changes on the rising edge. Thus within a single cycle, an instruction reads the *old* value of `RR0`/`RR1` and writes the *new* value to `WR`, and both are consistent.Coming Up¶
Next class we will complete the datapath by covering:
- The ALU: a combinational unit that performs add, sub, mul, shift, and logical operations
- The instruction decoder: extracts opcode,
funct3,funct7, register numbers, and immediate values from each instruction word, and drives the control lines that tell every other component what to do
With those two pieces, the single-cycle processor will be able to run a small program end-to-end.
Practice Problems¶
Problem 1: PC Calculation¶
After executing a jal that jumps forward 20 instructions, the PC is 0x1000. What was the PC before the jal?
Solution
Twenty instructions is `20 × 4 = 80 = 0x50` bytes. The PC before the jump was `0x1000 - 0x50 = 0xFB0`.Problem 2: Writing to x0¶
A program contains the instruction add x0, x5, x6. What happens on the rising edge?
Solution
Nothing visible. The ALU computes `x5 + x6` and `WD` is driven with that value, but `x0` is hard-wired to 0 — there is no flip-flop to update. The next read of `x0` still returns 0. This is exactly why some assemblers emit `add x0, x0, x0` as a NOP.Problem 3: Address Extraction¶
A Digital ROM is configured with 10 address bits. Which PC bits do we feed to the ROM?
Solution
Bits `[11:2]` of the PC. Bits 0 and 1 are always 0 for aligned 4-byte instructions; bits 2 through 11 give us the 10-bit word address needed to index 2¹⁰ = 1024 instructions.Problem 4: Read vs. Write in the Same Cycle¶
An instruction has RR0 = 5, RR1 = 7, WR = 5, WE = 1. If the ALU computes WD = 42, what value does RD0 carry during this clock cycle?
Solution
`RD0` carries the *old* value of `x5`. Reads are combinational and reflect the current state of the register file; the write of `42` into `x5` only happens on the rising edge at the *end* of the cycle. Starting next cycle, `x5` reads as 42.Problem 5: Why Two 32:1 MUXes?¶
Why does the register file need two 32:1 MUXes on the read side instead of one?
Solution
Every R-type instruction reads two source registers simultaneously (e.g., `add t0, t1, t2` reads `t1` and `t2` together). The two ALU operands `RD0` and `RD1` must both be available within a single clock cycle, so we need two independent read ports — each with its own 32:1 MUX selected by its own read-register number (`RR0` and `RR1`).Key Concepts¶
| Concept | Description |
|---|---|
| State + CL | Every synchronous digital system = storage elements + combinational logic |
| Sequential components | PC, register file, data memory — hold values across clock edges |
| Combinational components | Instruction memory (ROM), decoder, ALU — pure functions |
| Single-cycle | One instruction completes per clock cycle |
| Byte vs. word address | PC is byte-addressed; instruction ROM is word-addressed — use a splitter |
| x0 | Hard-wired to zero; writes are silently discarded |
| Two reads + one write | Register file services both ALU operands and the destination in one cycle |
Summary¶
- Every synchronous digital system has the same shape: state elements driven by combinational logic, connected in a feedback loop through the clock.
- A RISC-V processor's state is its general-purpose registers, program counter, and data memory. Everything else is pure combinational logic.
- In a single-cycle processor, one instruction finishes every clock cycle. The clock period must cover the longest combinational path.
- The program counter is a 64-bit register with a synchronous-clear MUX. It normally advances by 4; for branches and jumps it takes a calculated target address.
- The instruction memory is a ROM addressed by the top bits of the PC (bits
[9:2]for an 8-bit ROM address), producing a 32-bit instruction word. - The register file has two combinational read ports and one synchronous write port.
x0is hard-wired to zero; the other 31 registers are selected by 32:1 MUXes for reads and by a 5-to-32 decoder ANDWEfor writes.
Further Reading¶
- Processor Design Part 1 — the 2024 guide, with full diagrams for the PC, instruction memory, register file, and ALU
- Processor Design Part 2 — the register decoder, immediate decoder, and instruction decoder (covered in later lectures)
- Patterson & Hennessy, Computer Organization and Design, RISC-V Edition, Chapter 4 — single-cycle datapath