RISC-V Emulation¶

Overview¶

This lecture introduces the design and implementation of a RISC-V emulator in Rust. An emulator simulates the behavior of a processor by maintaining software representations of CPU state (registers, program counter) and memory, then executing instructions by decoding machine code and updating that state. We cover the fetch-decode-execute cycle, instruction decoding with pattern matching, Rust-specific challenges like raw pointers and unsafe code, and detailed implementation strategies for load and store instructions.

Learning Objectives¶

Understand the architecture of a software CPU emulator
Implement the fetch-decode-execute cycle for RISC-V instructions in Rust
Use raw pointers and unsafe blocks to perform memory operations required by emulation
Decode instruction words to extract opcode, registers, and immediates
Handle different instruction formats (R, I, S, B, J, RW types)
Implement load and store instructions using Rust pointer operations

Prerequisites¶

RISC-V Machine Code — instruction encoding and bit extraction in Rust
RISC-V Assembly 2 — control flow, branches, jumps
RISC-V Assembly Functions — calling convention, stack frames

What is an Emulator?¶

An emulator is software that mimics the behavior of hardware. A CPU emulator:

Maintains a software representation of processor state (registers, PC, stack)
Reads machine code from memory
Decodes each instruction and performs the specified operation
Updates state (registers, memory, PC) to reflect execution

Our emulator will execute real RISC-V machine code — the same binary instructions that a physical RISC-V processor would run. The assembly functions compiled in our project are actual machine code sitting in memory. The emulator reads those bytes and simulates what the hardware would do.

graph LR
    subgraph "Real RISC-V Processor"
        A[Hardware Registers]
        B[Hardware PC]
        C[Execution Unit]
    end

    subgraph "Our Rust Emulator"
        D["regs: [u64; 32]"]
        E["pc: *const u8"]
        F["match opcode { ... }"]
    end

    G[Same Machine Code] --> C
    G --> F

Emulator State in Rust¶

The RvState Struct¶

The emulator maintains all processor state in a Rust struct:

const RV_NUM_REGS: usize = 32;
const STACK_SIZE: usize = 8192;

pub struct RvState {
    pub regs: [u64; RV_NUM_REGS],  // 32 general-purpose registers
    pub pc: *const u8,              // Program counter (raw pointer)
    pub stack: [u8; STACK_SIZE],    // Emulated stack memory
}

Why a Raw Pointer for PC?¶

The program counter (pc) is a raw pointer (*const u8) rather than a regular Rust reference or integer. This is because:

PC points to real memory — the compiled assembly functions live at actual memory addresses. PC must point directly to those bytes.
Pointer arithmetic — we need to advance PC by 4 bytes per instruction, or jump forward/backward by arbitrary byte offsets for branches. Raw pointers support this via ptr.add() and ptr.offset().
Fetching instructions — we dereference PC to read the 32-bit instruction word, which requires casting to *const u32.

ABI Register Constants¶

const RV_ZERO: usize = 0;   // x0 — hardwired zero
const RV_RA: usize = 1;     // x1 — return address
const RV_SP: usize = 2;     // x2 — stack pointer
const RV_A0: usize = 10;    // x10 — argument 0 / return value
const RV_A1: usize = 11;    // x11 — argument 1
const RV_A2: usize = 12;    // x12 — argument 2
const RV_A3: usize = 13;    // x13 — argument 3

These are usize so they can be used directly as array indices into regs.

Initialization¶

Before emulating a function, we set up the initial processor state:

pub fn rv_init(state: &mut RvState, target: *const u32,
               a0: u64, a1: u64, a2: u64, a3: u64) {
    // Point PC to the function's machine code
    state.pc = target as *const u8;

    // Load function arguments into a0-a3
    state.regs[RV_A0] = a0;
    state.regs[RV_A1] = a1;
    state.regs[RV_A2] = a2;
    state.regs[RV_A3] = a3;

    // x0 is always zero
    state.regs[RV_ZERO] = 0;

    // Return address = null (halt sentinel)
    state.regs[RV_RA] = 0;

    // Stack pointer = top of emulated stack
    state.regs[RV_SP] = unsafe {
        state.stack.as_ptr().add(STACK_SIZE) as u64
    };
}

Key points:

target is a function pointer cast to *const u32 — it's the address of the first instruction of the assembly function
ra is set to null (0) — this is the halt sentinel. When the function returns, ret jumps to ra, which sets PC to null, ending emulation
Stack pointer points to the top (end) of the stack array, because the stack grows downward

Memory Layout¶

graph TD
    subgraph "RvState"
        A["regs[32] — 64-bit registers"]
        B["pc — raw pointer to code"]
        C["stack[8192] — emulated stack"]
    end

    subgraph "Host Memory"
        D["Assembly Function<br>(actual machine code bytes)"]
    end

    B -.->|"points to"| D
    E["SP (regs[2])"] -.->|"points into"| C

The Fetch-Decode-Execute Cycle¶

Every CPU — real or emulated — operates in a continuous cycle:

flowchart LR
    A[Fetch] --> B[Decode]
    B --> C[Execute]
    C --> D[Update State]
    D --> A

The Main Loop¶

pub fn rv_emulate(state: &mut RvState) -> u64 {
    while !state.pc.is_null() {
        rv_one(state);
        state.regs[RV_ZERO] = 0;  // x0 must always be 0
    }
    state.regs[RV_A0]  // Return value in a0
}

The loop continues until PC becomes null (our halt sentinel). After each instruction, we force x0 back to zero — if any instruction writes to x0, the write is silently discarded.

Executing One Instruction¶

fn rv_one(state: &mut RvState) {
    // FETCH: Read 32-bit instruction word at PC
    let iw = unsafe { *(state.pc as *const u32) };

    // DECODE: Extract opcode (bits [6:0])
    let opcode = get_bits(iw as u64, 0, 7);

    // EXECUTE: Dispatch to format handler
    match opcode {
        FMT_R       => run_r_format(state, iw),
        FMT_I_ARITH => run_i_arith(state, iw),
        FMT_I_JALR  => run_i_jalr(state, iw),
        _           => unsupported("format", opcode),
    }
}

Fetch requires unsafe because we dereference a raw pointer. We cast pc (a *const u8) to *const u32 to read a full 32-bit instruction word.

Opcode Constants¶

Each instruction format has a unique opcode in bits [6:0]:

const FMT_R: u32       = 0b0110011;  // Register operations (add, sub, etc.)
const FMT_RW: u32      = 0b0111011;  // Word operations (sllw, srlw, sraw)
const FMT_I_JALR: u32  = 0b1100111;  // JALR (jump and link register)
const FMT_I_ARITH: u32 = 0b0010011;  // Arithmetic immediates (addi, slli)

Bit Manipulation Utilities¶

These were introduced in the Machine Code lecture. Here are the Rust implementations:

/// Extract `count` bits starting at position `start`
pub fn get_bits(num: u64, start: u32, count: u32) -> u32 {
    let mask: u64 = (1u64 << count) - 1;
    ((num >> start) & mask) as u32
}

/// Extract a single bit as a boolean
pub fn get_bit(num: u64, which: u32) -> bool {
    ((num >> which) & 1) != 0
}

/// Sign-extend a value from `start` bits to 64 bits
pub fn sign_extend(num: u64, start: u32) -> i64 {
    let dist = 64 - start;
    let shifted = (num << dist) as i64;
    shifted >> dist
}

And the field extraction helpers:

fn get_rd(iw: u32) -> usize     { get_bits(iw as u64, 7, 5) as usize }
fn get_funct3(iw: u32) -> u32   { get_bits(iw as u64, 12, 3) }
fn get_rs1(iw: u32) -> usize    { get_bits(iw as u64, 15, 5) as usize }
fn get_rs2(iw: u32) -> usize    { get_bits(iw as u64, 20, 5) as usize }
fn get_funct7(iw: u32) -> u32   { get_bits(iw as u64, 25, 7) }

Note that get_rd, get_rs1, and get_rs2 return usize so they can be used directly as array indices: state.regs[rd].

Unsafe Rust for Emulation¶

Emulation requires several operations that Rust's safety system cannot verify at compile time. These must be wrapped in unsafe blocks.

Why Unsafe?¶

The emulator needs to:

Dereference raw pointers — reading the instruction word at PC
Perform pointer arithmetic — advancing PC, computing branch targets
Read arbitrary memory addresses — load instructions read from addresses computed at runtime
Write arbitrary memory addresses — store instructions write to computed addresses

None of these can be checked by the Rust compiler, so they require unsafe.

Pointer Operations Summary¶

Operation	Rust Code	Use Case
Read instruction	`unsafe { (pc as const u32) }`	Fetch stage
Advance PC by 4 bytes	`unsafe { pc.add(4) }`	After most instructions
Jump by signed offset	`unsafe { pc.offset(n as isize) }`	Branches, JAL
Read from memory	`unsafe { std::ptr::read_unaligned(addr as *const T) }`	Load instructions
Write to memory	`unsafe { std::ptr::write_unaligned(addr as *mut T, val) }`	Store instructions
Integer to pointer	`sum as *const u8`	JALR jump target
Pointer to integer	`pc as u64`	Saving return address

`ptr.add()` vs `ptr.offset()`¶

ptr.add(n) — advances the pointer by n bytes (unsigned). Used to advance PC to the next instruction: pc.add(4).
ptr.offset(n) — moves the pointer by n bytes (signed, takes isize). Used for branches and jumps where the offset can be negative (jumping backward).

R-Type Instructions¶

Format¶

┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ funct7   │ rs2      │ rs1      │ funct3   │ rd       │ opcode   │
│ [31:25]  │ [24:20]  │ [19:15]  │ [14:12]  │ [11:7]   │ [6:0]    │
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘

Implementation¶

R-type instructions perform register-to-register operations. The operation is determined by matching on (funct3, funct7):

fn run_r_format(s: &mut RvState, iw: u32) {
    let rd = get_rd(iw);
    let funct3 = get_funct3(iw);
    let funct7 = get_funct7(iw);
    let rs1 = get_rs1(iw);
    let rs2 = get_rs2(iw);

    match (funct3, funct7) {
        (0b000, 0b0000000) => {
            // ADD
            s.regs[rd] = s.regs[rs1].wrapping_add(s.regs[rs2]);
        }
        (0b000, 0b0000001) => {
            // MUL
            s.regs[rd] = s.regs[rs1].wrapping_mul(s.regs[rs2]);
        }
        (0b001, 0b0000000) => { /* SLL */ }
        (0b101, 0b0000000) => { /* SRL */ }
        (0b101, 0b0100000) => { /* SRA */ }
        (0b111, 0b0000000) => { /* AND */ }
        (0b100, 0b0000001) => { /* DIV */ }
        _ => unsupported("R-type funct3", funct3),
    }

    s.pc = unsafe { s.pc.add(4) };
}

Wrapping Arithmetic¶

In Rust, arithmetic on unsigned integers panics on overflow in debug mode. Since RISC-V arithmetic wraps naturally, we must use the wrapping_* methods:

// This panics in debug mode if overflow occurs:
s.regs[rd] = s.regs[rs1] + s.regs[rs2];  // Don't do this!

// This wraps correctly, matching RISC-V behavior:
s.regs[rd] = s.regs[rs1].wrapping_add(s.regs[rs2]);  // Correct

Use wrapping_add, wrapping_sub, wrapping_mul, and wrapping_div for arithmetic operations.

I-Type Instructions¶

Format¶

┌──────────────┬──────────┬──────────┬──────────┬──────────┐
│ imm[11:0]    │ rs1      │ funct3   │ rd       │ opcode   │
│ [31:20]      │ [19:15]  │ [14:12]  │ [11:7]   │ [6:0]    │
└──────────────┴──────────┴──────────┴──────────┴──────────┘

Three Variants¶

I-type instructions share the same format but have three different opcodes:

Variant	Opcode	Instructions
I_ARITH	`0b0010011`	addi, slli, srli, srai
I_LOAD	`0b0000011`	lb, lw, ld
I_JALR	`0b1100111`	jalr

Each variant is handled by its own function:

I-Type Arithmetic¶

fn run_i_arith(s: &mut RvState, iw: u32) {
    let rd = get_rd(iw);
    let funct3 = get_funct3(iw);
    let v1 = s.regs[get_rs1(iw)];
    let imm = sign_extend(get_bits(iw as u64, 20, 12) as u64, 12);
    let shamt = get_bits(iw as u64, 20, 6);

    match funct3 {
        0b000 => { s.regs[rd] = v1.wrapping_add(imm as u64); } // ADDI
        0b001 => { s.regs[rd] = v1 << shamt; }                  // SLLI
        0b101 => { /* SRLI / SRAI depending on bit 30 */ }
        _ => unsupported("I-arith funct3", funct3),
    }

    s.pc = unsafe { s.pc.add(4) };
}

I-Type Load¶

fn run_i_load(s: &mut RvState, iw: u32) {
    let rd = get_rd(iw);
    let funct3 = get_funct3(iw);
    let v1 = s.regs[get_rs1(iw)];
    let imm = sign_extend(get_bits(iw as u64, 20, 12) as u64, 12);
    let addr = v1.wrapping_add(imm as u64);

    match funct3 {
        0b000 | 0b010 | 0b011 => {
            s.regs[rd] = run_load(funct3, addr); // LB / LW / LD
        }
        _ => unsupported("I-load funct3", funct3),
    }

    s.pc = unsafe { s.pc.add(4) };
}

I-Type JALR¶

fn run_i_jalr(s: &mut RvState, iw: u32) {
    let rd = get_rd(iw);
    let v1 = s.regs[get_rs1(iw)];
    let imm = sign_extend(get_bits(iw as u64, 20, 12) as u64, 12);
    let target = v1.wrapping_add(imm as u64);

    // Link: save return address in rd
    if rd != 0 {
        s.regs[rd] = (s.pc as u64).wrapping_add(4);
    }

    // Jump — do NOT advance PC
    s.pc = target as *const u8;
}

Notice that JALR sets PC directly — it does not advance PC by 4 afterwards.

Load Instructions — Working with Pointers¶

Load instructions are the most pointer-intensive part of the emulator. They read data from memory at an address computed from a register value plus a sign-extended offset.

Computing the Target Address¶

For an instruction like lw a0, 4(sp):

// rs1 holds the base address (e.g., stack pointer value)
let base: u64 = s.regs[get_rs1(iw)];  // e.g., sp value

// The 12-bit immediate is the offset (e.g., 4)
let imm = sign_extend(get_bits(iw as u64, 20, 12) as u64, 12);

// Target address = base + offset
let addr: u64 = base.wrapping_add(imm as u64);

The address is a u64 integer that represents a memory location. To actually read from that location, we must cast it to a pointer.

Reading from Memory¶

fn run_load(width: u32, addr: u64) -> u64 {
    unsafe {
        match width {
            0 => std::ptr::read_unaligned(addr as *const u8) as u64,  // LB
            2 => std::ptr::read_unaligned(addr as *const u32) as u64, // LW
            3 => std::ptr::read_unaligned(addr as *const u64),        // LD
            _ => panic!("unsupported load width"),
        }
    }
}

Why `read_unaligned`?¶

Rust's normal pointer dereference (*ptr) requires the pointer to be aligned — a *const u32 must point to a 4-byte-aligned address, and *const u64 must be 8-byte-aligned. RISC-V code might compute addresses that don't meet these alignment requirements.

std::ptr::read_unaligned reads the value regardless of alignment, which is what we need:

// This may cause undefined behavior if addr is not aligned:
let value = unsafe { *(addr as *const u32) };  // Risky!

// This works regardless of alignment:
let value = unsafe { std::ptr::read_unaligned(addr as *const u32) };  // Safe

The Integer-to-Pointer Cast¶

The key insight is that register values are integers, but they represent memory addresses. When a load instruction computes base + offset, the result is a u64 that we must interpret as a pointer:

let addr: u64 = base + offset;  // This is a number

// Cast the number to a pointer to read memory at that address
let ptr: *const u32 = addr as *const u32;
let value: u32 = unsafe { std::ptr::read_unaligned(ptr) };

Load Width¶

The funct3 field determines how many bytes to read:

funct3	Instruction	Width	Rust Type
`0b000`	`lb`	1 byte	`*const u8`
`0b010`	`lw`	4 bytes	`*const u32`
`0b011`	`ld`	8 bytes	`*const u64`

All loaded values are widened to u64 (using as u64) before storing in the destination register, since all registers are 64 bits wide.

S-Type Instructions (Stores)¶

Store instructions write register values to memory. They are the mirror of loads, but with a split immediate.

Format¶

┌──────────────┬──────────┬──────────┬──────────┬──────────────┬──────────┐
│ imm[11:5]    │ rs2      │ rs1      │ funct3   │ imm[4:0]     │ opcode   │
│ [31:25]      │ [24:20]  │ [19:15]  │ [14:12]  │ [11:7]       │ [6:0]    │
└──────────────┴──────────┴──────────┴──────────┴──────────────┴──────────┘

The immediate is split across two fields. This is because the rs2 field (the value to store) occupies the position where rd would be in other formats, and RISC-V keeps register fields in consistent positions.

Reassembling the Immediate¶

let off_4_0 = get_bits(iw as u64, 7, 5);    // Lower 5 bits
let off_11_5 = get_bits(iw as u64, 25, 7);  // Upper 7 bits
let offset = (off_11_5 << 5) | off_4_0;     // Combine into 12 bits

Writing to Memory¶

fn run_s_format(s: &mut RvState, iw: u32) {
    let off_4_0 = get_bits(iw as u64, 7, 5);
    let off_11_5 = get_bits(iw as u64, 25, 7);
    let offset = (off_11_5 << 5) | off_4_0;

    let width = get_funct3(iw);
    let rs1 = get_rs1(iw);
    let rs2 = get_rs2(iw);

    // Target address = base register + sign-extended offset
    let target = s.regs[rs1].wrapping_add(offset as u64);

    unsafe {
        match width {
            0 => std::ptr::write_unaligned(
                     target as *mut u8, s.regs[rs2] as u8),   // SB
            3 => std::ptr::write_unaligned(
                     target as *mut u64, s.regs[rs2]),         // SD
            _ => panic!("unsupported store width"),
        }
    }

    s.pc = unsafe { s.pc.add(4) };
}

Store Width¶

funct3	Instruction	Width	Written Type
`0b000`	`sb`	1 byte	`*mut u8` — truncate with `as u8`
`0b010`	`sw`	4 bytes	`*mut u32` — truncate with `as u32`
`0b011`	`sd`	8 bytes	`*mut u64` — full 64-bit value

For sb and sw, the register value is truncated to the appropriate width using Rust's as cast.

Loads vs Stores: Pointer Direction¶

flowchart LR
    subgraph "Load (lb, lw, ld)"
        A[Memory] -->|"read_unaligned<br>*const T"| B[Register]
    end

    subgraph "Store (sb, sw, sd)"
        C[Register] -->|"write_unaligned<br>*mut T"| D[Memory]
    end

Loads use *const T (read-only pointer) and read_unaligned
Stores use *mut T (mutable pointer) and write_unaligned

B-Type Instructions (Branches)¶

Format¶

┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ imm[12]  │ imm[10:5]│ rs2      │ rs1      │ funct3   │ imm[4:1] │ imm[11]  │ opcode   │
│ [31]     │ [30:25]  │ [24:20]  │ [19:15]  │ [14:12]  │ [11:8]   │ [7]      │ [6:0]    │
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘

The immediate is scattered across the instruction for hardware efficiency.

Reconstructing the Branch Offset¶

fn b_offset(iw: u32) -> i64 {
    let bit12 = if get_bit(iw as u64, 31) { 1u32 } else { 0 };
    let bit11 = if get_bit(iw as u64, 7) { 1u32 } else { 0 };
    let bits10_5 = get_bits(iw as u64, 25, 6);
    let bits4_1 = get_bits(iw as u64, 8, 4);

    // Reassemble: imm[12|11|10:5|4:1|0]
    // Bit 0 is always 0 (2-byte alignment)
    let offset = (bit12 << 12) | (bit11 << 11)
               | (bits10_5 << 5) | (bits4_1 << 1);

    sign_extend(offset as u64, 13)
}

Implementation¶

fn run_b_format(s: &mut RvState, iw: u32) {
    let funct3 = get_funct3(iw);
    let v1 = s.regs[get_rs1(iw)];
    let v2 = s.regs[get_rs2(iw)];

    let taken = match funct3 {
        0b000 => v1 == v2,                          // BEQ
        0b100 => (v1 as i64) < (v2 as i64),         // BLT (signed)
        _ => unsupported("B-type funct3", funct3),
    };

    if taken {
        let offset = b_offset(iw);
        s.pc = unsafe { s.pc.offset(offset as isize) };
    } else {
        s.pc = unsafe { s.pc.add(4) };
    }
}

Signed vs Unsigned Comparison¶

Registers store u64 values. For signed comparisons (blt, bge), we cast to i64:

// Unsigned comparison (bltu, bgeu):
v1 < v2

// Signed comparison (blt, bge):
(v1 as i64) < (v2 as i64)

This reinterprets the bit pattern as a two's complement signed value.

Branch PC Update¶

Taken: pc.offset(offset as isize) — jump by a signed byte offset relative to the current instruction
Not taken: pc.add(4) — advance to the next instruction

J-Type Instructions (JAL)¶

Format¶

┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│ imm[20]    │ imm[10:1]  │ imm[11]    │ imm[19:12] │ rd         │ opcode     │
│ [31]       │ [30:21]    │ [20]       │ [19:12]    │ [11:7]     │ [6:0]      │
└────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘

Implementation¶

fn run_j_format(s: &mut RvState, iw: u32) {
    let rd = get_rd(iw);

    // Link: save return address (PC + 4) in rd
    if rd != 0 {
        s.regs[rd] = (s.pc as u64).wrapping_add(4);
    }

    // Extract scattered immediate bits
    ... 
    // Reassemble and sign-extend
    let offset = ...

    let signed_offset = sign_extend(offset as u64, 20).wrapping_mul(2);

    // Jump
    s.pc = unsafe { s.pc.offset(signed_offset as isize) };
}

JAL vs J (pseudo-instruction)¶

jal ra, label — saves return address in ra, then jumps
j label — pseudo-instruction for jal x0, label (discards link since x0 is always 0)

When rd = 0, the if rd != 0 check skips the link, matching the j pseudo-instruction behavior.

Stopping the Emulator¶

The Null Sentinel¶

The emulator runs until PC becomes null:

// During initialization:
state.regs[RV_RA] = 0;  // ra = null

// When the function executes 'ret' (= jalr x0, 0(ra)):
// PC = ra + 0 = null

// The main loop exits:
while !state.pc.is_null() {
    rv_one(state);
}

flowchart TD
    A["rv_init: ra = null"] --> B["Emulate instructions"]
    B --> C{"pc.is_null()?"}
    C -->|No| B
    C -->|Yes| D["Return regs[a0]"]

    E["ret instruction<br>(jalr x0, 0(ra))"] -->|"pc = ra = null"| C

Incremental Development Strategy¶

Building an emulator is best done incrementally, one instruction at a time:

flowchart TD
    A["1. Pick a target program<br>(start with quadratic)"] --> B["2. Run it — see which<br>instruction is unsupported"]
    B --> C["3. Identify the format<br>from the opcode"]
    C --> D["4. Decode the instruction<br>fields"]
    D --> E["5. Implement the<br>operation"]
    E --> F["6. Test: does Emu<br>match Rust and Asm?"]
    F -->|"No"| B
    F -->|"Yes"| G["7. Move to next<br>program"]

Suggested order:

quadratic — needs addi, add, sub, mul, jalr (simple arithmetic, no memory)
midpoint — adds srai (arithmetic shift)
max3 — adds branches (bge, blt) and jal
get_bitseq — adds sll, srl, and
to_upper — adds lb, sb (byte loads/stores, loops)
swap — adds lw, sw (word loads/stores)
sort — adds sd, ld (doubleword stores/loads, function calls)
fib_rec — tests recursive function calls (heavy stack usage)

Key Concepts¶

Concept	Description
Emulator	Software that simulates hardware behavior by maintaining state and executing instructions
Fetch	Reading the 32-bit instruction word from memory at PC
Decode	Extracting opcode, registers, and immediates from the instruction word
Execute	Performing the operation and updating registers, memory, and PC
Raw Pointer	`const u8` / `mut u8` — Rust's unchecked pointer type, needed for memory operations
`unsafe` block	Rust syntax for operations the compiler cannot verify (pointer dereference, etc.)
`read_unaligned`	Safe way to read from potentially unaligned memory addresses
`write_unaligned`	Safe way to write to potentially unaligned memory addresses
`wrapping_add`	Arithmetic that wraps on overflow instead of panicking
Halt Sentinel	Null PC signals the emulator to stop (set via `ra = 0` at init)
Split Immediate	Immediate value scattered across multiple fields (S/B/J types)

Practice Problems¶

Problem 1: Trace ADDI Execution¶

Given state.regs[11] = 100 and the instruction word 0x00A58593, trace through the emulator.

Show Solution

**Step 1: Extract opcode**

0x00A58593 = 0000 0000 1010 0101 1000 0101 1001 0011
opcode = 0010011 = FMT_I_ARITH

**Step 2: Extract fields**

imm[11:0] = 000000001010 = 10
rs1 = 01011 = 11 (a1)
funct3 = 000 = ADDI
rd = 01011 = 11 (a1)

**Step 3: Execute in Rust**

let imm = sign_extend(10, 12);  // imm = 10
let sum = 100u64.wrapping_add(10u64);  // sum = 110
s.regs[11] = 110;
s.pc = unsafe { s.pc.add(4) };

**Answer:** `addi a1, a1, 10` — sets a1 to 110, advances PC by 4.

Problem 2: Implement a Load Word¶

Write the code to execute lw a0, 8(sp) given that state.regs[2] (sp) contains address 0x7fff1000.

Show Solution

// Decode
let rd = 10;   // a0
let rs1 = 2;   // sp
let funct3 = 2; // LW
let imm = 8;   // offset

// Compute target address
let base: u64 = state.regs[2];     // 0x7fff1000
let addr: u64 = base.wrapping_add(8);  // 0x7fff1008

// Read 4 bytes from memory at addr
let value: u32 = unsafe {
    std::ptr::read_unaligned(addr as *const u32)
};

// Store in destination register (widened to u64)
state.regs[10] = value as u64;

// Advance PC
state.pc = unsafe { state.pc.add(4) };

Problem 3: Store Immediate Reconstruction¶

Given the instruction word 0x00B12423, extract the S-type store offset.

Show Solution

0x00B12423 = 0000 0000 1011 0001 0010 0100 0010 0011

**Extract fields:**

opcode = 0100011 (S-type)
imm[4:0]  = 01000 = 8    (bits [11:7])
funct3    = 010   = SW   (bits [14:12])
rs1       = 00010 = sp   (bits [19:15])
rs2       = 01011 = a1   (bits [24:20])
imm[11:5] = 0000000 = 0  (bits [31:25])

**Reassemble offset:**

let off_4_0 = 0b01000;    // 8
let off_11_5 = 0b0000000; // 0
let offset = (0 << 5) | 8; // = 8

**Answer:** `sw a1, 8(sp)` — stores the word in a1 to address sp + 8.

Problem 4: Branch Condition¶

What Rust expression evaluates the condition for blt a0, a1, target?

Show Solution

`blt` is a **signed** less-than comparison. Since registers are stored as `u64`, we need to reinterpret them as signed:

let v1: u64 = s.regs[10];  // a0
let v2: u64 = s.regs[11];  // a1

let taken: bool = (v1 as i64) < (v2 as i64);

The `as i64` cast reinterprets the unsigned 64-bit value as a signed two's complement value. For example, `0xFFFFFFFFFFFFFFFF` as `u64` is a very large number, but as `i64` it is `-1`.

Problem 5: Why `wrapping_add`?¶

What happens if you write s.regs[rd] = s.regs[rs1] + s.regs[rs2] in Rust when the values overflow?

Show Solution

In **debug mode** (`cargo build`), Rust checks for integer overflow on `u64` addition. If `s.regs[rs1] + s.regs[rs2]` exceeds `u64::MAX`, the program **panics** with an "attempt to add with overflow" error. In **release mode** (`cargo build --release`), the overflow wraps silently (like C). Since RISC-V arithmetic naturally wraps, and we want consistent behavior in both debug and release, we use:

s.regs[rd] = s.regs[rs1].wrapping_add(s.regs[rs2]);

This wraps on overflow in both modes, correctly matching RISC-V behavior.

Summary¶

Emulator architecture: An emulator maintains software state (registers, PC, stack) and simulates CPU operation by fetching, decoding, and executing instructions in a loop.
Rust raw pointers: The PC is a *const u8 raw pointer. Fetching, advancing PC, and all memory operations require unsafe blocks because the compiler cannot verify pointer validity.
Fetch-decode-execute cycle: Read the instruction word via pointer dereference, extract the opcode to determine format, dispatch to a handler that decodes fields, performs the operation, and updates state.
Loads and stores: Use std::ptr::read_unaligned and std::ptr::write_unaligned to safely handle potentially unaligned memory addresses. Register values are cast to pointers (addr as *const T) for memory access.
Wrapping arithmetic: Use wrapping_add, wrapping_sub, wrapping_mul instead of +, -, * to match RISC-V's wrapping overflow behavior and avoid panics in debug mode.
Immediate reconstruction: S-type, B-type, and J-type instructions have split immediates that must be reassembled and sign-extended before use.
Halting: Initialize ra to null. When the emulated function returns (ret = jalr x0, 0(ra)), PC becomes null, ending the emulation loop.

RISC-V Emulation¶

Overview¶

Learning Objectives¶

Prerequisites¶

What is an Emulator?¶

Emulator State in Rust¶

The RvState Struct¶

Why a Raw Pointer for PC?¶

ABI Register Constants¶

Initialization¶

Memory Layout¶

The Fetch-Decode-Execute Cycle¶

The Main Loop¶

Executing One Instruction¶

Opcode Constants¶

Bit Manipulation Utilities¶

Unsafe Rust for Emulation¶

Why Unsafe?¶

Pointer Operations Summary¶

ptr.add() vs ptr.offset()¶

R-Type Instructions¶

Format¶

Implementation¶

Wrapping Arithmetic¶

I-Type Instructions¶

Format¶

Three Variants¶

I-Type Arithmetic¶

I-Type Load¶

I-Type JALR¶

Load Instructions — Working with Pointers¶

Computing the Target Address¶

Reading from Memory¶

Why read_unaligned?¶

The Integer-to-Pointer Cast¶

Load Width¶

S-Type Instructions (Stores)¶

Format¶

Reassembling the Immediate¶

Writing to Memory¶

Store Width¶

Loads vs Stores: Pointer Direction¶

B-Type Instructions (Branches)¶

Format¶

Reconstructing the Branch Offset¶

Implementation¶

Signed vs Unsigned Comparison¶

Branch PC Update¶

J-Type Instructions (JAL)¶

Format¶

Implementation¶

JAL vs J (pseudo-instruction)¶

Stopping the Emulator¶

The Null Sentinel¶

Incremental Development Strategy¶

Key Concepts¶

Practice Problems¶

Problem 1: Trace ADDI Execution¶

Problem 2: Implement a Load Word¶

Problem 3: Store Immediate Reconstruction¶

Problem 4: Branch Condition¶

Problem 5: Why wrapping_add?¶

Further Reading¶

Summary¶

`ptr.add()` vs `ptr.offset()`¶

Why `read_unaligned`?¶

Problem 5: Why `wrapping_add`?¶