← Back to Course
# RISC-V Assembly 2 ## CS 631 Systems Foundations — Mar 3, 2026 --- ## Today's Agenda 1. Memory operations: data sizes, load/store 2. Arrays and indexing 3. Memory layout (stack, heap, data, code) 4. Control flow: if/else (deeper) 5. Control flow: loops (for, while, do-while) 6. Leaf functions vs non-leaf functions --- ## Memory Operations: Data Sizes Memory is **byte-addressable** — each byte has a unique address. | Data Size | Bytes | Load / Store | Rust Type | |-----------|-------|--------------|--------------| | Byte | 1 | `lb` / `sb` | `i8`, `u8` | | Halfword | 2 | `lh` / `sh` | `i16`, `u16` | | Word | 4 | `lw` / `sw` | `i32`, `u32` | | Doubleword| 8 | `ld` / `sd` | `i64`, `u64`, pointers |
Lecture 05 used
lw
/
sw
for
i32
. Today we see the full set.
--- ## Load Instructions Smaller values must be **extended** to fill a 64-bit register: | Instruction | Bytes | Extension | Use For | |-------------|-------|-----------|---------| | `lb` / `lbu`| 1 | Sign / Zero | `i8` / `u8` | | `lh` / `lhu`| 2 | Sign / Zero | `i16` / `u16` | | `lw` / `lwu`| 4 | Sign / Zero | `i32` / `u32` | | `ld` | 8 | —| `i64` / pointers |
Loading
0xFF
as a byte:
lb
→
0xFFFFFFFFFFFFFFFF
(−1)
lbu
→
0x00000000000000FF
(255)
--- ## Store Instructions | Instruction | Bytes | Use For | |---|---|---| | `sb rs2, offset(rs1)` | 1 | `i8` / `u8` | | `sh rs2, offset(rs1)` | 2 | `i16` / `u16` | | `sw rs2, offset(rs1)` | 4 | `i32` / `u32` | | `sd rs2, offset(rs1)` | 8 | `i64` / pointers | **Load vs Store — operand order differs:** ```asm lw t0, (a0) # Load: DESTINATION first — t0 = memory[a0] sw t0, (a0) # Store: SOURCE first — memory[a0] = t0 ``` --- ## Load with Offset Syntax: `offset(base)` computes address = base + offset ```asm lw t0, 0(a0) # load from a0 + 0 lw t1, 4(a0) # load from a0 + 4 lw t2, 8(a0) # load from a0 + 8 ``` ```text Address Content +----------------+ a0 + 8 | value3 | ← lw t2, 8(a0) +----------------+ a0 + 4 | value2 | ← lw t1, 4(a0) +----------------+ a0 | value1 | ← lw t0, (a0) +----------------+ ``` When offset is 0: `(a0)` is shorthand for `0(a0)` --- ## Array Layout in Memory Arrays are stored contiguously. For `arr: [i32; 4] = [3, 4, 5, 6]`: ```text Address Value +----------------+ arr + 12 | 6 | ← arr[3] +----------------+ arr + 8 | 5 | ← arr[2] +----------------+ arr + 4 | 4 | ← arr[1] +----------------+ arr | 3 | ← arr[0] +----------------+ ``` Element address = **base + (index × sizeof(element))** For `i32` (4 bytes): address = `arr + (i × 4)` --- ## Array Indexing: Shift vs Multiply Two ways to compute the byte offset for element `i`: ```asm # Method 1: Multiplication li t0, 4 # sizeof(i32) mul t0, a1, t0 # offset = i * 4 add t0, a0, t0 # address = arr + offset lw t1, (t0) # value = arr[i] # Method 2: Shift (faster) slli t0, a1, 2 # offset = i << 2 = i * 4 add t0, a0, t0 # address = arr + offset lw t1, (t0) # value = arr[i] ``` | Shift | Multiplier | Array Type | |---|---|---| | `slli x, y, 1` | × 2 | `i16` | | `slli x, y, 2` | × 4 | `i32` | | `slli x, y, 3` | × 8 | `i64` / pointers | --- ## Example: `sum_array`
**Rust** ```rust fn sum_array_rust(arr: &[i32]) -> i32 { let mut sum = 0; for i in 0..arr.len() { sum += arr[i]; } sum } ```
**Assembly** ```asm # a0=arr, a1=len, t0=sum, t1=i sum_array_s: li t0, 0 # sum = 0 li t1, 0 # i = 0 sum_loop: bge t1, a1, sum_done slli t2, t1, 2 # i * 4 add t2, a0, t2 # &arr[i] lw t3, (t2) # arr[i] add t0, t0, t3 # sum += arr[i] addi t1, t1, 1 # i++ j sum_loop sum_done: mv a0, t0 # return sum ret ```
--- ## Memory Layout Program memory is organized into distinct regions (high → low addresses): ```text +---------------------------+ High addresses | Stack | ← sp (grows DOWN) | (local vars, saved ra) | +---------------------------+ | ... | +---------------------------+ | Heap | (grows UP) | (Box, Vec, dynamic) | +---------------------------+ | Data | | (globals, string lits) | +---------------------------+ | Code (.text) | | (instructions) | +---------------------------+ Low addresses ``` --- ## The Stack `sp` (stack pointer) tracks the top of the stack. Stack grows **downward**. ```asm addi sp, sp, -16 # allocate 16 bytes (stack grows DOWN) sw t0, 0(sp) # store at sp + 0 sw t1, 4(sp) # store at sp + 4 sd ra, 8(sp) # store ra at sp + 8 (8 bytes) # ... do work ... ld ra, 8(sp) # restore (reverse order) lw t1, 4(sp) lw t0, 0(sp) addi sp, sp, 16 # deallocate ```
Stack must stay
16-byte aligned
. Always allocate in multiples of 16.
--- ## Branch Instructions (Complete) Lecture 05 covered signed branches. Here's the full set with **unsigned** variants: | Instruction | Condition | Type | |---|---|---| | `beq rs1, rs2, label` | rs1 == rs2 | — | | `bne rs1, rs2, label` | rs1 != rs2 | — | | `blt rs1, rs2, label` | rs1 < rs2 | Signed | | `bge rs1, rs2, label` | rs1 >= rs2 | Signed | | `bltu rs1, rs2, label` | rs1 < rs2 | Unsigned | | `bgeu rs1, rs2, label` | rs1 >= rs2 | Unsigned | | `ble rs1, rs2, label` | rs1 <= rs2 | Pseudo | | `bgt rs1, rs2, label` | rs1 > rs2 | Pseudo | `bltu`/`bgeu` matter when bit patterns differ: `0xFFFFFFFF` = −1 signed, 4 billion unsigned --- ## Condition Inversion Table To implement `if (condition)`, branch on the **opposite** to skip the then-block: | Rust Condition | Branch to Skip | |---|---| | `a == b` | `bne a, b, else` | | `a != b` | `beq a, b, else` | | `a < b` | `bge a, b, else` | | `a >= b` | `blt a, b, else` | | `a > b` | `ble a, b, else` | | `a <= b` | `bgt a, b, else` |
Same pattern as Lecture 05: branch on the
inverse
to skip the then-block.
--- ## Example: `max` Function
**Rust** ```rust fn max_rust(a: i32, b: i32) -> i32 { if a > b { a } else { b } } ```
**Assembly** ```asm .global max_s # a0 = a, a1 = b max_s: ble a0, a1, return_b # a > b: a0 has a ret return_b: mv a0, a1 # a0 = b ret ```
When `a > b`, `a0` already has the answer. Only need `mv` when `a <= b`. --- ## For Loop Pattern ```text Rust: Assembly: for i in 0..n { li t0, 0 # i = 0 // body loop: } bge t0, a0, done # i >= n? exit # body addi t0, t0, 1 # i++ j loop done: ``` **Structure**: initialize → check condition → body → update → jump back
Same pattern as
loop_s
from Lecture 05. The exit branch uses the
inverse
condition.
--- ## While Loop + Do-While Patterns **While loop**: condition at **top** ```asm # while x > 0 { sum += x; x -= 1; } while_loop: ble a0, zero, while_end # x <= 0? exit add t0, t0, a0 # sum += x addi a0, a0, -1 # x-- j while_loop while_end: ``` **Do-while**: condition at **bottom** (one fewer jump per iteration) ```asm # loop { sum += x; x -= 1; if x <= 0 { break } } do_loop: add t0, t0, a0 # sum += x addi a0, a0, -1 # x-- bgt a0, zero, do_loop # x > 0? repeat ``` --- ## Leaf Functions A **leaf function** does **not** call other functions. All Lab03 functions are leaf functions. **Rules:** 1. Arguments arrive in `a0`–`a7` 2. Return value goes in `a0` 3. Use `t0`–`t6` for temporaries 4. Don't touch `s` registers (must be preserved) 5. Return with `ret`
Lab03
: all six functions are leaf functions — no stack management needed. Just use
a
and
t
registers.
--- ## Non-Leaf Functions A **non-leaf function** calls other functions. Problem: `jal` overwrites `ra`. **Solution**: save `ra` on the stack before calling, restore after. ```asm my_function: addi sp, sp, -16 # allocate stack sd ra, 0(sp) # save ra jal helper # call (overwrites ra) ld ra, 0(sp) # restore ra addi sp, sp, 16 # deallocate ret # returns to OUR caller ```
Lab03 uses only leaf functions. Non-leaf functions appear in later assignments.
--- ## Practice: Array Access Write assembly to set `arr[5] = 100` (`arr` is `*mut i32` in `a0`): ```asm # Method 1: computed offset li t0, 100 li t1, 5 slli t1, t1, 2 # 5 * 4 = 20 add t1, a0, t1 # &arr[5] sw t0, (t1) # arr[5] = 100 # Method 2: constant offset li t0, 100 sw t0, 20(a0) # 5 * 4 = 20 ``` When the index is a constant, compute the byte offset yourself! --- ## Practice: Translate If Translate to assembly (`x` in `a0`): ```rust if x < 0 { x = -x; // absolute value } ``` ```asm bge a0, zero, skip # x >= 0? skip neg a0, a0 # x = -x skip: ``` **Pattern**: invert the condition (`< 0` → `>= 0`), branch to skip the body. `neg` is a pseudo-instruction: `sub a0, zero, a0` --- ## Key Takeaways - **Load/store**: `lb`/`lh`/`lw`/`ld` — signed vs unsigned - **Arrays**: base + index × size; `slli` to multiply - **Memory**: stack grows down, heap up - **Branches**: `bltu`/`bgeu` for unsigned; invert to skip - **Loops**: init → check → body → update → repeat - **Leaf** functions: simple. **Non-leaf**: save `ra` --- ## Further Reading - [RISC-V ISA Specification](https://riscv.org/technical/specifications/) - [RISC-V Assembly Programmer's Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/main/riscv-asm.md) - [The RISC-V Reader](http://www.riscvbook.com/) — Patterson & Waterman - [Lab03: RISC-V Assembly Programming](/assignments/lab03)