RISC-V Assembly Functions¶
Overview¶
This lecture covers how functions work at the assembly level in RISC-V. We explore the call/return mechanism, the calling convention that divides registers into caller-saved and callee-saved groups, and stack frame management. We then apply these concepts to non-leaf functions (functions that call other functions) using both caller-saved and callee-saved approaches, and finish with recursive functions. These concepts are essential for writing correct multi-function assembly programs.
Learning Objectives¶
- Explain how
callandretuse theraregister and program counter - Classify registers as caller-saved or callee-saved and explain the implications
- Write function prologues and epilogues that manage the stack correctly
- Implement non-leaf functions using both caller-saved and callee-saved approaches
- Trace recursive function execution through nested stack frames
Prerequisites¶
- RISC-V Assembly 1 & 2 (Lectures 05–06): registers, instructions, memory operations, control flow, leaf functions
- Program memory layout (stack grows downward, 16-byte alignment)
1. The Call/Return Mechanism¶
How Function Calls Work¶
When a program calls a function, two things must happen:
- Jump to the function's code
- Remember where to return when the function finishes
RISC-V uses the program counter (PC) and the return address register (ra) to accomplish this.
call and ret¶
The call pseudo-instruction saves the return address and jumps to the target:
call target # Pseudo-instruction, equivalent to:
# jal ra, target
# 1. ra = PC + 4 (address of next instruction)
# 2. PC = target (jump to function)
The ret pseudo-instruction jumps back to the saved return address:
How ra Links Caller and Callee¶
Address Instruction
0x1000 call bar # ra = 0x1004, PC = bar
0x1004 addi a0, a0, 1 # execution resumes here after bar returns
...
bar:
0x2000 addi a0, a0, 1 # bar's code
0x2004 ret # PC = ra = 0x1004
The call instruction stores PC + 4 (the address of the next instruction) into ra, then jumps to the target. When the called function executes ret, it sets PC = ra, returning execution to the instruction after the call.
Example: call_s.s / call.rs¶
Rust (src/bin/call.rs):
extern "C" {
fn foo_s(a: i32) -> i32;
}
fn bar(a: i32) -> i32 {
a + 1
}
fn foo(a: i32) -> i32 {
bar(a) + 1
}
foo calls bar, so foo is a non-leaf function. It must save ra before calling bar, because call bar will overwrite ra.
Assembly (asm/call_s.s):
.global foo_s
bar_s:
addi a0, a0, 1
ret
foo_s:
addi sp, sp, -16 # Allocate 16 bytes of stack space
sd ra, (sp) # Preserve ra
call bar_s # ra = PC + 4
addi a0, a0, 1
ld ra, (sp) # Restore ra
addi sp, sp, 16 # Deallocate 16 bytes
ret
Trace for foo_s(3):
| Step | Instruction | a0 |
ra |
Notes |
|---|---|---|---|---|
| 1 | foo_s: addi sp, sp, -16 |
3 | caller's addr | Allocate stack |
| 2 | sd ra, (sp) |
3 | caller's addr | Save ra on stack |
| 3 | call bar_s |
3 | after call | ra overwritten |
| 4 | bar_s: addi a0, a0, 1 |
4 | after call | bar adds 1 |
| 5 | ret (from bar_s) |
4 | after call | Return to foo_s |
| 6 | addi a0, a0, 1 |
5 | after call | foo adds 1 |
| 7 | ld ra, (sp) |
5 | caller's addr | Restore ra |
| 8 | ret (from foo_s) |
5 | caller's addr | Return to caller |
Result: foo_s(3) = 5 (bar adds 1, foo adds 1).
Forgetting to Save ra
If foo_s did not save ra before call bar_s, the ret at the end would jump back to the instruction after call bar_s instead of returning to the original caller — creating an infinite loop.
2. Caller-Saved vs Callee-Saved Registers¶
The Register Convention¶
The RISC-V calling convention divides registers into two groups based on who is responsible for preserving their values across a function call:
| Category | Registers | Preserved Across Calls? | Responsibility |
|---|---|---|---|
| Caller-saved (temporary) | a0–a7, t0–t6, ra |
No | Caller saves before call if needed |
| Callee-saved (saved) | s0–s11, sp |
Yes | Callee saves in prologue, restores in epilogue |
Caller-Saved Registers¶
The called function is free to overwrite these registers. If the caller needs their values after the call, it must save them before the call and restore them after:
a0–a7: argument/return registerst0–t6: temporary registersra: return address (overwritten bycall)
Callee-Saved Registers¶
The called function must preserve these registers. If the callee wants to use them, it must save them at the start and restore them before returning:
s0–s11: saved registerssp: stack pointer
Argument Passing¶
| Argument | Register | Notes |
|---|---|---|
| 1st | a0 |
Also used for return value |
| 2nd | a1 |
|
| 3rd | a2 |
|
| 4th | a3 |
|
| 5th | a4 |
|
| 6th | a5 |
|
| 7th | a6 |
|
| 8th | a7 |
|
| 9th+ | stack | Passed on the stack |
Return values are placed in a0 (and a1 for 128-bit values).
Two Valid Approaches
When you need values to survive across a function call, you have two choices:
- Caller-saved approach: save the values on the stack before each call, restore after
- Callee-saved approach: move values into
sregisters (which survive calls), save/restoresregisters in prologue/epilogue
Both are correct. The caller-saved approach uses more stack operations per call; the callee-saved approach requires prologue/epilogue saves but is simpler around each call site.
3. Stack Frame Management¶
The Prologue/Epilogue Pattern¶
Every non-leaf function follows this structure:
Prologue:
1. Allocate stack space (subtract from sp)
2. Save registers that need preserving (ra, s-regs)
Body:
3. Function logic, including calls to other functions
Epilogue:
4. Restore saved registers
5. Deallocate stack space (add to sp)
6. ret
my_function:
# === Prologue ===
addi sp, sp, -32 # 1. Allocate (multiple of 16)
sd ra, 0(sp) # 2. Save ra
sd s0, 8(sp) # Save s0
sd s1, 16(sp) # Save s1
# === Body ===
# ... function logic ...
call other_function
# ... more logic ...
# === Epilogue ===
ld s1, 16(sp) # 4. Restore s1
ld s0, 8(sp) # Restore s0
ld ra, 0(sp) # Restore ra
addi sp, sp, 32 # 5. Deallocate
ret # 6. Return
Stack Frame Layout¶
Each function call creates a stack frame — a block of memory on the stack for that function's saved registers and local variables:
Higher addresses
+---------------------------+
| Caller's frame |
+---------------------------+ <- sp (before prologue)
| ra | sp + 24
+---------------------------+
| s0 | sp + 16
+---------------------------+
| s1 | sp + 8
+---------------------------+
| local variable | sp + 0
+---------------------------+ <- sp (after prologue)
Lower addresses
Stack Alignment¶
The RISC-V calling convention requires sp to always be 16-byte aligned. Even if you only need 8 bytes of stack space, you must allocate 16.
| Values to Save | Bytes Needed | Allocate |
|---|---|---|
| ra only (8 bytes) | 8 | 16 |
| ra + 1 s-reg (16 bytes) | 16 | 16 |
| ra + 2 s-regs (24 bytes) | 24 | 32 |
| ra + 3 s-regs (32 bytes) | 32 | 32 |
| ra + 3 s-regs + local (36 bytes) | 36 | 48 |
Calculating Frame Size
Count the bytes you need, then round up to the next multiple of 16. Formula: frame_size = ceil(bytes_needed / 16) * 16.
4. Non-Leaf Functions: Caller-Saved Approach¶
Strategy¶
In the caller-saved approach, we keep values in a registers and save/restore them on the stack around each function call. This is straightforward but requires more stack operations.
Example: add4f_s.s / add4f.rs¶
This function adds four numbers using three calls to add2_s. We save caller-saved registers (a regs) on the stack before each call.
Rust (src/bin/add4f.rs):
extern "C" {
fn add4f_s(a: i32, b: i32, c: i32, d: i32) -> i32;
fn add4f_callee_s(a: i32, b: i32, c: i32, d: i32) -> i32;
}
fn add4f(a: i32, b: i32, c: i32, d: i32) -> i32 {
a + b + c + d
}
The approach: add4f(a, b, c, d) = add2(add2(a, b), add2(c, d))
Assembly (asm/add4f_s.s):
.global add4f_s
.global add2_s
# Add two args
add2_s:
add a0, a0, a1
ret
# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Caller-saved approach
add4f_s:
addi sp, sp, -32 # Allocate stack
sd ra, (sp) # Preserve ra
# a0 and a1 already set for add2_s(a, b)
sd a2, 8(sp) # Save c on stack
sd a3, 16(sp) # Save d on stack
call add2_s # a0 = a + b
sd a0, 24(sp) # Save result (tmp0) on stack
ld a0, 8(sp) # a0 = c (from stack)
ld a1, 16(sp) # a1 = d (from stack)
call add2_s # a0 = c + d
mv a1, a0 # a1 = c + d
ld a0, 24(sp) # a0 = a + b (from stack)
call add2_s # a0 = (a+b) + (c+d)
ld ra, (sp) # Restore ra
addi sp, sp, 32 # Deallocate stack
ret
Stack layout for add4f_s:
+---------------------------+
| ra | sp + 0
+---------------------------+
| c (a2) | sp + 8
+---------------------------+
| d (a3) | sp + 16
+---------------------------+
| tmp0 (a+b) | sp + 24
+---------------------------+
Notice how we save values on the stack before each call (because a regs may be overwritten) and load them back after.
5. Non-Leaf Functions: Callee-Saved Approach¶
Strategy¶
In the callee-saved approach, we move values into s registers, which are preserved across function calls. We save the s registers once in the prologue and restore them once in the epilogue. This avoids repeated save/restore around each call.
Example: add4f_callee_s.s¶
Assembly (asm/add4f_callee_s.s):
.global add4f_callee_s
# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Callee-saved approach: use s0, s1, s2
add4f_callee_s:
addi sp, sp, -32 # Allocate stack
sd ra, (sp) # Save ra
sd s0, 8(sp) # Save s0 (we'll use it)
sd s1, 16(sp) # Save s1
sd s2, 24(sp) # Save s2
mv s0, a2 # s0 = c (survives calls)
mv s1, a3 # s1 = d (survives calls)
# a0=a, a1=b already set
call add2_s # a0 = a + b
mv s2, a0 # s2 = a + b (survives calls)
mv a0, s0 # a0 = c
mv a1, s1 # a1 = d
call add2_s # a0 = c + d
mv a1, a0 # a1 = c + d
mv a0, s2 # a0 = a + b
call add2_s # a0 = (a+b) + (c+d)
ld ra, (sp) # Restore ra
ld s0, 8(sp) # Restore s0
ld s1, 16(sp) # Restore s1
ld s2, 24(sp) # Restore s2
addi sp, sp, 32 # Deallocate stack
ret
Comparing the Two Approaches¶
| Aspect | Caller-Saved | Callee-Saved |
|---|---|---|
| Where values live | On the stack | In s registers |
| Save/restore timing | Before/after each call | Once in prologue/epilogue |
| Stack accesses | More (per call) | Fewer (just prologue/epilogue) |
| Register usage | a and t regs only |
Uses s regs for persistence |
| Between calls | ld/sd to stack |
mv to/from s regs |
Which to Use?
The callee-saved approach is generally preferred when values must survive multiple calls — you save/restore s registers once instead of repeatedly saving/restoring values around each call. The caller-saved approach is simpler when you only have one or two calls.
6. Recursive Functions¶
Recursion in Assembly¶
A recursive function calls itself. Each call creates a new stack frame, so each invocation has its own saved registers and local variables. The stack grows with each recursive call and shrinks as calls return.
Example: factrec_s.s / factrec.rs¶
Rust (src/bin/factrec.rs):
extern "C" {
fn factrec_s(n: i32) -> i32;
}
fn factrec(n: i32) -> i32 {
if n <= 0 {
1
} else {
n * factrec(n - 1)
}
}
Assembly (asm/factrec_s.s):
.global factrec_s
# Compute n! using recursion
# a0 = n
factrec_s:
addi sp, sp, -16
sd ra, (sp)
# Base case: if n <= 0, return 1
bgt a0, zero, factrec_recstep
li a0, 1
j factrec_done
# Recursive step
factrec_recstep:
sd a0, 8(sp) # Save n on stack
addi a0, a0, -1 # a0 = n - 1
call factrec_s # a0 = factorial(n - 1)
ld t0, 8(sp) # Restore n from stack
mul a0, a0, t0 # a0 = factorial(n-1) * n
factrec_done:
ld ra, (sp)
addi sp, sp, 16
ret
Stack Diagram: factrec_s(3)¶
Each recursive call creates a new 16-byte stack frame:
Higher addresses
+---------------------------+
| Frame for factrec_s(3) |
| ra (caller's return) | sp₃ + 0
| n = 3 | sp₃ + 8
+---------------------------+ <- sp₃
| Frame for factrec_s(2) |
| ra (return to call 3) | sp₂ + 0
| n = 2 | sp₂ + 8
+---------------------------+ <- sp₂
| Frame for factrec_s(1) |
| ra (return to call 2) | sp₁ + 0
| n = 1 | sp₁ + 8
+---------------------------+ <- sp₁
| Frame for factrec_s(0) |
| ra (return to call 1) | sp₀ + 0
| (base case, no n save) |
+---------------------------+ <- sp₀ (deepest point)
Lower addresses
Execution trace:
| Call | n |
Action | Return Value |
|---|---|---|---|
factrec_s(3) |
3 | Save n=3, call factrec_s(2) | 2 * 3 = 6 |
factrec_s(2) |
2 | Save n=2, call factrec_s(1) | 1 * 2 = 2 |
factrec_s(1) |
1 | Save n=1, call factrec_s(0) | 1 * 1 = 1 |
factrec_s(0) |
0 | Base case: return 1 | 1 |
The stack unwinds as each call returns, multiplying:
1 * 1 = 1 -> 1 * 2 = 2 -> 2 * 3 = 6
Why Each Frame Needs Its Own n
Each recursive call overwrites a0 with n - 1 before the call. Without saving n on the stack, the original value of n would be lost. Each stack frame stores its own copy of n, so when the recursive call returns, we can load it back and multiply.
Key Concepts¶
| Concept | Description |
|---|---|
| Program Counter (PC) | Address of the current instruction being executed |
ra (return address) |
Register that holds the address to return to after a function call |
call |
Pseudo-instruction: saves PC+4 in ra, jumps to target (jal ra, target) |
ret |
Pseudo-instruction: jumps to address in ra (jalr zero, ra, 0) |
| Caller-saved registers | a0–a7, t0–t6, ra — may be overwritten by called functions |
| Callee-saved registers | s0–s11, sp — must be preserved by called functions |
| Stack frame | Block of stack memory allocated by a function for saved registers and locals |
| Prologue | Code at the start of a function: allocate stack, save registers |
| Epilogue | Code at the end of a function: restore registers, deallocate stack, ret |
| Leaf function | Does not call other functions — no stack frame needed |
| Non-leaf function | Calls other functions — must save ra and manage a stack frame |
Practice Problems¶
Problem 1: Call/Return Mechanics¶
Consider this code at address 0x1000:
After call bar executes, what are the values of ra and PC?
Click to reveal solution
- `ra = 0x1004` (address of the instruction after `call`) - `PC = 0x2000` (address of `bar`) The `call` instruction stores the return address (PC + 4 = 0x1004) into `ra`, then sets PC to the target address (0x2000).Problem 2: Register Preservation¶
A function uses registers t0, s0, and calls another function. Which registers must be saved, and by whom?
Click to reveal solution
- **`t0`**: caller-saved. If our function needs `t0`'s value after the call, **we** must save it on the stack before the call and restore it after. The called function is free to overwrite `t0`. - **`s0`**: callee-saved. If our function uses `s0`, **we** must save it in our prologue and restore it in our epilogue (because we are the callee who modifies it). But `s0` will survive the call to the other function automatically — the other function is also required to preserve it. - **`ra`**: caller-saved. Since we call another function, `call` will overwrite `ra`. We must save `ra` before the call and restore it before our `ret`. Summary: save `ra` and `s0` in our prologue (we modify both); save `t0` on the stack before the call if we need it after.Problem 3: Stack Frame Size¶
A function needs to save ra, s0, s1, and a 4-byte local variable on the stack. What should the stack frame size be?
Click to reveal solution
Calculate the bytes needed: - `ra`: 8 bytes - `s0`: 8 bytes - `s1`: 8 bytes - local variable: 4 bytes - **Total**: 28 bytes Round up to the next multiple of 16: **32 bytes**. The remaining 4 bytes (sp+28 to sp+31) are padding for 16-byte alignment.Problem 4: Fix the Bug¶
This non-leaf function has a bug. Find and fix it:
.global broken_func
broken_func:
addi sp, sp, -16
sd s0, 0(sp)
mv s0, a0
call helper
add a0, a0, s0
ld s0, 0(sp)
addi sp, sp, 16
ret
Click to reveal solution
The bug: **`ra` is not saved**. Since `broken_func` calls `helper`, the `call` instruction overwrites `ra`. When `broken_func` executes `ret`, it jumps to the wrong address (the instruction after `call helper` instead of returning to the original caller), creating an infinite loop. **Fix**: save and restore `ra`:Problem 5: Recursive Sum¶
Write a recursive function sum_s(n: i32) -> i32 that computes 1 + 2 + ... + n. Base case: sum(0) = 0. Recursive case: sum(n) = n + sum(n - 1).
Click to reveal solution
.global sum_s
# int sum_s(int n)
# a0 = n
# Returns: 1 + 2 + ... + n
sum_s:
addi sp, sp, -16
sd ra, (sp)
# Base case: if n <= 0, return 0
bgt a0, zero, sum_recstep
li a0, 0
j sum_done
sum_recstep:
sd a0, 8(sp) # Save n on stack
addi a0, a0, -1 # a0 = n - 1
call sum_s # a0 = sum(n - 1)
ld t0, 8(sp) # Restore n
add a0, a0, t0 # a0 = sum(n-1) + n
sum_done:
ld ra, (sp)
addi sp, sp, 16
ret
Further Reading¶
- RISC-V ISA Specification — official standard including calling convention
- RISC-V Assembly Programmer's Manual — practical reference
- The RISC-V Reader — Patterson & Waterman textbook
- RISC-V Calling Convention — register usage and stack frame specification
Summary¶
-
Call/return uses
call(saves PC+4 tora, jumps to target) andret(jumps to address inra). Any function that calls another function must saverabefore the call, or it will be overwritten. -
Register convention divides registers into caller-saved (
a0–a7,t0–t6,ra) and callee-saved (s0–s11,sp). Caller-saved registers may be freely overwritten by called functions; callee-saved registers must be preserved. -
Stack frames follow the prologue/epilogue pattern: allocate space, save registers, execute the body, restore registers, deallocate, return. Stack space must be allocated in multiples of 16 bytes.
-
Non-leaf functions can use either a caller-saved approach (saving
a/tregisters around each call) or a callee-saved approach (usingsregisters that survive calls). The callee-saved approach is generally cleaner when values must survive multiple calls. -
Recursive functions work because each call creates its own stack frame with its own saved
raand local values. The stack grows with each recursive call and shrinks as calls return, allowing each invocation to maintain independent state.