RISC-V Assembly Functions¶

Overview¶

This lecture covers how functions work at the assembly level in RISC-V. We explore the call/return mechanism, the calling convention that divides registers into caller-saved and callee-saved groups, and stack frame management. We then apply these concepts to non-leaf functions (functions that call other functions) using both caller-saved and callee-saved approaches, and finish with recursive functions. These concepts are essential for writing correct multi-function assembly programs.

Learning Objectives¶

Explain how call and ret use the ra register and program counter
Classify registers as caller-saved or callee-saved and explain the implications
Write function prologues and epilogues that manage the stack correctly
Implement non-leaf functions using both caller-saved and callee-saved approaches
Trace recursive function execution through nested stack frames

Prerequisites¶

RISC-V Assembly 1 & 2 (Lectures 05–06): registers, instructions, memory operations, control flow, leaf functions
Program memory layout (stack grows downward, 16-byte alignment)

1. The Call/Return Mechanism¶

How Function Calls Work¶

When a program calls a function, two things must happen:

Jump to the function's code
Remember where to return when the function finishes

RISC-V uses the program counter (PC) and the return address register (ra) to accomplish this.

`call` and `ret`¶

The call pseudo-instruction saves the return address and jumps to the target:

call target    # Pseudo-instruction, equivalent to:
               # jal ra, target
               # 1. ra = PC + 4 (address of next instruction)
               # 2. PC = target (jump to function)

The ret pseudo-instruction jumps back to the saved return address:

ret            # Pseudo-instruction, equivalent to:
               # jalr zero, ra, 0
               # 1. PC = ra (jump back to caller)

How `ra` Links Caller and Callee¶

Address    Instruction
0x1000     call bar        # ra = 0x1004, PC = bar
0x1004     addi a0, a0, 1  # execution resumes here after bar returns
...
bar:
0x2000     addi a0, a0, 1  # bar's code
0x2004     ret              # PC = ra = 0x1004

The call instruction stores PC + 4 (the address of the next instruction) into ra, then jumps to the target. When the called function executes ret, it sets PC = ra, returning execution to the instruction after the call.

Example: `call_s.s` / `call.rs`¶

Rust (src/bin/call.rs):

extern "C" {
    fn foo_s(a: i32) -> i32;
}

fn bar(a: i32) -> i32 {
    a + 1
}

fn foo(a: i32) -> i32 {
    bar(a) + 1
}

foo calls bar, so foo is a non-leaf function. It must save ra before calling bar, because call bar will overwrite ra.

Assembly (asm/call_s.s):

.global foo_s

bar_s:
    addi a0, a0, 1
    ret

foo_s:
    addi sp, sp, -16    # Allocate 16 bytes of stack space
    sd ra, (sp)         # Preserve ra

    call bar_s          # ra = PC + 4

    addi a0, a0, 1

    ld ra, (sp)         # Restore ra
    addi sp, sp, 16     # Deallocate 16 bytes
    ret

Trace for foo_s(3):

Step	Instruction	`a0`	`ra`	Notes
1	`foo_s: addi sp, sp, -16`	3	caller's addr	Allocate stack
2	`sd ra, (sp)`	3	caller's addr	Save ra on stack
3	`call bar_s`	3	after call	ra overwritten
4	`bar_s: addi a0, a0, 1`	4	after call	bar adds 1
5	`ret` (from bar_s)	4	after call	Return to foo_s
6	`addi a0, a0, 1`	5	after call	foo adds 1
7	`ld ra, (sp)`	5	caller's addr	Restore ra
8	`ret` (from foo_s)	5	caller's addr	Return to caller

Result: foo_s(3) = 5 (bar adds 1, foo adds 1).

Forgetting to Save ra

If foo_s did not save ra before call bar_s, the ret at the end would jump back to the instruction after call bar_s instead of returning to the original caller — creating an infinite loop.

2. Caller-Saved vs Callee-Saved Registers¶

The Register Convention¶

The RISC-V calling convention divides registers into two groups based on who is responsible for preserving their values across a function call:

Category	Registers	Preserved Across Calls?	Responsibility
Caller-saved (temporary)	`a0`–`a7`, `t0`–`t6`, `ra`	No	Caller saves before call if needed
Callee-saved (saved)	`s0`–`s11`, `sp`	Yes	Callee saves in prologue, restores in epilogue

Caller-Saved Registers¶

The called function is free to overwrite these registers. If the caller needs their values after the call, it must save them before the call and restore them after:

a0–a7: argument/return registers
t0–t6: temporary registers
ra: return address (overwritten by call)

Callee-Saved Registers¶

The called function must preserve these registers. If the callee wants to use them, it must save them at the start and restore them before returning:

s0–s11: saved registers
sp: stack pointer

Argument Passing¶

Argument	Register	Notes
1st	`a0`	Also used for return value
2nd	`a1`
3rd	`a2`
4th	`a3`
5th	`a4`
6th	`a5`
7th	`a6`
8th	`a7`
9th+	stack	Passed on the stack

Return values are placed in a0 (and a1 for 128-bit values).

Two Valid Approaches

When you need values to survive across a function call, you have two choices:

Caller-saved approach: save the values on the stack before each call, restore after
Callee-saved approach: move values into s registers (which survive calls), save/restore s registers in prologue/epilogue

Both are correct. The caller-saved approach uses more stack operations per call; the callee-saved approach requires prologue/epilogue saves but is simpler around each call site.

3. Stack Frame Management¶

The Prologue/Epilogue Pattern¶

Every non-leaf function follows this structure:

Prologue:
    1. Allocate stack space (subtract from sp)
    2. Save registers that need preserving (ra, s-regs)

Body:
    3. Function logic, including calls to other functions

Epilogue:
    4. Restore saved registers
    5. Deallocate stack space (add to sp)
    6. ret

my_function:
    # === Prologue ===
    addi sp, sp, -32        # 1. Allocate (multiple of 16)
    sd   ra, 0(sp)          # 2. Save ra
    sd   s0, 8(sp)          #    Save s0
    sd   s1, 16(sp)         #    Save s1

    # === Body ===
    # ... function logic ...
    call other_function
    # ... more logic ...

    # === Epilogue ===
    ld   s1, 16(sp)         # 4. Restore s1
    ld   s0, 8(sp)          #    Restore s0
    ld   ra, 0(sp)          #    Restore ra
    addi sp, sp, 32         # 5. Deallocate
    ret                     # 6. Return

Stack Frame Layout¶

Each function call creates a stack frame — a block of memory on the stack for that function's saved registers and local variables:

Higher addresses
+---------------------------+
|    Caller's frame         |
+---------------------------+  <- sp (before prologue)
|    ra                     |  sp + 24
+---------------------------+
|    s0                     |  sp + 16
+---------------------------+
|    s1                     |  sp + 8
+---------------------------+
|    local variable         |  sp + 0
+---------------------------+  <- sp (after prologue)
Lower addresses

Stack Alignment¶

The RISC-V calling convention requires sp to always be 16-byte aligned. Even if you only need 8 bytes of stack space, you must allocate 16.

Values to Save	Bytes Needed	Allocate
ra only (8 bytes)	8	16
ra + 1 s-reg (16 bytes)	16	16
ra + 2 s-regs (24 bytes)	24	32
ra + 3 s-regs (32 bytes)	32	32
ra + 3 s-regs + local (36 bytes)	36	48

Calculating Frame Size

Count the bytes you need, then round up to the next multiple of 16. Formula: frame_size = ceil(bytes_needed / 16) * 16.

4. Non-Leaf Functions: Caller-Saved Approach¶

Strategy¶

In the caller-saved approach, we keep values in a registers and save/restore them on the stack around each function call. This is straightforward but requires more stack operations.

Example: `add4f_s.s` / `add4f.rs`¶

This function adds four numbers using three calls to add2_s. We save caller-saved registers (a regs) on the stack before each call.

Rust (src/bin/add4f.rs):

extern "C" {
    fn add4f_s(a: i32, b: i32, c: i32, d: i32) -> i32;
    fn add4f_callee_s(a: i32, b: i32, c: i32, d: i32) -> i32;
}

fn add4f(a: i32, b: i32, c: i32, d: i32) -> i32 {
    a + b + c + d
}

The approach: add4f(a, b, c, d) = add2(add2(a, b), add2(c, d))

Assembly (asm/add4f_s.s):

.global add4f_s
.global add2_s

# Add two args
add2_s:
    add a0, a0, a1
    ret

# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Caller-saved approach

add4f_s:
    addi sp, sp, -32     # Allocate stack
    sd ra, (sp)          # Preserve ra

    # a0 and a1 already set for add2_s(a, b)
    sd a2, 8(sp)         # Save c on stack
    sd a3, 16(sp)        # Save d on stack

    call add2_s          # a0 = a + b

    sd a0, 24(sp)        # Save result (tmp0) on stack

    ld a0, 8(sp)         # a0 = c (from stack)
    ld a1, 16(sp)        # a1 = d (from stack)

    call add2_s          # a0 = c + d

    mv a1, a0            # a1 = c + d
    ld a0, 24(sp)        # a0 = a + b (from stack)

    call add2_s          # a0 = (a+b) + (c+d)

    ld ra, (sp)          # Restore ra
    addi sp, sp, 32      # Deallocate stack
    ret

Stack layout for add4f_s:

+---------------------------+
|    ra                     |  sp + 0
+---------------------------+
|    c (a2)                 |  sp + 8
+---------------------------+
|    d (a3)                 |  sp + 16
+---------------------------+
|    tmp0 (a+b)             |  sp + 24
+---------------------------+

Notice how we save values on the stack before each call (because a regs may be overwritten) and load them back after.

5. Non-Leaf Functions: Callee-Saved Approach¶

Strategy¶

In the callee-saved approach, we move values into s registers, which are preserved across function calls. We save the s registers once in the prologue and restore them once in the epilogue. This avoids repeated save/restore around each call.

Example: `add4f_callee_s.s`¶

Assembly (asm/add4f_callee_s.s):

.global add4f_callee_s

# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Callee-saved approach: use s0, s1, s2

add4f_callee_s:
    addi sp, sp, -32     # Allocate stack
    sd ra, (sp)          # Save ra
    sd s0, 8(sp)         # Save s0 (we'll use it)
    sd s1, 16(sp)        # Save s1
    sd s2, 24(sp)        # Save s2

    mv s0, a2            # s0 = c (survives calls)
    mv s1, a3            # s1 = d (survives calls)

    # a0=a, a1=b already set
    call add2_s          # a0 = a + b

    mv s2, a0            # s2 = a + b (survives calls)

    mv a0, s0            # a0 = c
    mv a1, s1            # a1 = d

    call add2_s          # a0 = c + d

    mv a1, a0            # a1 = c + d
    mv a0, s2            # a0 = a + b

    call add2_s          # a0 = (a+b) + (c+d)

    ld ra, (sp)          # Restore ra
    ld s0, 8(sp)         # Restore s0
    ld s1, 16(sp)        # Restore s1
    ld s2, 24(sp)        # Restore s2
    addi sp, sp, 32      # Deallocate stack
    ret

Comparing the Two Approaches¶

Aspect	Caller-Saved	Callee-Saved
Where values live	On the stack	In `s` registers
Save/restore timing	Before/after each call	Once in prologue/epilogue
Stack accesses	More (per call)	Fewer (just prologue/epilogue)
Register usage	`a` and `t` regs only	Uses `s` regs for persistence
Between calls	`ld`/`sd` to stack	`mv` to/from `s` regs

Which to Use?

The callee-saved approach is generally preferred when values must survive multiple calls — you save/restore s registers once instead of repeatedly saving/restoring values around each call. The caller-saved approach is simpler when you only have one or two calls.

6. Recursive Functions¶

Recursion in Assembly¶

A recursive function calls itself. Each call creates a new stack frame, so each invocation has its own saved registers and local variables. The stack grows with each recursive call and shrinks as calls return.

Example: `factrec_s.s` / `factrec.rs`¶

Rust (src/bin/factrec.rs):

extern "C" {
    fn factrec_s(n: i32) -> i32;
}

fn factrec(n: i32) -> i32 {
    if n <= 0 {
        1
    } else {
        n * factrec(n - 1)
    }
}

Assembly (asm/factrec_s.s):

.global factrec_s

# Compute n! using recursion
# a0 = n

factrec_s:
    addi sp, sp, -16
    sd ra, (sp)

    # Base case: if n <= 0, return 1
    bgt a0, zero, factrec_recstep
    li a0, 1
    j factrec_done

    # Recursive step
factrec_recstep:
    sd a0, 8(sp)        # Save n on stack
    addi a0, a0, -1     # a0 = n - 1

    call factrec_s      # a0 = factorial(n - 1)

    ld t0, 8(sp)        # Restore n from stack
    mul a0, a0, t0      # a0 = factorial(n-1) * n

factrec_done:
    ld ra, (sp)
    addi sp, sp, 16
    ret

Stack Diagram: `factrec_s(3)`¶

Each recursive call creates a new 16-byte stack frame:

Higher addresses
+---------------------------+
|  Frame for factrec_s(3)   |
|    ra (caller's return)   |  sp₃ + 0
|    n = 3                  |  sp₃ + 8
+---------------------------+  <- sp₃
|  Frame for factrec_s(2)   |
|    ra (return to call 3)  |  sp₂ + 0
|    n = 2                  |  sp₂ + 8
+---------------------------+  <- sp₂
|  Frame for factrec_s(1)   |
|    ra (return to call 2)  |  sp₁ + 0
|    n = 1                  |  sp₁ + 8
+---------------------------+  <- sp₁
|  Frame for factrec_s(0)   |
|    ra (return to call 1)  |  sp₀ + 0
|    (base case, no n save) |
+---------------------------+  <- sp₀ (deepest point)
Lower addresses

Execution trace:

Call	`n`	Action	Return Value
`factrec_s(3)`	3	Save n=3, call factrec_s(2)	2 * 3 = 6
`factrec_s(2)`	2	Save n=2, call factrec_s(1)	1 * 2 = 2
`factrec_s(1)`	1	Save n=1, call factrec_s(0)	1 * 1 = 1
`factrec_s(0)`	0	Base case: return 1	1

The stack unwinds as each call returns, multiplying: 1 * 1 = 1 -> 1 * 2 = 2 -> 2 * 3 = 6

Why Each Frame Needs Its Own n

Each recursive call overwrites a0 with n - 1 before the call. Without saving n on the stack, the original value of n would be lost. Each stack frame stores its own copy of n, so when the recursive call returns, we can load it back and multiply.

Key Concepts¶

Concept	Description
Program Counter (PC)	Address of the current instruction being executed
`ra` (return address)	Register that holds the address to return to after a function call
`call`	Pseudo-instruction: saves PC+4 in `ra`, jumps to target (`jal ra, target`)
`ret`	Pseudo-instruction: jumps to address in `ra` (`jalr zero, ra, 0`)
Caller-saved registers	`a0`–`a7`, `t0`–`t6`, `ra` — may be overwritten by called functions
Callee-saved registers	`s0`–`s11`, `sp` — must be preserved by called functions
Stack frame	Block of stack memory allocated by a function for saved registers and locals
Prologue	Code at the start of a function: allocate stack, save registers
Epilogue	Code at the end of a function: restore registers, deallocate stack, `ret`
Leaf function	Does not call other functions — no stack frame needed
Non-leaf function	Calls other functions — must save `ra` and manage a stack frame

Practice Problems¶

Problem 1: Call/Return Mechanics¶

Consider this code at address 0x1000:

0x1000: call bar      # bar is at address 0x2000
0x1004: addi a0, a0, 1

After call bar executes, what are the values of ra and PC?

Click to reveal solution

- `ra = 0x1004` (address of the instruction after `call`) - `PC = 0x2000` (address of `bar`) The `call` instruction stores the return address (PC + 4 = 0x1004) into `ra`, then sets PC to the target address (0x2000).

Problem 2: Register Preservation¶

A function uses registers t0, s0, and calls another function. Which registers must be saved, and by whom?

Click to reveal solution

- **`t0`**: caller-saved. If our function needs `t0`'s value after the call, **we** must save it on the stack before the call and restore it after. The called function is free to overwrite `t0`. - **`s0`**: callee-saved. If our function uses `s0`, **we** must save it in our prologue and restore it in our epilogue (because we are the callee who modifies it). But `s0` will survive the call to the other function automatically — the other function is also required to preserve it. - **`ra`**: caller-saved. Since we call another function, `call` will overwrite `ra`. We must save `ra` before the call and restore it before our `ret`. Summary: save `ra` and `s0` in our prologue (we modify both); save `t0` on the stack before the call if we need it after.

Problem 3: Stack Frame Size¶

A function needs to save ra, s0, s1, and a 4-byte local variable on the stack. What should the stack frame size be?

Click to reveal solution

Calculate the bytes needed: - `ra`: 8 bytes - `s0`: 8 bytes - `s1`: 8 bytes - local variable: 4 bytes - **Total**: 28 bytes Round up to the next multiple of 16: **32 bytes**.

addi sp, sp, -32     # Allocate 32 bytes
sd   ra, 0(sp)       # 8 bytes at sp+0
sd   s0, 8(sp)       # 8 bytes at sp+8
sd   s1, 16(sp)      # 8 bytes at sp+16
sw   t0, 24(sp)      # 4 bytes at sp+24 (local variable)

The remaining 4 bytes (sp+28 to sp+31) are padding for 16-byte alignment.

Problem 4: Fix the Bug¶

This non-leaf function has a bug. Find and fix it:

.global broken_func

broken_func:
    addi sp, sp, -16
    sd   s0, 0(sp)

    mv   s0, a0
    call helper
    add  a0, a0, s0

    ld   s0, 0(sp)
    addi sp, sp, 16
    ret

Click to reveal solution

The bug: **`ra` is not saved**. Since `broken_func` calls `helper`, the `call` instruction overwrites `ra`. When `broken_func` executes `ret`, it jumps to the wrong address (the instruction after `call helper` instead of returning to the original caller), creating an infinite loop. **Fix**: save and restore `ra`:

.global broken_func

broken_func:
    addi sp, sp, -16
    sd   ra, 0(sp)       # Save ra
    sd   s0, 8(sp)       # Save s0

    mv   s0, a0
    call helper
    add  a0, a0, s0

    ld   s0, 8(sp)       # Restore s0
    ld   ra, 0(sp)       # Restore ra
    addi sp, sp, 16
    ret

Problem 5: Recursive Sum¶

Write a recursive function sum_s(n: i32) -> i32 that computes 1 + 2 + ... + n. Base case: sum(0) = 0. Recursive case: sum(n) = n + sum(n - 1).

Click to reveal solution

.global sum_s

# int sum_s(int n)
# a0 = n
# Returns: 1 + 2 + ... + n

sum_s:
    addi sp, sp, -16
    sd   ra, (sp)

    # Base case: if n <= 0, return 0
    bgt  a0, zero, sum_recstep
    li   a0, 0
    j    sum_done

sum_recstep:
    sd   a0, 8(sp)       # Save n on stack
    addi a0, a0, -1      # a0 = n - 1
    call sum_s            # a0 = sum(n - 1)
    ld   t0, 8(sp)       # Restore n
    add  a0, a0, t0      # a0 = sum(n-1) + n

sum_done:
    ld   ra, (sp)
    addi sp, sp, 16
    ret

This follows the same pattern as `factrec_s`: save `ra` and `n`, make the recursive call, then combine the result with the saved `n`. For `sum_s(3)`: `sum(3) = 3 + sum(2) = 3 + 2 + sum(1) = 3 + 2 + 1 + sum(0) = 3 + 2 + 1 + 0 = 6`.

Summary¶

Call/return uses call (saves PC+4 to ra, jumps to target) and ret (jumps to address in ra). Any function that calls another function must save ra before the call, or it will be overwritten.
Register convention divides registers into caller-saved (a0–a7, t0–t6, ra) and callee-saved (s0–s11, sp). Caller-saved registers may be freely overwritten by called functions; callee-saved registers must be preserved.
Stack frames follow the prologue/epilogue pattern: allocate space, save registers, execute the body, restore registers, deallocate, return. Stack space must be allocated in multiples of 16 bytes.
Non-leaf functions can use either a caller-saved approach (saving a/t registers around each call) or a callee-saved approach (using s registers that survive calls). The callee-saved approach is generally cleaner when values must survive multiple calls.
Recursive functions work because each call creates its own stack frame with its own saved ra and local values. The stack grows with each recursive call and shrinks as calls return, allowing each invocation to maintain independent state.

RISC-V Assembly Functions¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. The Call/Return Mechanism¶

How Function Calls Work¶

call and ret¶

How ra Links Caller and Callee¶

Example: call_s.s / call.rs¶

2. Caller-Saved vs Callee-Saved Registers¶

The Register Convention¶

Caller-Saved Registers¶

Callee-Saved Registers¶

Argument Passing¶

3. Stack Frame Management¶

The Prologue/Epilogue Pattern¶

Stack Frame Layout¶

Stack Alignment¶

4. Non-Leaf Functions: Caller-Saved Approach¶

Strategy¶

Example: add4f_s.s / add4f.rs¶

5. Non-Leaf Functions: Callee-Saved Approach¶

Strategy¶

Example: add4f_callee_s.s¶

Comparing the Two Approaches¶

6. Recursive Functions¶

Recursion in Assembly¶

Example: factrec_s.s / factrec.rs¶

Stack Diagram: factrec_s(3)¶

Key Concepts¶

Practice Problems¶

Problem 1: Call/Return Mechanics¶

Problem 2: Register Preservation¶

Problem 3: Stack Frame Size¶

Problem 4: Fix the Bug¶

Problem 5: Recursive Sum¶

Further Reading¶

Summary¶

`call` and `ret`¶

How `ra` Links Caller and Callee¶

Example: `call_s.s` / `call.rs`¶

Example: `add4f_s.s` / `add4f.rs`¶

Example: `add4f_callee_s.s`¶

Example: `factrec_s.s` / `factrec.rs`¶

Stack Diagram: `factrec_s(3)`¶