Skip to content

RISC-V Assembly Functions

Overview

This lecture covers how functions work at the assembly level in RISC-V. We explore the call/return mechanism, the calling convention that divides registers into caller-saved and callee-saved groups, and stack frame management. We then apply these concepts to non-leaf functions (functions that call other functions) using both caller-saved and callee-saved approaches, and finish with recursive functions. These concepts are essential for writing correct multi-function assembly programs.

Learning Objectives

  • Explain how call and ret use the ra register and program counter
  • Classify registers as caller-saved or callee-saved and explain the implications
  • Write function prologues and epilogues that manage the stack correctly
  • Implement non-leaf functions using both caller-saved and callee-saved approaches
  • Trace recursive function execution through nested stack frames

Prerequisites

  • RISC-V Assembly 1 & 2 (Lectures 05–06): registers, instructions, memory operations, control flow, leaf functions
  • Program memory layout (stack grows downward, 16-byte alignment)

1. The Call/Return Mechanism

How Function Calls Work

When a program calls a function, two things must happen:

  1. Jump to the function's code
  2. Remember where to return when the function finishes

RISC-V uses the program counter (PC) and the return address register (ra) to accomplish this.

call and ret

The call pseudo-instruction saves the return address and jumps to the target:

call target    # Pseudo-instruction, equivalent to:
               # jal ra, target
               # 1. ra = PC + 4 (address of next instruction)
               # 2. PC = target (jump to function)

The ret pseudo-instruction jumps back to the saved return address:

ret            # Pseudo-instruction, equivalent to:
               # jalr zero, ra, 0
               # 1. PC = ra (jump back to caller)
Address    Instruction
0x1000     call bar        # ra = 0x1004, PC = bar
0x1004     addi a0, a0, 1  # execution resumes here after bar returns
...
bar:
0x2000     addi a0, a0, 1  # bar's code
0x2004     ret              # PC = ra = 0x1004

The call instruction stores PC + 4 (the address of the next instruction) into ra, then jumps to the target. When the called function executes ret, it sets PC = ra, returning execution to the instruction after the call.

Example: call_s.s / call.rs

Rust (src/bin/call.rs):

extern "C" {
    fn foo_s(a: i32) -> i32;
}

fn bar(a: i32) -> i32 {
    a + 1
}

fn foo(a: i32) -> i32 {
    bar(a) + 1
}

foo calls bar, so foo is a non-leaf function. It must save ra before calling bar, because call bar will overwrite ra.

Assembly (asm/call_s.s):

.global foo_s

bar_s:
    addi a0, a0, 1
    ret

foo_s:
    addi sp, sp, -16    # Allocate 16 bytes of stack space
    sd ra, (sp)         # Preserve ra

    call bar_s          # ra = PC + 4

    addi a0, a0, 1

    ld ra, (sp)         # Restore ra
    addi sp, sp, 16     # Deallocate 16 bytes
    ret

Trace for foo_s(3):

Step Instruction a0 ra Notes
1 foo_s: addi sp, sp, -16 3 caller's addr Allocate stack
2 sd ra, (sp) 3 caller's addr Save ra on stack
3 call bar_s 3 after call ra overwritten
4 bar_s: addi a0, a0, 1 4 after call bar adds 1
5 ret (from bar_s) 4 after call Return to foo_s
6 addi a0, a0, 1 5 after call foo adds 1
7 ld ra, (sp) 5 caller's addr Restore ra
8 ret (from foo_s) 5 caller's addr Return to caller

Result: foo_s(3) = 5 (bar adds 1, foo adds 1).

Forgetting to Save ra

If foo_s did not save ra before call bar_s, the ret at the end would jump back to the instruction after call bar_s instead of returning to the original caller — creating an infinite loop.


2. Caller-Saved vs Callee-Saved Registers

The Register Convention

The RISC-V calling convention divides registers into two groups based on who is responsible for preserving their values across a function call:

Category Registers Preserved Across Calls? Responsibility
Caller-saved (temporary) a0a7, t0t6, ra No Caller saves before call if needed
Callee-saved (saved) s0s11, sp Yes Callee saves in prologue, restores in epilogue

Caller-Saved Registers

The called function is free to overwrite these registers. If the caller needs their values after the call, it must save them before the call and restore them after:

  • a0a7: argument/return registers
  • t0t6: temporary registers
  • ra: return address (overwritten by call)

Callee-Saved Registers

The called function must preserve these registers. If the callee wants to use them, it must save them at the start and restore them before returning:

  • s0s11: saved registers
  • sp: stack pointer

Argument Passing

Argument Register Notes
1st a0 Also used for return value
2nd a1
3rd a2
4th a3
5th a4
6th a5
7th a6
8th a7
9th+ stack Passed on the stack

Return values are placed in a0 (and a1 for 128-bit values).

Two Valid Approaches

When you need values to survive across a function call, you have two choices:

  1. Caller-saved approach: save the values on the stack before each call, restore after
  2. Callee-saved approach: move values into s registers (which survive calls), save/restore s registers in prologue/epilogue

Both are correct. The caller-saved approach uses more stack operations per call; the callee-saved approach requires prologue/epilogue saves but is simpler around each call site.


3. Stack Frame Management

The Prologue/Epilogue Pattern

Every non-leaf function follows this structure:

Prologue:
    1. Allocate stack space (subtract from sp)
    2. Save registers that need preserving (ra, s-regs)

Body:
    3. Function logic, including calls to other functions

Epilogue:
    4. Restore saved registers
    5. Deallocate stack space (add to sp)
    6. ret
my_function:
    # === Prologue ===
    addi sp, sp, -32        # 1. Allocate (multiple of 16)
    sd   ra, 0(sp)          # 2. Save ra
    sd   s0, 8(sp)          #    Save s0
    sd   s1, 16(sp)         #    Save s1

    # === Body ===
    # ... function logic ...
    call other_function
    # ... more logic ...

    # === Epilogue ===
    ld   s1, 16(sp)         # 4. Restore s1
    ld   s0, 8(sp)          #    Restore s0
    ld   ra, 0(sp)          #    Restore ra
    addi sp, sp, 32         # 5. Deallocate
    ret                     # 6. Return

Stack Frame Layout

Each function call creates a stack frame — a block of memory on the stack for that function's saved registers and local variables:

Higher addresses
+---------------------------+
|    Caller's frame         |
+---------------------------+  <- sp (before prologue)
|    ra                     |  sp + 24
+---------------------------+
|    s0                     |  sp + 16
+---------------------------+
|    s1                     |  sp + 8
+---------------------------+
|    local variable         |  sp + 0
+---------------------------+  <- sp (after prologue)
Lower addresses

Stack Alignment

The RISC-V calling convention requires sp to always be 16-byte aligned. Even if you only need 8 bytes of stack space, you must allocate 16.

Values to Save Bytes Needed Allocate
ra only (8 bytes) 8 16
ra + 1 s-reg (16 bytes) 16 16
ra + 2 s-regs (24 bytes) 24 32
ra + 3 s-regs (32 bytes) 32 32
ra + 3 s-regs + local (36 bytes) 36 48

Calculating Frame Size

Count the bytes you need, then round up to the next multiple of 16. Formula: frame_size = ceil(bytes_needed / 16) * 16.


4. Non-Leaf Functions: Caller-Saved Approach

Strategy

In the caller-saved approach, we keep values in a registers and save/restore them on the stack around each function call. This is straightforward but requires more stack operations.

Example: add4f_s.s / add4f.rs

This function adds four numbers using three calls to add2_s. We save caller-saved registers (a regs) on the stack before each call.

Rust (src/bin/add4f.rs):

extern "C" {
    fn add4f_s(a: i32, b: i32, c: i32, d: i32) -> i32;
    fn add4f_callee_s(a: i32, b: i32, c: i32, d: i32) -> i32;
}

fn add4f(a: i32, b: i32, c: i32, d: i32) -> i32 {
    a + b + c + d
}

The approach: add4f(a, b, c, d) = add2(add2(a, b), add2(c, d))

Assembly (asm/add4f_s.s):

.global add4f_s
.global add2_s

# Add two args
add2_s:
    add a0, a0, a1
    ret

# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Caller-saved approach

add4f_s:
    addi sp, sp, -32     # Allocate stack
    sd ra, (sp)          # Preserve ra

    # a0 and a1 already set for add2_s(a, b)
    sd a2, 8(sp)         # Save c on stack
    sd a3, 16(sp)        # Save d on stack

    call add2_s          # a0 = a + b

    sd a0, 24(sp)        # Save result (tmp0) on stack

    ld a0, 8(sp)         # a0 = c (from stack)
    ld a1, 16(sp)        # a1 = d (from stack)

    call add2_s          # a0 = c + d

    mv a1, a0            # a1 = c + d
    ld a0, 24(sp)        # a0 = a + b (from stack)

    call add2_s          # a0 = (a+b) + (c+d)

    ld ra, (sp)          # Restore ra
    addi sp, sp, 32      # Deallocate stack
    ret

Stack layout for add4f_s:

+---------------------------+
|    ra                     |  sp + 0
+---------------------------+
|    c (a2)                 |  sp + 8
+---------------------------+
|    d (a3)                 |  sp + 16
+---------------------------+
|    tmp0 (a+b)             |  sp + 24
+---------------------------+

Notice how we save values on the stack before each call (because a regs may be overwritten) and load them back after.


5. Non-Leaf Functions: Callee-Saved Approach

Strategy

In the callee-saved approach, we move values into s registers, which are preserved across function calls. We save the s registers once in the prologue and restore them once in the epilogue. This avoids repeated save/restore around each call.

Example: add4f_callee_s.s

Assembly (asm/add4f_callee_s.s):

.global add4f_callee_s

# Add four args using 3 calls to add2_s
# a0=a, a1=b, a2=c, a3=d
# Callee-saved approach: use s0, s1, s2

add4f_callee_s:
    addi sp, sp, -32     # Allocate stack
    sd ra, (sp)          # Save ra
    sd s0, 8(sp)         # Save s0 (we'll use it)
    sd s1, 16(sp)        # Save s1
    sd s2, 24(sp)        # Save s2

    mv s0, a2            # s0 = c (survives calls)
    mv s1, a3            # s1 = d (survives calls)

    # a0=a, a1=b already set
    call add2_s          # a0 = a + b

    mv s2, a0            # s2 = a + b (survives calls)

    mv a0, s0            # a0 = c
    mv a1, s1            # a1 = d

    call add2_s          # a0 = c + d

    mv a1, a0            # a1 = c + d
    mv a0, s2            # a0 = a + b

    call add2_s          # a0 = (a+b) + (c+d)

    ld ra, (sp)          # Restore ra
    ld s0, 8(sp)         # Restore s0
    ld s1, 16(sp)        # Restore s1
    ld s2, 24(sp)        # Restore s2
    addi sp, sp, 32      # Deallocate stack
    ret

Comparing the Two Approaches

Aspect Caller-Saved Callee-Saved
Where values live On the stack In s registers
Save/restore timing Before/after each call Once in prologue/epilogue
Stack accesses More (per call) Fewer (just prologue/epilogue)
Register usage a and t regs only Uses s regs for persistence
Between calls ld/sd to stack mv to/from s regs

Which to Use?

The callee-saved approach is generally preferred when values must survive multiple calls — you save/restore s registers once instead of repeatedly saving/restoring values around each call. The caller-saved approach is simpler when you only have one or two calls.


6. Recursive Functions

Recursion in Assembly

A recursive function calls itself. Each call creates a new stack frame, so each invocation has its own saved registers and local variables. The stack grows with each recursive call and shrinks as calls return.

Example: factrec_s.s / factrec.rs

Rust (src/bin/factrec.rs):

extern "C" {
    fn factrec_s(n: i32) -> i32;
}

fn factrec(n: i32) -> i32 {
    if n <= 0 {
        1
    } else {
        n * factrec(n - 1)
    }
}

Assembly (asm/factrec_s.s):

.global factrec_s

# Compute n! using recursion
# a0 = n

factrec_s:
    addi sp, sp, -16
    sd ra, (sp)

    # Base case: if n <= 0, return 1
    bgt a0, zero, factrec_recstep
    li a0, 1
    j factrec_done

    # Recursive step
factrec_recstep:
    sd a0, 8(sp)        # Save n on stack
    addi a0, a0, -1     # a0 = n - 1

    call factrec_s      # a0 = factorial(n - 1)

    ld t0, 8(sp)        # Restore n from stack
    mul a0, a0, t0      # a0 = factorial(n-1) * n

factrec_done:
    ld ra, (sp)
    addi sp, sp, 16
    ret

Stack Diagram: factrec_s(3)

Each recursive call creates a new 16-byte stack frame:

Higher addresses
+---------------------------+
|  Frame for factrec_s(3)   |
|    ra (caller's return)   |  sp₃ + 0
|    n = 3                  |  sp₃ + 8
+---------------------------+  <- sp₃
|  Frame for factrec_s(2)   |
|    ra (return to call 3)  |  sp₂ + 0
|    n = 2                  |  sp₂ + 8
+---------------------------+  <- sp₂
|  Frame for factrec_s(1)   |
|    ra (return to call 2)  |  sp₁ + 0
|    n = 1                  |  sp₁ + 8
+---------------------------+  <- sp₁
|  Frame for factrec_s(0)   |
|    ra (return to call 1)  |  sp₀ + 0
|    (base case, no n save) |
+---------------------------+  <- sp₀ (deepest point)
Lower addresses

Execution trace:

Call n Action Return Value
factrec_s(3) 3 Save n=3, call factrec_s(2) 2 * 3 = 6
factrec_s(2) 2 Save n=2, call factrec_s(1) 1 * 2 = 2
factrec_s(1) 1 Save n=1, call factrec_s(0) 1 * 1 = 1
factrec_s(0) 0 Base case: return 1 1

The stack unwinds as each call returns, multiplying: 1 * 1 = 1 -> 1 * 2 = 2 -> 2 * 3 = 6

Why Each Frame Needs Its Own n

Each recursive call overwrites a0 with n - 1 before the call. Without saving n on the stack, the original value of n would be lost. Each stack frame stores its own copy of n, so when the recursive call returns, we can load it back and multiply.


Key Concepts

Concept Description
Program Counter (PC) Address of the current instruction being executed
ra (return address) Register that holds the address to return to after a function call
call Pseudo-instruction: saves PC+4 in ra, jumps to target (jal ra, target)
ret Pseudo-instruction: jumps to address in ra (jalr zero, ra, 0)
Caller-saved registers a0a7, t0t6, ra — may be overwritten by called functions
Callee-saved registers s0s11, sp — must be preserved by called functions
Stack frame Block of stack memory allocated by a function for saved registers and locals
Prologue Code at the start of a function: allocate stack, save registers
Epilogue Code at the end of a function: restore registers, deallocate stack, ret
Leaf function Does not call other functions — no stack frame needed
Non-leaf function Calls other functions — must save ra and manage a stack frame

Practice Problems

Problem 1: Call/Return Mechanics

Consider this code at address 0x1000:

0x1000: call bar      # bar is at address 0x2000
0x1004: addi a0, a0, 1

After call bar executes, what are the values of ra and PC?

Click to reveal solution - `ra = 0x1004` (address of the instruction after `call`) - `PC = 0x2000` (address of `bar`) The `call` instruction stores the return address (PC + 4 = 0x1004) into `ra`, then sets PC to the target address (0x2000).

Problem 2: Register Preservation

A function uses registers t0, s0, and calls another function. Which registers must be saved, and by whom?

Click to reveal solution - **`t0`**: caller-saved. If our function needs `t0`'s value after the call, **we** must save it on the stack before the call and restore it after. The called function is free to overwrite `t0`. - **`s0`**: callee-saved. If our function uses `s0`, **we** must save it in our prologue and restore it in our epilogue (because we are the callee who modifies it). But `s0` will survive the call to the other function automatically — the other function is also required to preserve it. - **`ra`**: caller-saved. Since we call another function, `call` will overwrite `ra`. We must save `ra` before the call and restore it before our `ret`. Summary: save `ra` and `s0` in our prologue (we modify both); save `t0` on the stack before the call if we need it after.

Problem 3: Stack Frame Size

A function needs to save ra, s0, s1, and a 4-byte local variable on the stack. What should the stack frame size be?

Click to reveal solution Calculate the bytes needed: - `ra`: 8 bytes - `s0`: 8 bytes - `s1`: 8 bytes - local variable: 4 bytes - **Total**: 28 bytes Round up to the next multiple of 16: **32 bytes**.
addi sp, sp, -32     # Allocate 32 bytes
sd   ra, 0(sp)       # 8 bytes at sp+0
sd   s0, 8(sp)       # 8 bytes at sp+8
sd   s1, 16(sp)      # 8 bytes at sp+16
sw   t0, 24(sp)      # 4 bytes at sp+24 (local variable)
The remaining 4 bytes (sp+28 to sp+31) are padding for 16-byte alignment.

Problem 4: Fix the Bug

This non-leaf function has a bug. Find and fix it:

.global broken_func

broken_func:
    addi sp, sp, -16
    sd   s0, 0(sp)

    mv   s0, a0
    call helper
    add  a0, a0, s0

    ld   s0, 0(sp)
    addi sp, sp, 16
    ret
Click to reveal solution The bug: **`ra` is not saved**. Since `broken_func` calls `helper`, the `call` instruction overwrites `ra`. When `broken_func` executes `ret`, it jumps to the wrong address (the instruction after `call helper` instead of returning to the original caller), creating an infinite loop. **Fix**: save and restore `ra`:
.global broken_func

broken_func:
    addi sp, sp, -16
    sd   ra, 0(sp)       # Save ra
    sd   s0, 8(sp)       # Save s0

    mv   s0, a0
    call helper
    add  a0, a0, s0

    ld   s0, 8(sp)       # Restore s0
    ld   ra, 0(sp)       # Restore ra
    addi sp, sp, 16
    ret

Problem 5: Recursive Sum

Write a recursive function sum_s(n: i32) -> i32 that computes 1 + 2 + ... + n. Base case: sum(0) = 0. Recursive case: sum(n) = n + sum(n - 1).

Click to reveal solution
.global sum_s

# int sum_s(int n)
# a0 = n
# Returns: 1 + 2 + ... + n

sum_s:
    addi sp, sp, -16
    sd   ra, (sp)

    # Base case: if n <= 0, return 0
    bgt  a0, zero, sum_recstep
    li   a0, 0
    j    sum_done

sum_recstep:
    sd   a0, 8(sp)       # Save n on stack
    addi a0, a0, -1      # a0 = n - 1
    call sum_s            # a0 = sum(n - 1)
    ld   t0, 8(sp)       # Restore n
    add  a0, a0, t0      # a0 = sum(n-1) + n

sum_done:
    ld   ra, (sp)
    addi sp, sp, 16
    ret
This follows the same pattern as `factrec_s`: save `ra` and `n`, make the recursive call, then combine the result with the saved `n`. For `sum_s(3)`: `sum(3) = 3 + sum(2) = 3 + 2 + sum(1) = 3 + 2 + 1 + sum(0) = 3 + 2 + 1 + 0 = 6`.

Further Reading


Summary

  1. Call/return uses call (saves PC+4 to ra, jumps to target) and ret (jumps to address in ra). Any function that calls another function must save ra before the call, or it will be overwritten.

  2. Register convention divides registers into caller-saved (a0a7, t0t6, ra) and callee-saved (s0s11, sp). Caller-saved registers may be freely overwritten by called functions; callee-saved registers must be preserved.

  3. Stack frames follow the prologue/epilogue pattern: allocate space, save registers, execute the body, restore registers, deallocate, return. Stack space must be allocated in multiples of 16 bytes.

  4. Non-leaf functions can use either a caller-saved approach (saving a/t registers around each call) or a callee-saved approach (using s registers that survive calls). The callee-saved approach is generally cleaner when values must survive multiple calls.

  5. Recursive functions work because each call creates its own stack frame with its own saved ra and local values. The stack grows with each recursive call and shrinks as calls return, allowing each invocation to maintain independent state.