Skip to content

Project02 - RISC-V Assembly and NTLang Compiler

Due Thu Mar 26th by 11:59pm in your Project02 GitHub repo

Tests: https://github.com/USF-CS631-S26/tests

Background

The goal of this project is to learn more RISC-V Assembly, specifically working wiht strings and calling C library functins, and to extend your NTLang tool with arguments and a compiler back-end. The project has two parts:

  1. RISC-V Assembly Programs — You will implement six functions in RISC-V assembly (string reversal, byte packing/unpacking, bit sequence extraction). Each function has a Rust driver that calls both a Rust reference implementation and your assembly implementation, printing results labeled Rust: and Asm: for comparison.

  2. NTLang Compiler Mode — You will extend your NTLang interpreter with register parameter support (a0a7) and a compile mode (-c) that generates RISC-V executables from NTLang expressions using stack-based code generation.

All programs target RISC-V 64-bit. On non-RISC-V hosts, you will use Docker and QEMU for cross-compilation and execution. A riscv-run script abstracts this so you can develop on any platform.

Requirements

  1. Implement six RISC-V assembly functions: rstr, rstr_rec, pack_bytes, unpack_bytes, get_bitseq, get_bitseq_signed.

  2. Each assembly function has a corresponding Rust implementation and a Rust driver program. The Rust drivers print two lines: Rust: and Asm: comparing both implementations.

  3. Extend NTLang to support register parameters a0a7 in expressions, settable via -aX val command-line flags.

  4. Implement NTLang compile mode (-c <name>) that generates a RISC-V executable, and assembly-only mode (-c <name> -s) that generates a .s file.

  5. Your project uses Cargo with a build.rs build script. Assembly and C files are compiled via the cc crate. A Makefile and Dockerfile handle cross-compilation with Docker.

  6. All programs must produce output matching the automated tests exactly.

Section 1: RISC-V Assembly Programs

Each program below has two implementations:

  • Rust — A reference implementation in the Rust driver
  • Assembly — Your RISC-V assembly implementation (the _s suffix function)

The Rust driver calls both implementations and prints the results. Your assembly output on the Asm: line must match the Rust: line.

Driver Pattern

Each Rust driver follows this pattern using unsafe extern "C" FFI:

use std::env;
use std::process;

unsafe extern "C" {
    fn example_s(arg: u32) -> u32;
}

fn example(arg: u32) -> u32 {
    // Rust reference implementation
    arg
}

fn main() {
    let args: Vec<String> = env::args().collect();
    // parse args...

    let rust_result = example(arg);
    println!("Rust: {}", rust_result);

    let s_result = unsafe { example_s(arg) };
    println!("Asm: {}", s_result);
}

1.1 rstr — Iterative String Reverse

Reverse a string iteratively.

Rust driver (src/bin/rstr.rs):

unsafe extern "C" {
    fn rstr_s(dst: *mut u8, src: *const u8);
}

fn rstr(src: &str) -> String {
    let src_bytes = src.as_bytes();
    let src_len = src_bytes.len();
    let mut dst = vec![0u8; src_len];

    let mut j = src_len;
    for i in 0..src_len {
        j -= 1;
        dst[i] = src_bytes[j];
    }

    String::from_utf8(dst).unwrap()
}

Assembly function signature: rstr_s(dst: *mut u8, src: *const u8) — copies the reverse of the null-terminated string src into dst.

Example:

$ ./riscv-run rstr FooBar
Rust: raBooF
Asm: raBooF

1.2 rstr_rec — Recursive String Reverse

Same as rstr but implemented recursively in assembly. The Rust driver is identical in structure.

Example:

$ ./riscv-run rstr_rec "CS631 Systems Foundations"
Rust: snoitadnuoF smetsyS 136SC
Asm: snoitadnuoF smetsyS 136SC

1.3 pack_bytes — Pack Four Bytes

Combine four byte values (b3, b2, b1, b0) into a single 32-bit integer: (b3 << 24) | (b2 << 16) | (b1 << 8) | b0.

Rust driver (src/bin/pack_bytes.rs):

unsafe extern "C" {
    fn pack_bytes_s(b3: u32, b2: u32, b1: u32, b0: u32) -> i32;
}

fn pack_bytes(b3: u32, b2: u32, b1: u32, b0: u32) -> i32 {
    let mut val: u32 = b3;
    val = (val << 8) | b2;
    val = (val << 8) | b1;
    val = (val << 8) | b0;
    val as i32
}

Examples:

$ ./riscv-run pack_bytes 1 2 3 4
Rust: 16909060
Asm: 16909060

$ ./riscv-run pack_bytes 255 255 255 255
Rust: -1
Asm: -1

1.4 unpack_bytes — Unpack to Four Bytes

Extract four byte values from a 32-bit integer, storing them in an array. Byte 0 is the least significant byte.

Rust driver (src/bin/unpack_bytes.rs):

unsafe extern "C" {
    fn unpack_bytes_s(val: i32, bytes: *mut u32);
}

fn unpack_bytes(val: i32, bytes: &mut [u32; 4]) {
    let mut v = val as u32;
    for i in 0..4 {
        bytes[i] = v & 0xFF;
        v >>= 8;
    }
}

The output prints bytes in order: bytes[3] bytes[2] bytes[1] bytes[0] (most significant first).

Examples:

$ ./riscv-run unpack_bytes 16909060
Rust: 1 2 3 4
Asm: 1 2 3 4

$ ./riscv-run unpack_bytes -2
Rust: 255 255 255 254
Asm: 255 255 255 254

1.5 get_bitseq — Extract Unsigned Bit Sequence

Extract bits from position start to end (inclusive) from a 32-bit number, returned as an unsigned value.

Rust driver (src/bin/get_bitseq.rs):

unsafe extern "C" {
    fn get_bitseq_s(num: u32, start: i32, end: i32) -> u32;
}

fn get_bitseq(num: u32, start: i32, end: i32) -> u32 {
    let len = (end - start) + 1;
    let val = num >> start;
    let mask = if len == 32 {
        0xFFFFFFFF
    } else {
        (1u32 << len) - 1
    };
    val & mask
}

Example:

$ ./riscv-run get_bitseq 94116 12 15
Rust: 6
Asm: 6

1.6 get_bitseq_signed — Extract Signed Bit Sequence

Same as get_bitseq but sign-extends the extracted bit sequence to 32 bits.

Rust driver (src/bin/get_bitseq_signed.rs):

unsafe extern "C" {
    fn get_bitseq_signed_s(num: u32, start: i32, end: i32) -> i32;
}

fn get_bitseq(num: u32, start: i32, end: i32) -> u32 {
    let len = (end - start) + 1;
    let val = num >> start;
    let mask = if len == 32 { 0xFFFFFFFF } else { (1u32 << len) - 1 };
    val & mask
}

fn get_bitseq_signed(num: u32, start: i32, end: i32) -> i32 {
    let val = get_bitseq(num, start, end);
    let len = (end - start) + 1;
    let shift_amt = 32 - len;
    let val = val << shift_amt;
    ((val as i32) >> shift_amt) as i32
}

The sign extension technique: shift the extracted bits to the top of the 32-bit word, then arithmetic shift right to sign-extend.

Examples:

$ ./riscv-run get_bitseq_signed 94117 12 15
Rust: 6
Asm: 6

$ ./riscv-run get_bitseq_signed 94117 4 7
Rust: -6
Asm: -6

Section 2: NTLang Parameter Support

Register Parameters

NTLang expressions can now reference argument registers a0a7 as operands. These represent the RISC-V argument registers and can be set from the command line using -aX val flags.

Scanner and Parser

No changes are needed to the scanner or parser from Project01. Register names a0a7 are already recognized as ident tokens by the scanner and parsed as ordinary variable references (ParseNode::Var) by the parser. The operand rule is unchanged:

operand    ::= intlit | hexlit | binlit | ident
             | '-' operand | '~' operand
             | '(' expression ')'

Evaluator Changes

The evaluator resolves a0a7 through normal variable lookup. Before evaluating an expression, eval_expression() (and eval_program() for file mode) pre-populates the env HashMap with register values from config.regs:

let mut env: HashMap<String, u32> = HashMap::new();
for i in 0..8 {
    env.insert(format!("a{}", i), config.regs[i]);
}

When the evaluator encounters a Var node named a0a7, it finds the value in env just like any other variable. This is the only evaluator change needed for register support.

Command Line

$ ntlang -e "expression" -aX val

Where X is 0–7 and val is an integer value to assign to that register.

Examples

$ ./riscv-run ntlang -e "(a0 * a1) + a2" -a0 3 -a1 4 -a2 5
17

$ ./riscv-run ntlang -e "a0 + a1" -a0 10 -a1 20
30

Section 3: NTLang Compile Mode

Overview

Compile mode generates a RISC-V executable from an NTLang expression. Instead of evaluating the expression, the compiler walks the parse tree and emits RISC-V assembly that computes the expression using a stack machine approach.

Command Line Options

Option Description
-c <name> Compile expression to a RISC-V executable named <name>
-c <name> -s Generate assembly only (produces <name>.s)

How It Works

  1. The expression is scanned and parsed as usual.
  2. Instead of calling eval, the compiler calls compile::compile_tree() which walks the parse tree and emits RISC-V assembly.
  3. The generated assembly is combined with codegen_main.s (a provided preamble) to form a complete program.
  4. The preamble handles: parsing command-line arguments into a0a7, calling the generated expression function, and printing the result.
  5. With -c <name>, the compiler invokes gcc to assemble and link the .s file into an executable, then removes the .s file.
  6. With -c <name> -s, only the .s file is produced.

The Preamble: codegen_main.s

To generate a complete working assembly program the compiler will emit the assembly code in two parts: a preamble and the generated expression in assembly. One of you jobs for this project is to convert the given C version of the preamble (c/codegen_main.c) into assembly (asm/codegen_main.s). This will give you practice in a slight more complicated assembly function that makes function calls to C library functions. The preamble does the following:

  1. Parses command-line arguments (converting strings to integers via atoi)
  2. Stores argument values into registers a0a7
  3. Calls the generated expression function
  4. Prints the return value in decimal and hex: %d (0x%X)
  5. Returns

Your compiler's compile_tree() function uses include_str!("../asm/codegen_main.s") to embed the preamble into the generated .s file, replacing the placeholder function name with the actual name from -c.

Stack Machine Code Generation

The code generator walks the parse tree depth-first and emits assembly instructions that use the stack to store intermediate values:

Push a constant (integer literal):

addi sp, sp, -4       # make room on stack
li t0, <val>          # load immediate value
sw t0, (sp)           # store to top of stack

Push a register (e.g., a0):

addi sp, sp, -4       # make room on stack
sw aX, (sp)           # store register to top of stack

Unary operator (e.g., negation):

lw t0, (sp)           # load operand from top of stack
<unary_op> t0, t0     # apply operation
sw t0, (sp)           # store result back

Binary operator (e.g., add):

lw t1, (sp)           # load right operand
addi sp, sp, 4        # pop right operand
lw t0, (sp)           # load left operand
<binary_op> t0, t0, t1  # apply operation
sw t0, (sp)           # store result (replaces left operand)

The function epilogue pops the final result into a0 and returns:

lw a0, (sp)
addi sp, sp, 4
ret

Operator Instructions

NTLang Operator RISC-V Instruction Description
+ add t0, t0, t1 Addition
- sub t0, t0, t1 Subtraction
* mul t0, t0, t1 Multiplication
/ div t0, t0, t1 Signed division
& and t0, t0, t1 Bitwise AND
\| or t0, t0, t1 Bitwise OR
^ xor t0, t0, t1 Bitwise XOR
<< sll t0, t0, t1 Shift left logical
>> srl t0, t0, t1 Shift right logical
>- sra t0, t0, t1 Shift right arithmetic
- (unary) sub t0, zero, t0 Negation
~ (unary) xor t0, t1, t0 with li t1, -1 Bitwise NOT

The compile.rs Module

Your compile.rs module implements two functions:

  • compile_tree_expr(node, output) — Recursively walks the parse tree and appends assembly instructions to the output string. For Var nodes, it checks if the name matches a register pattern (a0a7) and emits sw aX, (sp) accordingly, producing an error for unsupported variable names.
  • compile_tree(node, name) — Entry point that includes the preamble, emits the function label, calls compile_tree_expr, and appends the epilogue.

The find_gcc() Function

When linking the generated assembly into an executable, ntlang needs to find the right C compiler:

fn find_gcc() -> String {
    if Command::new("riscv64-linux-gnu-gcc").arg("--version")
        .stdout(process::Stdio::null()).stderr(process::Stdio::null())
        .status().is_ok()
    {
        "riscv64-linux-gnu-gcc".to_string()
    } else {
        "gcc".to_string()
    }
}

Inside the Docker container, riscv64-linux-gnu-gcc is available. On a native RISC-V machine, plain gcc works.

Example: Compiling (a0 + a1) * a2

Generate the executable:

$ ./riscv-run ntlang -e "(a0 + a1) * a2" -c myprog

Run it with arguments (the preamble maps command-line args to a0, a1, a2, ...):

$ ./riscv-run ./myprog 3 4 7
49 (0x31)

Generated assembly (with -s):

    # Push a0
    addi sp, sp, -4
    sw a0, (sp)

    # Push a1
    addi sp, sp, -4
    sw a1, (sp)

    # Binary add
    lw t1, (sp)
    addi sp, sp, 4
    lw t0, (sp)
    add t0, t0, t1
    sw t0, (sp)

    # Push a2
    addi sp, sp, -4
    sw a2, (sp)

    # Binary mul
    lw t1, (sp)
    addi sp, sp, 4
    lw t0, (sp)
    mul t0, t0, t1
    sw t0, (sp)

    # Pop result into a0 and return
    lw a0, (sp)
    addi sp, sp, 4
    ret

Test Examples

# Compile with literal values
$ ./riscv-run ntlang -e '(3 * 4) + 5' -c p1
$ ./riscv-run ./p1
17 (0x11)

# Compile with register parameters
$ ./riscv-run ntlang -e '(a0 * a1) + a2' -c p2
$ ./riscv-run ./p2 3 4 5
17 (0x11)

# Complex expression
$ ./riscv-run ntlang -e "(((((~((-(2 * ((1023 + 1) / 4)) >- 2) << 8)) >> 10) ^ 0b01110) & 0x1E) | ~(0b10000))" -c p3
$ ./riscv-run ./p3
-1 (0xFFFFFFFF)

Section 4: Build System

Project Structure

project02/
├── Cargo.toml
├── build.rs
├── Makefile
├── Dockerfile
├── riscv-run
├── .cargo/
│   └── config.toml
├── asm/
│   ├── codegen_main.s
│   ├── rstr_s.s
│   ├── rstr_rec_s.s
│   ├── pack_bytes_s.s
│   ├── unpack_bytes_s.s
│   ├── get_bitseq_s.s
│   └── get_bitseq_signed_s.s
├── c/
│   └── codegen_main.c
└── src/
    ├── lib.rs
    ├── scan.rs
    ├── parse.rs
    ├── eval.rs
    ├── conv.rs
    ├── config.rs
    ├── compile.rs
    └── bin/
        ├── ntlang.rs
        ├── rstr.rs
        ├── rstr_rec.rs
        ├── pack_bytes.rs
        ├── unpack_bytes.rs
        ├── get_bitseq.rs
        └── get_bitseq_signed.rs

Cargo.toml

Your Cargo.toml defines seven binary targets and uses the cc crate for building assembly and C files:

[package]
name = "project02"
version = "0.1.0"
edition = "2024"

[lib]
name = "project02"
path = "src/lib.rs"

[build-dependencies]
cc = "1"

[[bin]]
name = "get_bitseq"
path = "src/bin/get_bitseq.rs"

[[bin]]
name = "get_bitseq_signed"
path = "src/bin/get_bitseq_signed.rs"

[[bin]]
name = "pack_bytes"
path = "src/bin/pack_bytes.rs"

[[bin]]
name = "unpack_bytes"
path = "src/bin/unpack_bytes.rs"

[[bin]]
name = "rstr"
path = "src/bin/rstr.rs"

[[bin]]
name = "rstr_rec"
path = "src/bin/rstr_rec.rs"

[[bin]]
name = "ntlang"
path = "src/bin/ntlang.rs"

build.rs

The build.rs script conditionally compiles assembly and C files only when targeting RISC-V:

fn main() {
    let target = std::env::var("TARGET").unwrap_or_default();

    if target.contains("riscv") {
        cc::Build::new()
            .file("asm/get_bitseq_s.s")
            .file("asm/get_bitseq_signed_s.s")
            .file("asm/pack_bytes_s.s")
            .file("asm/unpack_bytes_s.s")
            .file("asm/rstr_s.s")
            .file("asm/rstr_rec_s.s")
            .compile("asm_functions");

        cc::Build::new()
            .file("c/get_bitseq_c.c")
            .file("c/get_bitseq_signed_c.c")
            .file("c/pack_bytes_c.c")
            .file("c/unpack_bytes_c.c")
            .file("c/rstr_c.c")
            .file("c/rstr_rec_c.c")
            .compile("c_functions");

        println!("cargo:rustc-link-arg-bins=-lasm_functions");
        println!("cargo:rustc-link-arg-bins=-lc_functions");
    }

    println!("cargo:rerun-if-changed=asm/");
    println!("cargo:rerun-if-changed=c/");
}

The target.contains("riscv") check allows the Rust library code (scanner, parser, etc.) to compile on your host machine for development, while assembly and C files are only compiled when cross-compiling for RISC-V.

.cargo/config.toml

Configure the RISC-V cross-compilation toolchain:

[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc"
runner = "qemu-riscv64 -L /usr/riscv64-linux-gnu"

This tells Cargo to use the cross-compiler as the linker and QEMU as the runner when targeting RISC-V.

Makefile

The Makefile auto-detects whether you're on a RISC-V machine or need Docker:

UNAME_M := $(shell uname -m)
BINS := get_bitseq get_bitseq_signed pack_bytes unpack_bytes rstr rstr_rec ntlang
IMAGE_NAME := project02-riscv

ifeq ($(UNAME_M),riscv64)
  CARGO_CMD = cargo
  CARGO_FLAGS =
else
  CARGO_CMD = docker run --rm -v $(CURDIR):/project $(IMAGE_NAME) cargo
  CARGO_FLAGS = --target riscv64gc-unknown-linux-gnu
endif

build: docker-image-ensure
    $(CARGO_CMD) build $(CARGO_FLAGS)

run-%: docker-image-ensure
    $(CARGO_CMD) run $(CARGO_FLAGS) --bin $* -- $(ARGS)

clean:
    cargo clean

docker-build:
    docker build -t $(IMAGE_NAME) .

docker-shell:
    docker run --rm -it -v $(CURDIR):/project $(IMAGE_NAME) bash

Key targets:

  • make build — build all binaries
  • make run-rstr ARGS="FooBar" — run a specific binary with arguments
  • make docker-build — rebuild the Docker image
  • make docker-shell — open a shell inside the Docker container

Dockerfile

FROM ubuntu:24.04

RUN apt-get update && apt-get install -y \
    gcc-riscv64-linux-gnu qemu-user curl build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
ENV PATH="/root/.cargo/bin:${PATH}"
RUN rustup target add riscv64gc-unknown-linux-gnu

WORKDIR /project

The Docker image provides: the RISC-V cross-compiler (riscv64-linux-gnu-gcc), QEMU for running RISC-V binaries, and Rust with the RISC-V target.

riscv-run Script

The riscv-run script is a convenience wrapper that handles Docker transparently:

#!/usr/bin/env bash
set -euo pipefail

BIN="${1:?usage: riscv-run <binary|executable> [args...]}"
shift

PROJECT_DIR="$(cd "$(dirname "$0")" && pwd)"
IMAGE="project02-riscv"

ensure_image() {
    if ! docker image inspect "$IMAGE" &>/dev/null; then
        echo "Building Docker image '$IMAGE'..." >&2
        docker build -t "$IMAGE" "$PROJECT_DIR"
    fi
}

if [ -f "$BIN" ]; then
    # Direct executable (e.g. ./prog compiled by ntlang -c)
    if [ "$(uname -m)" = "riscv64" ]; then
        exec "$BIN" "$@"
    else
        ensure_image
        exec docker run --rm -v "$PROJECT_DIR":/project -w /project "$IMAGE" \
            qemu-riscv64 -L /usr/riscv64-linux-gnu "$BIN" "$@"
    fi
else
    # Cargo binary
    if [ "$(uname -m)" = "riscv64" ]; then
        exec cargo run --quiet --manifest-path "$PROJECT_DIR/Cargo.toml" --bin "$BIN" -- "$@"
    else
        ensure_image
        exec docker run --rm -v "$PROJECT_DIR":/project \
            -e CC_riscv64gc_unknown_linux_gnu=riscv64-linux-gnu-gcc \
            "$IMAGE" \
            cargo run --quiet --target riscv64gc-unknown-linux-gnu --bin "$BIN" -- "$@"
    fi
fi

Usage:

# Run a Cargo binary (builds if needed)
$ ./riscv-run rstr FooBar

# Run a compiled executable
$ ./riscv-run ./myprog 3 4 5

On a native RISC-V machine, riscv-run runs the binary directly. On other platforms, it uses Docker with QEMU.

Running Natively on RISC-V

If you are on a RISC-V machine (e.g., the class server), you can build and run directly with Cargo:

$ cargo build
$ cargo run --bin rstr -- FooBar
$ cargo run --bin ntlang -- -e "(a0 + a1) * a2" -a0 3 -a1 4 -a2 5

Cross-Compilation with Docker

On non-RISC-V hosts (macOS, x86 Linux), use the riscv-run script or the Makefile, which handle Docker automatically. Make sure you have Docker installed and running (Docker Desktop, Colima, or OrbStack all work).

The first build takes a few minutes to create the Docker image. Subsequent builds are fast because Docker caches the image layers.

Grading

Tests: https://github.com/USF-CS631-S26/tests

Grading is based on automated tests (100 points total):

Tests Points Description
rstr_1, rstr_2 10 String reverse (iterative) (5 pts each)
rstr_rec_1, rstr_rec_2 10 String reverse (recursive) (5 pts each)
pack_bytes_1, pack_bytes_2 10 Byte packing (5 pts each)
unpack_bytes_1, unpack_bytes_2 10 Byte unpacking (5 pts each)
get_bitseq_1, get_bitseq_2 2 Unsigned bit extraction (1 pt each)
get_bitseq_signed_1, get_bitseq_signed_2 8 Signed bit extraction (4 pts each)
ntlang_args 10 NTLang register parameter eval
ntlang_comp_p1 10 Compile mode with literals
ntlang_comp_p2 10 Compile mode with register params
ntlang_comp_p3 10 Compile mode with complex expression
ntlang_comp_p4 10 Compile mode with complex expression and args
Total 100

Code Quality

Code quality deductions may be applied and can be earned back. We are looking for:

  • Consistent spacing and indentation
  • Consistent naming and commenting
  • No commented-out ("dead") code
  • No redundant or overly complicated code
  • A clean repo, that is no build products, extra files, etc.