Project02 - RISC-V Assembly and NTLang Compiler¶
Due Thu Mar 26th by 11:59pm in your Project02 GitHub repo
Links¶
Tests: https://github.com/USF-CS631-S26/tests
Background¶
The goal of this project is to learn more RISC-V Assembly, specifically working wiht strings and calling C library functins, and to extend your NTLang tool with arguments and a compiler back-end. The project has two parts:
-
RISC-V Assembly Programs — You will implement six functions in RISC-V assembly (string reversal, byte packing/unpacking, bit sequence extraction). Each function has a Rust driver that calls both a Rust reference implementation and your assembly implementation, printing results labeled
Rust:andAsm:for comparison. -
NTLang Compiler Mode — You will extend your NTLang interpreter with register parameter support (
a0–a7) and a compile mode (-c) that generates RISC-V executables from NTLang expressions using stack-based code generation.
All programs target RISC-V 64-bit. On non-RISC-V hosts, you will use Docker and QEMU for cross-compilation and execution. A riscv-run script abstracts this so you can develop on any platform.
Requirements¶
-
Implement six RISC-V assembly functions:
rstr,rstr_rec,pack_bytes,unpack_bytes,get_bitseq,get_bitseq_signed. -
Each assembly function has a corresponding Rust implementation and a Rust driver program. The Rust drivers print two lines:
Rust:andAsm:comparing both implementations. -
Extend NTLang to support register parameters
a0–a7in expressions, settable via-aX valcommand-line flags. -
Implement NTLang compile mode (
-c <name>) that generates a RISC-V executable, and assembly-only mode (-c <name> -s) that generates a.sfile. -
Your project uses Cargo with a
build.rsbuild script. Assembly and C files are compiled via thecccrate. A Makefile and Dockerfile handle cross-compilation with Docker. -
All programs must produce output matching the automated tests exactly.
Section 1: RISC-V Assembly Programs¶
Each program below has two implementations:
- Rust — A reference implementation in the Rust driver
- Assembly — Your RISC-V assembly implementation (the
_ssuffix function)
The Rust driver calls both implementations and prints the results. Your assembly output on the Asm: line must match the Rust: line.
Driver Pattern¶
Each Rust driver follows this pattern using unsafe extern "C" FFI:
use std::env;
use std::process;
unsafe extern "C" {
fn example_s(arg: u32) -> u32;
}
fn example(arg: u32) -> u32 {
// Rust reference implementation
arg
}
fn main() {
let args: Vec<String> = env::args().collect();
// parse args...
let rust_result = example(arg);
println!("Rust: {}", rust_result);
let s_result = unsafe { example_s(arg) };
println!("Asm: {}", s_result);
}
1.1 rstr — Iterative String Reverse¶
Reverse a string iteratively.
Rust driver (src/bin/rstr.rs):
unsafe extern "C" {
fn rstr_s(dst: *mut u8, src: *const u8);
}
fn rstr(src: &str) -> String {
let src_bytes = src.as_bytes();
let src_len = src_bytes.len();
let mut dst = vec![0u8; src_len];
let mut j = src_len;
for i in 0..src_len {
j -= 1;
dst[i] = src_bytes[j];
}
String::from_utf8(dst).unwrap()
}
Assembly function signature: rstr_s(dst: *mut u8, src: *const u8) — copies the reverse of the null-terminated string src into dst.
Example:
1.2 rstr_rec — Recursive String Reverse¶
Same as rstr but implemented recursively in assembly. The Rust driver is identical in structure.
Example:
$ ./riscv-run rstr_rec "CS631 Systems Foundations"
Rust: snoitadnuoF smetsyS 136SC
Asm: snoitadnuoF smetsyS 136SC
1.3 pack_bytes — Pack Four Bytes¶
Combine four byte values (b3, b2, b1, b0) into a single 32-bit integer: (b3 << 24) | (b2 << 16) | (b1 << 8) | b0.
Rust driver (src/bin/pack_bytes.rs):
unsafe extern "C" {
fn pack_bytes_s(b3: u32, b2: u32, b1: u32, b0: u32) -> i32;
}
fn pack_bytes(b3: u32, b2: u32, b1: u32, b0: u32) -> i32 {
let mut val: u32 = b3;
val = (val << 8) | b2;
val = (val << 8) | b1;
val = (val << 8) | b0;
val as i32
}
Examples:
$ ./riscv-run pack_bytes 1 2 3 4
Rust: 16909060
Asm: 16909060
$ ./riscv-run pack_bytes 255 255 255 255
Rust: -1
Asm: -1
1.4 unpack_bytes — Unpack to Four Bytes¶
Extract four byte values from a 32-bit integer, storing them in an array. Byte 0 is the least significant byte.
Rust driver (src/bin/unpack_bytes.rs):
unsafe extern "C" {
fn unpack_bytes_s(val: i32, bytes: *mut u32);
}
fn unpack_bytes(val: i32, bytes: &mut [u32; 4]) {
let mut v = val as u32;
for i in 0..4 {
bytes[i] = v & 0xFF;
v >>= 8;
}
}
The output prints bytes in order: bytes[3] bytes[2] bytes[1] bytes[0] (most significant first).
Examples:
$ ./riscv-run unpack_bytes 16909060
Rust: 1 2 3 4
Asm: 1 2 3 4
$ ./riscv-run unpack_bytes -2
Rust: 255 255 255 254
Asm: 255 255 255 254
1.5 get_bitseq — Extract Unsigned Bit Sequence¶
Extract bits from position start to end (inclusive) from a 32-bit number, returned as an unsigned value.
Rust driver (src/bin/get_bitseq.rs):
unsafe extern "C" {
fn get_bitseq_s(num: u32, start: i32, end: i32) -> u32;
}
fn get_bitseq(num: u32, start: i32, end: i32) -> u32 {
let len = (end - start) + 1;
let val = num >> start;
let mask = if len == 32 {
0xFFFFFFFF
} else {
(1u32 << len) - 1
};
val & mask
}
Example:
1.6 get_bitseq_signed — Extract Signed Bit Sequence¶
Same as get_bitseq but sign-extends the extracted bit sequence to 32 bits.
Rust driver (src/bin/get_bitseq_signed.rs):
unsafe extern "C" {
fn get_bitseq_signed_s(num: u32, start: i32, end: i32) -> i32;
}
fn get_bitseq(num: u32, start: i32, end: i32) -> u32 {
let len = (end - start) + 1;
let val = num >> start;
let mask = if len == 32 { 0xFFFFFFFF } else { (1u32 << len) - 1 };
val & mask
}
fn get_bitseq_signed(num: u32, start: i32, end: i32) -> i32 {
let val = get_bitseq(num, start, end);
let len = (end - start) + 1;
let shift_amt = 32 - len;
let val = val << shift_amt;
((val as i32) >> shift_amt) as i32
}
The sign extension technique: shift the extracted bits to the top of the 32-bit word, then arithmetic shift right to sign-extend.
Examples:
$ ./riscv-run get_bitseq_signed 94117 12 15
Rust: 6
Asm: 6
$ ./riscv-run get_bitseq_signed 94117 4 7
Rust: -6
Asm: -6
Section 2: NTLang Parameter Support¶
Register Parameters¶
NTLang expressions can now reference argument registers a0–a7 as operands. These represent the RISC-V argument registers and can be set from the command line using -aX val flags.
Scanner and Parser¶
No changes are needed to the scanner or parser from Project01. Register names a0–a7 are already recognized as ident tokens by the scanner and parsed as ordinary variable references (ParseNode::Var) by the parser. The operand rule is unchanged:
Evaluator Changes¶
The evaluator resolves a0–a7 through normal variable lookup. Before evaluating an expression, eval_expression() (and eval_program() for file mode) pre-populates the env HashMap with register values from config.regs:
let mut env: HashMap<String, u32> = HashMap::new();
for i in 0..8 {
env.insert(format!("a{}", i), config.regs[i]);
}
When the evaluator encounters a Var node named a0–a7, it finds the value in env just like any other variable. This is the only evaluator change needed for register support.
Command Line¶
Where X is 0–7 and val is an integer value to assign to that register.
Examples¶
$ ./riscv-run ntlang -e "(a0 * a1) + a2" -a0 3 -a1 4 -a2 5
17
$ ./riscv-run ntlang -e "a0 + a1" -a0 10 -a1 20
30
Section 3: NTLang Compile Mode¶
Overview¶
Compile mode generates a RISC-V executable from an NTLang expression. Instead of evaluating the expression, the compiler walks the parse tree and emits RISC-V assembly that computes the expression using a stack machine approach.
Command Line Options¶
| Option | Description |
|---|---|
-c <name> |
Compile expression to a RISC-V executable named <name> |
-c <name> -s |
Generate assembly only (produces <name>.s) |
How It Works¶
- The expression is scanned and parsed as usual.
- Instead of calling
eval, the compiler callscompile::compile_tree()which walks the parse tree and emits RISC-V assembly. - The generated assembly is combined with
codegen_main.s(a provided preamble) to form a complete program. - The preamble handles: parsing command-line arguments into
a0–a7, calling the generated expression function, and printing the result. - With
-c <name>, the compiler invokesgccto assemble and link the.sfile into an executable, then removes the.sfile. - With
-c <name> -s, only the.sfile is produced.
The Preamble: codegen_main.s¶
To generate a complete working assembly program the compiler will emit the assembly code in two parts: a preamble and the generated expression in assembly. One of you jobs for this project is to convert the given C version of the preamble (c/codegen_main.c) into assembly (asm/codegen_main.s). This will give you practice in a slight more complicated assembly function that makes function calls to C library functions. The preamble does the following:
- Parses command-line arguments (converting strings to integers via
atoi) - Stores argument values into registers
a0–a7 - Calls the generated expression function
- Prints the return value in decimal and hex:
%d (0x%X) - Returns
Your compiler's compile_tree() function uses include_str!("../asm/codegen_main.s") to embed the preamble into the generated .s file, replacing the placeholder function name with the actual name from -c.
Stack Machine Code Generation¶
The code generator walks the parse tree depth-first and emits assembly instructions that use the stack to store intermediate values:
Push a constant (integer literal):
addi sp, sp, -4 # make room on stack
li t0, <val> # load immediate value
sw t0, (sp) # store to top of stack
Push a register (e.g., a0):
Unary operator (e.g., negation):
lw t0, (sp) # load operand from top of stack
<unary_op> t0, t0 # apply operation
sw t0, (sp) # store result back
Binary operator (e.g., add):
lw t1, (sp) # load right operand
addi sp, sp, 4 # pop right operand
lw t0, (sp) # load left operand
<binary_op> t0, t0, t1 # apply operation
sw t0, (sp) # store result (replaces left operand)
The function epilogue pops the final result into a0 and returns:
Operator Instructions¶
| NTLang Operator | RISC-V Instruction | Description |
|---|---|---|
+ |
add t0, t0, t1 |
Addition |
- |
sub t0, t0, t1 |
Subtraction |
* |
mul t0, t0, t1 |
Multiplication |
/ |
div t0, t0, t1 |
Signed division |
& |
and t0, t0, t1 |
Bitwise AND |
\| |
or t0, t0, t1 |
Bitwise OR |
^ |
xor t0, t0, t1 |
Bitwise XOR |
<< |
sll t0, t0, t1 |
Shift left logical |
>> |
srl t0, t0, t1 |
Shift right logical |
>- |
sra t0, t0, t1 |
Shift right arithmetic |
- (unary) |
sub t0, zero, t0 |
Negation |
~ (unary) |
xor t0, t1, t0 with li t1, -1 |
Bitwise NOT |
The compile.rs Module¶
Your compile.rs module implements two functions:
compile_tree_expr(node, output)— Recursively walks the parse tree and appends assembly instructions to the output string. ForVarnodes, it checks if the name matches a register pattern (a0–a7) and emitssw aX, (sp)accordingly, producing an error for unsupported variable names.compile_tree(node, name)— Entry point that includes the preamble, emits the function label, callscompile_tree_expr, and appends the epilogue.
The find_gcc() Function¶
When linking the generated assembly into an executable, ntlang needs to find the right C compiler:
fn find_gcc() -> String {
if Command::new("riscv64-linux-gnu-gcc").arg("--version")
.stdout(process::Stdio::null()).stderr(process::Stdio::null())
.status().is_ok()
{
"riscv64-linux-gnu-gcc".to_string()
} else {
"gcc".to_string()
}
}
Inside the Docker container, riscv64-linux-gnu-gcc is available. On a native RISC-V machine, plain gcc works.
Example: Compiling (a0 + a1) * a2¶
Generate the executable:
Run it with arguments (the preamble maps command-line args to a0, a1, a2, ...):
Generated assembly (with -s):
# Push a0
addi sp, sp, -4
sw a0, (sp)
# Push a1
addi sp, sp, -4
sw a1, (sp)
# Binary add
lw t1, (sp)
addi sp, sp, 4
lw t0, (sp)
add t0, t0, t1
sw t0, (sp)
# Push a2
addi sp, sp, -4
sw a2, (sp)
# Binary mul
lw t1, (sp)
addi sp, sp, 4
lw t0, (sp)
mul t0, t0, t1
sw t0, (sp)
# Pop result into a0 and return
lw a0, (sp)
addi sp, sp, 4
ret
Test Examples¶
# Compile with literal values
$ ./riscv-run ntlang -e '(3 * 4) + 5' -c p1
$ ./riscv-run ./p1
17 (0x11)
# Compile with register parameters
$ ./riscv-run ntlang -e '(a0 * a1) + a2' -c p2
$ ./riscv-run ./p2 3 4 5
17 (0x11)
# Complex expression
$ ./riscv-run ntlang -e "(((((~((-(2 * ((1023 + 1) / 4)) >- 2) << 8)) >> 10) ^ 0b01110) & 0x1E) | ~(0b10000))" -c p3
$ ./riscv-run ./p3
-1 (0xFFFFFFFF)
Section 4: Build System¶
Project Structure¶
project02/
├── Cargo.toml
├── build.rs
├── Makefile
├── Dockerfile
├── riscv-run
├── .cargo/
│ └── config.toml
├── asm/
│ ├── codegen_main.s
│ ├── rstr_s.s
│ ├── rstr_rec_s.s
│ ├── pack_bytes_s.s
│ ├── unpack_bytes_s.s
│ ├── get_bitseq_s.s
│ └── get_bitseq_signed_s.s
├── c/
│ └── codegen_main.c
└── src/
├── lib.rs
├── scan.rs
├── parse.rs
├── eval.rs
├── conv.rs
├── config.rs
├── compile.rs
└── bin/
├── ntlang.rs
├── rstr.rs
├── rstr_rec.rs
├── pack_bytes.rs
├── unpack_bytes.rs
├── get_bitseq.rs
└── get_bitseq_signed.rs
Cargo.toml¶
Your Cargo.toml defines seven binary targets and uses the cc crate for building assembly and C files:
[package]
name = "project02"
version = "0.1.0"
edition = "2024"
[lib]
name = "project02"
path = "src/lib.rs"
[build-dependencies]
cc = "1"
[[bin]]
name = "get_bitseq"
path = "src/bin/get_bitseq.rs"
[[bin]]
name = "get_bitseq_signed"
path = "src/bin/get_bitseq_signed.rs"
[[bin]]
name = "pack_bytes"
path = "src/bin/pack_bytes.rs"
[[bin]]
name = "unpack_bytes"
path = "src/bin/unpack_bytes.rs"
[[bin]]
name = "rstr"
path = "src/bin/rstr.rs"
[[bin]]
name = "rstr_rec"
path = "src/bin/rstr_rec.rs"
[[bin]]
name = "ntlang"
path = "src/bin/ntlang.rs"
build.rs¶
The build.rs script conditionally compiles assembly and C files only when targeting RISC-V:
fn main() {
let target = std::env::var("TARGET").unwrap_or_default();
if target.contains("riscv") {
cc::Build::new()
.file("asm/get_bitseq_s.s")
.file("asm/get_bitseq_signed_s.s")
.file("asm/pack_bytes_s.s")
.file("asm/unpack_bytes_s.s")
.file("asm/rstr_s.s")
.file("asm/rstr_rec_s.s")
.compile("asm_functions");
cc::Build::new()
.file("c/get_bitseq_c.c")
.file("c/get_bitseq_signed_c.c")
.file("c/pack_bytes_c.c")
.file("c/unpack_bytes_c.c")
.file("c/rstr_c.c")
.file("c/rstr_rec_c.c")
.compile("c_functions");
println!("cargo:rustc-link-arg-bins=-lasm_functions");
println!("cargo:rustc-link-arg-bins=-lc_functions");
}
println!("cargo:rerun-if-changed=asm/");
println!("cargo:rerun-if-changed=c/");
}
The target.contains("riscv") check allows the Rust library code (scanner, parser, etc.) to compile on your host machine for development, while assembly and C files are only compiled when cross-compiling for RISC-V.
.cargo/config.toml¶
Configure the RISC-V cross-compilation toolchain:
[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc"
runner = "qemu-riscv64 -L /usr/riscv64-linux-gnu"
This tells Cargo to use the cross-compiler as the linker and QEMU as the runner when targeting RISC-V.
Makefile¶
The Makefile auto-detects whether you're on a RISC-V machine or need Docker:
UNAME_M := $(shell uname -m)
BINS := get_bitseq get_bitseq_signed pack_bytes unpack_bytes rstr rstr_rec ntlang
IMAGE_NAME := project02-riscv
ifeq ($(UNAME_M),riscv64)
CARGO_CMD = cargo
CARGO_FLAGS =
else
CARGO_CMD = docker run --rm -v $(CURDIR):/project $(IMAGE_NAME) cargo
CARGO_FLAGS = --target riscv64gc-unknown-linux-gnu
endif
build: docker-image-ensure
$(CARGO_CMD) build $(CARGO_FLAGS)
run-%: docker-image-ensure
$(CARGO_CMD) run $(CARGO_FLAGS) --bin $* -- $(ARGS)
clean:
cargo clean
docker-build:
docker build -t $(IMAGE_NAME) .
docker-shell:
docker run --rm -it -v $(CURDIR):/project $(IMAGE_NAME) bash
Key targets:
make build— build all binariesmake run-rstr ARGS="FooBar"— run a specific binary with argumentsmake docker-build— rebuild the Docker imagemake docker-shell— open a shell inside the Docker container
Dockerfile¶
FROM ubuntu:24.04
RUN apt-get update && apt-get install -y \
gcc-riscv64-linux-gnu qemu-user curl build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
ENV PATH="/root/.cargo/bin:${PATH}"
RUN rustup target add riscv64gc-unknown-linux-gnu
WORKDIR /project
The Docker image provides: the RISC-V cross-compiler (riscv64-linux-gnu-gcc), QEMU for running RISC-V binaries, and Rust with the RISC-V target.
riscv-run Script¶
The riscv-run script is a convenience wrapper that handles Docker transparently:
#!/usr/bin/env bash
set -euo pipefail
BIN="${1:?usage: riscv-run <binary|executable> [args...]}"
shift
PROJECT_DIR="$(cd "$(dirname "$0")" && pwd)"
IMAGE="project02-riscv"
ensure_image() {
if ! docker image inspect "$IMAGE" &>/dev/null; then
echo "Building Docker image '$IMAGE'..." >&2
docker build -t "$IMAGE" "$PROJECT_DIR"
fi
}
if [ -f "$BIN" ]; then
# Direct executable (e.g. ./prog compiled by ntlang -c)
if [ "$(uname -m)" = "riscv64" ]; then
exec "$BIN" "$@"
else
ensure_image
exec docker run --rm -v "$PROJECT_DIR":/project -w /project "$IMAGE" \
qemu-riscv64 -L /usr/riscv64-linux-gnu "$BIN" "$@"
fi
else
# Cargo binary
if [ "$(uname -m)" = "riscv64" ]; then
exec cargo run --quiet --manifest-path "$PROJECT_DIR/Cargo.toml" --bin "$BIN" -- "$@"
else
ensure_image
exec docker run --rm -v "$PROJECT_DIR":/project \
-e CC_riscv64gc_unknown_linux_gnu=riscv64-linux-gnu-gcc \
"$IMAGE" \
cargo run --quiet --target riscv64gc-unknown-linux-gnu --bin "$BIN" -- "$@"
fi
fi
Usage:
# Run a Cargo binary (builds if needed)
$ ./riscv-run rstr FooBar
# Run a compiled executable
$ ./riscv-run ./myprog 3 4 5
On a native RISC-V machine, riscv-run runs the binary directly. On other platforms, it uses Docker with QEMU.
Running Natively on RISC-V¶
If you are on a RISC-V machine (e.g., the class server), you can build and run directly with Cargo:
$ cargo build
$ cargo run --bin rstr -- FooBar
$ cargo run --bin ntlang -- -e "(a0 + a1) * a2" -a0 3 -a1 4 -a2 5
Cross-Compilation with Docker¶
On non-RISC-V hosts (macOS, x86 Linux), use the riscv-run script or the Makefile, which handle Docker automatically. Make sure you have Docker installed and running (Docker Desktop, Colima, or OrbStack all work).
The first build takes a few minutes to create the Docker image. Subsequent builds are fast because Docker caches the image layers.
Grading¶
Tests: https://github.com/USF-CS631-S26/tests
Grading is based on automated tests (100 points total):
| Tests | Points | Description |
|---|---|---|
| rstr_1, rstr_2 | 10 | String reverse (iterative) (5 pts each) |
| rstr_rec_1, rstr_rec_2 | 10 | String reverse (recursive) (5 pts each) |
| pack_bytes_1, pack_bytes_2 | 10 | Byte packing (5 pts each) |
| unpack_bytes_1, unpack_bytes_2 | 10 | Byte unpacking (5 pts each) |
| get_bitseq_1, get_bitseq_2 | 2 | Unsigned bit extraction (1 pt each) |
| get_bitseq_signed_1, get_bitseq_signed_2 | 8 | Signed bit extraction (4 pts each) |
| ntlang_args | 10 | NTLang register parameter eval |
| ntlang_comp_p1 | 10 | Compile mode with literals |
| ntlang_comp_p2 | 10 | Compile mode with register params |
| ntlang_comp_p3 | 10 | Compile mode with complex expression |
| ntlang_comp_p4 | 10 | Compile mode with complex expression and args |
| Total | 100 |
Code Quality¶
Code quality deductions may be applied and can be earned back. We are looking for:
- Consistent spacing and indentation
- Consistent naming and commenting
- No commented-out ("dead") code
- No redundant or overly complicated code
- A clean repo, that is no build products, extra files, etc.