Project06 - Octox Kernel Tracking and the `track` Command¶

Due Thu May 21nd by 11:59pm in your Project06 GitHub repo

Links¶

Tests: https://github.com/USF-CS631-S26/tests

Background¶

In Project 05 you wrote user programs that call into the Octox kernel through the existing syscall interface. In Project 06 you cross to the other side of that interface: you will extend the kernel itself with a small per-process tracking facility, expose two new syscalls, and write a user program named track that uses them to fork+exec a target command and print a report of what that target did — its syscalls, byte counts, memory map, and page tables.

The work touches almost every layer of the kernel you have seen this semester: the syscall dispatcher (UNIX System Calls lecture), process state and exec (OS Kernel lecture), and page tables (Page Tables lecture).

Setup¶

Clone the Octox repo into your Project06 GitHub repo. Build it once to confirm your toolchain works:

$ cd octox
$ cargo build --target riscv64gc-unknown-none-elf
$ cargo run --target riscv64gc-unknown-none-elf

You should land at the $ shell prompt. Ctrl-A x exits QEMU.

Requirements¶

Add a TrackMetrics struct in src/kernel/syscall.rs. This is the ABI shared between kernel and user — the autograder depends on the exact field layout shown in Section 2.
Add per-process tracking fields to ProcData in src/kernel/proc.rs.
Add two new syscalls — track_self and track_wait — wired through the syscall enum, dispatch table, and from_usize. Numbers 24 and 25.
Instrument the syscall dispatcher, read, write, and exec so that tracked processes accumulate counters and untracked processes pay only a single boolean check.
Write src/user/bin/track.rs with a matching [[bin]] entry in src/user/Cargo.toml. The binary lands at /bin/track.
All output must match the autograder spec (spacing on each line, blank lines, and page counts all included).
Use only sys::* and ulib — do not pull in additional crates.

Section 1: The `track` user program¶

track runs a target command in a child process and prints a per-process metrics report after the child exits.

Usage:

$ track <cmd> [args...]

With no arguments, print exactly:

usage: track <cmd> [args...]

…and exit 1.

For a command without a / in it, resolve it to /bin/<cmd> before passing it to sys::exec. (Octox's mkfs strips the leading _, so _track lands at /bin/track, and likewise for the command you exec.)

The control flow is the same fork/exec/wait pattern you saw in src/user/bin/ex_exec.rs, with two substitutions:

The child calls sys::track_self() before sys::exec(...). This sets the tracked flag on the calling process so the kernel knows to start counting.
The parent replaces sys::wait(...) with sys::track_wait(&mut xstatus, &mut metrics). This both reaps the zombie child and fills in a TrackMetrics struct with the counters the kernel accumulated.

Worked example 1: track echo hello

$ track echo hello
hello
=== track report ===
PID: 3
Name: echo
Exit: 0

System Calls (total: 4)
  exit: 1
  write: 3 bytes: 7

Memory (total: 118784 bytes)
  TEXT: 2 pages
  DATA: 2 pages
  HEAP: 0 pages
  STACK: 25 pages

Page Tables (5 pages, 20480 bytes)
  L2: 1
  L1: 2
  L0: 2

Worked example 2: track ls

$ track ls
dev            Dir      2 64
bin            Dir      3 432
lib            Dir      4 32
etc            Dir      5 48
README.org     File     6 5260
init           File    12 171576
initcode       File    21 91600
=== track report ===
PID: 3
Name: ls
Exit: 0

System Calls (total: 241)
  exit: 1
  read: 65 bytes: 1024
  fstat: 8
  sbrk: 1
  open: 8
  write: 150 bytes: 213
  close: 8

Memory (total: 196608 bytes)
  TEXT: 4 pages
  DATA: 3 pages
  HEAP: 16 pages
  STACK: 25 pages

Page Tables (5 pages, 20480 bytes)
  L2: 1
  L1: 2
  L0: 2

Worked example 3: head -c 4194304 README.org > /big, then track grep zzzzzz /big. This exercises the page-table walker — because the heap grows past 2 MB, the user page table needs more than one L0 page to cover it, and L0 rises from 2 (the value in the smaller programs above) to 6. The rest of the report shape is identical.

$ head -c 4194304 README.org > /big
$ track grep zzzzzz /big
=== track report ===
PID: 4
Name: grep
Exit: 0

System Calls (total: 131084)
  exit: 1
  read: 131073 bytes: 4194304
  sbrk: 8
  open: 1
  close: 1

Memory (total: 8527872 bytes)
  TEXT: 4 pages
  DATA: 4 pages
  HEAP: 2049 pages
  STACK: 25 pages

Page Tables (9 pages, 36864 bytes)
  L2: 1
  L1: 2
  L0: 6
$ rm /big

Print format. The autograder is byte-exact per line (it strips leading/trailing whitespace on each line before comparing, but does not collapse interior runs of spaces). Use these format strings exactly:

Header: "=== track report ===".
Identity block (three separate lines), then a blank line:
- "PID: {}"
- "Name: {}"
- "Exit: {}"
Syscall section header: "System Calls (total: {})".
Syscall rows (skip entries with count 0):
- normal: " {}: {}"
- read / write: " {}: {} bytes: {}"
Memory section: "Memory (total: {} bytes)" then four lines: " TEXT: {} pages", " DATA: {} pages", " HEAP: {} pages", " STACK: {} pages".
Page tables: "Page Tables ({} pages, {} bytes)" then " L2: {}", " L1: {}", " L0: {}".
One blank line between major sections (matches the examples above).
Every detail line uses a single space between every adjacent token. No format-string width specifiers ({:>N} / {:<N}) anywhere.

You will need a SYSCALL_NAMES table in track.rs indexed by the same numbering as the kernel's SysCalls enum. Keep the two in sync — if you add or rename a syscall in the kernel, update the table.

Section 2: The shared `TrackMetrics` struct (the ABI)¶

Add this struct near the top of src/kernel/syscall.rs, before the SysCalls enum. The exact field names, types, and order are part of the ABI — student kernels and the autograder must agree.

pub const NUM_SYSCALLS: usize = 24;
pub const TRACK_NAME_LEN: usize = 16;

#[repr(C)]
#[derive(Debug, Clone, Copy)]
pub struct TrackMetrics {
    pub pid: usize,
    pub name: [u8; TRACK_NAME_LEN],
    pub total_syscalls: usize,
    pub syscall_counts: [usize; NUM_SYSCALLS],
    pub bytes_read: usize,
    pub bytes_written: usize,
    pub mem_bytes: usize,
    pub text_pages: usize,
    pub data_pages: usize,
    pub heap_pages: usize,
    pub stack_pages: usize,
    pub pt_l2_pages: usize,
    pub pt_l1_pages: usize,
    pub pt_l0_pages: usize,
    pub pt_total_bytes: usize,
}

Also provide impl Default for TrackMetrics returning the all-zero value (the user program builds an empty one and hands it to track_wait as the out-param), and mark the struct as kernel-copyable so Uvm::copyout will accept it:

#[cfg(all(target_os = "none", feature = "kernel"))]
unsafe impl crate::defs::AsBytes for TrackMetrics {}

User-side, TrackMetrics, NUM_SYSCALLS, and TRACK_NAME_LEN come through the same pub use chain that already exposes Result, FcntlCmd, etc., so you do not need to write a manual user stub — adding the syscalls to the SysCalls enum is enough for the build script to regenerate usys.rs.

Section 3: Per-process tracking state¶

Extend ProcData in src/kernel/proc.rs with the fields the syscall handlers will update. You need:

tracked: bool — the fast-path gate; checked on every syscall.
stack_top: usize — the value of sz at the end of exec; used later to classify pages as STACK vs HEAP. Recorded once, then stable until the proc dies.
track_total_syscalls: usize
track_counts: [usize; NUM_SYSCALLS]
track_bytes_read: usize
track_bytes_written: usize

Three rules to honor:

Zero-initialize all fields in ProcData::new().
Reset them in Proc::free() alongside data.sz = 0. This matters because Proc slots are recycled — a stale tracked = true left over from a previous process would silently double-count.
fork() must not propagate tracked to the child. The parent of the tracked process is your track program itself, which should not be tracked. The defaults in Proc::new() already handle this; just make sure you do not add an explicit copy.

Section 4: Kernel instrumentation points¶

Four locations. In each, the work is small — a counter bump or a stash — but gated on pdata.tracked so untracked processes pay only one boolean check.

Syscall counter — src/kernel/syscall.rs, inside the syscall() dispatcher, right after syscall_id is decoded and before the table dispatch. If the current proc is tracked, bump track_counts[syscall_id] and track_total_syscalls.
read byte counter — src/kernel/syscall.rs, inside the read() handler. Capture the byte count returned by the underlying file read, and if the proc is tracked, add it to track_bytes_read before returning.
write byte counter — same file, inside write(). Mirror the read() change against track_bytes_written.
Exec reset and stack_top capture — src/kernel/exec.rs, inside the "commit to the new user image" block (after proc_data.uvm.replace(...) and proc_data.sz = sz). Two things happen here:
Always record proc_data.stack_top = sz. This is true for every exec, tracked or not — it costs you nothing and means the field is valid whenever you need it later.
If proc_data.tracked, zero the four syscall counters (track_counts, track_total_syscalls, track_bytes_read, track_bytes_written). This is what makes the report reflect the target binary's lifetime rather than the track program's argv-parsing-and-fork+exec stub.

Use saturating_add for the byte counters — usize is wide enough that you will not actually overflow in this course, but a long-running tracked process eventually would.

Section 5: The two new syscalls¶

Add the new variants to SysCalls in src/kernel/syscall.rs:

TrackSelf = 24,
TrackWait = 25,

…and matching entries in the dispatch TABLE, the from_usize match, and the gen_usys() generator (read the surrounding code — the pattern is mechanical).

track_self(). Mark the calling process tracked and clear its counters. There is no userspace argument; you just look up the current proc with Cpus::myproc().unwrap(), flip tracked, and zero the four counter fields. Returns Ok(()).

track_wait(xstatus_addr, metrics_addr). This is a near-copy of the existing wait() (in src/kernel/proc.rs). The shape is identical: scan the proc table for an exited child of the calling process, block on the parent's wait channel if none have exited yet, and once you find a zombie child, copy its exit status out to userspace and free the slot.

The one extra step happens just before c.free(c_guard): build a TrackMetrics struct from the zombie child's ProcData and use Uvm::copyout to write it into the parent's metrics_addr buffer. The fields you read from ProcData are direct copies; the only fields that require new computation are the memory / page-table fields described in the next section. Note that the child is in ZOMBIE state and not running, so it is safe to read its ProcData and walk its page table directly.

Order matters: build and copy out the metrics before calling c.free(), because free() resets data.sz, data.stack_top, and the page table you need to walk.

Section 6: Memory and page-table classification¶

The four memory buckets (TEXT, DATA, HEAP, STACK) are not stored explicitly in ProcData — you derive them by walking the child's user address space one virtual page at a time and classifying each mapped page by its permissions and where it sits relative to stack_top. This is the part of the project where the OS Kernel and Page Tables lectures are most useful.

The layout (set up by exec in src/kernel/exec.rs):

0                                                            sz
+---------+---------+- - -+------+--------+------+- - -+----+
|  TEXT   |  DATA   |     | guard | STACK | HEAP |     |    |
+---------+---------+- - -+------+--------+------+- - -+----+
                                          ^
                                          stack_top (= sz at end of exec)

Walk the address space from va = 0 to va = sz in PGSIZE steps, calling page_table.walk(va, false) on each. For each mapped leaf:

TEXT = flags & PTE_X != 0 (executable).
STACK = stack_top - STACK_PAGE_NUM * PGSIZE <= va < stack_top (STACK_PAGE_NUM = 25, see src/kernel/exec.rs).
HEAP = va >= stack_top (sbrk grows sz upward, above the original stack region).
DATA = remaining user-readable pages below the stack.

The guard page exec installs at the bottom of the stack lacks PTE_U and is naturally skipped — walk will report it but your PTE_U filter will discard it.

Once classified, fill in text_pages, data_pages, heap_pages, stack_pages directly, and set mem_bytes = (text + data + heap + stack) * PGSIZE.

Page-table page counts. Add a small read-only method on PageTable<V> in src/kernel/vm.rs that returns (l2, l1, l0) — the number of page-table pages allocated at each Sv39 level under this root. l2 is always 1 (the root). For l1, count the valid non-leaf entries in the root. For l0, count the valid non-leaf entries in each of those L1 tables. Then fill pt_l2_pages, pt_l1_pages, pt_l0_pages, and pt_total_bytes = (l2 + l1 + l0) * PGSIZE.

Building and Running¶

# build kernel + user programs
$ cargo build --target riscv64gc-unknown-none-elf

# boot in QEMU
$ cargo run --target riscv64gc-unknown-none-elf

# exit QEMU
Ctrl-A x

To run a single command end-to-end (the way the autograder tests it), use runoctox.py:

$ python3 runoctox.py "track echo hello"
$ python3 runoctox.py "echo hi > /a" "track cat /a" "rm /a"
$ python3 runoctox.py "head -c 4194304 README.org > /big" "track grep zzzzzz /big" "rm /big"

Tips and Pitfalls¶

exec does not return on success. Always follow sys::exec(...) in the child with sys::exit(1) so the failure path is defined.
Counters reset on exec, not on track_self. If you reset in track_self and not in exec, your counts will include syscalls made between track_self and exec (which is just the exec itself, but still wrong) and your report will not match the autograder.
Build the report before free(). Proc::free() zeros sz, stack_top, and the page table — exactly the state you need to walk.
Close the syscall fast-path. The tracked check inside syscall() runs on every system call from every process; keep it a single boolean test and an if-guarded counter bump, nothing more.
Whitespace tolerance. The autograder strips leading/trailing whitespace on each line before comparing, so the 2-space indent on detail rows can be any width (or absent) without affecting grading. Interior spaces on a line still have to match, which is why every detail line in the spec uses a single space between tokens.
Keep SYSCALL_NAMES in sync. Index 0 is the invalid slot; index 1 is fork; etc. The user-side table in track.rs and the kernel's SysCalls enum must agree, or your report will print the wrong name for the right count.
STACK_PAGE_NUM = 25. Your STACK page count in track echo hello must be exactly 25. If it is not, your stack/heap boundary is off.

Grading¶