UNIX System Calls¶

A tour of the core UNIX system-call interface — fork, exec, wait, open, read, write, close, dup2, pipe — through nine small teaching programs in the Octox kernel (a Rust port of xv6). By the end of this lecture you should be able to read, modify, and write Octox user programs of your own.

Overview¶

Everything that a program does outside its own address space goes through a system call: opening files, reading bytes, spawning processes, sending data to another process. UNIX made two unusual design choices that have shaped every major operating system since:

Files, pipes, sockets, and devices are all "file descriptors." You read and write them with the same two syscalls (read, write).
Creating a new process is split into two steps. fork duplicates the caller, then exec replaces the program running in the duplicate. This seam is what lets a shell wire up redirection and pipelines without cooperation from the programs it launches.

The nine example programs in src/user/bin/ex_*.rs each isolate one idea. We'll walk them in order.

Learning Objectives¶

After this lecture you should be able to:

Explain what a system call is, and what happens on a RISC-V ecall trap.
List the core UNIX process and file syscalls and what each one does.
Trace fork/wait/exit and predict what each process sees.
Describe why exec does not return on success, and what survives across it.
Use dup2 to rewire a file descriptor — and show how a shell uses this to implement > redirection.
Set up a pipe between two child processes and explain why everyone must close the unused end.

Prerequisites¶

Advanced Architecture — the hardware story the kernel is sitting on top of.
Octox Guide — build, run, and user-program layout.
RISC-V assembly lectures — in particular the ecall instruction.

What Is a System Call?¶

A system call is a function whose body runs in the kernel's address space instead of yours. The hardware forces a protection transition. On RISC-V that transition is the ecall instruction: it saves the user PC, switches the CPU to supervisor mode, and jumps to a fixed trap vector.

flowchart LR
    U["User program<br/>fn main()"] --> W["sys::fork wrapper<br/>(ulib, user mode)"]
    W --> E["ecall<br/>trap"]
    E --> K["kernel trap<br/>handler"]
    K --> S["fork impl<br/>(kernel)"]
    S --> R["sret<br/>return"]
    R --> W
    W --> U

The RISC-V syscall ABI Octox uses:

Register	Holds
`a7`	syscall number
`a0`..`a5`	arguments
`a0`	return value after `sret` — `Ok` or negative error code

The ulib user library hides this: you call sys::fork(), it emits the ecall, the kernel runs, you get back a Result<usize>.

The Octox Syscall Surface¶

Every program in this lecture uses only these wrappers from ulib::sys (generated from src/kernel/syscall.rs into usys.rs):

pub fn fork() -> Result<usize>                  // child: 0; parent: child pid
pub fn exit(xstatus: i32) -> !
pub fn wait(xstatus: &mut i32) -> Result<usize> // returns child pid
pub fn pipe(p: &mut [usize]) -> Result<()>      // p[0]=read fd, p[1]=write fd
pub fn read(fd: usize, buf: &mut [u8]) -> Result<usize>
pub fn write(fd: usize, b: &[u8]) -> Result<usize>
pub fn exec(filename: &str, argv: &[&str], envp: Option<&[Option<&str>]>)
    -> Result<usize>                            // does not return on success
pub fn open(filename: &str, flags: usize) -> Result<usize>
pub fn close(fd: usize) -> Result<()>
pub fn dup(fd: usize) -> Result<usize>
pub fn dup2(src: usize, dst: usize) -> Result<usize>
pub fn getpid() -> Result<usize>
pub fn sleep(n: usize) -> Result<()>

The print! and println! macros in ulib are not syscalls — they are thin wrappers that ultimately call sys::write(STDOUT_FILENO, ...). When we want to teach a syscall, we call it directly.

File Descriptors and Open Modes¶

A file descriptor is a small non-negative integer that indexes into a per-process table the kernel maintains. When you open a file you get back a fresh fd; when you fork, the child inherits a copy of that whole table; when you exec, the fd table survives (this is the key to how redirection works).

Three fds are already open when your program starts:

fd	Constant	Normally
0	`STDIN_FILENO`	keyboard / pipe input
1	`STDOUT_FILENO`	terminal / pipe output
2	`STDERR_FILENO`	terminal

open takes a flag mask built from ulib::sys::fcntl::omode:

Flag	Value	Meaning
`RDONLY`	`0x000`	read only
`WRONLY`	`0x001`	write only
`RDWR`	`0x002`	read and write
`CREATE`	`0x200`	create file if it does not exist
`TRUNC`	`0x400`	shrink file to 0 bytes on open
`APPEND`	`0x800`	position writes at end of file

The canonical "open for output" combination is WRONLY | CREATE | TRUNC — create it if missing, wipe it if present.

1. `ex_args` — Command-Line Arguments¶

When the kernel executes a program, it places the argv array on the new stack. The lang_start wrapper in ulib publishes it through env::args(). argv[0] is the program name; argv[1..] are what the user typed.

#![no_std]
use ulib::{env, print, println};

fn main() {
    // Iterate every entry (including argv[0]) and echo it on its own line.
    for arg in env::args() {
        println!("{}", arg);
    }
}

Try it:

$ ex_args hello world
ex_args
hello
world

2. `ex_count` — `open` + `read` Loop + `close`¶

A stripped-down wc -c. The lesson is the read loop: read may return fewer bytes than the buffer holds, and returns Ok(0) at end-of-file, so the caller must loop until it sees that zero.

#![no_std]
use ulib::{env, print, println, sys, sys::fcntl::omode};

fn main() {
    let mut args = env::args().skip(1);
    let path = args.next().expect("usage: ex_count FILE");

    // open() returns a small integer file descriptor. RDONLY means we
    // only plan to read from it.
    let fd = sys::open(path, omode::RDONLY).expect("open");

    // read() is allowed to return fewer bytes than the buffer holds,
    // and returns Ok(0) at end-of-file — so the caller must loop.
    let mut buf = [0u8; 512];
    let mut count: usize = 0;
    loop {
        let n = sys::read(fd, &mut buf).expect("read");
        if n == 0 {
            break; // EOF
        }
        count += n;
    }

    // Always release the descriptor when done.
    sys::close(fd).expect("close");

    println!("{}: {} bytes", path, count);
}

Key idea: "partial reads are normal." A single read call is one trip into the kernel, and the kernel is allowed to give you whatever is conveniently available right now. Loop until EOF.

3. `ex_write` — `open(... CREATE | TRUNC)` + `write`¶

Mirror image of ex_count: open for output, write bytes. write is also allowed to accept fewer bytes than you offered, so we loop.

#![no_std]
use ulib::{env, sys, sys::fcntl::omode};

fn main() {
    let mut args = env::args().skip(1);
    let path = args.next().expect("usage: ex_write FILE TEXT");
    let text = args.next().expect("usage: ex_write FILE TEXT");

    // Flag semantics:
    //   WRONLY  — open for writing only
    //   CREATE  — create the file if it does not exist
    //   TRUNC   — if it already exists, shrink it back to 0 bytes
    let fd = sys::open(path, omode::WRONLY | omode::CREATE | omode::TRUNC)
        .expect("open");

    // write() is permitted to accept fewer bytes than we offered, so
    // loop until every byte of TEXT has been delivered.
    let bytes = text.as_bytes();
    let mut off = 0;
    while off < bytes.len() {
        let n = sys::write(fd, &bytes[off..]).expect("write");
        off += n;
    }

    sys::close(fd).expect("close");
}

Try it:

$ ex_write /tmp/x hello
$ ex_count /tmp/x
/tmp/x: 5 bytes
$ cat /tmp/x
hello

4. `ex_fork` — Two Processes from One¶

fork is the most famous syscall in UNIX and the one that confuses students the hardest. A single call returns twice: once in the parent with the child's pid, once in the child with 0. After fork, parent and child are two independent processes with separate copies of every variable.

#![no_std]
use ulib::{print, println, sys};

fn main() {
    // Both parent and child will see x == 100 immediately after fork.
    // The child then bumps its own copy; the parent's copy is untouched.
    let mut x: i32 = 100;
    println!("before fork: x={}", x);

    // fork() returns:
    //   Ok(0)         in the child
    //   Ok(child_pid) in the parent
    match sys::fork().expect("fork") {
        0 => {
            // --- child ---
            x += 1;
            println!("child : pid={} x={}", sys::getpid().unwrap(), x);
            // Exit explicitly so the child never falls through to the
            // parent branch below.
            sys::exit(0);
        }
        child_pid => {
            // --- parent ---
            // wait() blocks until some child exits and writes that
            // child's exit status into the i32 we hand it.
            let mut status: i32 = 0;
            sys::wait(&mut status).expect("wait");
            println!(
                "parent: pid={} x={} (child {} exited with {})",
                sys::getpid().unwrap(), x, child_pid, status
            );
        }
    }
}

Copy-on-write

A real UNIX kernel does not literally duplicate every page of memory at fork. It shares the pages between parent and child read-only, and only copies a page when one side writes to it. From a correctness standpoint you can pretend it was a full copy — the mutations are invisible to the other process either way.

Try it:

$ ex_fork
before fork: x=100
child : pid=6 x=101
parent: pid=5 x=100 (child 6 exited with 0)

Notice the parent's x is still 100.

5. `ex_exec` — Replace the Running Program¶

exec takes a filename plus an argv and replaces the current process's program image with that binary. On success it does not return; the old code is gone, and execution starts at the new program's entry point.

#![no_std]
use ulib::{print, println, sys};

fn main() {
    match sys::fork().expect("fork") {
        0 => {
            // --- child ---
            // The "sleep" command lives at /bin/sleep. sleep takes one
            // argument: seconds to pause.
            let argv = ["sleep", "10"];
            sys::exec("/bin/sleep", &argv, None).expect("exec");

            // Only reachable if exec() failed (should be unreachable).
            sys::exit(1);
        }
        child => {
            // --- parent ---
            println!("parent: launched child pid={}, waiting...", child);
            let mut status: i32 = 0;
            let reaped = sys::wait(&mut status).expect("wait");
            println!("parent: child {} exited with {}", reaped, status);
        }
    }
}

Key idea: fork + exec is the UNIX answer to "run another program." fork gives you a process you own; exec gives it a different program to run. Between the two calls is where a shell does its setup work (redirection, pipes, close-on-exec, setuid, ...). That's the point of splitting the operation.

About the binary names

In Octox every user binary is built as _<name> (see src/user/Cargo.toml) so it does not clash with host-system binaries of the same name during the cross-build. mkfs strips the leading _ when writing the program into fs.img, so on the running system it lives at /bin/<name>. exec requires a full path — there is no PATH lookup in the kernel.

6. `ex_redir` — Rewiring Your Own Stdout¶

Before we can redirect other programs we need one more syscall: dup2. dup2(src, dst) atomically:

closes whatever dst currently refers to (if anything), then
makes dst an alias for the same underlying file as src.

After dup2(fd, 1), writes to fd 1 go wherever fd 3 (or whatever) was pointing. The program doesn't have to know it is "redirected." This example only forks — the child keeps running ex_redir, but with its stdout replaced.

#![no_std]
use ulib::{env, print, println, stdio::STDOUT_FILENO, sys, sys::fcntl::omode};

fn main() {
    let mut args = env::args().skip(1);
    let path = args.next().expect("usage: ex_redir OUTFILE");

    match sys::fork().expect("fork") {
        0 => {
            // --- child ---
            let fd = sys::open(path, omode::WRONLY | omode::CREATE | omode::TRUNC)
                .expect("open");

            // Redirect stdout. After this call, fd 1 refers to the file.
            sys::dup2(fd, STDOUT_FILENO).expect("dup2");

            // fd 3 (or whatever the original was) is now redundant: the
            // file is still referenced by fd 1. Closing it avoids leaking
            // a descriptor across exec or just within this process.
            sys::close(fd).expect("close");

            // println! writes to fd 1 — which is now the file.
            println!("hello from child: my stdout is redirected");
            sys::exit(0);
        }
        _ => {
            // --- parent ---
            let mut status: i32 = 0;
            sys::wait(&mut status).expect("wait");
            // The parent never touched its own fd 1, so this prints to
            // the terminal, not to the file.
            println!("parent: child finished; my stdout is still the terminal");
        }
    }
}

Try it:

$ ex_redir /tmp/r
parent: child finished; my stdout is still the terminal
$ cat /tmp/r
hello from child: my stdout is redirected

7. `ex_redir2` — Redirection Survives `exec`¶

The point of this variant is a single fact: a process's fd table is preserved across exec. So if we set up fd 1 before calling exec, the new program's writes to stdout land in the file. The program being exec'd does not need to know, or cooperate.

This is exactly how cmd > file works in every shell on Earth.

#![no_std]
use ulib::{env, print, println, stdio::STDOUT_FILENO, sys, sys::fcntl::omode};

fn main() {
    let mut args = env::args().skip(1);
    let path = args.next().expect("usage: ex_redir2 OUTFILE");

    match sys::fork().expect("fork") {
        0 => {
            // --- child ---
            // Open the output file and splice it onto stdout.
            let fd = sys::open(path, omode::WRONLY | omode::CREATE | omode::TRUNC)
                .expect("open");
            sys::dup2(fd, STDOUT_FILENO).expect("dup2");
            sys::close(fd).expect("close");

            // Now exec() into /bin/echo. Because fd 1 is inherited, the
            // echo program's output lands in the file rather than on
            // the terminal. This is why redirection "just works" for
            // arbitrary programs — they never need to know they are
            // being redirected.
            let argv = ["echo", "hello", "from", "exec"];
            sys::exec("/bin/echo", &argv, None).expect("exec");
            sys::exit(1); // unreachable unless exec failed
        }
        _ => {
            // --- parent ---
            let mut status: i32 = 0;
            sys::wait(&mut status).expect("wait");
            println!("parent: child exited with {}", status);
        }
    }
}

Recipe for cmd > file:

fork()
if child:
    fd = open(file, WRONLY | CREATE | TRUNC)
    dup2(fd, 1)
    close(fd)
    exec(cmd, argv)
else:
    wait()

8. `ex_pipe` — Talking Through a Pipe¶

A pipe is a one-way in-kernel byte buffer exposed as a pair of file descriptors. sys::pipe(&mut p) fills in p[0] (the read end) and p[1] (the write end). Bytes written to p[1] come back out of p[0].

After fork, both processes hold both ends. By convention each side closes the end it does not use. This is not a stylistic choice:

The EOF rule

read returns Ok(0) (EOF) on a pipe only when every open write end has been closed. If the reader forgets to close its own copy of the write end, it will read its own EOF — never — and deadlock.

#![no_std]
use ulib::{print, println, sys, stdio::STDOUT_FILENO};

fn main() {
    let mut p = [0usize; 2];
    sys::pipe(&mut p).expect("pipe");
    let (read_fd, write_fd) = (p[0], p[1]);

    match sys::fork().expect("fork") {
        0 => {
            // --- child: the writer ---
            // We will not read from the pipe, so close that end.
            sys::close(read_fd).expect("close");

            let msg = b"hello from child\n";
            sys::write(write_fd, msg).expect("write");

            // Closing the write end is what lets the parent's read()
            // return 0 (EOF) once we are done.
            sys::close(write_fd).expect("close");
            sys::exit(0);
        }
        _ => {
            // --- parent: the reader ---
            // Symmetric: we will not write, so close the write end.
            // This is important — if we left it open and then tried to
            // read until EOF, we would block forever because the kernel
            // thinks *we* are still a writer.
            sys::close(write_fd).expect("close");

            let mut buf = [0u8; 64];
            let n = sys::read(read_fd, &mut buf).expect("read");
            sys::close(read_fd).expect("close");

            // Echo exactly what we received to our own stdout.
            sys::write(STDOUT_FILENO, &buf[..n]).expect("write");

            let mut status: i32 = 0;
            sys::wait(&mut status).expect("wait");
            println!("parent: child exited with {}", status);
        }
    }
}

9. `ex_pipe2` — The Shell's Pipeline: `ls | wc`¶

Combine every idea so far. Two children, two execs, one pipe. Each child dup2s one end of the pipe onto its stdin or stdout before exec, so ls and wc run as if they had a normal terminal on the other side.

flowchart LR
    P1["child1<br/>ls"] -- "stdout&nbsp;=&nbsp;write_fd" --> PIPE["pipe<br/>buffer"]
    PIPE -- "read_fd&nbsp;=&nbsp;stdin" --> P2["child2<br/>wc"]
    PARENT["parent<br/>(closes both ends,<br/>waits twice)"] -.-> P1
    PARENT -.-> P2

#![no_std]
use ulib::{
    print, println,
    stdio::{STDIN_FILENO, STDOUT_FILENO},
    sys,
};

fn main() {
    let mut p = [0usize; 2];
    sys::pipe(&mut p).expect("pipe");
    let (read_fd, write_fd) = (p[0], p[1]);

    // --- first child: the producer (`ls`) ---
    match sys::fork().expect("fork") {
        0 => {
            // Redirect stdout to the pipe's write end.
            sys::dup2(write_fd, STDOUT_FILENO).expect("dup2");

            // After dup2, fd 1 already refers to the write end. The
            // original pipe fds are redundant in this process, and we
            // must close the read end because we are not going to read.
            sys::close(read_fd).expect("close");
            sys::close(write_fd).expect("close");

            let argv = ["ls"];
            sys::exec("/bin/ls", &argv, None).expect("exec");
            sys::exit(1);
        }
        _ => {}
    }

    // --- second child: the consumer (`wc`) ---
    match sys::fork().expect("fork") {
        0 => {
            // Redirect stdin to the pipe's read end.
            sys::dup2(read_fd, STDIN_FILENO).expect("dup2");

            // Same cleanup as above, mirror image.
            sys::close(read_fd).expect("close");
            sys::close(write_fd).expect("close");

            let argv = ["wc"];
            sys::exec("/bin/wc", &argv, None).expect("exec");
            sys::exit(1);
        }
        _ => {}
    }

    // --- parent ---
    // Close BOTH pipe ends here. If we did not, the kernel would still
    // count us as a writer, and wc would block on read() forever
    // waiting for an EOF that never comes.
    sys::close(read_fd).expect("close");
    sys::close(write_fd).expect("close");

    // Reap both children. wait() returns whichever finishes first.
    let mut status: i32 = 0;
    sys::wait(&mut status).expect("wait 1");
    sys::wait(&mut status).expect("wait 2");
    println!("parent: both children finished");
}

The classic deadlock

The parent calls pipe before any of the children exist, so the parent holds both ends too. If the parent forgets to close them, wc will hang forever on read waiting for an EOF that never comes — because the kernel still sees the parent as a live writer.

Building a Shell from Seven Syscalls¶

Every feature a minimalist shell provides reduces to a small fixed recipe over these syscalls:

Shell feature	Syscall recipe
Run a command	`fork` → child `exec`; parent `wait`
Run in background	`fork` → child `exec`; parent does not `wait`
`cmd > file`	`fork` → `open` + `dup2(fd,1)` + `close` → `exec`
`cmd < file`	`fork` → `open` + `dup2(fd,0)` + `close` → `exec`
`cmd1 \| cmd2`	`pipe`; `fork` twice; each child rewires one end; parent closes both, `wait` twice
Exit a subprocess	`exit(status)`
Reap child	`wait(&mut status)`

The Octox shell (src/user/bin/sh.rs) is implemented with exactly this toolkit.

Key Concepts¶

Concept	Takeaway
System call	User code asks the kernel to do something via `ecall`; returns to user.
File descriptor	Small integer indexing a per-process open-file table.
`fork` returns twice	`0` in child, `child_pid` in parent; memory is (logically) copied.
`exec` does not return	Success replaces the image; only failure paths run the code after it.
fds survive `fork`	Child inherits a copy of the whole fd table.
fds survive `exec`	New program runs with the fd table the caller set up.
`dup2(src, dst)`	Atomically closes `dst`, then aliases it onto `src`.
Pipe EOF rule	Readers see EOF only after every write-end fd is closed.
Partial `read`/`write`	Always loop; never assume a single call moved all bytes.

Practice Problems¶

Problem 1 — What does `ex_fork` print?¶

If you change x += 1; in the child branch to x += 50;, what does the parent line print? Why?

Solution

The parent still prints `x=100`. `fork` gives the child a separate copy of `x`; whatever the child does to its copy is invisible to the parent. Only the child's line changes (to `x=150`).

Problem 2 — The hanging pipeline¶

In ex_pipe2, delete the two sys::close calls the parent makes on the pipe fds (lines 72–73). The program now hangs. Which process is stuck, on which call, and why?

Solution

`wc` (child 2) hangs inside its final `read` on stdin. The parent still holds the write end of the pipe open, so the kernel cannot signal EOF to the reader even after `ls` has exited and closed its own write end. `read` waits for more data forever. The first `sys::wait` in the parent then blocks waiting for `wc` to exit, so the whole program is stuck. Lesson: **everyone** who ever held a pipe fd must close the ends they do not use. The reader needs to see the *last* write-end fd go away.

Problem 3 — A three-command pipeline¶

Sketch the syscalls needed to implement cmd1 | cmd2 | cmd3. How many pipes? How many forks? How many close calls does the parent make?

Solution

Two pipes, three `fork`s. Let `p1` and `p2` be the two pipes. - child 1: `dup2(p1[1], 1)`; close `p1[0]`, `p1[1]`, `p2[0]`, `p2[1]`; `exec(cmd1)`. - child 2: `dup2(p1[0], 0)`; `dup2(p2[1], 1)`; close all four raw fds; `exec(cmd2)`. - child 3: `dup2(p2[0], 0)`; close all four raw fds; `exec(cmd3)`. - parent: close all four raw fds (`p1[0]`, `p1[1]`, `p2[0]`, `p2[1]`); `wait` three times. Four closes in the parent (one per pipe end), plus the four each child did for cleanup. Missing any of the closes in the parent can deadlock the pipeline.

UNIX System Calls¶

Overview¶

Learning Objectives¶

Prerequisites¶

What Is a System Call?¶

The Octox Syscall Surface¶

File Descriptors and Open Modes¶

1. ex_args — Command-Line Arguments¶

2. ex_count — open + read Loop + close¶

3. ex_write — open(... CREATE | TRUNC) + write¶

4. ex_fork — Two Processes from One¶

5. ex_exec — Replace the Running Program¶

6. ex_redir — Rewiring Your Own Stdout¶

7. ex_redir2 — Redirection Survives exec¶

8. ex_pipe — Talking Through a Pipe¶

9. ex_pipe2 — The Shell's Pipeline: ls | wc¶

Building a Shell from Seven Syscalls¶

Key Concepts¶

Practice Problems¶

Problem 1 — What does ex_fork print?¶

Problem 2 — The hanging pipeline¶

Problem 3 — A three-command pipeline¶

Further Reading¶

1. `ex_args` — Command-Line Arguments¶

2. `ex_count` — `open` + `read` Loop + `close`¶

3. `ex_write` — `open(... CREATE | TRUNC)` + `write`¶

4. `ex_fork` — Two Processes from One¶

5. `ex_exec` — Replace the Running Program¶

6. `ex_redir` — Rewiring Your Own Stdout¶

7. `ex_redir2` — Redirection Survives `exec`¶

8. `ex_pipe` — Talking Through a Pipe¶

9. `ex_pipe2` — The Shell's Pipeline: `ls | wc`¶

Problem 1 — What does `ex_fork` print?¶