← Back to Course
# Scanning in Rust ## CS 631 Systems Foundations --- ## The Rust Compilation Pipeline
flowchart LR A[".rs files
Cargo.toml"] --> B[Cargo/rustc] B --> C["Compiled Binary
(target/debug/ or
target/release/)"]
```bash cargo new myproject # Create new project cargo build # Build (debug mode) cargo run # Build and run cargo build --release # Optimized release build ``` --- ## Project Structure ```text myproject/ ├── Cargo.toml # Project manifest ├── src/ │ ├── main.rs # Main entry point │ ├── lib.rs # Library entry point │ └── bin/ # Additional binaries │ ├── scan1.rs # cargo run --bin scan1 │ └── scan2.rs # cargo run --bin scan2 └── target/ # Build artifacts ``` --- ## Hello World in Rust ```rust // hello.rs - Entry point of every Rust program fn main() { println!("Hello, World!"); } ``` - No semicolon on final expression = return value - `println!` is a **macro** (note the `!`) - No need for `return 0;` --- ## Native Data Types ```rust fn main() { // Signed integers (specified bit width) let a: i8 = -128; // 8-bit signed let b: i32 = -2_147_483_648; // 32-bit (default) let c: i64 = 0; // 64-bit signed // Unsigned integers let d: u8 = 255; // 8-bit unsigned let e: u32 = 0xDEAD_BEEF; // 32-bit unsigned let f: u64 = 0x123_456_789; // 64-bit unsigned // Other primitives let g: char = 'A'; // 4 bytes (Unicode) let h: bool = true; // 1 byte let i: f64 = 3.14159; // 64-bit float (default) } ``` --- ## Type Comparison: C vs Rust | C Type | Rust Type | Size | |--------|-----------|------| | `int32_t` | `i32` | 4 bytes | | `uint32_t` | `u32` | 4 bytes | | `int64_t` | `i64` | 8 bytes | | `uint64_t` | `u64` | 8 bytes | | `size_t` | `usize` | platform | | `char` | `u8` or `char` | 1 or 4 bytes | | `double` | `f64` | 8 bytes | --- ## Arrays and Vectors ```rust fn main() { // Arrays: Fixed size, stack-allocated let arr: [i32; 5] = [1, 2, 3, 4, 5]; println!("arr[0] = {}", arr[0]); // Initialize all elements to same value let zeros: [i32; 10] = [0; 10]; // Vectors: Dynamic size, heap-allocated let mut vec: Vec
= Vec::new(); vec.push(10); vec.push(20); // Vector with initial values let primes = vec![2, 3, 5, 7, 11, 13]; } ``` --- ## Structs ```rust // Define a struct struct Person { id: u32, name: String, } fn main() { // Create a struct instance let person = Person { id: 12345, name: String::from("Alice"), }; // Access struct fields println!("ID: {}", person.id); println!("Name: {}", person.name); } ``` --- ## Enums with Associated Data ```rust // Rust enums can hold data in each variant #[derive(Debug)] enum Token { IntLit(String), // Holds a String Plus, // No data Minus, // No data Eot, // No data } fn main() { let t1 = Token::IntLit(String::from("42")); let t2 = Token::Plus; println!("{:?}", t1); // IntLit("42") println!("{:?}", t2); // Plus } ```
This is more powerful than C enums, which are just numbers!
--- ## Ownership Rule 1 > Each value in Rust has exactly **one owner** ```rust fn main() { let s1 = String::from("hello"); // s1 owns the String let s2 = s1; // Ownership MOVES to s2 // Error! s1 no longer valid // println!("{}", s1); // compile error! println!("{}", s2); // OK - s2 owns it now } ``` --- ## Ownership Rule 2 > When the owner goes out of scope, the value is **dropped** (freed) ```rust fn main() { { let s = String::from("hello"); // s is valid here } // s is dropped here - memory freed automatically! // No garbage collector needed // No manual free() needed } ``` --- ## Ownership Rule 3 > Ownership can be **moved** to a new owner ```rust fn take_ownership(s: String) { println!("{}", s); } // s is dropped here fn main() { let s = String::from("hello"); take_ownership(s); // s moves into function // Error! s was moved // println!("{}", s); } ``` --- ## References and Borrowing **References** allow access without taking ownership: ```rust fn calculate_length(s: &String) -> usize { s.len() } // s goes out of scope, but doesn't own the String fn main() { let s = String::from("hello"); // &s creates an immutable reference (borrow) let len = calculate_length(&s); // s is still valid! println!("'{}' has length {}", s, len); } ``` --- ## Mutable References To **modify** borrowed data, use `&mut`: ```rust fn add_world(s: &mut String) { s.push_str(", world!"); } fn main() { let mut s = String::from("hello"); add_world(&mut s); // Mutable borrow println!("{}", s); // "hello, world!" } ``` --- ## Borrowing Rules 1. You can have **either**: - One mutable reference (`&mut T`), OR - Any number of immutable references (`&T`) 2. References must always be **valid** ```rust let mut s = String::from("hello"); let r1 = &s; // OK let r2 = &s; // OK - multiple immutable refs println!("{} {}", r1, r2); let r3 = &mut s; // OK - r1, r2 no longer used r3.push_str("!"); ``` --- ## String vs &str | Type | Ownership | Storage | Use Case | |------|-----------|---------|----------| | `String` | Owned | Heap | Mutable, growable | | `&str` | Borrowed | Anywhere | Read-only slice | ```rust // String: owned, heap-allocated let mut owned = String::from("hello"); owned.push_str(", world"); // &str: borrowed, often from literals let borrowed: &str = "hello"; // Converting let s: String = borrowed.to_string(); // &str -> String let slice: &str = &s; // String -> &str ``` --- ## Pattern Matching with `match` ```rust fn token_name(token: &Token) -> &str { match token { Token::IntLit(_) => "TK_INTLIT", // _ ignores value Token::Plus => "TK_PLUS", Token::Minus => "TK_MINUS", Token::Eot => "TK_EOT", } } fn token_value(token: &Token) -> &str { match token { Token::IntLit(s) => s, // Extract the String Token::Plus => "+", Token::Minus => "-", Token::Eot => "", } } ``` --- ## Scanner Struct ```rust struct Scanner { chars: Vec
, // Input as character array pos: usize, // Current position } impl Scanner { fn new(input: &str) -> Scanner { Scanner { chars: input.chars().collect(), pos: 0, } } } ``` --- ## Scanner Helper Methods ```rust impl Scanner { fn at_end(&self) -> bool { self.pos >= self.chars.len() } fn current(&self) -> Option
{ if self.at_end() { None } else { Some(self.chars[self.pos]) } } fn advance(&mut self) { self.pos += 1; } fn skip_whitespace(&mut self) { while let Some(ch) = self.current() { if ch == ' ' || ch == '\t' { self.advance(); } else { break; } } } } ``` --- ## Scanning Integer Literals ```rust impl Scanner { fn scan_intlit(&mut self) -> String { let mut value = String::new(); while let Some(ch) = self.current() { if ch.is_ascii_digit() { value.push(ch); self.advance(); } else { break; } } value } } ``` --- ## The scan_token Function ```rust fn scan_token(&mut self) -> Token { self.skip_whitespace(); match self.current() { None => Token::Eot, Some(ch) => { if ch.is_ascii_digit() { Token::IntLit(self.scan_intlit()) } else { self.advance(); match ch { '+' => Token::Plus, '-' => Token::Minus, '*' => Token::Mult, '/' => Token::Div, _ => { eprintln!("scan error: {}", ch); std::process::exit(1); } } } } } } ``` --- ## Scanning All Tokens ```rust impl Scanner { fn scan_all(&mut self) -> Vec
{ let mut tokens = Vec::new(); loop { let token = self.scan_token(); let is_eot = matches!(token, Token::Eot); tokens.push(token); if is_eot { break; } } tokens } } ``` --- ## Complete Example: main() ```rust use std::env; fn main() { let args: Vec
= env::args().collect(); if args.len() != 2 { eprintln!("Usage: scan2
"); std::process::exit(1); } let mut scanner = Scanner::new(&args[1]); let tokens = scanner.scan_all(); for token in &tokens { println!("{}", token); } } ``` --- ## Running the Scanner ```bash $ cargo run --bin scan2 -- "10 + 20 * 3" TK_INTLIT("10") TK_PLUS("+") TK_INTLIT("20") TK_MULT("*") TK_INTLIT("3") TK_EOT("") ``` --- ## C vs Rust Comparison | Aspect | C | Rust | |--------|---|------| | Memory | Manual (malloc/free) | Automatic (ownership) | | Null | NULL pointer | Option<T> | | Strings | char arrays + \0 | String / &str | | Enums | Just integers | Can hold data | | Safety | Runtime crashes | Compile-time checks | | Position | char *p | usize index | --- ## Key Rust Concepts | Concept | Description | |---------|-------------| | Ownership | Values have a single owner | | Borrowing | References allow temporary access | | `match` | Exhaustive pattern matching | | `Option
` | Represents optional values | | `Vec
` | Growable array type | | `String` vs `&str` | Owned vs borrowed strings | | `impl` | Add methods to types | --- ## Summary (1/2) 1. **Ownership**: Values have one owner; freed when owner drops 2. **Borrowing**: `&T` (immutable) and `&mut T` (mutable) references 3. **Enums**: Can hold data, perfect for tokens --- ## Summary (2/2) 4. **Pattern Matching**: `match` extracts enum data 5. **Scanner**: Uses `Vec
` and `usize` index instead of pointers 6. **Safety**: Compiler prevents use-after-free, double-free, null derefs