← Back to Course
# RISC-V Machine Code ## CS 631 Systems Foundations — Mar 24, 2026 --- ## Today's Agenda 1. What is machine code? 2. R-type instruction format 3. Decoding and encoding instructions 4. Common R-type instructions 5. Bit extraction in Rust 6. I-type format and sign extension 7. All six RISC-V instruction formats 8. In-class: runtime instruction decoding --- ## What is Machine Code? **Machine code** = binary representation of assembly instructions In RISC-V: - All instructions are **32 bits** wide - 2^5 = **32 registers** (5 bits per register field) - Stored in memory, fetched and executed by the processor
The
assembler
translates assembly → machine code.
The
processor
executes machine code directly.
--- ## Assembly → Machine Code ```asm add a0, a0, a1 ``` **Hexadecimal:** ``` 0x00B50533 ``` **Binary:** ``` 0000 0000 1011 0101 0000 0101 0011 0011 ``` Every assembly instruction maps to exactly one 32-bit machine code word. --- ## R-Type Instruction Format Register-to-register operations: ``` ┌─────────┬─────┬─────┬───────┬─────┬─────────┐ │ funct7 │ rs2 │ rs1 │ funct3│ rd │ opcode │ ├─────────┼─────┼─────┼───────┼─────┼─────────┤ │ 31...25 │24.20│19.15│ 14..12│11..7│ 6...0 │ │ 7 bits │5 bit│5 bit│ 3 bits│5 bit│ 7 bits │ └─────────┴─────┴─────┴───────┴─────┴─────────┘ ``` --- ## R-Type Field Descriptions | Field | Bits | Description | |-------|------|-------------| | **opcode** | [6:0] | Instruction format (R-type = `0110011`) | | **rd** | [11:7] | Destination register | | **funct3** | [14:12] | Operation selector | | **rs1** | [19:15] | First source register | | **rs2** | [24:20] | Second source register | | **funct7** | [31:25] | Additional function bits | --- ## Decoding: `add a0, a0, a1` Machine code: `0x00B50533` **Step 1:** Convert to binary ``` 0x00B50533 = 0000 0000 1011 0101 0000 0101 0011 0011 ``` **Step 2:** Extract fields ``` 0000000 01011 01010 000 01010 0110011 funct7 rs2 rs1 f3 rd opcode ``` --- ## Decoding: Field Values | Field | Binary | Decimal | Meaning | |-------|--------|---------|---------| | opcode | 0110011 | 51 | R-type | | rd | 01010 | 10 | a0 (x10) | | funct3 | 000 | 0 | ADD | | rs1 | 01010 | 10 | a0 (x10) | | rs2 | 01011 | 11 | a1 (x11) | | funct7 | 0000000 | 0 | ADD (not SUB) | **Result:** `add a0, a0, a1` --- ## Visual Breakdown
graph LR subgraph "0x00B50533" A["0000000"] --> B["funct7=0
ADD"] C["01011"] --> D["rs2=11
a1"] E["01010"] --> F["rs1=10
a0"] G["000"] --> H["funct3=0
ADD"] I["01010"] --> J["rd=10
a0"] K["0110011"] --> L["opcode
R-type"] end
--- ## Common R-Type Instructions Opcode is always `0110011`. Operation determined by funct3 + funct7: | Instruction | funct7 | funct3 | Operation | |-------------|--------|--------|-----------| | **add** | 0000000 | 000 | rs1 + rs2 | | **sub** | 0100000 | 000 | rs1 - rs2 | | **sll** | 0000000 | 001 | rs1 << rs2 | | **srl** | 0000000 | 101 | rs1 >> rs2 (logical) | | **sra** | 0100000 | 101 | rs1 >> rs2 (arith) | | **and** | 0000000 | 111 | rs1 & rs2 | | **or** | 0000000 | 110 | rs1 \| rs2 | | **xor** | 0000000 | 100 | rs1 ^ rs2 | --- ## ADD vs SUB vs MUL All three have funct3 = 000. Distinguished by funct7: ``` ADD: funct7 = 0000000 (bit 30 = 0, bit 25 = 0) SUB: funct7 = 0100000 (bit 30 = 1, bit 25 = 0) MUL: funct7 = 0000001 (bit 30 = 0, bit 25 = 1) ```
Bit 30
: ADD (0) vs SUB (1)
Bit 25
: Base (0) vs M extension (1)
--- ## Encoding: `sub a2, a3, a4` **Step 1:** Identify components - SUB: funct7 = 0100000, funct3 = 000, opcode = 0110011 - rd = a2 = x12 = 01100 - rs1 = a3 = x13 = 01101 - rs2 = a4 = x14 = 01110 **Step 2:** Assemble bits ``` 0100000 01110 01101 000 01100 0110011 funct7 rs2 rs1 f3 rd opcode ``` **Step 3:** Convert → `0x40E68633` --- ## Bit Extraction in Rust ```rust fn get_bits(iw: u32, start: u32, count: u32) -> u32 { let mask = (1 << count) - 1; (iw >> start) & mask } fn get_opcode(iw: u32) -> u32 { get_bits(iw, 0, 7) } fn get_rd(iw: u32) -> u32 { get_bits(iw, 7, 5) } fn get_funct3(iw: u32) -> u32 { get_bits(iw, 12, 3) } fn get_rs1(iw: u32) -> u32 { get_bits(iw, 15, 5) } fn get_rs2(iw: u32) -> u32 { get_bits(iw, 20, 5) } fn get_funct7(iw: u32) -> u32 { get_bits(iw, 25, 7) } ``` --- ## Inline Extraction Without a helper function — shift and mask directly: ```rust let iw: u32 = 0x00B50533; let opcode = iw & 0b1111111; let rd = (iw >> 7) & 0b11111; let funct3 = (iw >> 12) & 0b111; let rs1 = (iw >> 15) & 0b11111; let rs2 = (iw >> 20) & 0b11111; let funct7 = (iw >> 25) & 0b1111111; ```
>>
shifts bits right,
&
masks to keep only the bits we want.
--- ## I-Type Instruction Format Encodes an immediate (constant) value: ``` ┌──────────────┬─────┬───────┬─────┬─────────┐ │ imm[11:0] │ rs1 │ funct3│ rd │ opcode │ ├──────────────┼─────┼───────┼─────┼─────────┤ │ 31...20 │19.15│ 14..12│11..7│ 6...0 │ │ 12 bits │5 bit│ 3 bits│5 bit│ 7 bits │ └──────────────┴─────┴───────┴─────┴─────────┘ ``` Examples: `addi`, `lw`, `lb`, `jalr` The 12-bit immediate is **sign-extended** before use. --- ## Sign Extension in Rust 12-bit value → 64-bit signed value: ```rust let immu: u64 = 0b111111111101; // 12-bit unsigned let start: u32 = 11; // sign bit position let distance: u32 = 64 - start; let imm: i64 = ((immu as i64) << distance) >> distance; // imm = -3 ``` **How it works:** 1. Shift left so sign bit reaches MSB 2. Arithmetic right shift fills with sign bit 3. Result is correctly sign-extended --- ## All Six Instruction Formats ``` R-type: |funct7 |rs2|rs1|f3| rd|opcode| Register ops I-type: | imm[11:0] |rs1|f3| rd|opcode| Immediates, loads S-type: |imm[11:5]|rs2|rs1|f3|imm[4:0]|opcode| Stores B-type: |imm bits|rs2|rs1|f3|imm bits|opcode| Branches U-type: | imm[31:12] | rd|opcode| LUI, AUIPC J-type: | imm bits | rd|opcode| JAL ```
Opcode is always bits [6:0]. Register fields are always in the same positions.
--- ## In-Class: Runtime Decoding Assembly function we'll decode at runtime: ```asm .global add2_s add2_s: add a0, a0, a1 addi t0, t1, -33 ret ``` --- ## In-Class: Rust Decoder ```rust let pc = add2_s as *const u32; let iw = unsafe { *pc }; let opcode = iw & 0b1111111; let funct3 = (iw >> 12) & 0b111; let funct7 = (iw >> 25) & 0b1111111; let rd = (iw >> 7) & 0b11111; let rs1 = (iw >> 15) & 0b11111; let rs2 = (iw >> 20) & 0b11111; // Advance to next instruction let pc = unsafe { pc.add(1) }; let iw = unsafe { *pc }; ``` Function pointer → address of first instruction word in memory. --- ## Key Takeaways - **Machine code** = binary encoding of assembly (fixed 32-bit words) - **R-type**: opcode + rd + funct3 + rs1 + rs2 + funct7 - **Decode**: shift right to field position, mask to field width - **Bit 30/25**: distinguish ADD, SUB, MUL (same funct3) - **I-type**: 12-bit sign-extended immediate - **Six formats**: R, I, S, B, U, J — opcode always in [6:0] --- ## Further Reading - [RISC-V ISA Specification](https://riscv.org/technical/specifications/) - [RISC-V Assembly Programmer's Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/main/riscv-asm.md) - [The RISC-V Reader](http://www.riscvbook.com/) — Patterson & Waterman - RISC-V Green Card — Quick reference for instruction encodings