RISC-V Machine Code¶
Overview¶
This lecture covers how RISC-V assembly instructions are encoded as machine code — the actual 1s and 0s that the processor executes. We examine the binary format of RISC-V instructions, learn to decode and encode R-type instructions by hand, and use bit manipulation in Rust to extract instruction fields programmatically. We also survey all six RISC-V instruction formats and explore sign extension for immediate values.
Learning Objectives¶
- Understand machine code as the binary representation of assembly instructions
- Decode R-type instructions by extracting opcode, rd, rs1, rs2, funct3, and funct7 fields
- Encode assembly instructions into 32-bit machine code by hand
- Extract instruction fields using bit shifting and masking in Rust
- Understand the six core RISC-V instruction formats
- Perform sign extension on immediate values
Prerequisites¶
- RISC-V Assembly 1 — registers, instructions, memory operations
- RISC-V Assembly 2 — control flow, functions
- RISC-V Assembly Strings and Bits — byte operations, bitwise instructions
- Binary, Bases, and Bitwise Operations — binary/hex number systems, bit manipulation
Machine Code Fundamentals¶
What is Machine Code?¶
Machine code is the binary representation of assembly instructions — the actual 1s and 0s that the processor executes. In RISC-V:
- All instructions are 32 bits (in RV32I and the base instruction set of RV64I)
- There are 2^5 = 32 possible registers (5 bits to encode each)
- Instructions are stored in memory and fetched by the processor
Assembly to Machine Code¶
Assembly instruction:
Machine code (hexadecimal):
Machine code (binary):
The assembler translates assembly mnemonics into machine code. The processor fetches and executes machine code directly.
Why 32 Registers?¶
With 5 bits, we can represent 2^5 = 32 different values (0–31), which maps perfectly to RISC-V's 32 general-purpose registers (x0–x31).
R-Type Instruction Format¶
Instruction Fields¶
R-type instructions perform register-to-register operations. The 32-bit instruction word is divided into fields:
┌─────────┬─────┬─────┬───────┬─────┬─────────┐
│ funct7 │ rs2 │ rs1 │ funct3│ rd │ opcode │
├─────────┼─────┼─────┼───────┼─────┼─────────┤
│ 31...25 │24.20│19.15│ 14..12│11..7│ 6...0 │
│ 7 bits │5 bit│5 bit│ 3 bits│5 bit│ 7 bits │
└─────────┴─────┴─────┴───────┴─────┴─────────┘
| Field | Bits | Description |
|---|---|---|
| opcode | [6:0] | Identifies instruction format (R-type = 0110011) |
| rd | [11:7] | Destination register |
| funct3 | [14:12] | Function selector (operation within format) |
| rs1 | [19:15] | First source register |
| rs2 | [24:20] | Second source register |
| funct7 | [31:25] | Additional function bits (distinguishes add/sub, etc.) |
Decoding Example: add a0, a0, a1¶
Instruction: add a0, a0, a1
Machine code: 0x00B50533
Step 1: Convert to binary
Step 2: Extract fields
Step 3: Interpret values
| Field | Binary | Decimal | Meaning |
|---|---|---|---|
| opcode | 0110011 | 51 | R-type format |
| rd | 01010 | 10 | a0 (x10) |
| funct3 | 000 | 0 | ADD operation |
| rs1 | 01010 | 10 | a0 (x10) |
| rs2 | 01011 | 11 | a1 (x11) |
| funct7 | 0000000 | 0 | ADD (not SUB) |
Visual Breakdown¶
graph LR
subgraph "0x00B50533"
A["0000000"] --> B["funct7=0<br>ADD"]
C["01011"] --> D["rs2=11<br>a1"]
E["01010"] --> F["rs1=10<br>a0"]
G["000"] --> H["funct3=0<br>ADD"]
I["01010"] --> J["rd=10<br>a0"]
K["0110011"] --> L["opcode<br>R-type"]
end
Common R-Type Instructions¶
Opcode and Function Fields¶
For R-type instructions, the opcode is always 0110011 (0x33). The specific operation is determined by funct3 and funct7:
| Instruction | funct7 | funct3 | Operation |
|---|---|---|---|
| add | 0000000 | 000 | rd = rs1 + rs2 |
| sub | 0100000 | 000 | rd = rs1 - rs2 |
| sll | 0000000 | 001 | rd = rs1 << rs2 |
| srl | 0000000 | 101 | rd = rs1 >> rs2 (logical) |
| sra | 0100000 | 101 | rd = rs1 >> rs2 (arithmetic) |
| and | 0000000 | 111 | rd = rs1 & rs2 |
| or | 0000000 | 110 | rd = rs1 | rs2 |
| xor | 0000000 | 100 | rd = rs1 ^ rs2 |
M Extension (Multiply/Divide)¶
With the M extension, additional operations use funct7 = 0000001:
| Instruction | funct7 | funct3 | Operation |
|---|---|---|---|
| mul | 0000001 | 000 | rd = rs1 * rs2 |
| div | 0000001 | 100 | rd = rs1 / rs2 |
Distinguishing ADD vs SUB vs MUL¶
All three have funct3 = 000, but differ in funct7:
ADD: funct7 = 0000000, funct3 = 000 (bit 30 = 0, bit 25 = 0)
SUB: funct7 = 0100000, funct3 = 000 (bit 30 = 1, bit 25 = 0)
MUL: funct7 = 0000001, funct3 = 000 (bit 30 = 0, bit 25 = 1)
Key distinguishing bits: - Bit 30: Distinguishes ADD (0) from SUB (1) - Bit 25: Distinguishes base instructions (0) from M extension (1)
Encoding Instructions¶
Manual Encoding: sub a2, a3, a4¶
Let's encode sub a2, a3, a4:
Step 1: Identify components - Instruction: SUB (R-type) - rd = a2 = x12 = 01100 - rs1 = a3 = x13 = 01101 - rs2 = a4 = x14 = 01110 - funct3 = 000 - funct7 = 0100000 - opcode = 0110011
Step 2: Assemble the bits
Step 3: Convert to hex
Verification:
Bit Extraction in Rust¶
Extracting Instruction Fields¶
To decode machine code in software, we extract fields using bit manipulation:
/// Extract `count` bits starting at position `start` from instruction word `iw`
fn get_bits(iw: u32, start: u32, count: u32) -> u32 {
let mask = (1 << count) - 1; // Create mask of `count` 1s
(iw >> start) & mask // Shift and mask
}
// Extract specific fields from instruction word
fn get_opcode(iw: u32) -> u32 { get_bits(iw, 0, 7) }
fn get_rd(iw: u32) -> u32 { get_bits(iw, 7, 5) }
fn get_funct3(iw: u32) -> u32 { get_bits(iw, 12, 3) }
fn get_rs1(iw: u32) -> u32 { get_bits(iw, 15, 5) }
fn get_rs2(iw: u32) -> u32 { get_bits(iw, 20, 5) }
fn get_funct7(iw: u32) -> u32 { get_bits(iw, 25, 7) }
Example: Decoding 0x00B50533¶
let iw: u32 = 0x00B50533;
let opcode = get_opcode(iw); // 0110011 = 51 = R-type
let rd = get_rd(iw); // 01010 = 10 = a0
let funct3 = get_funct3(iw); // 000 = 0 = ADD
let rs1 = get_rs1(iw); // 01010 = 10 = a0
let rs2 = get_rs2(iw); // 01011 = 11 = a1
let funct7 = get_funct7(iw); // 0000000 = 0 = ADD (not SUB)
// Result: add a0, a0, a1
Inline Bit Extraction¶
You can also extract fields directly without a helper function using shift and mask:
let iw: u32 = 0x00B50533;
let opcode = iw & 0b1111111; // bits [6:0]
let rd = (iw >> 7) & 0b11111; // bits [11:7]
let funct3 = (iw >> 12) & 0b111; // bits [14:12]
let rs1 = (iw >> 15) & 0b11111; // bits [19:15]
let rs2 = (iw >> 20) & 0b11111; // bits [24:20]
let funct7 = (iw >> 25) & 0b1111111; // bits [31:25]
I-Type Instructions and Sign Extension¶
I-Type Format¶
I-type instructions encode an immediate value (constant) within the instruction:
┌──────────────┬─────┬───────┬─────┬─────────┐
│ imm[11:0] │ rs1 │ funct3│ rd │ opcode │
├──────────────┼─────┼───────┼─────┼─────────┤
│ 31...20 │19.15│ 14..12│11..7│ 6...0 │
│ 12 bits │5 bit│ 3 bits│5 bit│ 7 bits │
└──────────────┴─────┴───────┴─────┴─────────┘
Examples: addi, lw, lb, jalr
The 12-bit immediate is sign-extended to the full register width (32 or 64 bits) before use.
Decoding Example: addi t0, t1, -33¶
Instruction: addi t0, t1, -33
The value -33 in 12-bit two's complement:
Assembled fields:
- opcode = 0010011 (I-type arithmetic)
- rd = 00101 = 5 (t0 = x5)
- funct3 = 000 (ADDI)
- rs1 = 00110 = 6 (t1 = x6)
- imm = 111111011111 = -33 (sign-extended)
Sign Extension in Rust¶
When extracting an immediate value from an instruction, the 12-bit value must be sign-extended to preserve negative numbers. The technique uses arithmetic shift:
fn signext() {
let immu: u64 = 0b111111111101; // 12-bit unsigned value
let start: u32 = 11; // sign bit position (bit 11 of 12-bit value)
let distance: u32 = 64 - start; // bits to shift
// Shift left to put sign bit at MSB, then arithmetic shift right
let imm: i64 = ((immu as i64) << distance) >> distance;
println!("imm = {}", imm); // prints: imm = -3
}
How it works:
- Cast the unsigned value to a signed type (
i64) - Shift left so the sign bit (bit 11) moves to the most significant position
- Arithmetic right shift (
>>on signed types in Rust) fills with the sign bit - The result is the correctly sign-extended value
RISC-V Instruction Formats Overview¶
RISC-V uses six core instruction formats. Each has the opcode in bits [6:0]:
R-type: | funct7 | rs2 | rs1 | funct3 | rd | opcode | Register ops
I-type: | imm[11:0] | rs1 | funct3 | rd | opcode | Immediate ops, loads
S-type: | imm[11:5]| rs2| rs1 | funct3 |imm[4:0]|opcode| Stores
B-type: | imm bits | rs2| rs1 | funct3 |imm bits|opcode| Branches
U-type: | imm[31:12] | rd | opcode | Upper immediate
J-type: | imm bits | rd | opcode | Jumps (JAL)
graph TD
subgraph "Instruction Formats"
R["R-type<br>opcode=0110011<br>Register operations"]
I["I-type<br>opcode varies<br>Immediate ops, loads"]
S["S-type<br>opcode=0100011<br>Stores"]
B["B-type<br>opcode=1100011<br>Conditional branches"]
U["U-type<br>opcode=0110111/0010111<br>LUI, AUIPC"]
J["J-type<br>opcode=1101111<br>JAL"]
end
Design principle: The opcode and register fields (rd, rs1, rs2) are always in the same bit positions across formats, simplifying hardware decoding.
In-Class Example: Runtime Instruction Decoding¶
This example demonstrates reading machine code at runtime from an assembly function and decoding it in Rust.
Assembly Function (add2_s.s)¶
Rust Program (main.rs)¶
unsafe extern "C" {
fn add2_s(a0: i32, a1: i32) -> i32;
}
fn decode() {
let r = unsafe { add2_s(3, 4) };
println!("add2_s(3, 4) = {}", r);
let pc = add2_s as *const u32;
let iw = unsafe { *pc };
println!("[pc = {:p}] iw = {:X}", pc, iw);
// Decode R-type fields
let opcode = iw & 0b1111111;
let funct3 = (iw >> 12) & 0b111;
let funct7 = (iw >> 25) & 0b1111111;
let rd = (iw >> 7) & 0b11111;
let rs1 = (iw >> 15) & 0b11111;
let rs2 = (iw >> 20) & 0b11111;
println!("opcode = {}", opcode);
println!("funct3 = {}", funct3);
println!("funct7 = {}", funct7);
println!("rd = {}", rd);
println!("rs1 = {}", rs1);
println!("rs2 = {}", rs2);
// Advance to next instruction
let pc = unsafe { pc.add(1) };
let iw = unsafe { *pc };
println!("[pc = {:p}] iw = {:X}", pc, iw);
}
fn signext() {
let immu: u64 = 0b111111111101;
let start: u32 = 11;
let distance: u32 = 64 - start;
let imm: i64 = ((immu as i64) << distance) >> distance;
println!("imm = {}", imm);
}
fn main() {
println!("== decode ==");
decode();
println!();
println!("== signext ==");
signext();
}
Key points:
- The function pointer add2_s as *const u32 gives us the address of the first instruction
- Reading *pc loads the 32-bit instruction word from memory
- pc.add(1) advances by 4 bytes (one 32-bit instruction) to the next instruction
- We can verify our manual decoding matches the extracted field values
Key Concepts¶
| Concept | Definition |
|---|---|
| Machine Code | Binary representation of instructions executed by the CPU |
| Instruction Word (iw) | The 32-bit binary value encoding one instruction |
| Opcode | 7-bit field identifying the instruction format |
| funct3 | 3-bit field selecting operation within a format |
| funct7 | 7-bit field providing additional operation distinction |
| rd | Destination register (5 bits) |
| rs1, rs2 | Source registers (5 bits each) |
| R-type | Format for register-to-register operations |
| I-type | Format for immediate (constant) operations and loads |
| Sign Extension | Extending a smaller signed value to a wider type preserving the sign |
Practice Problems¶
Problem 1: Decode Machine Code¶
Decode the instruction 0x40A60633. What assembly instruction does it represent?
Solution
**Step 1: Convert to binary** **Step 2: Extract fields** **Step 3: Determine instruction** - opcode = R-type - funct3 = 0, funct7 bit 30 = 1: SUB **Answer:** `sub a2, a2, a0`Problem 2: Encode Instruction¶
Encode and a5, a6, a7 as a 32-bit hexadecimal machine code.
Solution
**Step 1: Look up encoding** - AND: funct7 = 0000000, funct3 = 111, opcode = 0110011 - a5 = x15 = 01111 - a6 = x16 = 10000 - a7 = x17 = 10001 **Step 2: Assemble bits** **Step 3: Group and convert** **Answer:** `0x011877B3`Problem 3: What Does Bit 30 Tell Us?¶
Given an R-type instruction with funct3 = 000, how do you determine if it's ADD, SUB, or MUL?
Solution
Check bits 30 and 25 of the funct7 field: | Bit 30 | Bit 25 | Instruction | |--------|--------|-------------| | 0 | 0 | ADD | | 1 | 0 | SUB | | 0 | 1 | MUL | **Code to determine:**Problem 4: Register Encoding¶
If an instruction has rd = 01010, rs1 = 01011, rs2 = 01100, what are the ABI register names?
Solution
| Field | Binary | Decimal | ABI Name | |-------|--------|---------|----------| | rd | 01010 | 10 | a0 | | rs1 | 01011 | 11 | a1 | | rs2 | 01100 | 12 | a2 | **Register mapping reference:** - x0 = zero - x1 = ra - x2 = sp - x10–x17 = a0–a7 (arguments/return values) - x8–x9, x18–x27 = s0–s11 (saved registers) - x5–x7, x28–x31 = t0–t6 (temporaries)Problem 5: Field Extraction¶
Write a Rust expression to extract funct3 from an instruction word iw.
Solution
**Method 1: Shift and mask** **Method 2: Using get_bits function** **Explanation:** - funct3 is at bits [14:12] - Shift right by 12 to move those bits to position [2:0] - AND with 0x7 (binary 111) to keep only 3 bitsFurther Reading¶
- RISC-V Specification — Volume 1: Unprivileged ISA, Chapter 2 (RV32I Base Integer Instruction Set)
- https://riscv.org/specifications/
- RISC-V Green Card — Quick reference for instruction encodings
- Computer Organization and Design: RISC-V Edition (Patterson & Hennessy) — Chapter 2
- GNU Assembler Manual — RISC-V specific options and directives
Summary¶
-
Machine code is the binary encoding of assembly instructions. RISC-V uses fixed 32-bit instruction words.
-
R-type format encodes register-to-register operations with fields: opcode (7 bits), rd (5 bits), funct3 (3 bits), rs1 (5 bits), rs2 (5 bits), and funct7 (7 bits).
-
Decoding machine code requires extracting fields using bit shifting and masking. The opcode identifies the format, funct3 and funct7 identify the specific operation.
-
Bit 30 and bit 25 of funct7 distinguish operations with the same funct3 (e.g., ADD vs SUB vs MUL all have funct3 = 000).
-
I-type instructions encode a 12-bit immediate that is sign-extended to the full register width.
-
Six instruction formats (R, I, S, B, U, J) cover all RISC-V operations, with the opcode always in bits [6:0].