RISC-V Machine Code¶

Overview¶

This lecture covers how RISC-V assembly instructions are encoded as machine code — the actual 1s and 0s that the processor executes. We examine the binary format of RISC-V instructions, learn to decode and encode R-type instructions by hand, and use bit manipulation in Rust to extract instruction fields programmatically. We also survey all six RISC-V instruction formats and explore sign extension for immediate values.

Learning Objectives¶

Understand machine code as the binary representation of assembly instructions
Decode R-type instructions by extracting opcode, rd, rs1, rs2, funct3, and funct7 fields
Encode assembly instructions into 32-bit machine code by hand
Extract instruction fields using bit shifting and masking in Rust
Understand the six core RISC-V instruction formats
Perform sign extension on immediate values

Prerequisites¶

RISC-V Assembly 1 — registers, instructions, memory operations
RISC-V Assembly 2 — control flow, functions
RISC-V Assembly Strings and Bits — byte operations, bitwise instructions
Binary, Bases, and Bitwise Operations — binary/hex number systems, bit manipulation

Machine Code Fundamentals¶

What is Machine Code?¶

Machine code is the binary representation of assembly instructions — the actual 1s and 0s that the processor executes. In RISC-V:

All instructions are 32 bits (in RV32I and the base instruction set of RV64I)
There are 2^5 = 32 possible registers (5 bits to encode each)
Instructions are stored in memory and fetched by the processor

Assembly to Machine Code¶

Assembly instruction:

add a0, a0, a1

Machine code (hexadecimal):

0x00B50533

Machine code (binary):

0000 0000 1011 0101 0000 0101 0011 0011

The assembler translates assembly mnemonics into machine code. The processor fetches and executes machine code directly.

Why 32 Registers?¶

With 5 bits, we can represent 2^5 = 32 different values (0–31), which maps perfectly to RISC-V's 32 general-purpose registers (x0–x31).

R-Type Instruction Format¶

Instruction Fields¶

R-type instructions perform register-to-register operations. The 32-bit instruction word is divided into fields:

┌─────────┬─────┬─────┬───────┬─────┬─────────┐
│ funct7  │ rs2 │ rs1 │ funct3│  rd │ opcode  │
├─────────┼─────┼─────┼───────┼─────┼─────────┤
│ 31...25 │24.20│19.15│ 14..12│11..7│  6...0  │
│  7 bits │5 bit│5 bit│ 3 bits│5 bit│ 7 bits  │
└─────────┴─────┴─────┴───────┴─────┴─────────┘

Field	Bits	Description
opcode	[6:0]	Identifies instruction format (R-type = 0110011)
rd	[11:7]	Destination register
funct3	[14:12]	Function selector (operation within format)
rs1	[19:15]	First source register
rs2	[24:20]	Second source register
funct7	[31:25]	Additional function bits (distinguishes add/sub, etc.)

Decoding Example: add a0, a0, a1¶

Instruction: add a0, a0, a1 Machine code: 0x00B50533

Step 1: Convert to binary

0x00B50533 = 0000 0000 1011 0101 0000 0101 0011 0011

Step 2: Extract fields

Binary:  0000000 01011 01010 000 01010 0110011
Field:   funct7  rs2   rs1  f3   rd   opcode

Step 3: Interpret values

Field	Binary	Decimal	Meaning
opcode	0110011	51	R-type format
rd	01010	10	a0 (x10)
funct3	000	0	ADD operation
rs1	01010	10	a0 (x10)
rs2	01011	11	a1 (x11)
funct7	0000000	0	ADD (not SUB)

Visual Breakdown¶

graph LR
    subgraph "0x00B50533"
        A["0000000"] --> B["funct7=0<br>ADD"]
        C["01011"] --> D["rs2=11<br>a1"]
        E["01010"] --> F["rs1=10<br>a0"]
        G["000"] --> H["funct3=0<br>ADD"]
        I["01010"] --> J["rd=10<br>a0"]
        K["0110011"] --> L["opcode<br>R-type"]
    end

Common R-Type Instructions¶

Opcode and Function Fields¶

For R-type instructions, the opcode is always 0110011 (0x33). The specific operation is determined by funct3 and funct7:

Instruction	funct7	funct3	Operation
add	0000000	000	rd = rs1 + rs2
sub	0100000	000	rd = rs1 - rs2
sll	0000000	001	rd = rs1 << rs2
srl	0000000	101	rd = rs1 >> rs2 (logical)
sra	0100000	101	rd = rs1 >> rs2 (arithmetic)
and	0000000	111	rd = rs1 & rs2
or	0000000	110	rd = rs1 \| rs2
xor	0000000	100	rd = rs1 ^ rs2

M Extension (Multiply/Divide)¶

With the M extension, additional operations use funct7 = 0000001:

Instruction	funct7	funct3	Operation
mul	0000001	000	rd = rs1 * rs2
div	0000001	100	rd = rs1 / rs2

Distinguishing ADD vs SUB vs MUL¶

All three have funct3 = 000, but differ in funct7:

ADD: funct7 = 0000000, funct3 = 000  (bit 30 = 0, bit 25 = 0)
SUB: funct7 = 0100000, funct3 = 000  (bit 30 = 1, bit 25 = 0)
MUL: funct7 = 0000001, funct3 = 000  (bit 30 = 0, bit 25 = 1)

Key distinguishing bits: - Bit 30: Distinguishes ADD (0) from SUB (1) - Bit 25: Distinguishes base instructions (0) from M extension (1)

Encoding Instructions¶

Manual Encoding: sub a2, a3, a4¶

Let's encode sub a2, a3, a4:

Step 1: Identify components - Instruction: SUB (R-type) - rd = a2 = x12 = 01100 - rs1 = a3 = x13 = 01101 - rs2 = a4 = x14 = 01110 - funct3 = 000 - funct7 = 0100000 - opcode = 0110011

Step 2: Assemble the bits

funct7   rs2    rs1   funct3  rd     opcode
0100000  01110  01101  000    01100  0110011

Step 3: Convert to hex

Binary: 0100 0000 1110 0110 1000 0110 0011 0011
Hex:    0x40E68633

Verification:

# Assembly
sub a2, a3, a4

# Machine code
0x40E68633

Bit Extraction in Rust¶

Extracting Instruction Fields¶

To decode machine code in software, we extract fields using bit manipulation:

/// Extract `count` bits starting at position `start` from instruction word `iw`
fn get_bits(iw: u32, start: u32, count: u32) -> u32 {
    let mask = (1 << count) - 1;  // Create mask of `count` 1s
    (iw >> start) & mask           // Shift and mask
}

// Extract specific fields from instruction word
fn get_opcode(iw: u32) -> u32 { get_bits(iw, 0, 7) }
fn get_rd(iw: u32) -> u32     { get_bits(iw, 7, 5) }
fn get_funct3(iw: u32) -> u32 { get_bits(iw, 12, 3) }
fn get_rs1(iw: u32) -> u32    { get_bits(iw, 15, 5) }
fn get_rs2(iw: u32) -> u32    { get_bits(iw, 20, 5) }
fn get_funct7(iw: u32) -> u32 { get_bits(iw, 25, 7) }

Example: Decoding 0x00B50533¶

let iw: u32 = 0x00B50533;

let opcode = get_opcode(iw);  // 0110011 = 51 = R-type
let rd     = get_rd(iw);      // 01010 = 10 = a0
let funct3 = get_funct3(iw);  // 000 = 0 = ADD
let rs1    = get_rs1(iw);     // 01010 = 10 = a0
let rs2    = get_rs2(iw);     // 01011 = 11 = a1
let funct7 = get_funct7(iw);  // 0000000 = 0 = ADD (not SUB)

// Result: add a0, a0, a1

Inline Bit Extraction¶

You can also extract fields directly without a helper function using shift and mask:

let iw: u32 = 0x00B50533;

let opcode = iw & 0b1111111;            // bits [6:0]
let rd     = (iw >> 7)  & 0b11111;      // bits [11:7]
let funct3 = (iw >> 12) & 0b111;        // bits [14:12]
let rs1    = (iw >> 15) & 0b11111;      // bits [19:15]
let rs2    = (iw >> 20) & 0b11111;      // bits [24:20]
let funct7 = (iw >> 25) & 0b1111111;    // bits [31:25]

I-Type Instructions and Sign Extension¶

I-Type Format¶

I-type instructions encode an immediate value (constant) within the instruction:

┌──────────────┬─────┬───────┬─────┬─────────┐
│  imm[11:0]   │ rs1 │ funct3│  rd │ opcode  │
├──────────────┼─────┼───────┼─────┼─────────┤
│   31...20    │19.15│ 14..12│11..7│  6...0  │
│   12 bits    │5 bit│ 3 bits│5 bit│ 7 bits  │
└──────────────┴─────┴───────┴─────┴─────────┘

Examples: addi, lw, lb, jalr

The 12-bit immediate is sign-extended to the full register width (32 or 64 bits) before use.

Decoding Example: addi t0, t1, -33¶

Instruction: addi t0, t1, -33

The value -33 in 12-bit two's complement:

-33 = 0b111111011111 (12 bits)

Assembled fields:

imm[11:0]     rs1    funct3  rd     opcode
111111011111  00110  000     00101  0010011

opcode = 0010011 (I-type arithmetic)
rd = 00101 = 5 (t0 = x5)
funct3 = 000 (ADDI)
rs1 = 00110 = 6 (t1 = x6)
imm = 111111011111 = -33 (sign-extended)

Sign Extension in Rust¶

When extracting an immediate value from an instruction, the 12-bit value must be sign-extended to preserve negative numbers. The technique uses arithmetic shift:

fn signext() {
    let immu: u64 = 0b111111111101;  // 12-bit unsigned value
    let start: u32 = 11;             // sign bit position (bit 11 of 12-bit value)
    let distance: u32 = 64 - start;  // bits to shift

    // Shift left to put sign bit at MSB, then arithmetic shift right
    let imm: i64 = ((immu as i64) << distance) >> distance;

    println!("imm = {}", imm);  // prints: imm = -3
}

How it works:

Cast the unsigned value to a signed type (i64)
Shift left so the sign bit (bit 11) moves to the most significant position
Arithmetic right shift (>> on signed types in Rust) fills with the sign bit
The result is the correctly sign-extended value

RISC-V Instruction Formats Overview¶

RISC-V uses six core instruction formats. Each has the opcode in bits [6:0]:

R-type:  | funct7 | rs2 | rs1 | funct3 | rd | opcode |  Register ops
I-type:  |    imm[11:0]  | rs1 | funct3 | rd | opcode |  Immediate ops, loads
S-type:  | imm[11:5]| rs2| rs1 | funct3 |imm[4:0]|opcode|  Stores
B-type:  | imm bits | rs2| rs1 | funct3 |imm bits|opcode|  Branches
U-type:  |        imm[31:12]        | rd | opcode |  Upper immediate
J-type:  |        imm bits          | rd | opcode |  Jumps (JAL)

graph TD
    subgraph "Instruction Formats"
        R["R-type<br>opcode=0110011<br>Register operations"]
        I["I-type<br>opcode varies<br>Immediate ops, loads"]
        S["S-type<br>opcode=0100011<br>Stores"]
        B["B-type<br>opcode=1100011<br>Conditional branches"]
        U["U-type<br>opcode=0110111/0010111<br>LUI, AUIPC"]
        J["J-type<br>opcode=1101111<br>JAL"]
    end

Design principle: The opcode and register fields (rd, rs1, rs2) are always in the same bit positions across formats, simplifying hardware decoding.

In-Class Example: Runtime Instruction Decoding¶

This example demonstrates reading machine code at runtime from an assembly function and decoding it in Rust.

Assembly Function (`add2_s.s`)¶

.global add2_s

add2_s:
    add a0, a0, a1
    addi t0, t1, -33
    ret

Rust Program (`main.rs`)¶

unsafe extern "C" {
    fn add2_s(a0: i32, a1: i32) -> i32;
}

fn decode() {
    let r = unsafe { add2_s(3, 4) };
    println!("add2_s(3, 4) = {}", r);

    let pc = add2_s as *const u32;

    let iw = unsafe { *pc };
    println!("[pc = {:p}] iw = {:X}", pc, iw);

    // Decode R-type fields
    let opcode = iw & 0b1111111;
    let funct3 = (iw >> 12) & 0b111;
    let funct7 = (iw >> 25) & 0b1111111;
    let rd     = (iw >> 7)  & 0b11111;
    let rs1    = (iw >> 15) & 0b11111;
    let rs2    = (iw >> 20) & 0b11111;

    println!("opcode  = {}", opcode);
    println!("funct3  = {}", funct3);
    println!("funct7  = {}", funct7);
    println!("rd      = {}", rd);
    println!("rs1     = {}", rs1);
    println!("rs2     = {}", rs2);

    // Advance to next instruction
    let pc = unsafe { pc.add(1) };
    let iw = unsafe { *pc };
    println!("[pc = {:p}] iw = {:X}", pc, iw);
}

fn signext() {
    let immu: u64 = 0b111111111101;
    let start: u32 = 11;
    let distance: u32 = 64 - start;

    let imm: i64 = ((immu as i64) << distance) >> distance;

    println!("imm = {}", imm);
}

fn main() {
    println!("== decode ==");
    decode();

    println!();

    println!("== signext ==");
    signext();
}

Key points: - The function pointer add2_s as *const u32 gives us the address of the first instruction - Reading *pc loads the 32-bit instruction word from memory - pc.add(1) advances by 4 bytes (one 32-bit instruction) to the next instruction - We can verify our manual decoding matches the extracted field values

Key Concepts¶

Concept	Definition
Machine Code	Binary representation of instructions executed by the CPU
Instruction Word (iw)	The 32-bit binary value encoding one instruction
Opcode	7-bit field identifying the instruction format
funct3	3-bit field selecting operation within a format
funct7	7-bit field providing additional operation distinction
rd	Destination register (5 bits)
rs1, rs2	Source registers (5 bits each)
R-type	Format for register-to-register operations
I-type	Format for immediate (constant) operations and loads
Sign Extension	Extending a smaller signed value to a wider type preserving the sign

Practice Problems¶

Problem 1: Decode Machine Code¶

Decode the instruction 0x40A60633. What assembly instruction does it represent?

Solution

**Step 1: Convert to binary**

0x40A60633 = 0100 0000 1010 0110 0000 0110 0011 0011

**Step 2: Extract fields**

funct7:  0100000 = 32 (bit 30 = 1, indicates SUB)
rs2:     01010 = 10 (a0)
rs1:     01100 = 12 (a2)
funct3:  000 = 0
rd:      01100 = 12 (a2)
opcode:  0110011 = R-type

**Step 3: Determine instruction** - opcode = R-type - funct3 = 0, funct7 bit 30 = 1: SUB **Answer:** `sub a2, a2, a0`

Problem 2: Encode Instruction¶

Encode and a5, a6, a7 as a 32-bit hexadecimal machine code.

Solution

**Step 1: Look up encoding** - AND: funct7 = 0000000, funct3 = 111, opcode = 0110011 - a5 = x15 = 01111 - a6 = x16 = 10000 - a7 = x17 = 10001 **Step 2: Assemble bits**

funct7   rs2    rs1   funct3  rd     opcode
0000000  10001  10000  111    01111  0110011

**Step 3: Group and convert**

Binary: 0000 0001 0001 1000 0111 0111 1011 0011
Hex:    0x011877B3

**Answer:** `0x011877B3`

Problem 3: What Does Bit 30 Tell Us?¶

Given an R-type instruction with funct3 = 000, how do you determine if it's ADD, SUB, or MUL?

Solution

Check bits 30 and 25 of the funct7 field: | Bit 30 | Bit 25 | Instruction | |--------|--------|-------------| | 0 | 0 | ADD | | 1 | 0 | SUB | | 0 | 1 | MUL | **Code to determine:**

let bit_30 = (iw >> 30) & 1;
let bit_25 = (iw >> 25) & 1;

if bit_25 == 1 {
    // MUL (M extension)
} else if bit_30 == 1 {
    // SUB (bit 30 set)
} else {
    // ADD
}

Problem 4: Register Encoding¶

If an instruction has rd = 01010, rs1 = 01011, rs2 = 01100, what are the ABI register names?

Solution

| Field | Binary | Decimal | ABI Name | |-------|--------|---------|----------| | rd | 01010 | 10 | a0 | | rs1 | 01011 | 11 | a1 | | rs2 | 01100 | 12 | a2 | **Register mapping reference:** - x0 = zero - x1 = ra - x2 = sp - x10–x17 = a0–a7 (arguments/return values) - x8–x9, x18–x27 = s0–s11 (saved registers) - x5–x7, x28–x31 = t0–t6 (temporaries)

Problem 5: Field Extraction¶

Write a Rust expression to extract funct3 from an instruction word iw.

Solution

**Method 1: Shift and mask**

let funct3 = (iw >> 12) & 0x7;  // 0x7 = 0b111 (3 bits)

**Method 2: Using get_bits function**

let funct3 = get_bits(iw, 12, 3);  // start at bit 12, 3 bits

**Explanation:** - funct3 is at bits [14:12] - Shift right by 12 to move those bits to position [2:0] - AND with 0x7 (binary 111) to keep only 3 bits

Summary¶

Machine code is the binary encoding of assembly instructions. RISC-V uses fixed 32-bit instruction words.
R-type format encodes register-to-register operations with fields: opcode (7 bits), rd (5 bits), funct3 (3 bits), rs1 (5 bits), rs2 (5 bits), and funct7 (7 bits).
Decoding machine code requires extracting fields using bit shifting and masking. The opcode identifies the format, funct3 and funct7 identify the specific operation.
Bit 30 and bit 25 of funct7 distinguish operations with the same funct3 (e.g., ADD vs SUB vs MUL all have funct3 = 000).
I-type instructions encode a 12-bit immediate that is sign-extended to the full register width.
Six instruction formats (R, I, S, B, U, J) cover all RISC-V operations, with the opcode always in bits [6:0].

RISC-V Machine Code¶

Overview¶

Learning Objectives¶

Prerequisites¶

Machine Code Fundamentals¶

What is Machine Code?¶

Assembly to Machine Code¶

Why 32 Registers?¶

R-Type Instruction Format¶

Instruction Fields¶

Decoding Example: add a0, a0, a1¶

Visual Breakdown¶

Common R-Type Instructions¶

Opcode and Function Fields¶

M Extension (Multiply/Divide)¶

Distinguishing ADD vs SUB vs MUL¶

Encoding Instructions¶

Manual Encoding: sub a2, a3, a4¶

Bit Extraction in Rust¶

Extracting Instruction Fields¶

Example: Decoding 0x00B50533¶

Inline Bit Extraction¶

I-Type Instructions and Sign Extension¶

I-Type Format¶

Decoding Example: addi t0, t1, -33¶

Sign Extension in Rust¶

RISC-V Instruction Formats Overview¶

In-Class Example: Runtime Instruction Decoding¶

Assembly Function (add2_s.s)¶

Rust Program (main.rs)¶

Key Concepts¶

Practice Problems¶

Problem 1: Decode Machine Code¶

Problem 2: Encode Instruction¶

Problem 3: What Does Bit 30 Tell Us?¶

Problem 4: Register Encoding¶

Problem 5: Field Extraction¶

Further Reading¶

Summary¶

Assembly Function (`add2_s.s`)¶

Rust Program (`main.rs`)¶