Skip to content

RISC-V Machine Code

Overview

This lecture covers how RISC-V assembly instructions are encoded as machine code — the actual 1s and 0s that the processor executes. We examine the binary format of RISC-V instructions, learn to decode and encode R-type instructions by hand, and use bit manipulation in Rust to extract instruction fields programmatically. We also survey all six RISC-V instruction formats and explore sign extension for immediate values.

Learning Objectives

  • Understand machine code as the binary representation of assembly instructions
  • Decode R-type instructions by extracting opcode, rd, rs1, rs2, funct3, and funct7 fields
  • Encode assembly instructions into 32-bit machine code by hand
  • Extract instruction fields using bit shifting and masking in Rust
  • Understand the six core RISC-V instruction formats
  • Perform sign extension on immediate values

Prerequisites


Machine Code Fundamentals

What is Machine Code?

Machine code is the binary representation of assembly instructions — the actual 1s and 0s that the processor executes. In RISC-V:

  • All instructions are 32 bits (in RV32I and the base instruction set of RV64I)
  • There are 2^5 = 32 possible registers (5 bits to encode each)
  • Instructions are stored in memory and fetched by the processor

Assembly to Machine Code

Assembly instruction:

add a0, a0, a1

Machine code (hexadecimal):

0x00B50533

Machine code (binary):

0000 0000 1011 0101 0000 0101 0011 0011

The assembler translates assembly mnemonics into machine code. The processor fetches and executes machine code directly.

Why 32 Registers?

With 5 bits, we can represent 2^5 = 32 different values (0–31), which maps perfectly to RISC-V's 32 general-purpose registers (x0–x31).


R-Type Instruction Format

Instruction Fields

R-type instructions perform register-to-register operations. The 32-bit instruction word is divided into fields:

┌─────────┬─────┬─────┬───────┬─────┬─────────┐
│ funct7  │ rs2 │ rs1 │ funct3│  rd │ opcode  │
├─────────┼─────┼─────┼───────┼─────┼─────────┤
│ 31...25 │24.20│19.15│ 14..12│11..7│  6...0  │
│  7 bits │5 bit│5 bit│ 3 bits│5 bit│ 7 bits  │
└─────────┴─────┴─────┴───────┴─────┴─────────┘
Field Bits Description
opcode [6:0] Identifies instruction format (R-type = 0110011)
rd [11:7] Destination register
funct3 [14:12] Function selector (operation within format)
rs1 [19:15] First source register
rs2 [24:20] Second source register
funct7 [31:25] Additional function bits (distinguishes add/sub, etc.)

Decoding Example: add a0, a0, a1

Instruction: add a0, a0, a1 Machine code: 0x00B50533

Step 1: Convert to binary

0x00B50533 = 0000 0000 1011 0101 0000 0101 0011 0011

Step 2: Extract fields

Binary:  0000000 01011 01010 000 01010 0110011
Field:   funct7  rs2   rs1  f3   rd   opcode

Step 3: Interpret values

Field Binary Decimal Meaning
opcode 0110011 51 R-type format
rd 01010 10 a0 (x10)
funct3 000 0 ADD operation
rs1 01010 10 a0 (x10)
rs2 01011 11 a1 (x11)
funct7 0000000 0 ADD (not SUB)

Visual Breakdown

graph LR
    subgraph "0x00B50533"
        A["0000000"] --> B["funct7=0<br>ADD"]
        C["01011"] --> D["rs2=11<br>a1"]
        E["01010"] --> F["rs1=10<br>a0"]
        G["000"] --> H["funct3=0<br>ADD"]
        I["01010"] --> J["rd=10<br>a0"]
        K["0110011"] --> L["opcode<br>R-type"]
    end

Common R-Type Instructions

Opcode and Function Fields

For R-type instructions, the opcode is always 0110011 (0x33). The specific operation is determined by funct3 and funct7:

Instruction funct7 funct3 Operation
add 0000000 000 rd = rs1 + rs2
sub 0100000 000 rd = rs1 - rs2
sll 0000000 001 rd = rs1 << rs2
srl 0000000 101 rd = rs1 >> rs2 (logical)
sra 0100000 101 rd = rs1 >> rs2 (arithmetic)
and 0000000 111 rd = rs1 & rs2
or 0000000 110 rd = rs1 | rs2
xor 0000000 100 rd = rs1 ^ rs2

M Extension (Multiply/Divide)

With the M extension, additional operations use funct7 = 0000001:

Instruction funct7 funct3 Operation
mul 0000001 000 rd = rs1 * rs2
div 0000001 100 rd = rs1 / rs2

Distinguishing ADD vs SUB vs MUL

All three have funct3 = 000, but differ in funct7:

ADD: funct7 = 0000000, funct3 = 000  (bit 30 = 0, bit 25 = 0)
SUB: funct7 = 0100000, funct3 = 000  (bit 30 = 1, bit 25 = 0)
MUL: funct7 = 0000001, funct3 = 000  (bit 30 = 0, bit 25 = 1)

Key distinguishing bits: - Bit 30: Distinguishes ADD (0) from SUB (1) - Bit 25: Distinguishes base instructions (0) from M extension (1)


Encoding Instructions

Manual Encoding: sub a2, a3, a4

Let's encode sub a2, a3, a4:

Step 1: Identify components - Instruction: SUB (R-type) - rd = a2 = x12 = 01100 - rs1 = a3 = x13 = 01101 - rs2 = a4 = x14 = 01110 - funct3 = 000 - funct7 = 0100000 - opcode = 0110011

Step 2: Assemble the bits

funct7   rs2    rs1   funct3  rd     opcode
0100000  01110  01101  000    01100  0110011

Step 3: Convert to hex

Binary: 0100 0000 1110 0110 1000 0110 0011 0011
Hex:    0x40E68633

Verification:

# Assembly
sub a2, a3, a4

# Machine code
0x40E68633


Bit Extraction in Rust

Extracting Instruction Fields

To decode machine code in software, we extract fields using bit manipulation:

/// Extract `count` bits starting at position `start` from instruction word `iw`
fn get_bits(iw: u32, start: u32, count: u32) -> u32 {
    let mask = (1 << count) - 1;  // Create mask of `count` 1s
    (iw >> start) & mask           // Shift and mask
}

// Extract specific fields from instruction word
fn get_opcode(iw: u32) -> u32 { get_bits(iw, 0, 7) }
fn get_rd(iw: u32) -> u32     { get_bits(iw, 7, 5) }
fn get_funct3(iw: u32) -> u32 { get_bits(iw, 12, 3) }
fn get_rs1(iw: u32) -> u32    { get_bits(iw, 15, 5) }
fn get_rs2(iw: u32) -> u32    { get_bits(iw, 20, 5) }
fn get_funct7(iw: u32) -> u32 { get_bits(iw, 25, 7) }

Example: Decoding 0x00B50533

let iw: u32 = 0x00B50533;

let opcode = get_opcode(iw);  // 0110011 = 51 = R-type
let rd     = get_rd(iw);      // 01010 = 10 = a0
let funct3 = get_funct3(iw);  // 000 = 0 = ADD
let rs1    = get_rs1(iw);     // 01010 = 10 = a0
let rs2    = get_rs2(iw);     // 01011 = 11 = a1
let funct7 = get_funct7(iw);  // 0000000 = 0 = ADD (not SUB)

// Result: add a0, a0, a1

Inline Bit Extraction

You can also extract fields directly without a helper function using shift and mask:

let iw: u32 = 0x00B50533;

let opcode = iw & 0b1111111;            // bits [6:0]
let rd     = (iw >> 7)  & 0b11111;      // bits [11:7]
let funct3 = (iw >> 12) & 0b111;        // bits [14:12]
let rs1    = (iw >> 15) & 0b11111;      // bits [19:15]
let rs2    = (iw >> 20) & 0b11111;      // bits [24:20]
let funct7 = (iw >> 25) & 0b1111111;    // bits [31:25]

I-Type Instructions and Sign Extension

I-Type Format

I-type instructions encode an immediate value (constant) within the instruction:

┌──────────────┬─────┬───────┬─────┬─────────┐
│  imm[11:0]   │ rs1 │ funct3│  rd │ opcode  │
├──────────────┼─────┼───────┼─────┼─────────┤
│   31...20    │19.15│ 14..12│11..7│  6...0  │
│   12 bits    │5 bit│ 3 bits│5 bit│ 7 bits  │
└──────────────┴─────┴───────┴─────┴─────────┘

Examples: addi, lw, lb, jalr

The 12-bit immediate is sign-extended to the full register width (32 or 64 bits) before use.

Decoding Example: addi t0, t1, -33

Instruction: addi t0, t1, -33

The value -33 in 12-bit two's complement:

-33 = 0b111111011111 (12 bits)

Assembled fields:

imm[11:0]     rs1    funct3  rd     opcode
111111011111  00110  000     00101  0010011

  • opcode = 0010011 (I-type arithmetic)
  • rd = 00101 = 5 (t0 = x5)
  • funct3 = 000 (ADDI)
  • rs1 = 00110 = 6 (t1 = x6)
  • imm = 111111011111 = -33 (sign-extended)

Sign Extension in Rust

When extracting an immediate value from an instruction, the 12-bit value must be sign-extended to preserve negative numbers. The technique uses arithmetic shift:

fn signext() {
    let immu: u64 = 0b111111111101;  // 12-bit unsigned value
    let start: u32 = 11;             // sign bit position (bit 11 of 12-bit value)
    let distance: u32 = 64 - start;  // bits to shift

    // Shift left to put sign bit at MSB, then arithmetic shift right
    let imm: i64 = ((immu as i64) << distance) >> distance;

    println!("imm = {}", imm);  // prints: imm = -3
}

How it works:

  1. Cast the unsigned value to a signed type (i64)
  2. Shift left so the sign bit (bit 11) moves to the most significant position
  3. Arithmetic right shift (>> on signed types in Rust) fills with the sign bit
  4. The result is the correctly sign-extended value

RISC-V Instruction Formats Overview

RISC-V uses six core instruction formats. Each has the opcode in bits [6:0]:

R-type:  | funct7 | rs2 | rs1 | funct3 | rd | opcode |  Register ops
I-type:  |    imm[11:0]  | rs1 | funct3 | rd | opcode |  Immediate ops, loads
S-type:  | imm[11:5]| rs2| rs1 | funct3 |imm[4:0]|opcode|  Stores
B-type:  | imm bits | rs2| rs1 | funct3 |imm bits|opcode|  Branches
U-type:  |        imm[31:12]        | rd | opcode |  Upper immediate
J-type:  |        imm bits          | rd | opcode |  Jumps (JAL)
graph TD
    subgraph "Instruction Formats"
        R["R-type<br>opcode=0110011<br>Register operations"]
        I["I-type<br>opcode varies<br>Immediate ops, loads"]
        S["S-type<br>opcode=0100011<br>Stores"]
        B["B-type<br>opcode=1100011<br>Conditional branches"]
        U["U-type<br>opcode=0110111/0010111<br>LUI, AUIPC"]
        J["J-type<br>opcode=1101111<br>JAL"]
    end

Design principle: The opcode and register fields (rd, rs1, rs2) are always in the same bit positions across formats, simplifying hardware decoding.


In-Class Example: Runtime Instruction Decoding

This example demonstrates reading machine code at runtime from an assembly function and decoding it in Rust.

Assembly Function (add2_s.s)

.global add2_s

add2_s:
    add a0, a0, a1
    addi t0, t1, -33
    ret

Rust Program (main.rs)

unsafe extern "C" {
    fn add2_s(a0: i32, a1: i32) -> i32;
}

fn decode() {
    let r = unsafe { add2_s(3, 4) };
    println!("add2_s(3, 4) = {}", r);

    let pc = add2_s as *const u32;

    let iw = unsafe { *pc };
    println!("[pc = {:p}] iw = {:X}", pc, iw);

    // Decode R-type fields
    let opcode = iw & 0b1111111;
    let funct3 = (iw >> 12) & 0b111;
    let funct7 = (iw >> 25) & 0b1111111;
    let rd     = (iw >> 7)  & 0b11111;
    let rs1    = (iw >> 15) & 0b11111;
    let rs2    = (iw >> 20) & 0b11111;

    println!("opcode  = {}", opcode);
    println!("funct3  = {}", funct3);
    println!("funct7  = {}", funct7);
    println!("rd      = {}", rd);
    println!("rs1     = {}", rs1);
    println!("rs2     = {}", rs2);

    // Advance to next instruction
    let pc = unsafe { pc.add(1) };
    let iw = unsafe { *pc };
    println!("[pc = {:p}] iw = {:X}", pc, iw);
}

fn signext() {
    let immu: u64 = 0b111111111101;
    let start: u32 = 11;
    let distance: u32 = 64 - start;

    let imm: i64 = ((immu as i64) << distance) >> distance;

    println!("imm = {}", imm);
}

fn main() {
    println!("== decode ==");
    decode();

    println!();

    println!("== signext ==");
    signext();
}

Key points: - The function pointer add2_s as *const u32 gives us the address of the first instruction - Reading *pc loads the 32-bit instruction word from memory - pc.add(1) advances by 4 bytes (one 32-bit instruction) to the next instruction - We can verify our manual decoding matches the extracted field values


Key Concepts

Concept Definition
Machine Code Binary representation of instructions executed by the CPU
Instruction Word (iw) The 32-bit binary value encoding one instruction
Opcode 7-bit field identifying the instruction format
funct3 3-bit field selecting operation within a format
funct7 7-bit field providing additional operation distinction
rd Destination register (5 bits)
rs1, rs2 Source registers (5 bits each)
R-type Format for register-to-register operations
I-type Format for immediate (constant) operations and loads
Sign Extension Extending a smaller signed value to a wider type preserving the sign

Practice Problems

Problem 1: Decode Machine Code

Decode the instruction 0x40A60633. What assembly instruction does it represent?

Solution **Step 1: Convert to binary**
0x40A60633 = 0100 0000 1010 0110 0000 0110 0011 0011
**Step 2: Extract fields**
funct7:  0100000 = 32 (bit 30 = 1, indicates SUB)
rs2:     01010 = 10 (a0)
rs1:     01100 = 12 (a2)
funct3:  000 = 0
rd:      01100 = 12 (a2)
opcode:  0110011 = R-type
**Step 3: Determine instruction** - opcode = R-type - funct3 = 0, funct7 bit 30 = 1: SUB **Answer:** `sub a2, a2, a0`

Problem 2: Encode Instruction

Encode and a5, a6, a7 as a 32-bit hexadecimal machine code.

Solution **Step 1: Look up encoding** - AND: funct7 = 0000000, funct3 = 111, opcode = 0110011 - a5 = x15 = 01111 - a6 = x16 = 10000 - a7 = x17 = 10001 **Step 2: Assemble bits**
funct7   rs2    rs1   funct3  rd     opcode
0000000  10001  10000  111    01111  0110011
**Step 3: Group and convert**
Binary: 0000 0001 0001 1000 0111 0111 1011 0011
Hex:    0x011877B3
**Answer:** `0x011877B3`

Problem 3: What Does Bit 30 Tell Us?

Given an R-type instruction with funct3 = 000, how do you determine if it's ADD, SUB, or MUL?

Solution Check bits 30 and 25 of the funct7 field: | Bit 30 | Bit 25 | Instruction | |--------|--------|-------------| | 0 | 0 | ADD | | 1 | 0 | SUB | | 0 | 1 | MUL | **Code to determine:**
let bit_30 = (iw >> 30) & 1;
let bit_25 = (iw >> 25) & 1;

if bit_25 == 1 {
    // MUL (M extension)
} else if bit_30 == 1 {
    // SUB (bit 30 set)
} else {
    // ADD
}

Problem 4: Register Encoding

If an instruction has rd = 01010, rs1 = 01011, rs2 = 01100, what are the ABI register names?

Solution | Field | Binary | Decimal | ABI Name | |-------|--------|---------|----------| | rd | 01010 | 10 | a0 | | rs1 | 01011 | 11 | a1 | | rs2 | 01100 | 12 | a2 | **Register mapping reference:** - x0 = zero - x1 = ra - x2 = sp - x10–x17 = a0–a7 (arguments/return values) - x8–x9, x18–x27 = s0–s11 (saved registers) - x5–x7, x28–x31 = t0–t6 (temporaries)

Problem 5: Field Extraction

Write a Rust expression to extract funct3 from an instruction word iw.

Solution **Method 1: Shift and mask**
let funct3 = (iw >> 12) & 0x7;  // 0x7 = 0b111 (3 bits)
**Method 2: Using get_bits function**
let funct3 = get_bits(iw, 12, 3);  // start at bit 12, 3 bits
**Explanation:** - funct3 is at bits [14:12] - Shift right by 12 to move those bits to position [2:0] - AND with 0x7 (binary 111) to keep only 3 bits

Further Reading

  1. RISC-V Specification — Volume 1: Unprivileged ISA, Chapter 2 (RV32I Base Integer Instruction Set)
  2. https://riscv.org/specifications/
  3. RISC-V Green Card — Quick reference for instruction encodings
  4. Computer Organization and Design: RISC-V Edition (Patterson & Hennessy) — Chapter 2
  5. GNU Assembler Manual — RISC-V specific options and directives

Summary

  1. Machine code is the binary encoding of assembly instructions. RISC-V uses fixed 32-bit instruction words.

  2. R-type format encodes register-to-register operations with fields: opcode (7 bits), rd (5 bits), funct3 (3 bits), rs1 (5 bits), rs2 (5 bits), and funct7 (7 bits).

  3. Decoding machine code requires extracting fields using bit shifting and masking. The opcode identifies the format, funct3 and funct7 identify the specific operation.

  4. Bit 30 and bit 25 of funct7 distinguish operations with the same funct3 (e.g., ADD vs SUB vs MUL all have funct3 = 000).

  5. I-type instructions encode a 12-bit immediate that is sign-extended to the full register width.

  6. Six instruction formats (R, I, S, B, U, J) cover all RISC-V operations, with the opcode always in bits [6:0].