Skip to content

RISC-V Assembly: Strings and Bit Manipulation

Overview

This lecture covers essential RISC-V assembly programming techniques for working with strings and bit manipulation. We explore endianness and byte ordering, byte-level memory operations using lb and sb instructions, string processing patterns (length, copy), and bitwise operations including shifts, masks, and bit sequence extraction. Each concept is demonstrated with RISC-V assembly functions called from Rust driver programs.

Learning Objectives

  • Explain endianness and how RISC-V stores multi-byte values in memory
  • Use byte load/store instructions (lb, lbu, sb) for character and byte manipulation
  • Implement string iteration patterns in assembly with proper null termination
  • Apply bitwise operations (AND, OR, XOR, NOT) and shifts in RISC-V assembly
  • Extract bit sequences using the shift-and-mask technique
  • Sign-extend extracted bit sequences using arithmetic shifts

Prerequisites

  • RISC-V Assembly 1 & 2 (Lectures 05–06): registers, instructions, memory operations, control flow, functions, calling convention
  • Stack frame management (prologue/epilogue pattern)
  • Binary representation and two's complement

1. Endianness

Big Endian vs Little Endian

When storing multi-byte values, the order of bytes in memory depends on the system's endianness.

Consider the 32-bit integer 0x22AA33CC:

Address Big Endian Little Endian
0 0x22 (MSB) 0xCC (LSB)
1 0xAA 0x33
2 0x33 0xAA
3 0xCC (LSB) 0x22 (MSB)

Big endian: most significant byte (MSB) at the lowest address — bytes are stored in the order you'd write the number.

Little endian: least significant byte (LSB) at the lowest address — bytes are stored in reverse order.

RISC-V is little-endian: The least significant byte is stored at the lowest address.

Detecting Endianness

We can detect endianness by storing a multi-byte value and reading its first byte:

Rust (src/bin/endian.rs):

fn main() {
    let x: i32 = 1;
    let ptr = &x as *const i32 as *const u8;
    let first_byte = unsafe { *ptr };

    if first_byte == 1 {
        println!("Little-endian");
    } else {
        println!("Big-endian");
    }
}

The integer 1 is 0x00000001. On a little-endian system, the byte at the lowest address is 0x01, so first_byte == 1. On a big-endian system, the lowest address would hold 0x00.

Byte Access with lb

When loading individual bytes from a multi-byte value, endianness determines which byte you get:

# If 0x22AA33CC is stored at address in a0 (little-endian):
lw t0, (a0)       # t0 = 0x22AA33CC (full word)
lb t1, 0(a0)      # t1 = 0xCC (byte 0 — least significant)
lb t2, 1(a0)      # t2 = 0x33 (byte 1)
lb t3, 2(a0)      # t3 = 0xAA (byte 2)
lb t4, 3(a0)      # t4 = 0x22 (byte 3 — most significant)

2. Byte Load/Store Instructions

Instruction Summary

Instruction Description Sign Extension
lb rd, offset(rs) Load byte, sign-extend to 64 bits Yes (bit 7 → bits 8–63)
lbu rd, offset(rs) Load byte unsigned, zero-extend No (zeros in bits 8–63)
sb rs, offset(rd) Store low byte of register to memory N/A

Sign extension: lb copies bit 7 of the loaded byte into bits 8–63. For ASCII characters (0–127), this doesn't matter. For values ≥ 128, lb produces a negative number while lbu produces a positive one.

Example: Pointer Dereference — intdr_s.s / intdr.rs

A simple example showing how assembly can dereference a pointer and modify the value:

Rust (src/bin/intdr.rs):

extern "C" {
    fn intdr_s(p: *mut i32);
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let val: i32 = if args.len() > 1 {
        args[1].parse().unwrap()
    } else {
        5
    };

    // Rust version
    let mut x = val;
    x += 1;
    println!("Rust: {}", x);

    // Asm version
    let mut x = val;
    unsafe { intdr_s(&mut x as *mut i32) };
    println!("Asm: {}", x);
}

Assembly (asm/intdr_s.s):

.global intdr_s

.text

# a0 = pointer to int
# increment the value at the pointer
intdr_s:
    lw t0, 0(a0)        # load value
    addi t0, t0, 1      # increment
    sw t0, 0(a0)        # store back
    ret

The assembly receives a pointer in a0, uses lw to load the 32-bit value, increments it, and uses sw to store it back. The Rust caller sees the modified value because both sides share the same memory location.


3. String Operations

C Strings in Memory

Strings in C (and when passed via FFI) are null-terminated arrays of bytes. Each character occupies one byte, and the string ends with a 0x00 byte.

In assembly, we process strings character by character using lb (load byte) and sb (store byte), advancing a pointer until we hit the null terminator.

The Rust FFI Pattern

All our string examples follow the same Rust FFI pattern:

  • extern "C" { ... } — declares the assembly function
  • CString::new(s) — creates a null-terminated C string from a Rust string
  • unsafe { ... } — required for calling foreign functions
  • .as_ptr() — gets a raw pointer to the string data

String Length — strlen_s.s / strlen.rs

Rust (src/bin/strlen.rs):

use std::ffi::CString;

extern "C" {
    fn strlen_s(s: *const u8) -> usize;
}

fn strlen_c(s: &str) -> usize {
    s.len()
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let s = if args.len() > 1 { &args[1] } else { "hello" };

    let r = strlen_c(s);
    println!("Rust: {}", r);

    let cs = CString::new(s).unwrap();
    let r = unsafe { strlen_s(cs.as_ptr() as *const u8) };
    println!("Asm: {}", r);
}

Assembly (asm/strlen_s.s):

.global strlen_s

.text

# a0 = pointer to string
# return length of string
strlen_s:
    li t0, 0            # length = 0
strlen_s_loop:
    lb t1, 0(a0)        # load byte
    beqz t1, strlen_s_done  # if null terminator, done
    addi t0, t0, 1      # length++
    addi a0, a0, 1      # advance pointer
    j strlen_s_loop
strlen_s_done:
    mv a0, t0           # return length
    ret

Algorithm: Initialize a counter to 0. Load each byte; if it's zero (null terminator), return the counter. Otherwise, increment both the counter and the pointer, and loop.

String Copy — strcpy_s.s / strcpy.rs

Rust (src/bin/strcpy.rs):

use std::ffi::{CStr, CString};

extern "C" {
    fn strcpy_s(dest: *mut u8, src: *const u8) -> *mut u8;
}

fn strcpy_c(dest: &mut [u8], src: &str) {
    let bytes = src.as_bytes();
    dest[..bytes.len()].copy_from_slice(bytes);
    dest[bytes.len()] = 0;
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let s = if args.len() > 1 { &args[1] } else { "hello" };

    // Rust version
    let mut buf = vec![0u8; 256];
    strcpy_c(&mut buf, s);
    let result = CStr::from_bytes_until_nul(&buf).unwrap();
    println!("Rust: {}", result.to_str().unwrap());

    // Asm version
    let mut buf = vec![0u8; 256];
    let cs = CString::new(s).unwrap();
    unsafe { strcpy_s(buf.as_mut_ptr(), cs.as_ptr() as *const u8) };
    let result = CStr::from_bytes_until_nul(&buf).unwrap();
    println!("Asm: {}", result.to_str().unwrap());
}

Assembly (asm/strcpy_s.s):

.global strcpy_s

.text

# a0 = dest, a1 = src
# copy string from src to dest, return dest
strcpy_s:
    mv t0, a0           # save dest
strcpy_s_loop:
    lb t1, 0(a1)        # load byte from src
    sb t1, 0(a0)        # store byte to dest
    beqz t1, strcpy_s_done  # if null terminator, done
    addi a0, a0, 1      # advance dest
    addi a1, a1, 1      # advance src
    j strcpy_s_loop
strcpy_s_done:
    mv a0, t0           # return original dest
    ret

Algorithm: Copy each byte from source to destination, including the null terminator. We copy first, then check — this ensures the null terminator itself gets copied. Return the original destination pointer.


4. Bitwise Operations

Two's Complement Review

Before working with bitwise operations, recall how signed integers are represented:

Rust (src/bin/twos.rs):

fn main() {
    let vals: Vec<i8> = vec![0, 1, -1, 127, -128, 42, -42];

    for v in vals {
        println!("{:4} = 0x{:02x}", v, v as u8);
    }
}

Output:

   0 = 0x00
   1 = 0x01
  -1 = 0xff
 127 = 0x7f
-128 = 0x80
  42 = 0x2a
 -42 = 0xd6

Key insight: -1 is 0xFF (all ones), -128 is 0x80 (just the sign bit). The MSB (bit 7 for 8-bit values) is the sign bit.

Bitwise Operations in Assembly — bits_s.s / bits.rs

Rust (src/bin/bits.rs):

extern "C" {
    fn and_s(a: u32, b: u32) -> u32;
    fn or_s(a: u32, b: u32) -> u32;
    fn xor_s(a: u32, b: u32) -> u32;
    fn not_s(a: u32) -> u32;
    fn sll_w(a: u32, n: u32) -> u32;
    fn srl_w(a: u32, n: u32) -> u32;
    fn sra_w(a: i32, n: u32) -> i32;
}

fn prbin(v: u32, bits: u32) {
    for i in (0..bits).rev() {
        print!("{}", (v >> i) & 1);
    }
    println!();
}

fn main() {
    let a: u8 = 0b11001100;
    let b: u8 = 0b10101010;

    println!("a = 0x{:02x}", a);
    prbin(a as u32, 8);
    println!("b = 0x{:02x}", b);
    prbin(b as u32, 8);
    println!();

    // AND, OR, XOR, NOT demonstrations...
    let r = a & b;
    print!("Rust: a & b = 0x{:02x} ", r);
    prbin(r as u32, 8);
    let r = unsafe { and_s(a as u32, b as u32) } as u8;
    print!("Asm:  a & b = 0x{:02x} ", r);
    prbin(r as u32, 8);
    // ... (similar for OR, XOR, NOT, shifts)
}

Assembly (asm/bits_s.s):

.global and_s
.global or_s
.global xor_s
.global not_s
.global sll_w
.global srl_w
.global sra_w

.text

# a0 = a, a1 = b
# return a & b
and_s:
    and a0, a0, a1
    ret

# a0 = a, a1 = b
# return a | b
or_s:
    or a0, a0, a1
    ret

# a0 = a, a1 = b
# return a ^ b
xor_s:
    xor a0, a0, a1
    ret

# a0 = a
# return ~a
not_s:
    not a0, a0
    ret

# a0 = a, a1 = n
# return a << n
sll_w:
    sllw a0, a0, a1
    ret

# a0 = a, a1 = n
# return a >> n (logical)
srl_w:
    srlw a0, a0, a1
    ret

# a0 = a, a1 = n
# return a >> n (arithmetic)
sra_w:
    sraw a0, a0, a1
    ret

Bitwise Operation Summary

Operation Instruction Description Example (8-bit)
AND and rd, rs1, rs2 1 only if both bits are 1 11001100 & 10101010 = 10001000
OR or rd, rs1, rs2 1 if either bit is 1 11001100 \| 10101010 = 11101110
XOR xor rd, rs1, rs2 1 if bits differ 11001100 ^ 10101010 = 01100110
NOT not rd, rs Flip all bits ~11001100 = 00110011

Logical vs Arithmetic Shifts

Instruction Operation Fill Bits Use Case
sllw rd, rs1, rs2 Shift left logical (32-bit) Zeros on right Multiply by powers of 2
srlw rd, rs1, rs2 Shift right logical (32-bit) Zeros on left Unsigned divide by powers of 2
sraw rd, rs1, rs2 Shift right arithmetic (32-bit) Sign bit on left Signed divide by powers of 2

The key difference: srlw fills vacated bits with zeros, while sraw fills them with copies of the sign bit. This matters for negative numbers:

0xF0000000 >> 4 (logical):     0x0F000000  (positive result)
0xF0000000 >> 4 (arithmetic):  0xFF000000  (stays negative)

5. Bit Sequence Extraction

The Problem

Extract a contiguous range of bits from a value. Given a number, a start bit position, and an end bit position, return the bits in that range as an unsigned integer.

The Algorithm

  1. Shift right by start to move the desired bits to position 0
  2. Create a mask with (end - start + 1) ones: (1 << len) - 1
  3. AND with the mask to zero out all other bits

Step-by-Step Worked Example

Extract bits 3–5 from 552:

552 = 0b1000101000

Step 1: Shift right by 3 (start position)
  552 >> 3 = 69 = 0b1000101

Step 2: Create mask for 3 bits (end - start + 1 = 5 - 3 + 1 = 3)
  (1 << 3) - 1 = 8 - 1 = 7 = 0b111

Step 3: AND with mask
  69 & 7 = 0b1000101 & 0b111 = 0b101 = 5

Result: 5

Assembly Implementation — get_bitseq_s.s / get_bitseq.rs

Rust (src/bin/get_bitseq.rs):

extern "C" {
    fn get_bitseq_s(num: u32, start: u32, end: u32) -> u32;
}

fn get_bitseq(num: u32, start: u32, end: u32) -> u32 {
    let shifted = num >> start;
    let mask = (1u32 << (end - start + 1)) - 1;
    shifted & mask
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let (num, start, end) = if args.len() > 3 {
        (
            args[1].parse::<u32>().unwrap(),
            args[2].parse::<u32>().unwrap(),
            args[3].parse::<u32>().unwrap(),
        )
    } else {
        (552, 3, 5)
    };

    let r = get_bitseq(num, start, end);
    println!("Rust: {}", r);

    let r = unsafe { get_bitseq_s(num, start, end) };
    println!("Asm: {}", r);
}

Assembly (asm/get_bitseq_s.s):

.global get_bitseq_s

.text

# a0 = num, a1 = start, a2 = end
# extract bits [start:end] from num
get_bitseq_s:
    srl a0, a0, a1      # shift right by start
    sub t0, a2, a1      # end - start
    addi t0, t0, 1      # end - start + 1
    li t1, 1
    sll t1, t1, t0      # 1 << (end - start + 1)
    addi t1, t1, -1     # mask = (1 << (end - start + 1)) - 1
    and a0, a0, t1      # num & mask
    ret

Note that this in-class version is a leaf function (no call instructions), so it needs no stack frame. Compare with the 2024 lecture version that handles the special case of extracting all 64 bits.

Signed Bit Sequence Extraction

Sometimes we need to treat the extracted bits as a signed value. The technique is:

  1. Call get_bitseq to get the unsigned value
  2. Shift left to put the extracted sign bit at the MSB of a 32-bit word
  3. Arithmetic shift right to propagate the sign bit

C-style pseudocode:

uint32_t get_bitseq_signed(int32_t num, int start, int end) {
    uint32_t val = get_bitseq(num, start, end);
    int len = (end - start) + 1;
    int shift_amt = 32 - len;

    val = val << shift_amt;                // Shift sign bit to MSB
    int val_signed = ((int) val) >> shift_amt;  // Arithmetic shift right
    return val_signed;
}

Assembly:

get_bitseq_signed_s:
    addi sp, sp, -16
    sd ra, (sp)

    call get_bitseq_s        # Get unsigned sequence in a0

    sub t0, a2, a1           # end - start
    addi t0, t0, 1           # len = end - start + 1

    li t1, 32                # Word width
    sub t1, t1, t0           # shift_amt = 32 - len

    sllw t2, a0, t1          # Shift left to put sign bit at MSB
    sraw a0, t2, t1          # Arithmetic shift right to sign-extend

    ld ra, (sp)
    addi sp, sp, 16
    ret

Example: Extract bits 4–7 from 94117 as signed:

Unsigned result: 10 = 0b1010 (MSB is 1 → negative in 4-bit signed)

shift_amt = 32 - 4 = 28

Step 1: 0b1010 << 28 = 0xA0000000
Step 2: 0xA0000000 >> 28 (arithmetic) = 0xFFFFFFFA = -6

Result: -6

6. Byte Packing and Unpacking

Pack Bytes

Combine four individual bytes into a single 32-bit integer by shifting and OR-ing:

# pack_bytes_s: combine 4 bytes into 32-bit value
# a0 = b3 (MSB), a1 = b2, a2 = b1, a3 = b0 (LSB)

pack_bytes_s:
    mv t0, a0           # val = b3
    slli t0, t0, 8      # val <<= 8
    or t0, t0, a1       # val |= b2
    slli t0, t0, 8      # val <<= 8
    or t0, t0, a2       # val |= b1
    slli t0, t0, 8      # val <<= 8
    or t0, t0, a3       # val |= b0
    mv a0, t0           # return val
    ret

How it works: Start with the most significant byte, shift left 8 to make room, OR in the next byte. Repeat until all four bytes are packed.

Unpack Bytes

Extract individual bytes from a 32-bit integer into an array:

# unpack_bytes_s: split value into byte array
# a0 = value
# a1 = array of uint32_t (4 elements)

unpack_bytes_s:
    li t0, 0                # loop index
    li t1, 4                # loop limit
unpack_loop:
    beq t0, t1, unpack_done
    andi t2, a0, 0xFF       # mask off low byte
    sw t2, (a1)             # *arr = byte
    srli a0, a0, 8          # shift to next byte
    addi t0, t0, 1          # i++
    addi a1, a1, 4          # advance array pointer
    j unpack_loop
unpack_done:
    ret

How it works: Use andi with 0xFF to isolate the lowest byte, store it, then shift right by 8 to bring the next byte into position. The loop processes bytes from LSB to MSB.


Key Concepts

Concept Description
Endianness Byte ordering in memory; RISC-V is little-endian (LSB at lowest address)
lb / lbu Load byte with sign extension / load byte unsigned (zero extension)
sb Store the low byte of a register to memory
Null termination C strings end with a 0x00 byte; loops check for this to find string end
and, or, xor, not Bitwise operations: AND, OR, XOR, bitwise complement
Logical shift (sll, srl) Shift that fills vacated bits with zeros
Arithmetic shift (sra) Right shift that fills vacated bits with the sign bit
Bit mask Pattern of bits used to extract specific bit positions: (1 << len) - 1
Sign extension Extending a value to more bits while preserving its signed interpretation

Practice Problems

Problem 1: Population Count (popcount)

Write RISC-V assembly to count the number of 1-bits in a 32-bit value.

Click to reveal solution
.global popcount_s

# a0 = value to count bits in
# Returns count in a0

popcount_s:
    li t0, 0            # count = 0
    li t1, 32           # loop limit
    li t2, 0            # loop index
popcount_loop:
    beq t2, t1, popcount_done
    andi t3, a0, 1      # Check LSB
    add t0, t0, t3      # count += LSB
    srli a0, a0, 1      # Shift right
    addi t2, t2, 1      # i++
    j popcount_loop
popcount_done:
    mv a0, t0           # Return count
    ret
Check LSB with `andi`, add it to the count, shift right to expose the next bit. Repeat 32 times.

Problem 2: Extract Nibble

Extract the nibble (4 bits) at position n from a 32-bit value (n=0 is bits 0–3, n=1 is bits 4–7, etc.).

Click to reveal solution
.global get_nibble_s

# a0 = value
# a1 = nibble position (0-7)
# Returns nibble value (0-15)

get_nibble_s:
    slli t0, a1, 2      # t0 = n * 4 (bit position)
    srl t1, a0, t0      # Shift nibble to position 0
    andi a0, t1, 0xF    # Mask to 4 bits
    ret
Multiply the nibble index by 4 to get the bit position, shift right, then mask with `0xF`.

Problem 3: Case Conversion

Write assembly to convert a lowercase ASCII character to uppercase. Return non-lowercase characters unchanged.

Click to reveal solution
.global toupper_s

# a0 = character
# Returns uppercase version (or unchanged if not lowercase)

toupper_s:
    li t0, 'a'          # 97
    blt a0, t0, toupper_done  # if ch < 'a', return unchanged
    li t0, 'z'          # 122
    bgt a0, t0, toupper_done  # if ch > 'z', return unchanged
    addi a0, a0, -32    # Convert: 'a' - 'A' = 32
toupper_done:
    ret
ASCII lowercase letters are 97–122, uppercase are 65–90. The difference is exactly 32.

Problem 4: Sign-Extend 8-bit Value

Write assembly to sign-extend an 8-bit value (in the low byte of a0) to a full 64-bit register.

Click to reveal solution
.global sign_extend_8_s

# a0 = 8-bit value (in low byte)
# Returns sign-extended 64-bit value

sign_extend_8_s:
    slli a0, a0, 56     # Shift to top byte
    srai a0, a0, 56     # Arithmetic shift right
    ret
Shifting left by 56 puts bit 7 (the sign bit of the 8-bit value) at bit 63. The arithmetic right shift propagates this sign bit back through all 56 positions. Alternatively, you could use `lb` which does sign extension automatically.

Problem 5: Swap Bytes

Write assembly to swap the two low bytes of a 32-bit value. For example, 0x0000AABB becomes 0x0000BBAA.

Click to reveal solution
.global swap_low_bytes_s

# a0 = value
# Returns value with low two bytes swapped

swap_low_bytes_s:
    andi t0, a0, 0xFF       # t0 = low byte (byte 0)
    srli t1, a0, 8          # shift right by 8
    andi t1, t1, 0xFF       # t1 = byte 1
    slli t0, t0, 8          # move byte 0 to byte 1 position
    or a0, t0, t1           # combine: byte 0 in position 1, byte 1 in position 0
    ret
Extract each byte with shift and mask, then recombine in swapped positions.

Further Reading


Summary

  1. RISC-V is little-endian: The least significant byte is stored at the lowest memory address. This affects how multi-byte values appear when accessed byte-by-byte with lb.

  2. Byte load/store (lb, lbu, sb) are essential for string processing. lb sign-extends, lbu zero-extends. Always use byte operations when working with characters.

  3. String operations follow a pointer-advancing loop pattern: load a byte, check for null terminator, process, advance pointer, repeat. The Rust FFI uses CString and unsafe blocks to pass strings to assembly.

  4. Bitwise operations (AND, OR, XOR, NOT) and shifts map directly to single RISC-V instructions. The critical distinction is between logical shifts (fill with zeros) and arithmetic shifts (fill with sign bit).

  5. Bit sequence extraction uses a three-step process: shift right to position 0, create a mask with (1 << len) - 1, then AND. For signed extraction, use the shift-left-then-arithmetic-shift-right technique.

  6. Byte packing/unpacking combines shifts and OR/AND operations to assemble or disassemble multi-byte values from individual bytes.