RISC-V Assembly: Strings and Bit Manipulation¶
Overview¶
This lecture covers essential RISC-V assembly programming techniques for working with strings and bit manipulation. We explore endianness and byte ordering, byte-level memory operations using lb and sb instructions, string processing patterns (length, copy), and bitwise operations including shifts, masks, and bit sequence extraction. Each concept is demonstrated with RISC-V assembly functions called from Rust driver programs.
Learning Objectives¶
- Explain endianness and how RISC-V stores multi-byte values in memory
- Use byte load/store instructions (
lb,lbu,sb) for character and byte manipulation - Implement string iteration patterns in assembly with proper null termination
- Apply bitwise operations (AND, OR, XOR, NOT) and shifts in RISC-V assembly
- Extract bit sequences using the shift-and-mask technique
- Sign-extend extracted bit sequences using arithmetic shifts
Prerequisites¶
- RISC-V Assembly 1 & 2 (Lectures 05–06): registers, instructions, memory operations, control flow, functions, calling convention
- Stack frame management (prologue/epilogue pattern)
- Binary representation and two's complement
1. Endianness¶
Big Endian vs Little Endian¶
When storing multi-byte values, the order of bytes in memory depends on the system's endianness.
Consider the 32-bit integer 0x22AA33CC:
| Address | Big Endian | Little Endian |
|---|---|---|
| 0 | 0x22 (MSB) | 0xCC (LSB) |
| 1 | 0xAA | 0x33 |
| 2 | 0x33 | 0xAA |
| 3 | 0xCC (LSB) | 0x22 (MSB) |
Big endian: most significant byte (MSB) at the lowest address — bytes are stored in the order you'd write the number.
Little endian: least significant byte (LSB) at the lowest address — bytes are stored in reverse order.
RISC-V is little-endian: The least significant byte is stored at the lowest address.
Detecting Endianness¶
We can detect endianness by storing a multi-byte value and reading its first byte:
Rust (src/bin/endian.rs):
fn main() {
let x: i32 = 1;
let ptr = &x as *const i32 as *const u8;
let first_byte = unsafe { *ptr };
if first_byte == 1 {
println!("Little-endian");
} else {
println!("Big-endian");
}
}
The integer 1 is 0x00000001. On a little-endian system, the byte at the lowest address is 0x01, so first_byte == 1. On a big-endian system, the lowest address would hold 0x00.
Byte Access with lb¶
When loading individual bytes from a multi-byte value, endianness determines which byte you get:
# If 0x22AA33CC is stored at address in a0 (little-endian):
lw t0, (a0) # t0 = 0x22AA33CC (full word)
lb t1, 0(a0) # t1 = 0xCC (byte 0 — least significant)
lb t2, 1(a0) # t2 = 0x33 (byte 1)
lb t3, 2(a0) # t3 = 0xAA (byte 2)
lb t4, 3(a0) # t4 = 0x22 (byte 3 — most significant)
2. Byte Load/Store Instructions¶
Instruction Summary¶
| Instruction | Description | Sign Extension |
|---|---|---|
lb rd, offset(rs) |
Load byte, sign-extend to 64 bits | Yes (bit 7 → bits 8–63) |
lbu rd, offset(rs) |
Load byte unsigned, zero-extend | No (zeros in bits 8–63) |
sb rs, offset(rd) |
Store low byte of register to memory | N/A |
Sign extension: lb copies bit 7 of the loaded byte into bits 8–63. For ASCII characters (0–127), this doesn't matter. For values ≥ 128, lb produces a negative number while lbu produces a positive one.
Example: Pointer Dereference — intdr_s.s / intdr.rs¶
A simple example showing how assembly can dereference a pointer and modify the value:
Rust (src/bin/intdr.rs):
extern "C" {
fn intdr_s(p: *mut i32);
}
fn main() {
let args: Vec<String> = std::env::args().collect();
let val: i32 = if args.len() > 1 {
args[1].parse().unwrap()
} else {
5
};
// Rust version
let mut x = val;
x += 1;
println!("Rust: {}", x);
// Asm version
let mut x = val;
unsafe { intdr_s(&mut x as *mut i32) };
println!("Asm: {}", x);
}
Assembly (asm/intdr_s.s):
.global intdr_s
.text
# a0 = pointer to int
# increment the value at the pointer
intdr_s:
lw t0, 0(a0) # load value
addi t0, t0, 1 # increment
sw t0, 0(a0) # store back
ret
The assembly receives a pointer in a0, uses lw to load the 32-bit value, increments it, and uses sw to store it back. The Rust caller sees the modified value because both sides share the same memory location.
3. String Operations¶
C Strings in Memory¶
Strings in C (and when passed via FFI) are null-terminated arrays of bytes. Each character occupies one byte, and the string ends with a 0x00 byte.
In assembly, we process strings character by character using lb (load byte) and sb (store byte), advancing a pointer until we hit the null terminator.
The Rust FFI Pattern¶
All our string examples follow the same Rust FFI pattern:
extern "C" { ... }— declares the assembly functionCString::new(s)— creates a null-terminated C string from a Rust stringunsafe { ... }— required for calling foreign functions.as_ptr()— gets a raw pointer to the string data
String Length — strlen_s.s / strlen.rs¶
Rust (src/bin/strlen.rs):
use std::ffi::CString;
extern "C" {
fn strlen_s(s: *const u8) -> usize;
}
fn strlen_c(s: &str) -> usize {
s.len()
}
fn main() {
let args: Vec<String> = std::env::args().collect();
let s = if args.len() > 1 { &args[1] } else { "hello" };
let r = strlen_c(s);
println!("Rust: {}", r);
let cs = CString::new(s).unwrap();
let r = unsafe { strlen_s(cs.as_ptr() as *const u8) };
println!("Asm: {}", r);
}
Assembly (asm/strlen_s.s):
.global strlen_s
.text
# a0 = pointer to string
# return length of string
strlen_s:
li t0, 0 # length = 0
strlen_s_loop:
lb t1, 0(a0) # load byte
beqz t1, strlen_s_done # if null terminator, done
addi t0, t0, 1 # length++
addi a0, a0, 1 # advance pointer
j strlen_s_loop
strlen_s_done:
mv a0, t0 # return length
ret
Algorithm: Initialize a counter to 0. Load each byte; if it's zero (null terminator), return the counter. Otherwise, increment both the counter and the pointer, and loop.
String Copy — strcpy_s.s / strcpy.rs¶
Rust (src/bin/strcpy.rs):
use std::ffi::{CStr, CString};
extern "C" {
fn strcpy_s(dest: *mut u8, src: *const u8) -> *mut u8;
}
fn strcpy_c(dest: &mut [u8], src: &str) {
let bytes = src.as_bytes();
dest[..bytes.len()].copy_from_slice(bytes);
dest[bytes.len()] = 0;
}
fn main() {
let args: Vec<String> = std::env::args().collect();
let s = if args.len() > 1 { &args[1] } else { "hello" };
// Rust version
let mut buf = vec![0u8; 256];
strcpy_c(&mut buf, s);
let result = CStr::from_bytes_until_nul(&buf).unwrap();
println!("Rust: {}", result.to_str().unwrap());
// Asm version
let mut buf = vec![0u8; 256];
let cs = CString::new(s).unwrap();
unsafe { strcpy_s(buf.as_mut_ptr(), cs.as_ptr() as *const u8) };
let result = CStr::from_bytes_until_nul(&buf).unwrap();
println!("Asm: {}", result.to_str().unwrap());
}
Assembly (asm/strcpy_s.s):
.global strcpy_s
.text
# a0 = dest, a1 = src
# copy string from src to dest, return dest
strcpy_s:
mv t0, a0 # save dest
strcpy_s_loop:
lb t1, 0(a1) # load byte from src
sb t1, 0(a0) # store byte to dest
beqz t1, strcpy_s_done # if null terminator, done
addi a0, a0, 1 # advance dest
addi a1, a1, 1 # advance src
j strcpy_s_loop
strcpy_s_done:
mv a0, t0 # return original dest
ret
Algorithm: Copy each byte from source to destination, including the null terminator. We copy first, then check — this ensures the null terminator itself gets copied. Return the original destination pointer.
4. Bitwise Operations¶
Two's Complement Review¶
Before working with bitwise operations, recall how signed integers are represented:
Rust (src/bin/twos.rs):
fn main() {
let vals: Vec<i8> = vec![0, 1, -1, 127, -128, 42, -42];
for v in vals {
println!("{:4} = 0x{:02x}", v, v as u8);
}
}
Output:
Key insight: -1 is 0xFF (all ones), -128 is 0x80 (just the sign bit). The MSB (bit 7 for 8-bit values) is the sign bit.
Bitwise Operations in Assembly — bits_s.s / bits.rs¶
Rust (src/bin/bits.rs):
extern "C" {
fn and_s(a: u32, b: u32) -> u32;
fn or_s(a: u32, b: u32) -> u32;
fn xor_s(a: u32, b: u32) -> u32;
fn not_s(a: u32) -> u32;
fn sll_w(a: u32, n: u32) -> u32;
fn srl_w(a: u32, n: u32) -> u32;
fn sra_w(a: i32, n: u32) -> i32;
}
fn prbin(v: u32, bits: u32) {
for i in (0..bits).rev() {
print!("{}", (v >> i) & 1);
}
println!();
}
fn main() {
let a: u8 = 0b11001100;
let b: u8 = 0b10101010;
println!("a = 0x{:02x}", a);
prbin(a as u32, 8);
println!("b = 0x{:02x}", b);
prbin(b as u32, 8);
println!();
// AND, OR, XOR, NOT demonstrations...
let r = a & b;
print!("Rust: a & b = 0x{:02x} ", r);
prbin(r as u32, 8);
let r = unsafe { and_s(a as u32, b as u32) } as u8;
print!("Asm: a & b = 0x{:02x} ", r);
prbin(r as u32, 8);
// ... (similar for OR, XOR, NOT, shifts)
}
Assembly (asm/bits_s.s):
.global and_s
.global or_s
.global xor_s
.global not_s
.global sll_w
.global srl_w
.global sra_w
.text
# a0 = a, a1 = b
# return a & b
and_s:
and a0, a0, a1
ret
# a0 = a, a1 = b
# return a | b
or_s:
or a0, a0, a1
ret
# a0 = a, a1 = b
# return a ^ b
xor_s:
xor a0, a0, a1
ret
# a0 = a
# return ~a
not_s:
not a0, a0
ret
# a0 = a, a1 = n
# return a << n
sll_w:
sllw a0, a0, a1
ret
# a0 = a, a1 = n
# return a >> n (logical)
srl_w:
srlw a0, a0, a1
ret
# a0 = a, a1 = n
# return a >> n (arithmetic)
sra_w:
sraw a0, a0, a1
ret
Bitwise Operation Summary¶
| Operation | Instruction | Description | Example (8-bit) |
|---|---|---|---|
| AND | and rd, rs1, rs2 |
1 only if both bits are 1 | 11001100 & 10101010 = 10001000 |
| OR | or rd, rs1, rs2 |
1 if either bit is 1 | 11001100 \| 10101010 = 11101110 |
| XOR | xor rd, rs1, rs2 |
1 if bits differ | 11001100 ^ 10101010 = 01100110 |
| NOT | not rd, rs |
Flip all bits | ~11001100 = 00110011 |
Logical vs Arithmetic Shifts¶
| Instruction | Operation | Fill Bits | Use Case |
|---|---|---|---|
sllw rd, rs1, rs2 |
Shift left logical (32-bit) | Zeros on right | Multiply by powers of 2 |
srlw rd, rs1, rs2 |
Shift right logical (32-bit) | Zeros on left | Unsigned divide by powers of 2 |
sraw rd, rs1, rs2 |
Shift right arithmetic (32-bit) | Sign bit on left | Signed divide by powers of 2 |
The key difference: srlw fills vacated bits with zeros, while sraw fills them with copies of the sign bit. This matters for negative numbers:
0xF0000000 >> 4 (logical): 0x0F000000 (positive result)
0xF0000000 >> 4 (arithmetic): 0xFF000000 (stays negative)
5. Bit Sequence Extraction¶
The Problem¶
Extract a contiguous range of bits from a value. Given a number, a start bit position, and an end bit position, return the bits in that range as an unsigned integer.
The Algorithm¶
- Shift right by
startto move the desired bits to position 0 - Create a mask with
(end - start + 1)ones:(1 << len) - 1 - AND with the mask to zero out all other bits
Step-by-Step Worked Example¶
Extract bits 3–5 from 552:
552 = 0b1000101000
Step 1: Shift right by 3 (start position)
552 >> 3 = 69 = 0b1000101
Step 2: Create mask for 3 bits (end - start + 1 = 5 - 3 + 1 = 3)
(1 << 3) - 1 = 8 - 1 = 7 = 0b111
Step 3: AND with mask
69 & 7 = 0b1000101 & 0b111 = 0b101 = 5
Result: 5
Assembly Implementation — get_bitseq_s.s / get_bitseq.rs¶
Rust (src/bin/get_bitseq.rs):
extern "C" {
fn get_bitseq_s(num: u32, start: u32, end: u32) -> u32;
}
fn get_bitseq(num: u32, start: u32, end: u32) -> u32 {
let shifted = num >> start;
let mask = (1u32 << (end - start + 1)) - 1;
shifted & mask
}
fn main() {
let args: Vec<String> = std::env::args().collect();
let (num, start, end) = if args.len() > 3 {
(
args[1].parse::<u32>().unwrap(),
args[2].parse::<u32>().unwrap(),
args[3].parse::<u32>().unwrap(),
)
} else {
(552, 3, 5)
};
let r = get_bitseq(num, start, end);
println!("Rust: {}", r);
let r = unsafe { get_bitseq_s(num, start, end) };
println!("Asm: {}", r);
}
Assembly (asm/get_bitseq_s.s):
.global get_bitseq_s
.text
# a0 = num, a1 = start, a2 = end
# extract bits [start:end] from num
get_bitseq_s:
srl a0, a0, a1 # shift right by start
sub t0, a2, a1 # end - start
addi t0, t0, 1 # end - start + 1
li t1, 1
sll t1, t1, t0 # 1 << (end - start + 1)
addi t1, t1, -1 # mask = (1 << (end - start + 1)) - 1
and a0, a0, t1 # num & mask
ret
Note that this in-class version is a leaf function (no call instructions), so it needs no stack frame. Compare with the 2024 lecture version that handles the special case of extracting all 64 bits.
Signed Bit Sequence Extraction¶
Sometimes we need to treat the extracted bits as a signed value. The technique is:
- Call
get_bitseqto get the unsigned value - Shift left to put the extracted sign bit at the MSB of a 32-bit word
- Arithmetic shift right to propagate the sign bit
C-style pseudocode:
uint32_t get_bitseq_signed(int32_t num, int start, int end) {
uint32_t val = get_bitseq(num, start, end);
int len = (end - start) + 1;
int shift_amt = 32 - len;
val = val << shift_amt; // Shift sign bit to MSB
int val_signed = ((int) val) >> shift_amt; // Arithmetic shift right
return val_signed;
}
Assembly:
get_bitseq_signed_s:
addi sp, sp, -16
sd ra, (sp)
call get_bitseq_s # Get unsigned sequence in a0
sub t0, a2, a1 # end - start
addi t0, t0, 1 # len = end - start + 1
li t1, 32 # Word width
sub t1, t1, t0 # shift_amt = 32 - len
sllw t2, a0, t1 # Shift left to put sign bit at MSB
sraw a0, t2, t1 # Arithmetic shift right to sign-extend
ld ra, (sp)
addi sp, sp, 16
ret
Example: Extract bits 4–7 from 94117 as signed:
Unsigned result: 10 = 0b1010 (MSB is 1 → negative in 4-bit signed)
shift_amt = 32 - 4 = 28
Step 1: 0b1010 << 28 = 0xA0000000
Step 2: 0xA0000000 >> 28 (arithmetic) = 0xFFFFFFFA = -6
Result: -6
6. Byte Packing and Unpacking¶
Pack Bytes¶
Combine four individual bytes into a single 32-bit integer by shifting and OR-ing:
# pack_bytes_s: combine 4 bytes into 32-bit value
# a0 = b3 (MSB), a1 = b2, a2 = b1, a3 = b0 (LSB)
pack_bytes_s:
mv t0, a0 # val = b3
slli t0, t0, 8 # val <<= 8
or t0, t0, a1 # val |= b2
slli t0, t0, 8 # val <<= 8
or t0, t0, a2 # val |= b1
slli t0, t0, 8 # val <<= 8
or t0, t0, a3 # val |= b0
mv a0, t0 # return val
ret
How it works: Start with the most significant byte, shift left 8 to make room, OR in the next byte. Repeat until all four bytes are packed.
Unpack Bytes¶
Extract individual bytes from a 32-bit integer into an array:
# unpack_bytes_s: split value into byte array
# a0 = value
# a1 = array of uint32_t (4 elements)
unpack_bytes_s:
li t0, 0 # loop index
li t1, 4 # loop limit
unpack_loop:
beq t0, t1, unpack_done
andi t2, a0, 0xFF # mask off low byte
sw t2, (a1) # *arr = byte
srli a0, a0, 8 # shift to next byte
addi t0, t0, 1 # i++
addi a1, a1, 4 # advance array pointer
j unpack_loop
unpack_done:
ret
How it works: Use andi with 0xFF to isolate the lowest byte, store it, then shift right by 8 to bring the next byte into position. The loop processes bytes from LSB to MSB.
Key Concepts¶
| Concept | Description |
|---|---|
| Endianness | Byte ordering in memory; RISC-V is little-endian (LSB at lowest address) |
lb / lbu |
Load byte with sign extension / load byte unsigned (zero extension) |
sb |
Store the low byte of a register to memory |
| Null termination | C strings end with a 0x00 byte; loops check for this to find string end |
and, or, xor, not |
Bitwise operations: AND, OR, XOR, bitwise complement |
Logical shift (sll, srl) |
Shift that fills vacated bits with zeros |
Arithmetic shift (sra) |
Right shift that fills vacated bits with the sign bit |
| Bit mask | Pattern of bits used to extract specific bit positions: (1 << len) - 1 |
| Sign extension | Extending a value to more bits while preserving its signed interpretation |
Practice Problems¶
Problem 1: Population Count (popcount)¶
Write RISC-V assembly to count the number of 1-bits in a 32-bit value.
Click to reveal solution
.global popcount_s
# a0 = value to count bits in
# Returns count in a0
popcount_s:
li t0, 0 # count = 0
li t1, 32 # loop limit
li t2, 0 # loop index
popcount_loop:
beq t2, t1, popcount_done
andi t3, a0, 1 # Check LSB
add t0, t0, t3 # count += LSB
srli a0, a0, 1 # Shift right
addi t2, t2, 1 # i++
j popcount_loop
popcount_done:
mv a0, t0 # Return count
ret
Problem 2: Extract Nibble¶
Extract the nibble (4 bits) at position n from a 32-bit value (n=0 is bits 0–3, n=1 is bits 4–7, etc.).
Click to reveal solution
Multiply the nibble index by 4 to get the bit position, shift right, then mask with `0xF`.Problem 3: Case Conversion¶
Write assembly to convert a lowercase ASCII character to uppercase. Return non-lowercase characters unchanged.
Click to reveal solution
.global toupper_s
# a0 = character
# Returns uppercase version (or unchanged if not lowercase)
toupper_s:
li t0, 'a' # 97
blt a0, t0, toupper_done # if ch < 'a', return unchanged
li t0, 'z' # 122
bgt a0, t0, toupper_done # if ch > 'z', return unchanged
addi a0, a0, -32 # Convert: 'a' - 'A' = 32
toupper_done:
ret
Problem 4: Sign-Extend 8-bit Value¶
Write assembly to sign-extend an 8-bit value (in the low byte of a0) to a full 64-bit register.
Click to reveal solution
Shifting left by 56 puts bit 7 (the sign bit of the 8-bit value) at bit 63. The arithmetic right shift propagates this sign bit back through all 56 positions. Alternatively, you could use `lb` which does sign extension automatically.Problem 5: Swap Bytes¶
Write assembly to swap the two low bytes of a 32-bit value. For example, 0x0000AABB becomes 0x0000BBAA.
Click to reveal solution
.global swap_low_bytes_s
# a0 = value
# Returns value with low two bytes swapped
swap_low_bytes_s:
andi t0, a0, 0xFF # t0 = low byte (byte 0)
srli t1, a0, 8 # shift right by 8
andi t1, t1, 0xFF # t1 = byte 1
slli t0, t0, 8 # move byte 0 to byte 1 position
or a0, t0, t1 # combine: byte 0 in position 1, byte 1 in position 0
ret
Further Reading¶
- RISC-V ISA Specification — official standard including byte load/store instructions
- RISC-V Assembly Programmer's Manual — practical reference
- The RISC-V Reader — Patterson & Waterman textbook
- Hacker's Delight (Warren) — bit manipulation techniques
Summary¶
-
RISC-V is little-endian: The least significant byte is stored at the lowest memory address. This affects how multi-byte values appear when accessed byte-by-byte with
lb. -
Byte load/store (
lb,lbu,sb) are essential for string processing.lbsign-extends,lbuzero-extends. Always use byte operations when working with characters. -
String operations follow a pointer-advancing loop pattern: load a byte, check for null terminator, process, advance pointer, repeat. The Rust FFI uses
CStringandunsafeblocks to pass strings to assembly. -
Bitwise operations (AND, OR, XOR, NOT) and shifts map directly to single RISC-V instructions. The critical distinction is between logical shifts (fill with zeros) and arithmetic shifts (fill with sign bit).
-
Bit sequence extraction uses a three-step process: shift right to position 0, create a mask with
(1 << len) - 1, then AND. For signed extraction, use the shift-left-then-arithmetic-shift-right technique. -
Byte packing/unpacking combines shifts and OR/AND operations to assemble or disassemble multi-byte values from individual bytes.