Binary and Bases in C¶

Overview¶

This lecture covers the C-specific implementation of binary representation, bitwise operations, and base conversion functions for NTLang. We examine C's fixed-width integer types, bitwise and shift operators with their precedence rules, and the implementation of string-to-integer and integer-to-string conversion functions. We also cover width masking and signed output formatting.

Learning Objectives¶

Use fixed-width integer types from <stdint.h> for predictable behavior
Apply C's bitwise operators (~, &, |, ^) and understand precedence gotchas
Implement shift operations and understand signed vs unsigned shift behavior
Write string-to-integer conversion with base detection and overflow handling
Implement integer-to-string conversion for decimal, binary, and hexadecimal
Apply width masking and sign extension for NTLang output formatting

Prerequisites¶

Understanding of binary, hex, and two's complement (previous lecture)
C programming with pointers and arrays
Familiarity with the NTLang project structure

1. C Integer Types and `<stdint.h>`¶

Native Integer Types¶

C's native integer types have platform-dependent sizes:

Type	Typical Size	Guaranteed Minimum
`char`	1 byte	1 byte
`short`	2 bytes	2 bytes
`int`	4 bytes	2 bytes
`long`	4 or 8 bytes	4 bytes
`long long`	8 bytes	8 bytes

The size of int and long varies across platforms, which can cause bugs.

Fixed-Width Types from `<stdint.h>`¶

For systems programming, use fixed-width types that guarantee exact sizes:

#include <stdint.h>

uint32_t value = 42;      /* Exactly 32 bits, unsigned */
int32_t  signed_val = -5; /* Exactly 32 bits, signed */
uint8_t  byte = 0xFF;     /* Exactly 8 bits, unsigned */
int8_t   sbyte = -128;    /* Exactly 8 bits, signed */
uint16_t half = 0xFFFF;   /* Exactly 16 bits, unsigned */

Type Summary for NTLang¶

Type	Width	Range	NTLang Use
`uint32_t`	32 bits	0 to 4,294,967,295	Primary value type
`int32_t`	32 bits	-2,147,483,648 to 2,147,483,647	Signed interpretation
`uint8_t`	8 bits	0 to 255	Single byte/character

Using `sizeof`¶

printf("uint32_t is %zu bytes\n", sizeof(uint32_t));  /* Always 4 */
printf("int is %zu bytes\n", sizeof(int));             /* Platform-dependent */

2. Bitwise Operators in C¶

The Four Bitwise Operators¶

Operator	Name	Example	Result
`~`	Bitwise NOT	`~0x0F`	`0xFFFFFFF0`
`&`	Bitwise AND	`0xCA & 0xF0`	`0xC0`
`\\|`	Bitwise OR	`0xCA \\| 0x0F`	`0xCF`
`^`	Bitwise XOR	`0xCA ^ 0xFF`	`0x35`

C Code Examples¶

uint32_t a = 0xCA;      /* 1100 1010 */
uint32_t b = 0xF0;      /* 1111 0000 */

uint32_t not_a   = ~a;       /* 0xFFFFFF35 (inverts all 32 bits) */
uint32_t and_ab  = a & b;    /* 0xC0 */
uint32_t or_ab   = a | b;    /* 0xFA */
uint32_t xor_ab  = a ^ b;    /* 0x3A */

Operator Precedence Gotcha¶

Bitwise operators have lower precedence than comparison operators:

/* BUG: This checks (a) & (0x80 == 0x80), not (a & 0x80) == 0x80 */
if (a & 0x80 == 0x80) { ... }

/* CORRECT: Use parentheses */
if ((a & 0x80) == 0x80) { ... }

Rule: Always parenthesize bitwise operations in conditions.

Precedence (high to low)	Operators
Arithmetic	`+`, `-`, `*`, `/`
Shift	`<<`, `>>`
Comparison	`<`, `>`, `==`, `!=`
Bitwise	`&`, `^`, `\\|`
Logical	`&&`, `\\|\\|`

3. Shift Operators in C¶

Left Shift (`<<`)¶

Shifts bits left, filling with zeros:

uint32_t x = 1;
uint32_t shifted = x << 4;  /* 0x10 = 16 */
/* 0000 0001 → 0001 0000 */

Equivalent to multiplying by \(2^n\).

Right Shift (`>>`)¶

For unsigned types: logical shift (fills with 0):

uint32_t x = 0x80;         /* 1000 0000 */
uint32_t shifted = x >> 4; /* 0x08 = 0000 1000 */

For signed types: the behavior is implementation-defined. Most compilers perform an arithmetic shift (fills with sign bit), but this is not guaranteed by the C standard.

int32_t x = -16;           /* 0xFFFFFFF0 */
int32_t shifted = x >> 2;  /* Usually 0xFFFFFFFC = -4 (ASR) */
/* But technically implementation-defined! */

Casting for Arithmetic Shift¶

To ensure arithmetic shift behavior, cast to signed:

uint32_t value = 0xFFFFFFF0;
int32_t signed_result = ((int32_t)value) >> 2;  /* ASR */

To ensure logical shift, cast to unsigned:

int32_t value = -16;
uint32_t logical_result = ((uint32_t)value) >> 2;  /* LSR */

NTLang Shift Operators¶

NTLang defines three shift operators:

NTLang Op	Token	C Equivalent	Description
`<<`	`TK_LSL`	`<<`	Logical shift left
`>>`	`TK_LSR`	`>>` (unsigned)	Logical shift right
`>-`	`TK_ASR`	`>>` (signed cast)	Arithmetic shift right

Implementation of >- (ASR) in the evaluator:

case OP_ASR:
    result = (uint32_t)(((int32_t)left) >> right);
    break;

4. String-to-Integer in C¶

The `conv_str_to_uint32()` Function¶

This function converts a string representation (decimal, binary, or hex) to a uint32_t value.

Base Detection from Prefix¶

uint32_t conv_str_to_uint32(char *str) {
    int base = 10;      /* Default base */
    int start = 0;      /* Starting index */

    /* Detect base from prefix */
    if (str[0] == '0' && str[1] == 'b') {
        base = 2;       /* Binary: "0b1010" */
        start = 2;
    } else if (str[0] == '0' && str[1] == 'x') {
        base = 16;      /* Hex: "0xFF" */
        start = 2;
    }

    /* ... convert digits ... */
}

Character-to-Digit Conversion¶

int digit;
char c = tolower(str[i]);  /* Handle uppercase hex */

if (c >= '0' && c <= '9') {
    digit = c - '0';       /* '0'→0, '9'→9 */
} else if (c >= 'a' && c <= 'f') {
    digit = c - 'a' + 10;  /* 'a'→10, 'f'→15 */
} else {
    /* Invalid character */
    exit(-1);
}

Right-to-Left Place-Value Approach¶

uint32_t conv_str_to_uint32(char *str) {
    uint32_t value = 0;
    uint32_t place = 1;    /* Current place value (base^i) */
    int base, start;
    int len;

    /* ... base detection (above) ... */

    len = strlen(str);

    /* Process digits right-to-left */
    for (int i = len - 1; i >= start; i--) {
        int digit;
        char c = tolower(str[i]);

        if (c >= '0' && c <= '9') {
            digit = c - '0';
        } else if (c >= 'a' && c <= 'f') {
            digit = c - 'a' + 10;
        } else {
            exit(-1);
        }

        value += digit * place;
        place *= base;
    }

    return value;
}

Overflow Consideration¶

For 32-bit values, the maximum input is 0xFFFFFFFF (4,294,967,295). Values larger than this will silently wrap around due to unsigned integer overflow in C, which is well-defined behavior.

5. Integer-to-String in C¶

Overview of Conversion Functions¶

NTLang needs three conversion functions:

Function	Output Format	Example
`conv_uint32_to_decstr()`	Decimal digits	`"255"`
`conv_uint32_to_binstr()`	`0b` prefix + binary	`"0b11111111"`
`conv_uint32_to_hexstr()`	`0x` prefix + hex	`"0xFF"`

Decimal Conversion¶

void conv_uint32_to_decstr(uint32_t value, char *str) {
    char temp[32];
    int i = 0;

    if (value == 0) {
        str[0] = '0';
        str[1] = '\0';
        return;
    }

    /* Extract digits (reverse order) */
    while (value > 0) {
        temp[i] = '0' + (value % 10);
        value /= 10;
        i++;
    }

    /* Reverse into output string */
    int len = i;
    for (int j = 0; j < len; j++) {
        str[j] = temp[len - 1 - j];
    }
    str[len] = '\0';
}

Binary Conversion¶

void conv_uint32_to_binstr(uint32_t value, char *str) {
    char temp[64];
    int i = 0;

    /* Add prefix */
    str[0] = '0';
    str[1] = 'b';

    if (value == 0) {
        str[2] = '0';
        str[3] = '\0';
        return;
    }

    /* Extract bits (reverse order) */
    while (value > 0) {
        temp[i] = '0' + (value % 2);
        value /= 2;
        i++;
    }

    /* Reverse into output after prefix */
    int len = i;
    for (int j = 0; j < len; j++) {
        str[2 + j] = temp[len - 1 - j];
    }
    str[2 + len] = '\0';
}

Hexadecimal Conversion¶

void conv_uint32_to_hexstr(uint32_t value, char *str) {
    char temp[32];
    char hex_digits[] = "0123456789ABCDEF";
    int i = 0;

    /* Add prefix */
    str[0] = '0';
    str[1] = 'x';

    if (value == 0) {
        str[2] = '0';
        str[3] = '\0';
        return;
    }

    /* Extract hex digits (reverse order) */
    while (value > 0) {
        temp[i] = hex_digits[value % 16];
        value /= 16;
        i++;
    }

    /* Reverse into output after prefix */
    int len = i;
    for (int j = 0; j < len; j++) {
        str[2 + j] = temp[len - 1 - j];
    }
    str[2 + len] = '\0';
}

6. Masking and Width in C¶

The `conv_width_mask()` Function¶

Compute a mask for a given bit width:

uint32_t conv_width_mask(int width) {
    if (width == 32) {
        return 0xFFFFFFFF;  /* Special case: 1 << 32 is undefined */
    }
    return (1 << width) - 1;
}

Why the special case? Shifting by the bit width of the type is undefined behavior in C. 1 << 32 for a 32-bit integer is undefined.

Applying the Mask¶

uint32_t value = 0xDEADBEEF;
uint32_t mask = conv_width_mask(8);   /* 0xFF */
uint32_t masked = value & mask;       /* 0xEF */

Sign Extension¶

When displaying a value as signed, check the MSB of the masked value and sign-extend:

/* Check MSB of the width-bit value */
int msb = width - 1;
if ((value >> msb) & 1) {
    /* MSB is 1: sign-extend by setting upper bits to 1 */
    for (int i = width; i < 32; i++) {
        value |= (1 << i);
    }
}

The `conv_uint32_to_str()` Dispatch Function¶

The main conversion function selects the appropriate format:

void conv_uint32_to_str(uint32_t value, char *str, int base) {
    switch (base) {
        case 2:
            conv_uint32_to_binstr(value, str);
            break;
        case 10:
            conv_uint32_to_decstr(value, str);
            break;
        case 16:
            conv_uint32_to_hexstr(value, str);
            break;
        default:
            exit(-1);
    }
}

7. Signed Output in C¶

NTLang Output Flags¶

NTLang supports several output modes controlled by command-line flags:

Flag	Meaning	Example Output
`-b 2`	Binary output	`0b1010`
`-b 10`	Decimal output (default)	`10`
`-b 16`	Hex output	`0xA`
`-w N`	Width in bits	(affects masking)
`-u`	Unsigned output	`4294967286`
(default)	Signed output	`-10`

MSB Check for Sign¶

/* After masking to width bits: */
uint32_t mask = conv_width_mask(width);
uint32_t masked = value & mask;

/* Check if MSB of masked value is set */
int msb = width - 1;
bool is_negative = (masked >> msb) & 1;

Signed Output Implementation¶

void conv_print_signed(uint32_t value, int width, int base) {
    uint32_t mask = conv_width_mask(width);
    value = value & mask;
    char str[64];

    int msb = width - 1;
    if ((value >> msb) & 1) {
        /* Negative: sign-extend to 32 bits, then negate */
        for (int i = width; i < 32; i++) {
            value |= (1 << i);
        }
        /* Now value is the 32-bit two's complement representation */
        uint32_t positive = -value;  /* Negate using unsigned arithmetic */
        conv_uint32_to_str(positive, str, base);
        printf("-%s", str);
    } else {
        /* Positive: just convert */
        conv_uint32_to_str(value, str, base);
        printf("%s", str);
    }
}

Complete NTLang Output Logic¶

uint32_t mask = conv_width_mask(width);
uint32_t result = eval(tree) & mask;
char str[64];

if (unsigned_flag) {
    /* -u flag: print as unsigned */
    conv_uint32_to_str(result, str, base);
    printf("%s\n", str);
} else {
    /* Default: print as signed */
    conv_print_signed(result, width, base);
    printf("\n");
}

8. C vs Rust Comparison¶

Aspect	C	Rust
Fixed-width types	`uint32_t` from `<stdint.h>`	`u32` built-in
NOT operator	`~`	`!`
Signed right shift	Implementation-defined	Always arithmetic for `i32`
Overflow behavior	Undefined (signed), wrapping (unsigned)	Panic in debug, wrapping in release
String conversion	Manual `char` array manipulation	`format!` macro or manual
Error handling	`exit(-1)`	`Result`/`Option` types
Type casting	`(int32_t)value`	`value as i32`

Key Concepts¶

Concept	Description
`<stdint.h>`	Fixed-width integer types (`uint32_t`, `int32_t`)
Bitwise operators	`~`, `&`, `\\|`, `^` operate on individual bits
Precedence gotcha	Bitwise ops have lower precedence than comparisons
`>>` on signed	Implementation-defined in C (usually ASR)
`conv_str_to_uint32()`	String to integer with base detection
`conv_uint32_to_*str()`	Integer to string (dec, bin, hex)
`conv_width_mask()`	`(1 << w) - 1` with special case for w=32
Sign extension	Set upper bits to 1 for negative values
`value = -value`	Unsigned negation wraps correctly

Practice Problems¶

Problem 1: Precedence Bug¶

What does this print? How do you fix it?

uint32_t flags = 0xFF;
if (flags & 0x10 != 0) {
    printf("bit 4 is set\n");
}

Click to reveal solution

**Bug**: `!=` has higher precedence than `&`, so this evaluates as `flags & (0x10 != 0)` which is `flags & 1` = `0xFF & 0x01` = `1` (truthy, but for the wrong reason). **Fix**: Add parentheses:

if ((flags & 0x10) != 0) {
    printf("bit 4 is set\n");
}

Both versions happen to print the message for `0xFF`, but for `flags = 0x20`, the buggy version would still print (checking bit 0), while the fixed version correctly wouldn't.

Problem 2: Implement Hex String-to-Integer¶

Write a function that converts a hex string (without 0x prefix) to uint32_t using the left-to-right algorithm.

Click to reveal solution

uint32_t hex_to_uint32(char *str) {
    uint32_t value = 0;

    for (int i = 0; str[i] != '\0'; i++) {
        int digit;
        char c = tolower(str[i]);

        if (c >= '0' && c <= '9') {
            digit = c - '0';
        } else if (c >= 'a' && c <= 'f') {
            digit = c - 'a' + 10;
        } else {
            exit(-1);  /* Invalid character */
        }

        value = value * 16 + digit;
    }

    return value;
}

/* Test: hex_to_uint32("1A3") returns 419 */

Problem 3: Width Mask Edge Cases¶

What does conv_width_mask() return for widths 1, 4, 8, and 32?

Click to reveal solution

Width 1:  (1 << 1) - 1  = 2 - 1   = 1          = 0x00000001
Width 4:  (1 << 4) - 1  = 16 - 1  = 15         = 0x0000000F
Width 8:  (1 << 8) - 1  = 256 - 1 = 255        = 0x000000FF
Width 32: special case              = 4294967295 = 0xFFFFFFFF

The width-32 case must be handled separately because `1 << 32` is undefined behavior in C (shifting by the full width of the type).

Problem 4: Sign Extension by Hand¶

Apply sign extension to the value 0x0B with width 4. What is the 32-bit signed result?

Click to reveal solution

Value: 0x0B = 0000 1011
Width: 4, so we only care about bits 3-0: 1011

MSB of 4-bit value (bit 3) = 1 → negative

Sign extend: set bits 4-31 to 1
  Before: 0000 0000 0000 0000 0000 0000 0000 1011
  After:  1111 1111 1111 1111 1111 1111 1111 1011
        = 0xFFFFFFFB

As signed 32-bit: -5

Verify: in 4-bit two's complement, 1011 = -5 ✓

Summary¶

Fixed-width types from <stdint.h> (uint32_t, int32_t) ensure predictable sizes across platforms, unlike native int and long.
Bitwise operators (~, &, |, ^) work on individual bits. Always parenthesize them in conditions due to precedence rules.
Shift operators (<<, >>) multiply/divide by powers of 2. Right shift on signed types is implementation-defined in C — use explicit casts when you need guaranteed behavior.
String-to-integer conversion detects the base from prefix (0b, 0x), converts characters to digits, and accumulates the value using place values.
Integer-to-string conversion uses division/modulo to extract digits in reverse order, then reverses the result. Separate functions handle decimal, binary, and hex output.
Width masking with (1 << w) - 1 isolates the lower bits. The width-32 case needs special handling to avoid undefined behavior.
Signed output checks the MSB of the masked value, sign-extends if negative, negates to get the positive magnitude, and prepends a - sign.

Binary and Bases in C¶

Overview¶

Learning Objectives¶

Prerequisites¶

1. C Integer Types and <stdint.h>¶

Native Integer Types¶

Fixed-Width Types from <stdint.h>¶

Type Summary for NTLang¶

Using sizeof¶

2. Bitwise Operators in C¶

The Four Bitwise Operators¶

C Code Examples¶

Operator Precedence Gotcha¶

3. Shift Operators in C¶

Left Shift (<<)¶

Right Shift (>>)¶

Casting for Arithmetic Shift¶

NTLang Shift Operators¶

4. String-to-Integer in C¶

The conv_str_to_uint32() Function¶

Base Detection from Prefix¶

Character-to-Digit Conversion¶

Right-to-Left Place-Value Approach¶

Overflow Consideration¶

5. Integer-to-String in C¶

Overview of Conversion Functions¶

Decimal Conversion¶

Binary Conversion¶

Hexadecimal Conversion¶

6. Masking and Width in C¶

The conv_width_mask() Function¶

Applying the Mask¶

Sign Extension¶

The conv_uint32_to_str() Dispatch Function¶

7. Signed Output in C¶

NTLang Output Flags¶

MSB Check for Sign¶

Signed Output Implementation¶

Complete NTLang Output Logic¶

8. C vs Rust Comparison¶

Key Concepts¶

Practice Problems¶

Problem 1: Precedence Bug¶

Problem 2: Implement Hex String-to-Integer¶

Problem 3: Width Mask Edge Cases¶

Problem 4: Sign Extension by Hand¶

Summary¶

1. C Integer Types and `<stdint.h>`¶

Fixed-Width Types from `<stdint.h>`¶

Using `sizeof`¶

Left Shift (`<<`)¶

Right Shift (`>>`)¶

The `conv_str_to_uint32()` Function¶

The `conv_width_mask()` Function¶

The `conv_uint32_to_str()` Dispatch Function¶