Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.syntblaze.com/llms.txt

Use this file to discover all available pages before exploring further.

The char type in Rust represents a single Unicode Scalar Value. Unlike characters in languages such as C or C++ where a char is typically a single byte, a Rust char is guaranteed to be exactly 4 bytes (32 bits) in size. This fixed width allows it to represent any valid Unicode character, from standard ASCII to complex emojis and ideographs.

Syntax and Initialization

A char literal is defined using single quotes ('). It can be instantiated using literal characters, ASCII escapes (strictly limited to the \x00 to \x7F range), or Unicode escape sequences.
let ascii: char = 'R';
let emoji: char = '🦀';
let newline: char = '\n';
let ascii_escape: char = '\x41';    // Represents 'A'
let unicode_hex: char = '\u{2764}'; // Represents '❤'

Memory Representation and Constraints

Because a char is a Unicode Scalar Value, its bitwise representation is strictly bound to valid Unicode code points.
  • Size: 4 bytes (32 bits).
  • Valid Range: U+0000 to U+D7FF and U+E000 to U+10FFFF.
  • Invalid Range: The range U+D800 to U+DFFF is reserved for UTF-16 surrogate pairs and is strictly forbidden from being instantiated as a char in Rust. Attempting to transmute or unsafely construct a char within this range results in undefined behavior.

Relationship with Strings

It is critical to distinguish char from Rust’s string types (String and &str).
  • A char is always a fixed 4-byte UCS-4/UTF-32 representation.
  • Strings in Rust are UTF-8 encoded, meaning their characters are variable-width (1 to 4 bytes).
Consequently, a string is not an array of chars in memory; it is an array of u8 bytes. Iterating over a string using .chars() performs on-the-fly decoding of UTF-8 bytes into 4-byte char values.
let text = "Rust🦀"; 

// Size in memory: 8 bytes (4 for "Rust", 4 for "🦀")
let string_bytes = text.len(); 

// Number of Unicode Scalar Values: 5
let char_count = text.chars().count(); 

Type Casting and Conversion

Because not all 32-bit integers are valid Unicode Scalar Values, converting between integers and char requires specific handling. From char to Integer: Casting a char to a u32 is always safe and can be done using the as keyword. Casting to smaller integer types (like u8) will truncate the upper bits.
let c = 'A';
let val_u32 = c as u32; // 65
let val_u8 = c as u8;   // 65
From Integer to char: Casting a u32 directly to a char using as is not permitted because the compiler cannot guarantee the u32 falls within the valid Unicode Scalar Value range. Instead, you must use the fallible char::from_u32 associated function, which returns an Option<char>.
let valid_char = char::from_u32(0x2764); // Returns Some('❤')
let invalid_char = char::from_u32(0xD800); // Returns None (Surrogate pair)

Encoding Introspection

While a char is always 4 bytes in memory, Rust provides built-in methods to determine how much space a specific char would occupy if it were encoded into variable-width formats like UTF-8 or UTF-16.
let c = '🦀';
let utf8_size = c.len_utf8();   // Returns 4 (bytes)
let utf16_size = c.len_utf16(); // Returns 2 (16-bit code units)
Master Rust with Deep Grasping Methodology!Learn More