Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.syntblaze.com/llms.txt

Use this file to discover all available pages before exploring further.

A String in Rust is a heap-allocated, growable, mutable, UTF-8 encoded text type. It is an owned data structure, meaning the String instance holds exclusive ownership of the underlying text buffer and automatically deallocates that heap memory when the String goes out of scope.

Internal Representation

Under the hood, a String is implemented as a wrapper around a Vec<u8> (a vector of bytes). The standard library enforces a strict guarantee that the bytes contained within this vector always form valid UTF-8.
// Conceptual representation of Rust's String
pub struct String {
    vec: Vec<u8>,
}

Memory Layout

A String consists of three machine words stored on the stack (24 bytes on a 64-bit architecture), which manage the dynamically sized data on the heap:
  1. Pointer (ptr): A memory address pointing to the first byte of the string’s data on the heap.
  2. Length (len): The number of bytes (not characters) currently occupied by the string’s contents.
  3. Capacity (capacity): The total number of bytes allocated on the heap. When len equals capacity and more data is pushed, the String triggers a reallocation to request a larger contiguous memory block.

Instantiation Syntax

A String can be initialized empty or derived from string literals (&str).
// Allocates an empty String (does not allocate heap memory until data is pushed)
let mut empty = String::new();

// Allocates a String with a specific initial capacity to prevent reallocation
let mut pre_allocated = String::with_capacity(10);

// Creating a String from a string slice (&str)
let s1 = String::from("hello");
let s2 = "hello".to_string();
let s3 = "hello".to_owned();

UTF-8 Implications and Indexing

Because String guarantees UTF-8 encoding, a single Unicode scalar value (a char in Rust) can occupy anywhere from 1 to 4 bytes. Consequently, Rust prohibits constant-time integer indexing (e.g., s[0]) on a String. Indexing by byte could return an incomplete, invalid character sequence. Instead, data extraction requires explicit iteration over either bytes or Unicode scalar values:
let s = String::from("🦀rust");

// Iterates over the raw bytes (4 bytes for the crab, 4 bytes for "rust")
for b in s.bytes() {
    // b is u8
}

// Iterates over valid Unicode scalar values
for c in s.chars() {
    // c is char
}

Mutation

As a growable buffer, a mutable String provides methods to append data. Appending operations validate UTF-8 encoding at compile-time (for literals) or runtime.
let mut s = String::from("foo");

// Appends a string slice (&str)
s.push_str("bar"); 

// Appends a single Unicode scalar value (char)
s.push('!');       

Relationship with &str (Deref Coercion)

String is the owned text type, whereas &str (string slice) is the borrowed text type. String implements the Deref<Target = str> trait. This allows the compiler to implicitly coerce a &String into a &str, granting String access to all methods defined on the str primitive.
fn print_length(slice: &str) {
    println!("{}", slice.len());
}

let owned_string = String::from("data");

// The compiler automatically coerces &String into &str
print_length(&owned_string); 
Master Rust with Deep Grasping Methodology!Learn More