String 与 &str

问题

Rust 中 String 和 &str 有什么区别？为什么有两种字符串类型？

答案

Rust 有两种核心字符串类型，它们的关系类似 Vec<T> 和 &[T]：

类型	存储位置	所有权	可变性	大小
`String`	堆上	拥有所有权	可变（`mut`）	3 个字段：ptr + len + cap（24 字节）
`&str`	任意位置的引用	借用，无所有权	不可变	2 个字段：ptr + len（16 字节）

两者都保证是有效的 UTF-8 编码。

内存布局

String：在堆上分配，拥有数据，大小可变。结构为 { ptr: *mut u8, len: usize, cap: usize }
&str：胖指针（fat pointer），指向 UTF-8 字节序列的切片引用。结构为 { ptr: *const u8, len: usize }

创建字符串

fn main() {
    // String
    let s1 = String::from("hello");
    let s2 = "hello".to_string();
    let s3 = String::new();              // 空字符串
    let s4 = String::with_capacity(10);  // 预分配容量
    let s5 = format!("{} {}", "hello", "world");

    // &str
    let s6: &str = "hello";             // 字符串字面量,类型是 &'static str
    let s7: &str = &s1[0..3];           // String 的切片 "hel"
    let s8: &str = &s1;                 // String → &str（Deref）
}

字符串字面量的类型

字符串字面量 "hello" 的类型是 &'static str——它被编译进程序的二进制文件中，在程序运行期间始终有效。

常用操作

fn main() {
    let mut s = String::from("hello");

    // 追加
    s.push(' ');            // 追加单个字符
    s.push_str("world");   // 追加字符串切片
    s += "!";               // + 运算符（消耗左操作数的所有权）

    // 插入
    s.insert(5, ',');       // 在索引处插入字符
    s.insert_str(6, " ");  // 在索引处插入字符串

    // 替换
    let new = s.replace("world", "Rust");      // 替换所有匹配（返回新 String）
    let new = s.replacen("l", "L", 1);         // 只替换前 n 个

    // 删除
    s.pop();                // 移除并返回最后一个字符
    s.truncate(5);          // 截断到指定长度
    s.clear();              // 清空

    // 长度
    let s = String::from("你好");
    assert_eq!(s.len(), 6);        // 字节长度（UTF-8 中每个中文 3 字节）
    assert_eq!(s.chars().count(), 2); // 字符数
}

UTF-8 编码与索引

Rust 的字符串是 UTF-8 编码，不支持直接索引（s[0] 会编译错误）：

let s = String::from("你好世界");

// s[0];  // ❌ 编译错误：String 不能直接索引

// 正确方式：
// 1. 字节切片（需确保在字符边界上）
let byte_slice = &s[0..3]; // "你"（UTF-8 中 '你' 占 3 字节）
// let bad = &s[0..2];     // ⚠️ 运行时 panic：不在字符边界

// 2. 遍历字符
for c in s.chars() {
    print!("{} ", c); // 你 好 世 界
}

// 3. 遍历字节
for b in s.bytes() {
    print!("{:02x} ", b); // e4 bd a0 e5 a5 bd e4 b8 96 e7 95 8c
}

为什么不支持索引？

UTF-8 是变长编码：ASCII 字符 1 字节，中文 3 字节，emoji 4 字节。s[i] 无法在 O(1) 时间返回"第 i 个字符"
**索引应返回什么？**字节、Unicode 标量值、还是字素簇（grapheme cluster）？不同场景需求不同
Rust 选择让开发者显式选择访问方式，避免隐式的性能陷阱

字符的三个层次

let s = "नमस्ते"; // 印地语

// 字节（bytes）
s.bytes();  // [224, 164, 168, 224, 164, 174, 224, 164, 184, ...]

// Unicode 标量值（chars）
s.chars();  // ['न', 'म', 'स', '्', 'त', 'े']

// 字素簇（grapheme clusters）—— 需要 unicode-segmentation crate
// ["न", "म", "स्", "ते"]  ← 人类理解的"字符"

String 与 &str 的转换

fn main() {
    // String → &str（无开销，Deref）
    let s = String::from("hello");
    let r: &str = &s;          // Deref Coercion
    let r: &str = s.as_str();  // 显式方法
    let r: &str = &s[..];      // 切片

    // &str → String（需要堆分配）
    let s: String = "hello".to_string();
    let s: String = String::from("hello");
    let s: String = "hello".to_owned();
}

函数参数应该用哪个？

// ✅ 推荐：接受 &str，兼容 String 和 &str
fn greet(name: &str) {
    println!("Hello, {}!", name);
}

// ❌ 不推荐：只能接受 &String
fn greet_limited(name: &String) {
    println!("Hello, {}!", name);
}

fn main() {
    let owned = String::from("Alice");
    let borrowed = "Bob";

    greet(&owned);     // ✅ &String → &str（Deref）
    greet(borrowed);   // ✅ &str 直接传

    greet_limited(&owned);   // ✅
    // greet_limited(borrowed); // ❌ 类型不匹配
}

如果函数需要拥有字符串（例如存入结构体），接受 String：

struct User {
    name: String, // 拥有自己的数据
}

impl User {
    // 接受 String 或可以转为 String 的类型
    fn new(name: impl Into<String>) -> Self {
        User { name: name.into() }
    }
}

fn main() {
    let u1 = User::new("Alice");             // &str → String
    let u2 = User::new(String::from("Bob")); // String 直接传
}

字符串拼接

fn main() {
    let s1 = String::from("hello");
    let s2 = String::from(" world");

    // 方式 1：+ 运算符（消耗 s1 的所有权）
    let s3 = s1 + &s2; // s1 被 move，s2 被借用
    // println!("{}", s1); // ❌ s1 已被 move

    // 方式 2：format! 宏（不消耗任何所有权）
    let s1 = String::from("hello");
    let s3 = format!("{}{}", s1, s2); // ✅ s1, s2 都还在

    // 方式 3：push_str（在原字符串上追加）
    let mut s = String::from("hello");
    s.push_str(" world");
}

大量拼接场景

频繁拼接时，format! 或 push_str 优于 +。更好的做法是预分配容量：

let parts = vec!["hello", " ", "world", "!"];
let mut result = String::with_capacity(parts.iter().map(|s| s.len()).sum());
for part in &parts {
    result.push_str(part);
}

其他字符串类型

类型	用途
`OsString` / `&OsStr`	操作系统原生字符串（可能不是 UTF-8）
`CString` / `&CStr`	C 语言字符串（以 `\0` 结尾）
`PathBuf` / `&Path`	文件路径（跨平台兼容）
`Cow<'a, str>`	"写时克隆"字符串，避免不必要的分配

常见面试问题

Q1: 为什么 Rust 有 `String` 和 `&str` 两种字符串类型？

答案：

这体现了 Rust 所有权系统的设计哲学——区分"拥有"和"借用"：

String：拥有堆上的字符串数据，可以修改、增长、传递所有权
&str：借用已有的字符串数据，零开销、不可变

类比 Vec<T> 和 &[T] 的关系。这种设计让编译器精确知道谁负责释放内存、谁只是在读取。

Q2: `String` 能直接索引吗？为什么？

答案：

不能，s[0] 会编译错误。原因：

UTF-8 是变长编码，s[0] 无法 O(1) 定位到第 n 个字符
"字符"的定义存在歧义（字节、Unicode 标量值、字素簇）
Rust 拒绝隐藏 O(n) 复杂度在看起来是 O(1) 的操作背后

替代方案：

s.chars().nth(n) — 获取第 n 个 Unicode 标量值
&s[start..end] — 字节范围切片（必须在字符边界上）
s.as_bytes()[n] — 获取第 n 个字节

Q3: `to_string()`、`to_owned()`、`String::from()` 有什么区别？

答案：

对于 &str → String，三者效果相同，细微差异：

方法	来源	说明
`String::from("hello")`	`From<&str>` trait	最直接，语义清晰
`"hello".to_string()`	`ToString` trait	通用，任何实现了 `Display` 的类型都可用
`"hello".to_owned()`	`ToOwned` trait	语义"从借用创建拥有版"，最符合所有权概念

推荐：String::from() 或 .to_string()，选择哪个是风格偏好。

Q4: `Cow<str>` 是什么？什么场景使用？

答案：

Cow（Clone on Write）——写时克隆，可以延迟分配：

use std::borrow::Cow;

fn process(input: &str) -> Cow<str> {
    if input.contains("bad") {
        // 需要修改：分配新 String
        Cow::Owned(input.replace("bad", "good"))
    } else {
        // 不需要修改：直接返回引用，零分配
        Cow::Borrowed(input)
    }
}

适用场景：函数大部分情况不需要修改输入，只在少数情况下需要修改。用 Cow 可以避免不必要的 clone()。

Q5: `+` 运算符拼接字符串为什么会消耗左操作数？

答案：

+ 运算符实际调用的是 fn add(self, s: &str) -> String，注意 self 不是引用——它取得左操作数的所有权：

let s1 = String::from("hello");
let s2 = String::from(" world");
let s3 = s1 + &s2;
// s1 被 move 进 add 方法，内部复用了 s1 的堆缓冲区，追加 s2 的内容
// 这比创建一个全新的 String 更高效

如果不想消耗所有权，用 format! 宏或 clone()。

Q6: 如何高效处理大量字符串拼接？

答案：

// 方案 1：预分配 + push_str
let mut result = String::with_capacity(1024);
for item in &items {
    result.push_str(item);
}

// 方案 2：collect
let result: String = items.iter().copied().collect();

// 方案 3：join
let result = items.join(", ");

关键：预分配容量，避免多次 reallocation。String::with_capacity() 预分配足够空间后，push_str 不会触发重新分配。

问题​

答案​

内存布局​

创建字符串​

常用操作​

UTF-8 编码与索引​

字符的三个层次​

String 与 &str 的转换​

函数参数应该用哪个？​

字符串拼接​

其他字符串类型​

常见面试问题​

Q1: 为什么 Rust 有 String 和 &str 两种字符串类型？​

Q2: String 能直接索引吗？为什么？​

Q3: to_string()、to_owned()、String::from() 有什么区别？​

Q4: Cow<str> 是什么？什么场景使用？​

Q5: + 运算符拼接字符串为什么会消耗左操作数？​

Q6: 如何高效处理大量字符串拼接？​

相关链接​

问题

答案