TL;DR

Rust’s character type char represents a single Unicode character, as a 32-bit value.

A value of type &String (pronounced “ref String”) is a reference to a String value, a &i32 is a reference to an i32, and so on.

The expression &x produces a reference to x; in Rust terminology, we say that it borrows a reference to x. Given a reference r, the expression *r refers to the value r points to.

Rust references come in 2 flavors:

  1. &T: an immutable, shared reference;
  2. &mut T: a mutable, exclusive reference;

A slice, written [T] without specifying the length, is a region of an array or vector. Since a slice can be any length, slices can’t be stored directly in variables or passed as function arguments. Slices are always passed by reference.

A reference to a slice is a fat pointer: a two-word value comprising a pointer to the slice’s first element, and the number of elements in the slice.

A reference to a slice is a non-owning pointer to a range of consecutive values in memory.

Since slices almost always appear behind references, we often just refer to types like &[T] or &str as “slices,” using the shorter name for the more common concept.

The types &[T] and &mut [T], called a shared slice of Ts and mutable slice of Ts, are references to a series of elements that are a part of some other value, like an array or vector.

A string literal with the b prefix is a byte string. Such a string is a slice of u8 values—that is, bytes—rather than Unicode text.

let method = b"GET";
assert_eq!(method, &[b'G', b'E', b'T']);

A String has a resizable buffer holding UTF-8 text. The buffer is allocated on the heap, so it can resize its buffer as needed or requested. Think of a String as a Vec<u8> that is guaranteed to hold well-formed UTF-8; in fact, this is how String is implemented.

A &str (pronounced “stir” or “string slice”) is a reference to a run of UTF-8 text owned by someone else: it “borrows” the text. A &str is a fat pointer, containing both the address of the actual data and its length. Think of a &str as being nothing more than a &[u8] that is guaranteed to hold well-formed UTF-8.

A string literal is a &str that refers to preallocated text, typically stored in read-only memory (in the executable) along with the program’s machine code.

&str is very much like &[T]: a fat pointer to some data. String is analogous to Vec<T>.

Rust has the raw pointer types *mut T and *const T.

Fixed-Width Numeric Types

Integer Types

The footing of Rust’s type system is a collection of fixed-width numeric types, chosen to match the types that almost all modern processors implement directly in hardware.

Fixed-width numeric types can overflow or lose precision, but they are adequate for most applications and can be thousands of times faster than representations like arbitrary-precision integers and exact rationals.

  • The rational numbers are the set of all numbers that can be written as fractions p/q, where p and q are integers.
Size (bits) Unsigned integer Signed integer Floating-point
8 u8 i8
16 u16 i16
32 u32 i32 f32
64 u64 i64 f64
128 u128 i128
Machine word usize isize
  • A machine word is a value the size of an address space on the machine the code runs on: they are 32 bits long on 32-bit architectures, and 64 bits long on 64-bit architectures.

    • Rust requires array indices to be usize values. Values representing the sizes of arrays or vectors or counts of the number of elements in some data structure also generally have the usize type.
  • Rust’s unsigned integer types use their full range to represent positive values and zero.

  • Rust’s signed integer types use the two’s complement representation, using the same bit patterns as the corresponding unsigned type to cover a range of positive and negative values.

  • Rust uses the u8 type for byte values.

  • Rust treats characters as distinct from the numeric types: a char is not a u8, nor is it a u32 (though it is 32 bits long).

  • Integer literals in Rust can take a suffix indicating their type.

    • 42u8 is a u8 value; 1729isize is an isize.
  • If an integer literal lacks a type suffix, Rust puts off determining its type until it finds the value being used in a way that pins it down: stored in a variable of a particular type, passed to a function that expects a particular type, compared with another value of a particular type, or something like that. In the end, if multiple types could work, Rust defaults to i32 if that is among the possibilities. Otherwise, Rust reports the ambiguity as an error.

    println!("{}", (-4).abs());
    // can't call method `abs` on ambiguous numeric type `{integer}`
    
    println!("{}", (-4_i32).abs()); // method calls have a higher precedence than unary prefix operators
    println!("{}", i32::abs(-4)); 
    
    • Rust wants to know exactly which integer type a value has before it will call the type’s own methods. The default of i32 applies only if the type is still ambiguous after all method calls have been resolved, so that’s too late to help here.
      • i32 作为默认类型的机制要在所有的方法调用都得到解析后才会生效,但这里的 .abs() 方法所属的类型始终无法确定,所以前面的机制无法生效。
  • The prefixes 0x, 0o, and 0b designate hexadecimal, octal, and binary literals.

  • To make long numbers more legible, you can insert underscores among the digits. The exact placement of the underscores is not significant.

    • 4_294_967_295; 0xffff_ffff; 127_u8.
  • Rust provides byte literals, character-like literals for u8 values.

    • b'A' represents the ASCII code for the character A, as a u8 value. Since the ASCII code for A is 65, the literals b'A' and 65u8 are exactly equivalent.
    • Byte literals are just another notation for u8 values. Only ASCII characters may appear in byte literals.
  • For characters that are hard to write or read, you can write their code in hexadecimal instead. A byte literal of the form b'\xHH', where HH is any two-digit hexadecimal number, represents the byte whose value is HH.

  • Convert from one integer type to another using the as operator.

    assert_eq!( 10_i8 as u16, 10_u16);       // in range
    assert_eq!( 2525_u16 as i16, 2525_i16);  // in range
    assert_eq!( -1_i16 as i32, -1_i32);      // sign-extended
    assert_eq!(65535_u16 as i32, 65535_i32); // zero-extended
    
    // Conversions that are out of range for the destination
    // produce values that are equivalent to the original modulo 2^N,
    // where N is the width of the destination in bits. This
    // is sometimes called "truncation."
    assert_eq!( 1000_i16 as u8, 232_u8);
    assert_eq!(65535_u32 as i16, -1_i16);
    assert_eq!( -1_i8 as u8, 255_u8);
    assert_eq!( 255_u8 as i8, -1_i8);
    

Checked, Wrapping, Saturating, and Overflowing Arithmetic

When an integer arithmetic operation overflows:

  1. In a debug build, Rust panics.
  2. In a release build, the operation wraps around: it produces the value equivalent to the mathematically correct result modulo the range of the value (truncation).
let mut i = 1i8; // i8: [-128, 127]
loop {
    i *= 10;
    println!("{}", i);
}
// cargo run
// 10
// 100
// thread 'main' panicked at 'attempt to multiply with overflow'

// cargo build --release
// ./target/release/ftypes > result
// 10
// 100
// -24  (100*10=1000 对应的二进制是 1111101000,截取低 8 位得到 11101000,就是 -24)
// 16   (-24*10=-240 对应的二进制是 00010000,就是 16)
// -96
// 64
// -128
// 0
// 0
// 0
  • The integer types provide methods to override default overflow behaviors.

    // panics in any build
    let mut i: i8 = 1;
    loop {
        i = i.checked_mul(10).expect("multiplication overflowed");
    }
    // thread 'main' panicked at 'multiplication overflowed'
    

These integer arithmetic methods fall in 4 general categories:

  1. Checked operations return an Option of the result: Some(v) if the mathematically correct result can be represented as a value of that type, or None if it cannot.

    assert_eq!(10_u8.checked_add(20), Some(30));
    // overflow
    assert_eq!(100_u8.checked_add(200), None);
    // Do the addition; panic if it overflows.
    let sum = x.checked_add(y).unwrap(); // Panics if the self value equals [`None`].
    // Oddly, signed division can overflow too, in one particular case.
    // A signed n-bit type can represent -2ⁿ⁻¹, but not 2ⁿ⁻¹.
    assert_eq!((-128_i8).checked_div(-1), None);
    
  2. Wrapping operations return the value equivalent to the mathematically correct result modulo the range of the value (truncation).

    assert_eq!(100_u16.wrapping_mul(200), 20000);
    assert_eq!(500_u16.wrapping_mul(500), 53392); // 250000 modulo 2¹⁶
    // Operations on signed types may wrap to negative values.
    assert_eq!(500_i16.wrapping_mul(500), -12144);
    
    // In bitwise shift operations, the shift distance
    // is wrapped to fall within the size of the value.
    // So a shift of 17 bits in a 16-bit type is a shift
    // of 1.
    assert_eq!(5_i16.wrapping_shl(17), 10);
    
  3. Saturating operations return the representable value that is closest to the mathematically correct result. The result is “clamped” to the maximum and minimum values the type can represent.

    assert_eq!(32760_i16.saturating_add(10), 32767);
    assert_eq!((-32760_i16).saturating_sub(10), -32768);
    
  4. Overflowing operations return a tuple (result, overflowed), where result is what the wrapping version of the function would return, and overflowed is a bool indicating whether an overflow occurred.

    assert_eq!(255_u8.overflowing_sub(2), (253, false));
    assert_eq!(255_u8.overflowing_add(2), (1, true));
    
    // A shift of 17 bits is too large for `u16`, and 17 modulo 16 is 1.
    assert_eq!(5_u16.overflowing_shl(17), (10, true));
    
    • overflowing_shl and overflowing_shr return true for overflowed only if the shift distance was as large or larger than the bit width of the type itself. The actual shift applied is the requested shift modulo the bit width of the type.

Floating-Point Types

Rust provides IEEE single- and double-precision floating-point types.

  • These types include positive and negative infinities (INFINITY, NEG_INFINITY), distinct positive and negative zero values, and a not-a-number (NAN) value.
  • Every part of a floating-point number after the integer part is optional, but at least one of the fractional part, exponent, or type suffix must be present, to distinguish it from an integer literal.
    • The fractional part may consist of a lone decimal point, so 5. is a valid floating-point constant.
  • If a floating-point literal lacks a type suffix, Rust checks the context to see how the values are used, much as it does for integer literals. If it ultimately finds that either floating-point type could fit, it chooses f64 by default.
    • For the purposes of type inference, Rust treats integer literals and floating-point literals as distinct classes: it will never infer a floating-point type for an integer literal, or vice versa.
Type Precision Range
f32 IEEE single precision (at least 6 decimal digits) Roughly $$–3.4 × 10^{38}$$ to $$+3.4 × 10^{38}$$
f64 IEEE double precision (at least 15 decimal digits) Roughly $$–1.8 × 10^{308}$$ to $$+1.8 × 10^{308}$$

The bool Type

Control structures like if and while require their conditions to be bool expressions, as do the short-circuiting logical operators && and ||.

Rust’s as operator can convert bool values to integer types:

assert_eq!(false as i32, 0);
assert_eq!(true as i32, 1);
  • as won’t convert in the other direction, from numeric types to bool. You must write out an explicit comparison like x != 0.
  • Although a bool needs only a single bit to represent it, Rust uses an entire byte for a bool value in memory, so you can create a pointer to it.
    • 内存单元的大小是 1 byte。

Characters

Rust’s character type char represents a single Unicode character, as a 32-bit value.

Rust uses the char type for single characters in isolation, but uses the UTF-8 encoding for strings and streams of text. A String represents its text as a sequence of UTF-8 bytes, not as an array of characters.

Character literals are characters enclosed in single quotes, like '8' or '!'.

You can write out a character’s Unicode code point in hexadecimal:

  • If the character’s code point is in the range U+0000 to U+007F (that is, if it is drawn from the ASCII character set), then you can write the character as '\xHH', where HH is a two-digit hexadecimal number.
  • You can write any Unicode character as '\u{HHHHHH}', where HHHHHH is a hexadecimal number up to six digits long, with underscores allowed for grouping as usual.

A char always holds a Unicode code point in the range 0x0000 to 0xD7FF, or 0xE000 to 0x10FFFF.

  • A char is never a surrogate pair half (that is, a code point in the range 0xD800 to 0xDFFF), or a value outside the Unicode codespace (that is, greater than 0x10FFFF).
  • Rust uses the type system and dynamic checks to ensure char values are always in the permitted range.

Rust never implicitly converts between char and any other type. You can use the as conversion operator to convert a char to an integer type; for types smaller than 32 bits, the upper bits of the character’s value are truncated.

assert_eq!('*' as i32, 42);
assert_eq!('ಠ' as u16, 0xca0);
assert_eq!('ಠ' as i8, -0x60); // U+0CA0 truncated to eight bits, signed

u8 is the only type the as operator will convert to char.

  • Rust intends the as operator to perform only cheap, infallible conversions, but every integer type other than u8 includes values that are not permitted Unicode code points, so those conversions would require run-time checks.
  • Instead, the standard library function std::char::from_u32 takes any u32 value and returns an Option<char>: if the u32 is not a permitted Unicode code point, then from_u32 returns None; otherwise, it returns Some(c), where c is the char result.

Tuples

A tuple is a pair, or triple, quadruple, quintuple, etc. (hence, n-tuple, or tuple), of values of assorted types.

  • You can write a tuple as a sequence of elements, separated by commas and surrounded by parentheses.
  • Given a tuple value t, you can access its elements as t.0, t.1, and so on.
  • Rust code often uses tuple types to return multiple values from a function.

Both tuples and arrays represent an ordered sequence of values.

  • Each element of a tuple can have a different type, whereas an array’s elements must be all the same type.
  • Tuples allow only constants as indices, like t.4. You can’t write t.i or t[i] to get the ith element.

Tuples can used as a sort of minimal struct type.

fn write_image(filename: &str, pixels: &[u8], bounds: (usize, usize)) -> Result<(), std::io::Error>;
  • The type of the bounds parameter is (usize, usize), a tuple of two usize values.
  • We could just as well write out separate width and height parameters, and the machine code would be about the same either way. It’s a matter of clarity.

The other commonly used tuple type is the zero-tuple ().

  • This is traditionally called the unit type because it has only one value, also written ().

  • A function that returns no value has a return type of ().

    fn swap<T>(x: &mut T, y: &mut T);
    // shorthand for
    fn swap<T>(x: &mut T, y: &mut T) -> ();
    
  • Rust uses the unit type where there’s no meaningful value to carry, but context requires some sort of type nonetheless.

    fn write_image(filename: &str, pixels: &[u8], bounds: (usize, usize)) -> Result<(), std::io::Error>;
    
    • write_image returns a std::io::Error value if something goes wrong, but returns no value on success.

Rust consistently permits an extra trailing comma everywhere commas are used: function arguments, arrays, struct and enum definitions, and so on.

  • You may include a comma after a tuple’s last element.
  • For consistency’s sake, there are tuples that contain a single value.
    • The literal ("lonely hearts",) is a tuple containing a single string; its type is (&str,). Here, the comma after the value is necessary to distinguish the singleton tuple from a simple parenthetic expression.

Pointer Types

In Java, if class Rectangle contains a field Vector2D upperLeft;, then upperLeft is a reference to another separately created Vector2D object. Objects never physically contain other objects in Java.

Rust is designed to help keep allocations to a minimum. Values nest by default. The value ((0, 0), (1440, 900)) is stored as four adjacent integers. If you store it in a local variable, you’ve got a local variable four integers wide. Nothing is allocated in the heap.

  • This is great for memory efficiency, but as a consequence, when a Rust program needs values to point to other values, it must use pointer types explicitly.

References, boxes, and unsafe pointers are 3 pointer types.

References

A value of type &String (pronounced “ref String”) is a reference to a String value, a &i32 is a reference to an i32, and so on. At run time, a reference to an i32 is a single machine word holding the address of the i32, which may be on the stack or in the heap.

The expression &x produces a reference to x; in Rust terminology, we say that it borrows a reference to x. Given a reference r, the expression *r refers to the value r points to.

  • These are very much like the & and *. operators in C and C++. Like a C pointer, a reference does not automatically free any resources when it goes out of scope.

Rust references are never null: there is simply no way to produce a null reference in safe Rust. Rust tracks the ownership and lifetimes of values, so mistakes like dangling pointers, double frees, and pointer invalidation are ruled out at compile time.

Rust references come in 2 flavors:

  1. &T: an immutable, shared reference.
    • You can have many shared references to a given value at a time, but they are read-only: modifying the value they point to is forbidden, as with const T* in C.
  2. &mut T: a mutable, exclusive reference.
    • You can read and modify the value it points to, as with a T* in C.
    • For as long as the reference exists, you may not have any other references of any kind to that value. In fact, the only way you may access the value at all is through the mutable reference.

Rust uses this dichotomy between shared and mutable references to enforce a “single writer or multiple readers” rule: either you can read and write the value, or it can be shared by any number of readers, but never both at the same time. This separation, enforced by compile-time checks, is central to Rust’s safety guarantees.

Boxes

The simplest way to allocate a value in the heap is to use Box::new.

let t = (12, "eggs");
let b = Box::new(t); // allocate a tuple in the heap
  • Box::new allocates enough memory to contain the tuple on the heap.
    • The type of t is (i32, &str), so the type of b is Box<(i32, &str)>.
  • When b goes out of scope, the memory is freed immediately, unless b has been moved—by returning it, for example.
    • Moves are essential to the way Rust handles heap-allocated values.

ChatGPT
Heap:

  • A region of memory used for dynamic memory allocation.
  • An unordered set of blocks that can be allocated and freed in any order.
  • Managed explicitly by the programmer or a memory allocator.
    • The programmer explicitly requests memory allocation and deallocation using functions like malloc or new to allocate memory and free or delete to deallocate memory.
  • Can be much larger than the stack and can grow dynamically as needed.
    • If memory fragmentation occurs or excessive memory is allocated without deallocation, it can lead to memory exhaustion or fragmentation issues.

ChatGPT
Stack:

  • A region of memory organized in Last-In-First-Out (LIFO) order.
  • Grows and shrinks automatically as functions are called and return.
    • Each function call creates a new stack frame, which contains information related to the function call, including:
      • Return address: The memory address where the program should return after the function call completes.
      • Function arguments
      • Local variables
      • Saved registers: The values of registers that need to be preserved during the function call.
      • Stack pointer: A pointer that keeps track of the current position on the stack.
  • Memory management on the stack is handled automatically by the compiler and the runtime system.
  • Typically limited in size. Its size is predetermined, and if it exceeds its capacity, a stack overflow can occur, resulting in program termination.
    • In most programming languages, the size of the stack is predetermined by the compiler or runtime system based on the architecture and configuration of the system. The stack typically has a fixed size set during the compilation or execution of the program. This predetermined size is often based on factors such as the operating system, hardware limitations, and the configuration of the program itself.
    • The stack size can vary depending on the programming language, compiler, operating system, and system settings. For example, in C and C++, you can sometimes specify the stack size during the program’s compilation using compiler flags or options. In other cases, the stack size may be determined by the operating system or runtime environment.

Raw Pointers

Rust has the raw pointer types *mut T and *const T. Raw pointers really are just like pointers in C++. Using a raw pointer is unsafe, because Rust makes no effort to track what it points to.

  • Raw pointers may be null, or they may point to memory that has been freed or that now contains a value of a different type.

You may only dereference raw pointers within an unsafe block. An unsafe block is Rust’s opt-in mechanism for advanced language features whose safety is up to you.

Arrays, Vectors, and Slices

Rust has 3 types for representing a sequence of values in memory:

  1. The type [T; N] represents an array of N values, each of type T.
    • An array’s size is a constant determined at compile time and is part of the type; you can’t append new elements or shrink an array.
  2. The type Vec<T>, called a vector of Ts, is a dynamically allocated (on heap), growable sequence of values of type T.
    • A vector’s elements are allocated on the heap, so you can resize vectors at will: push new elements onto them, append other vectors to them, delete elements, and so on.
  3. The types &[T] and &mut [T], called a shared slice of Ts and mutable slice of Ts, are references to a series of elements that are a part of some other value, like an array or vector.
    • You can think of a slice as a pointer to its first element, together with a count of the number of elements you can access starting at that point.
    • A mutable slice &mut [T] lets you read and modify elements, but can’t be shared; a shared slice &[T] lets you share access among several readers, but doesn’t let you modify elements.

Given a value v of any of these three types, the expression v.len() gives the number of elements in v, and v[i] refers to the ith element of v. The first element is v[0], and the last element is v[v.len() - 1].

  • Rust checks that i always falls within this range; if it doesn’t, the expression panics.
  • The length of v may be zero, in which case any attempt to index it will panic.
  • i must be a usize value; you can’t use any other integer type as an index.

Arrays

let lazy_caterer: [u32; 6] = [1, 2, 4, 7, 11, 16];
let taxonomy = ["Animalia", "Arthropoda", "Insecta"];
assert_eq!(lazy_caterer[3], 7);
assert_eq!(taxonomy.len(), 3);

let mut sieve = [true; 10000];

[V; N] produces an array of length N filled with value V.

  • [true; 10000] is an array of 10,000 bool elements, all set to true.
  • [0u8; 1024] can be a one-kilobyte buffer, filled with zeros.
  • Rust has no notation for an uninitialized array. In general, Rust ensures that code can never access any sort of uninitialized value.

The useful methods on arrays—iterating over elements, searching, sorting, filling, filtering, and so on—are all provided as methods on slices, not arrays. Rust implicitly converts an array to a slice when searching for methods, so you can call any slice method on an array directly:

let mut chaos = [3, 5, 4, 1, 2];
chaos.sort();
assert_eq!(chaos, [1, 2, 3, 4, 5]);
  • Rust implicitly produces a &mut [i32] slice referring to the entire array and passes that to sort to operate on.

Vectors

let mut primes = vec![2, 3, 5, 7];
assert_eq!(primes.iter().product::<i32>(), 210);
primes.push(11);
primes.push(13);
assert_eq!(primes.iter().product::<i32>(), 30030);

fn new_pixel_buffer(rows: usize, cols: usize) -> Vec<u8> {
    vec![0; rows * cols]
}

let mut pal = Vec::new();
pal.push("step");
pal.push("on");
pal.push("no");
pal.push("pets");
assert_eq!(pal, vec!["step", "on", "no", "pets"]);

let v: Vec<i32> = (0..5).collect();
assert_eq!(v, [0, 1, 2, 3, 4]);
  • The simplest way to create vectors is to use the vec! macro. It’s equivalent to calling Vec::new to create a new, empty vector and then pushing the elements onto it.
  • Another way is to build a vector from the values produced by an iterator.
    • You’ll often need to supply the type when using collect, because it can build many different sorts of collections, not just vectors.

As with arrays, you can use slice methods on vectors.

let mut palindrome = vec!["a man", "a plan", "a canal", "panama"];
palindrome.reverse();
assert_eq!(palindrome, vec!["panama", "a canal", "a plan", "a man"]);
  • The call implicitly borrows a &mut [&str] slice from the vector and invokes reverse on that.

A Vec<T> consists of three values:

  1. A pointer to the heap-allocated buffer for the elements, which is created and owned by the Vec<T>;
  2. The number of elements that buffer has the capacity to store;
    • A vector’s capacity method returns returns the number of elements it could hold without reallocation.
  3. The number it actually contains now (its length).
let mut v = Vec::with_capacity(2);
assert_eq!(v.len(), 0);
assert_eq!(v.capacity(), 2);
v.push(1);
v.push(2);
assert_eq!(v.len(), 2);
assert_eq!(v.capacity(), 2);
v.push(3);
assert_eq!(v.len(), 3);
// Typically prints "capacity is now 4"
// isn’t guaranteed to be exactly 4, but it will be at least 3
println!("capacity is now {}", v.capacity());
  • Vec::with_capacity to create a vector with a buffer of specified capacity.
  • When the buffer has reached its capacity, adding another element to the vector entails allocating a larger buffer, copying the present contents into it, updating the vector’s pointer and capacity to describe the new buffer, and finally freeing the old one.
let mut v = vec![10, 20, 30, 40, 50];
// Make the element at index 3 be 35.
v.insert(3, 35);
assert_eq!(v, [10, 20, 30, 35, 40, 50]);
// Remove the element at index 1.
v.remove(1);
assert_eq!(v, [10, 30, 35, 40, 50]);

let languages: Vec<String> = std::env::args().skip(1).collect();
    for l in languages {
        println!("{}: {}", l,
            if l.len() % 2 == 0 {
                "functional"
            } else {
                "imperative"
        });
}

let mut v = vec!["Snow Puff", "Glass Gem"];
assert_eq!(v.pop(), Some("Glass Gem"));
assert_eq!(v.pop(), Some("Snow Puff"));
assert_eq!(v.pop(), None);
  • pop removes the last element and return it.
    • Popping a value from a Vec<T> returns an Option<T>: None if the vector was already empty, or Some(v) if its last element had been v.

Vec is an ordinary type defined in Rust, not built into the language.

Slices

A slice, written [T] without specifying the length, is a region of an array or vector. Since a slice can be any length, slices can’t be stored directly in variables or passed as function arguments. Slices are always passed by reference.

A reference to a slice is a fat pointer: a two-word value comprising a pointer to the slice’s first element, and the number of elements in the slice.

let v: Vec<f64> = vec![0.0, 0.707, 1.0, 0.707]; // heap
let a: [f64; 4] = [0.0, -0.707, -1.0, -0.707];
// Rust automatically converts the &Vec<f64> reference and the 
// &[f64; 4] reference to slice references that point directly to the data.
let sv: &[f64] = &v;
let sa: &[f64] = &a;

Whereas an ordinary reference is a non-owning pointer to a single value, a reference to a slice is a non-owning pointer to a range of consecutive values in memory. It’s good choice when you want to write a function that operates on either an array or a vector.

fn print(n: &[f64]) {
    for elt in n {
        println!("{}", elt);
    }
}
print(&a); // works on arrays
print(&v); // works on vectors

You can get a reference to a slice of an array or vector, or a slice of an existing slice, by indexing it with a range.

print(&v[0..2]);    // print the first two elements of v
print(&a[2..]);     // print elements of a starting with a[2]
print(&sv[1..3]);   // print v[1] and v[2]
  • Trying to borrow a slice that extends past the end of the data results in a panic.
  • Since slices almost always appear behind references, we often just refer to types like &[T] or &str as “slices,” using the shorter name for the more common concept.

String Types

String Literals

String literals are enclosed in double quotes. They use the same backslash escape sequences as char literals.

  • In string literals, unlike char literals, single quotes don’t need a backslash escape, and double quotes do.

  • A string may span multiple lines.

    println!("In the room the women come and go,
        Singing of Mount Abora");
    
    • The newline character in that string literal is included in the string and therefore in the output. So are the spaces at the beginning of the second line.
  • If one line of a string ends with a backslash, then the newline character and the leading whitespace on the next line are dropped.

    println!("It was a bright, cold day in April, and \
        there were four of us—\
        more or less.");
    

In a few cases, the need to double every backslash in a string is a nuisance. For these cases, Rust offers raw strings. A raw string is tagged with the lowercase letter r. All backslashes and whitespace characters inside a raw string are included verbatim in the string. No escape sequences are recognized.

let default_win_install_path = r"C:\Program Files\Gorillas";
let pattern = Regex::new(r"\d+(\.\d+)*");

println!(r###"
    This raw string started with 'r###"'.
    Therefore it does not end until we reach a quote mark ('"')
    followed immediately by three pound signs ('###'):
"###);
  • You can’t include a double-quote character in a raw string simply by putting a backslash in front of it. The start and end of a raw string can be marked with pound signs. This way you can include double-quote characters in a raw string.
    • You can add as few or as many pound signs as needed to make it clear where the raw string ends.

Byte Strings

A string literal with the b prefix is a byte string. Such a string is a slice of u8 values—that is, bytes—rather than Unicode text.

let method = b"GET";
assert_eq!(method, &[b'G', b'E', b'T']);
  • The type of method is &[u8; 3]. It’s a reference to an array of three bytes.

Byte strings can span multiple lines, use escape sequences, and use backslashes to join lines. Raw byte strings start with br".

Byte strings can’t contain arbitrary Unicode characters. They must make do with ASCII and \xHH escape sequences.

Strings in Memory

Rust strings are sequences of Unicode characters, but they are not stored in memory as arrays of chars. Instead, they are stored using UTF-8, a variable-width encoding.

  • Each ASCII character in a string is stored in one byte.
  • Other characters take up multiple bytes.
let noodles = "noodles".to_string();
print!(
    "length: {}, capacity: {}\n",
    noodles.len(),
    noodles.capacity()
);
// length: 7, capacity: 7
// cargo 1.65.0 (4bc8f24d3 2022-10-20)
let oodles = &noodles[1..];
let poodles = "ಠ_ಠ";

  • A String has a resizable buffer holding UTF-8 text. The buffer is allocated on the heap, so it can resize its buffer as needed or requested. Think of a String as a Vec<u8> that is guaranteed to hold well-formed UTF-8; in fact, this is how String is implemented.

  • A &str (pronounced “stir” or “string slice”) is a reference to a run of UTF-8 text owned by someone else: it “borrows” the text. A &str is a fat pointer, containing both the address of the actual data and its length. Think of a &str as being nothing more than a &[u8] that is guaranteed to hold well-formed UTF-8.

    • A string literal is a &str that refers to preallocated text, typically stored in read-only memory (in the executable) along with the program’s machine code.
    • It is impossible to modify a &str. For creating new strings at run time, use String type.
    • The type &mut str does exist, but it is not very useful, since almost any operation on UTF-8 can change its overall byte length, and a slice cannot reallocate its referent. In fact, the only operations available on &mut str are make_ascii_uppercase and make_ascii_lowercase, which modify the text in place and affect only single-byte characters, by definition.
      • 长度变了,胖指针也变了。
  • A String or &str’s .len() method returns its length. The length is measured in bytes, not characters.

    assert_eq!("ಠ_ಠ".len(), 7);
    assert_eq!("ಠ_ಠ".chars().count(), 3);
    

String

&str is very much like &[T]: a fat pointer to some data. String is analogous to Vec<T>.

Like a Vec, each String has its own heap-allocated buffer that isn’t shared with any other String. When a String variable goes out of scope, the buffer is automatically freed, unless the String was moved.

There are several ways to create Strings:

let error_message = "too many pets".to_string();
assert_eq!(format!("{}°{:02}′{:02}″N", 24, 5, 23),
           "24°05′23″N".to_string());
let bits = vec!["veni", "vidi", "vici"];
assert_eq!(bits.concat(), "venividivici");
assert_eq!(bits.join(", "), "veni, vidi, vici");
  • The .to_string() method converts a &str to a String. This copies the string.
  • The format!() macro works just like println!(), except that it returns a new String instead of writing text to stdout, and it doesn’t automatically add a newline at the end.
  • Arrays, slices, and vectors of strings have two methods, .concat() and .join(sep), that form a new String from many strings.

A &str can refer to any slice of any string, whether it is a string literal (stored in the executable) or a String (allocated and freed at run time). This means that &str is more appropriate for function arguments when the caller should be allowed to pass either kind of string.

Using Strings

assert!("ONE".to_lowercase() == "one");
assert!("peanut".contains("nut"));
assert_eq!("ಠ_ಠ".replace("ಠ", "■"), "■_■");
assert_eq!(" clean\n".trim(), "clean");
for word in "veni, vidi, vici".split(", ") {
    assert!(word.starts_with("v"));
}
  • Strings support the == and != operators. Two strings are equal if they contain the same characters in the same order (regardless of whether they point to the same location in memory).
    • Given the nature of Unicode, simple char-by-char comparison does not always give the expected answers. The Rust strings "th\u{e9}" and "the\u{301}" are both valid Unicode representations for thé, the French word for tea. Unicode says they should both be displayed and processed in the same way, but Rust treats them as two completely distinct strings.
  • Strings also support the comparison operators <, <=, >, and >=, as well as many useful methods and functions.
    • Rust’s ordering operators like < use a simple lexicographical order based on character code point values. This ordering only sometimes resembles the ordering used for text in the user’s language and culture.

Other String-Like Types

Rust guarantees that strings are valid UTF-8. Sometimes a program really needs to be able to deal with strings that are not valid Unicode. This usually happens when a Rust program has to interoperate with some other system that doesn’t enforce any such rules.

Rust offers a few string-like types for these situations:

  • Stick to String and &str for Unicode text.
  • When working with filenames, use std::path::PathBuf and &Path instead.
  • When working with binary data that isn’t UTF-8 encoded at all, use Vec<u8> and &[u8].
  • When working with environment variable names and command-line arguments in the native form presented by the operating system, use OsString and &OsStr.
  • When interoperating with C libraries that use null-terminated strings, use std::ffi::CString and &CStr.

Type Aliases

type Bytes = Vec<u8>;
fn decode(data: &Bytes) {}

The type keyword can be used like typedef in C++ to declare a new name for an existing type.

Beyond the Basics

Rust’s user-defined types give the language much of its flavor, because that’s where methods are defined. There are 3 kinds of user-defined types: structs, enums, and traits.


References