References in Rust
Contents
The simple Box<T>
heap pointer, and the pointers internal to String
and Vec
values are owning pointers: when the owner is dropped, the referent goes with it.
Rust also has non-owning pointer types called references, which have no effect on their referents’ lifetimes.
References must never outlive their referents. You must make it apparent in your code that no reference can possibly outlive the value it points to. To emphasize this, Rust refers to creating a reference to some value as borrowing the value: what you have borrowed, you must eventually return to its owner.
The references themselves are nothing special—under the hood, they’re just addresses. But the rules that keep them safe are novel to Rust. And although these rules are the part of Rust that requires the most effort to master, the breadth of classic, absolutely everyday bugs they prevent is surprising, and their effect on multithreaded programming is liberating. This is Rust’s radical wager, again.
TL;DR
Rust lets you borrow a reference to the value of any sort of expression at all.
fn factorial(n: usize) -> usize {
(1..n+1).product()
}
let r = &factorial(6);
// Arithmetic operators can see through one level of references.
assert_eq!(r + &1009, 1729);
- In situations like this, Rust simply creates an anonymous variable to hold the expression’s value and makes the reference point to that.
References to Values
use std::collections::HashMap;
type Table = HashMap<String, Vec<String>>;
fn show(table: Table) {
// iterates in no specific order;
// the outer for loop takes ownership of the hash table and consumes it entirely
for (artist, works) in table {
println!("works by {}:", artist);
// the inner for loop does the same to each of the vectors
for work in works {
println!(" {}", work);
}
}
}
fn main() {
let mut table = Table::new();
table.insert("Gesualdo".to_string(),
vec!["many madrigals".to_string(),
"Tenebrae Responsoria".to_string()]);
table.insert("Caravaggio".to_string(),
vec!["The Musicians".to_string(),
"The Calling of St. Matthew".to_string()]);
table.insert("Cellini".to_string(),
vec!["Perseus with the head of Medusa".to_string(),
"a salt cellar".to_string()]);
show(table);
}
assert_eq!(table["Gesualdo"][0], "many madrigals"); // error: borrow of moved value: `table`
HashMap
is notCopy
, since it owns a dynamically allocated table. So when the program callsshow(table)
, the whole structure gets moved to the function, leaving the variabletable
uninitialized. The outerfor
loop takes ownership of the hash table and consumes it entirely; and the innerfor
loop does the same to each of the vectors. Because of move semantics, we’ve completely destroyed the entire structure simply by trying to print it out.
The right way to handle this is to use references. A reference lets you access a value without affecting its ownership. References come in two kinds:
- A shared reference lets you read but not modify its referent.
- You can have as many shared references to a particular value at a time.
- The expression
&e
yields a shared reference toe
’s value; ife
has the typeT
, then&e
has the type&T
, pronounced “refT
.” - Shared references are
Copy
.
- If you have a mutable reference to a value, you may both read and modify the value.
- You may not have any other references of any sort to that value active at the same time.
- The expression
&mut e
yields a mutable reference toe
’s value; you write its type as&mut T
, which is pronounced “ref muteT
.” - Mutable references are not
Copy
.
Think of the distinction between shared and mutable references as a way to enforce a multiple readers or single writer rule at compile time. In fact, this rule doesn’t apply only to references; it covers the borrowed value’s owner as well. As long as there are shared references to a value, not even its owner can modify it; the value is locked down. Similarly, if there is a mutable reference to a value, it has exclusive access to the value; you can’t use the owner at all, until the mutable reference goes away.
Keeping sharing and mutation fully separate turns out to be essential to memory safety.
The printing function in our example doesn’t need to modify the table, just read its contents. So the caller should be able to pass it a shared reference to the table.
fn show(table: &Table) {
for (artist, works) in table {
println!("works by {}:", artist);
for work in works {
println!(" {}", work);
}
}
}
show(&table);
fn sort_works(table: &mut Table) {
for (_artist, works) in table {
works.sort();
}
}
sort_works(&mut table);
- References are non-owning pointers, so the table variable remains the owner of the entire structure;
show
has just borrowed it for a bit. The type ofshow
’s parameter table has changed fromTable
to&Table
: instead of passing the table by value (and hence moving ownership into the function), we’re now passing a shared reference.- Iterating over a shared reference to a
HashMap
is defined to produce shared references to each entry’s key and value:artist
has changed from aString
to a&String
, andworks
from aVec<String>
to a&Vec<String>
. - The inner loop is changed similarly. Iterating over a shared reference to a vector is defined to produce shared references to its elements, so
work
is now a&String
.
- Iterating over a shared reference to a
When we pass a value to a function in a way that moves ownership of the value to the function, we say that we have passed it (the value) by value. If we instead pass the function a reference to the value, we say that we have passed the value by reference.
Working with References
Rust References Versus C++ References
Rust references and C++ references are both just addresses at the machine level.
In C++, references are created implicitly by conversion, and dereferenced implicitly too.
// C++ code!
int x = 10;
int &r = x; // initialization creates reference implicitly
assert(r == 10); // implicitly dereference r to see x's value
r = 20; // stores 20 in x, r itself still points to x
In Rust, references are created explicitly with the &
operator, and dereferenced explicitly with the *
operator.
let x = 10;
let r = &x; // &x is a shared reference to x
assert!(*r == 10); // explicitly dereference r
let mut y = 32;
let m = &mut y; // &mut y is a mutable reference to y
*m += 32; // explicitly dereference m to set y's value
assert!(*m == 64); // and to see y's new value
However, the .
operator is an exception.
struct Anime { name: &'static str, bechdel_pass: bool };
let aria = Anime { name: "Aria: The Animation", bechdel_pass: true };
let anime_ref = &aria;
assert_eq!(anime_ref.name, "Aria: The Animation");
// Equivalent to the above, but with the dereference written out:
assert_eq!((*anime_ref).name, "Aria: The Animation");
- Since references are so widely used in Rust, the
.
operator implicitly dereferences its left operand, if needed.- The
println!
macro used in theshow
function expands to code that uses the.
operator, so it takes advantage of this implicit dereference as well.
- The
The .
operator can also implicitly borrow a reference to its left operand, if needed for a method call.
let mut v = vec![1973, 1968];
v.sort(); // implicitly borrows a mutable reference to v
(&mut v).sort(); // equivalent, but more verbose
In a nutshell, whereas C++ converts implicitly between references and lvalues (that is, expressions referring to locations in memory), with these conversions appearing anywhere they’re needed, in Rust you use the &
and *
operators to create and follow references, with the exception of the .
operator, which borrows and dereferences implicitly.
Assigning References
In Rust, assigning a reference to a variable makes that variable point somewhere new.
let x = 10;
let y = 20;
let mut r = &x;
if b { r = &y; }
assert!(*r == 10 || *r == 20);
C++ references behave very differently: assigning a value to a reference in C++ stores the value in its referent. Once a C++ reference has been initialized, there’s no way to make it point at anything else.
// C++ code!
int x = 10;
int &r = x;
r = 20; // stores 20 in x, r itself still points to x
References to References
Rust permits references to references.
struct Point { x: i32, y: i32 }
let point = Point { x: 1000, y: 729 };
let r: &Point = &point;
let rr: &&Point = &r;
let rrr: &&&Point = &rr;
assert_eq!(rrr.y, 729);
- The
.
operator follows as many references as it takes to find its target.
Comparing References
Like the .
operator, Rust’s comparison operators “see through” any number of references.
let x = 10;
let y = 10;
let rx = &x;
let ry = &y;
let rrx = ℞
let rry = &ry;
assert!(rrx <= rry);
assert!(rrx == rry);
-
The final assertion here succeeds, even though
rrx
andrry
point at different values (namely,rx
andry
), because the==
operator follows all the references and performs the comparison on their final targets,x
andy
.- This is almost always the behavior you want, especially when writing generic functions.
-
If you actually want to know whether two references point to the same memory, you can use
std::ptr::eq
, which compares them as addresses.assert!(rx == ry); // their referents are equal assert!(!std::ptr::eq(rx, ry)); // but occupy different addresses
-
The operands of a comparison must have exactly the same type, including the references.
assert!(rx == rrx); // error: type mismatch: `&i32` vs `&&i32` assert!(rx == *rrx); // this is okay
References Are Never Null
Rust references are never null. There’s no analogue to C’s NULL
or C++’s nullptr
. There is no default initial value for a reference (you can’t use any variable until it’s
been initialized, regardless of its type) and Rust won’t convert integers to references (outside of unsafe
code), so you can’t convert zero into a reference.
- You can’t use any variable until it’s been initialized, regardless of its type.
C and C++ code often uses a null pointer to indicate the absence of a value.
- The
malloc
function returns either a pointer to a new block of memory ornullptr
if there isn’t enough memory available to satisfy the request.
In Rust, if you need a value that is either a reference to something or not, use the type Option<&T>
. At the machine level, Rust represents None
as a null pointer, and Some(r)
, where r
is a &T
value, as the nonzero address, so Option<&T>
is just as efficient as a nullable pointer in C or C++, even though it’s safer: its type requires you to check whether it’s None
before you can use it.
Borrowing References to Arbitrary Expressions
Whereas C and C++ only let you apply the &
operator to certain kinds of expressions, Rust lets you borrow a reference to the value of any sort of expression at all.
fn factorial(n: usize) -> usize {
(1..n+1).product()
}
let r = &factorial(6);
// Arithmetic operators can see through one level of references.
assert_eq!(r + &1009, 1729);
- In situations like this, Rust simply creates an anonymous variable to hold the expression’s value and makes the reference point to that. The lifetime of this anonymous variable depends on what you do with the reference:
- If you immediately assign the reference to a variable in a
let
statement (or make it part of some struct or array that is being immediately assigned), then Rust makes the anonymous variable live as long as the variable thelet
initializes. In our example, Rust would do this for the referent ofr
. - Otherwise, the anonymous variable lives to the end of the enclosing statement. In our example, the anonymous variable created to hold
1009
lasts only to the end of theassert_eq!
statement.
- If you immediately assign the reference to a variable in a
If you’re used to C or C++, this may sound error-prone. But remember that Rust will never let you write code that would produce a dangling reference. If the reference could ever be used beyond the anonymous variable’s lifetime, Rust will always report the problem to you at compile time. You can then fix your code to keep the referent in a named variable with an appropriate lifetime.
References to Slices and Trait Objects
Besides references that are simple addresses, Rust also includes two kinds of fat pointers, two-word values carrying the address of some value, along with some further information necessary to put the value to use.
- A reference to a slice is a fat pointer: a two-word value comprising a pointer to the slice’s first element, and the number of elements in the slice.
- Rust’s other kind of fat pointer is a trait object, a reference to a value that implements a certain trait.
- A trait object carries a value’s address and a pointer to the trait’s implementation appropriate to that value, for invoking the trait’s methods.
Aside from carrying this extra data, slice and trait object references behave just like the other sorts of references: they don’t own their referents, they are not allowed to outlive their referents, they may be mutable or shared, and so on.
Reference Safety
Borrowing a Local Variable
{
let r;
{
let x = 1;
r = &x;
}
assert_eq!(*r, 1);
}
// | r = &x;
// | ^^ borrowed value does not live long enough
// | }
// | - `x` dropped here while still borrowed
// | assert_eq!(*r, 1);
// | ----------------- borrow later used here
- The variables
r
andx
both have a lifetime, extending from the point at which they’re initialized until the point that the compiler can prove they are no longer in use. The third lifetime is that of a reference type: the type of the reference we borrow tox
and store inr
.
A lifetime is some stretch of your program for which a reference could be safe to use: a statement, an expression, the scope of some variable, or the like. Lifetimes are entirely figments of Rust’s compile-time imagination. At run time, a reference is nothing but an address; its lifetime is part of its type and has no run-time representation.
Rust tries to assign each reference type a lifetime that meets the constraints imposed by how it is used.
Some constraints:
-
If you have a variable
x
, then a reference tox
(&x
) must not outlivex
itself.- The variable’s lifetime must contain or enclose that of the reference borrowed from it. This constraint limits how large a reference’s lifetime can be.
- Beyond the point where
x
goes out of scope, the reference&x
would be a dangling pointer. - This constraint limits how large a reference’s lifetime can be.
- 超出此范围后,
x
被回收,&x
的生命周期也随之结束。
- 超出此范围后,
-
If you store a reference in a variable
r
(makesr
&i32
type), the reference’s type (&i32
) must be good for the entire lifetime of the variable, from its initialization until its last use.r
保存着&x
,即r
依赖&x
,显然被依赖的&x
的生命周期要至少和r
一样长。- If the reference (
&x
) can’t live at least as long as the variable does, then at some pointr
will be a dangling pointer. The reference’s lifetime must contain or enclose the variable’s.x
被回收,&x
生命周期结束成为 dangling pointer,r
依旧指向该 dangling pointer。
- This constraint limits how small a reference’s lifetime can be.
Rust simply tries to find a lifetime for each reference that satisfies all these constraints.
-
The reference’s lifetime must be contained by
x
’s, but fully encloser
’s. There is no such lifetime. -
The following would work.
{ let x = 1; { let r = &x; r = &x; } assert_eq!(*r, 1); }
These rules apply in a natural way when you borrow a reference to some part of some larger data structure, like an element of a vector:
let v = vec![1, 2, 3];
let r = &v[1];
- Since
v
owns the vector, which owns its elements, the lifetime ofv
must enclose that of the reference type of&v[1]
. - Similarly, if you store a reference in some data structure, its lifetime must enclose that of the data structure. If you build a vector of references, all of them must have lifetimes enclosing that of the variable that owns the vector.
The principle that Rust uses for all code: first, understand the constraints arising from the way the program uses references; then, find lifetimes that satisfy them.
Receiving References as Function Arguments
Suppose we have a function f
that takes a reference and stores it in a global variable.
// This code has several problems, and doesn't compile.
static mut STASH: &i32;
fn f(p: &i32) { STASH = p; }
Rust’s equivalent of a global variable is called a static: it’s a value that’s created when the program starts and lasts until it terminates. Like any other declaration, Rust’s module system controls where statics are visible, so they’re only “global” in their lifetime, not their visibility.
- Every static must be initialized.
- Mutable statics are inherently not thread-safe. Any thread can access a static at any time. Even in single-threaded programs, they can fall prey to other sorts of reentrancy problems.
- For these reasons, you may access a mutable
static
only within anunsafe
block.
- For these reasons, you may access a mutable
Revised version (still not good enough)):
static mut STASH: &i32 = &128;
fn f(p: &i32) {
unsafe {
STASH = p;
}
}
-
The signature of
f
as written here is actually shorthand for the following:fn f<'a>(p: &'a i32) {}
- The lifetime
'a
(pronounced “tick A”) is a lifetime parameter off
. You can read<'a>
as “for any lifetime'a
”. We’re defining a function that takes a reference to ani32
with any given lifetime'a
.'a
可以理解为某个长度的生命周期的标识。
- Since we must allow
'a
to be any lifetime, things had better work out if it’s the smallest possible lifetime: one just enclosing the call tof
.- 作为函数参数,最短的生命周期就是函数调用的生命周期。
- The lifetime
-
The assignment
STASH = p;
raises some problem. SinceSTASH
lives for the program’s entire execution, the reference type it holds must have a lifetime of the same length; Rust calls this the'static
lifetime. But the lifetime ofp
’s reference is some'a
, which could be anything, as long as it encloses the call tof
. So, Rust rejects our code. At this point, it’s clear that our function can’t accept just any reference as an argument.
Another revised version:
static mut STASH: &i32 = &10;
fn f(p: &'static i32) {
unsafe {
STASH = p;
}
}
static WORTH_POINTING_AT: i32 = 1000;
f(&WORTH_POINTING_AT);
f
’s signature spells out that p must be a reference with lifetime'static
.- We can only apply
f
to references to other statics, but that’s the only thing that’s certain not to leaveSTASH
dangling anyway.
We were unable to write a function that stashed a reference in a global variable without reflecting that intention in the function’s signature. In Rust, a function’s signature always exposes the body’s behavior.
Conversely, if we see a function with a signature like g(p: &i32)
(or with the lifetimes written out, g<'a>(p: &'a i32)
), we can tell that it does not stash its argument p
anywhere that will outlive the call. There’s no need to look into g
’s definition; the signature alone tells us what g
can and can’t do with its argument.
Passing References to Functions
fn g<'a>(p: &'a i32) {}
let x = 10;
g(&x);
- From
g
’s signature alone, Rust knows it will not savep
anywhere that might outlive the call: any lifetime that encloses the call must work for'a
. So Rust chooses the smallest possible lifetime for&x
: that of the call tog
.- This meets all constraints: it doesn’t outlive
x
, and it encloses the entire call tog
. So this code passes muster.
- Although
g
takes a lifetime parameter'a
, we didn’t need to mention it when callingg
. You only need to worry about lifetime parameters when defining functions and types; when using them, Rust infers the lifetimes for you.
- This meets all constraints: it doesn’t outlive
fn f(p: &'static i32) {}
let x = 10;
f(&x);
- This fails to compile: the reference
&x
must not outlivex
, but by passing it tof
, we constrain it to live at least as long as'static
. There’s no way to satisfy everyone here, so Rust rejects the code.x
不是static
,生命周期小于static
。
Returning References
// v should have at least one element.
// `fn smallest<'a>(v: &'a [i32]) -> &'a i32 {}`
fn smallest(v: &[i32]) -> &i32 {
let mut s = &v[0];
for r in &v[1..] {
if *r < *s { s = r; }
}
s
}
let s;
{
let parabola = [9, 4, 1, 0, 1, 4, 9];
s = smallest(¶bola); // error[E0597]: `parabola` does not live long enough
}
assert_eq!(*s, 0);
// | s = smallest(¶bola); // error[E0597]: `parabola` does not live long enough
// | ^^^^^^^^^ borrowed value does not live long enough
// | }
// | - `parabola` dropped here while still borrowed
// | assert_eq!(*s, 0);
// | ----------------- borrow later used here
-
When a function takes a single reference as an argument and returns a single reference, Rust assumes that the two must have the same lifetime.
-
The argument
¶bola
must not outliveparabola
itself, yetsmallest
’s return value must live at least as long ass
. There’s no possible lifetime'a
that can satisfy both constraints. Movings
so that its lifetime is clearly contained withinparabola
’s fixes the problem.{ let parabola = [9, 4, 1, 0, 1, 4, 9]; let s = smallest(¶bola); assert_eq!(*s, 0); }
Lifetimes in function signatures let Rust assess the relationships between the references you pass to the function and those the function returns, and they ensure they’re being used safely.
Structs Containing References
// This does not compile.
struct S {
r: &i32 // put the reference inside a structure
}
let s;
{
let x = 10;
s = S { r: &x };
}
assert_eq!(*s.r, 10); // bad: reads from dropped `x`
-
The safety constraints Rust places on references apply to
S
as well. They don’t just disappear just because the reference is inside a struct. -
Whenever a reference type appears inside another type’s definition, you must write out its lifetime.
struct S { r: &'static i32 }
r
can only refer toi32
values that will last for the lifetime of the program.
struct S<'a> { r: &'a i32 }
- This give the type
S
a lifetime parameter'a
and use that forr
.
-
Each value you create of type
S
gets a fresh lifetime'a
, which becomes constrained by how you use the value. The lifetime of any reference&x
you store inr
had better enclose'a
, and'a
must outlast the lifetime of wherever you store theS
. -
The expression
S { r: &x }
creates a freshS
value with some lifetime'a
. When you store&x
in ther
field, you constrain'a
to lie entirely withinx
’s lifetime.S
和S.r
的生命周期是'a
,'a
由赋值给r
的&x
决定,'a
生命周期内r
中存储的 reference&x
必须都存在,这就要求x
必须存在。
-
The assignment
s = S { ... }
stores thisS
in a variable whose lifetime extends to the end of the example code, constraining'a
to outlast the lifetime ofs
. And now Rust has arrived at the same contradictory constraints as before:'a
must not outlivex
, yet must live at least as long ass
.
struct S<'a> {
r: &'a i32,
}
struct D {
s: S,
}
// | s: S,
// | ^ expected named lifetime parameter
// |
// help: consider introducing a named lifetime parameter
// |
// ~ struct D<'a> {
// ~ s: S<'a>,
- When a type (
S
) with a lifetime parameter is placed inside some other type (D
), you also must specify its lifetime. - We can’t leave off
S
’s lifetime parameter here: Rust needs to know howD
’s lifetime relates to that of the reference in itsS
in order to apply the same checks toD
that it does forS
and plain references.-
We could give s the
'static
lifetime.struct D { s: S<'static> }
- With this definition, the
s
field may only borrow values that live for the entire execution of the program. That’s somewhat restrictive, but it does mean thatD
can’t possibly borrow a local variable; there are no special constraints onD
’s lifetime.
- With this definition, the
-
We could give
D
its own lifetime parameter and pass that toS
. By taking a lifetime parameter'a
and using it ins
’s type, we’ve allowed Rust to relateD
value’s lifetime to that of the reference itsS
holds.struct D<'a> { s: S<'a> }
-
A type’s lifetime parameters always reveal whether it contains references with interesting (that is, non-'static
) lifetimes and what those lifetimes can be.
fn parse_record<'i>(input: &'i [u8]) -> Record<'i> { ... }
- Without looking into the definition of the
Record
type at all, we can tell that, if we receive aRecord
returned fromparse_record
, whatever references it contains must point into the input buffer we passed in, and nowhere else (except perhaps at'static
values).
This exposure of internal behavior is the reason Rust requires types that contain references to take explicit lifetime parameters.
- There’s no reason Rust couldn’t simply make up a distinct lifetime for each reference in the struct and save you the trouble of writing them out. Early versions of Rust actually behaved this way, but developers found it confusing: it is helpful to know when one value borrows something from another value, especially when working through errors.
It’s not just references and types like S
that have lifetimes. Every type in Rust has a lifetime, including i32
and String
. Most are simply 'static
, meaning that values of those types can live for as long as you like.
- For example, a
Vec<i32>
is self-contained and needn’t be dropped before any particular variable goes out of scope. - A type like
Vec<&'a i32>
has a lifetime that must be enclosed by'a
: it must be dropped while its referents are still alive.
Distinct Lifetime Parameters
struct S<'a> {
x: &'a i32,
y: &'a i32,
}
let x = 10;
let r;
{
let y = 20;
{
let s = S { x: &x, y: &y };
r = s.x;
}
}
println!("{}", r);
// | let s = S { x: &x, y: &y };
// | ^^ borrowed value does not live long enough
// ...
// | }
// | - `y` dropped here while still borrowed
// | println!("{}", r);
// | - borrow later used here
-
Both references of
S
use the same lifetime'a
. -
This code doesn’t create any dangling pointers. The reference to
y
stays ins
, which goes out of scope beforey
does. The reference tox
ends up inr
, which doesn’t outlivex
. -
Both fields of
S
are references with the same lifetime'a
, so Rust must find a single lifetime that works for boths.x
ands.y
. Assigningr = s.x
requires'a
to encloser
’s lifetime. Initializings.y
with&y
requires'a
to be no longer thany
’s lifetime. No lifetime is shorter thany
’s scope but longer thanr
’s. -
The problem arises because both references in
S
have the same lifetime'a
. Changing the definition ofS
to let each reference have a distinct lifetime fixes everything.struct S<'a, 'b> { x: &'a i32, y: &'b i32 }
- With this definition,
s.x
ands.y
have independent lifetimes. What we do withs.x
has no effect on what we store ins.y
. 'a
can simply ber
’s lifetime, and'b
can bes
’s.y
’s lifetime would work too for'b
, but Rust tries to choose the smallest lifetime that works.
- With this definition,
Function signatures can have similar effects.
fn f<'a>(r: &'a i32, s: &'a i32) -> &'a i32 { r } // perhaps too tight
fn f<'a, 'b>(r: &'a i32, s: &'b i32) -> &'a i32 { r } // looser
- The downside to this is that adding lifetimes can make types and function signatures harder to read. Authors tend to try the simplest possible definition first and then loosen restrictions until the code compiles. Since Rust won’t permit the code to run unless it’s safe, simply waiting to be told when there’s a problem is a perfectly acceptable tactic.
Omitting Lifetime Parameters
Rust lets us omit lifetime for function parameters and return values when it’s reasonably obvious what they should be.
-
In the simplest cases, Rust just assigns a distinct lifetime to each spot that needs one.
struct S<'a, 'b> { x: &'a i32, y: &'b i32 } fn sum_r_xy(r: &i32, s: S) -> i32 { r + s.x + s.y } // shorthand for fn sum_r_xy<'a, 'b, 'c>(r: &'a i32, s: S<'b, 'c>) -> i32 {}
-
If you do return references or other types with lifetime parameters, Rust still tries to make the unambiguous cases easy. If there’s only a single lifetime that appears among your function’s parameters, then Rust assumes any lifetimes in your return value must be that one.
fn first_third(point: &[i32; 3]) -> (&i32, &i32) { (&point[0], &point[2]) } // shorthand for fn first_third<'a>(point: &'a [i32; 3]) -> (&'a i32, &'a i32) {}
-
If there are multiple lifetimes among your parameters, then there’s no natural reason to prefer one over the other for the return value, and Rust makes you spell out what’s going on.
-
If a function is a method on some type and takes its
self
parameter by reference, then Rust assumes thatself
’s lifetime is the one to give everything in the return value.struct StringTable { elements: Vec<String>, } impl StringTable { fn find_by_prefix(&self, prefix: &str) -> Option<&String> { for i in 0 .. self.elements.len() { if self.elements[i].starts_with(prefix) { return Some(&self.elements[i]); } } None } // shorthand for fn find_by_prefix<'a, 'b>(&'a self, prefix: &'b str) -> Option<&'a String> {} }
- Rust assumes that whatever you’re borrowing, you’re borrowing from
self
.
- Rust assumes that whatever you’re borrowing, you’re borrowing from
These are just abbreviations, meant to be helpful without introducing surprises. When they’re not what you want, you can always write the lifetimes out explicitly.
Sharing Versus Mutation
One way to introduce dangling pointers:
let v = vec![4, 8, 19, 27, 34, 10];
let r = &v;
let aside = v; // move vector to aside
r[0]; // bad: uses `v`, which is now uninitialized
// | let r = &v;
// | -- borrow of `v` occurs here
// | let aside = v;
// | ^ move out of `v` occurs here
// | r[0];
// | - borrow later used here
-
The assignment to
aside
moves the vector, leavingv
uninitialized, and turnsr
into a dangling pointer.- Although
v
stays in scope forr
’s entire lifetime, the problem here is thatv
’s value gets moved elsewhere, leavingv
uninitialized whiler
still refers to it.
- Although
-
Throughout its lifetime, a shared reference makes its referent read-only: you may not assign to the referent or move its value elsewhere. In this code, within
r
’s lifetime there exists the attempt to move the vector, so Rust rejects the program. -
The following version works.
let v = vec![4, 8, 19, 27, 34, 10]; { let r = &v; r[0]; // ok: vector is still there } let aside = v;
r
goes out of scope earlier, the reference’s lifetime ends beforev
is moved aside.
Another way to introduce dangling pointers:
// extend a vector with the elements of a slice
fn extend(vec: &mut Vec<f64>, slice: &[f64]) {
for elt in slice {
vec.push(*elt);
}
}
let mut wave = Vec::new();
let head = vec![0.0, 1.0];
let tail = [0.0, -1.0];
extend(&mut wave, &head); // extend wave with another vector
extend(&mut wave, &tail); // extend wave with an array
assert_eq!(wave, vec![0.0, 1.0, 0.0, -1.0]);
extend(&mut wave, &wave); // error[E0502]: cannot borrow `wave` as immutable because it is also borrowed as mutable
assert_eq!(wave, vec![0.0, 1.0, 0.0, -1.0,
0.0, 1.0, 0.0, -1.0]);
// error[E0502]: cannot borrow `wave` as immutable because it is also borrowed as mutable
// |
// | extend(&mut wave, &wave);
// | ------ --------- ^^^^^ immutable borrow occurs here
// | | |
// | | mutable borrow occurs here
// | mutable borrow later used by call
-
When we add an element to a vector, if its buffer is full, it must allocate a new buffer with more space. Suppose
wave
starts with space for four elements and so must allocate a larger buffer whenextend
tries to add a fifth (extend(&mut wave, &wave);
). Theextend
function’svec
argument borrowswave
(owned by the caller), which has allocated itself a new buffer with space for eight elements. Butslice
continues to point to the old four-element buffer, which has been dropped.- 随着
extend
函数中vec.push
的执行,vec
指向新分配的更大空间,而slice
仍然指向旧的小空间。
- This sort of problem isn’t unique to Rust: modifying collections while pointing into them is delicate territory in many languages.
- In C++, the
std::vector
specification cautions you that “reallocation [of the vector’s buffer] invalidates all the references, pointers, and iterators referring to the elements in the sequence.” - Java says, of modifying a
java.util.Hashtable
object: If theHashtable
is structurally modified at any time after the iterator is created, in any way except through the iterator’s ownremove
method, the iterator will throw aConcurrentModificationException
.
- In C++, the
- What’s especially difficult about this sort of bug is that it doesn’t happen all the time. In testing, your vector might always happen to have enough space, the buffer might never be reallocated, and the problem might never come to light.
- 随着
-
Rust reports the problem with our call to
extend
at compile time. We may borrow a mutable reference to the vector, and we may borrow a shared reference to its elements, but those two references’ lifetimes must not overlap. In our case, both references’ lifetimes contain the call toextend
, so Rust rejects the code.
These errors both stem from violations of Rust’s rules for mutation and sharing:
-
Shared access is read-only access.
- Values borrowed by shared references are read-only. Across the lifetime of a shared reference, neither its referent, nor anything reachable from that referent, can be changed by anything. There exist no live mutable references to anything in that structure (ownership tree), its owner is held read-only, and so on. It’s really frozen.
-
It is OK to reborrow a shared reference from a shared (the adjective “shared” speaks for itself) reference:
let mut w = (107, 109); let r = &w; let r0 = &r.0; // ok: reborrowing shared as shared let m1 = &mut r.1; // error: can't reborrow shared as mutable // error[E0596]: cannot borrow `r.1` as mutable, as it is behind a `&` reference // | // | let r = &w; // | -- help: consider changing this to be a mutable reference: `&mut w` // | let r0 = &r.0; // | let m1 = &mut r.1; // | ^^^^^^^^ `r` is a `&` reference, so the data it refers to cannot be borrowed as mutable println!("{}", r0); // r0 gets used here
-
- Values borrowed by shared references are read-only. Across the lifetime of a shared reference, neither its referent, nor anything reachable from that referent, can be changed by anything. There exist no live mutable references to anything in that structure (ownership tree), its owner is held read-only, and so on. It’s really frozen.
-
Mutable access is exclusive access.
-
A value borrowed by a mutable reference is reachable exclusively via that reference. Across the lifetime of a mutable reference, there is no other usable path to its referent or to any value reachable from there. The only references whose lifetimes may overlap with a mutable reference are those you borrow from the mutable reference itself.
-
It’s OK to reborrow from a mutable (either mutable or shareable) reference:
let mut v = (136, 139); let m = &mut v; let m0 = &mut m.0; // ok: reborrowing mutable from mutable *m0 = 137; let r1 = &m.1; // ok: reborrowing shared from mutable, // and doesn't overlap with m0(和 m0 引用的不是同一个元素) v.1; // error: access through other paths still forbidden // error[E0503]: cannot use `v.1` because it was mutably borrowed // | // | let m = &mut v; // | ------ borrow of `v` occurs here // ... // | v.1; // | ^^^ use of borrowed `v` // | println!("{}", r1); // r1 gets used here // | -- borrow later used here println!("{}", r1); // r1 gets used here
-
-
Rust reported the
extend
example as a violation of the second rule: since we’ve borrowed a mutable reference towave
, that mutable reference must be the only way to reach the vector or its elements. The shared reference to theslice
is itself another way to reach the elements, violating the second rule. Rust could also have treated our bug as a violation of the first rule: since we’ve borrowed a shared reference towave
’s elements, the elements and theVec
itself are all read-only. You can’t borrow a mutable reference to a read-only value.
-
Each kind of reference affects what we can do with the values along the owning path to the referent, and the values reachable from the referent.
- In both cases, the path of ownership leading to the referent cannot be changed for the reference’s lifetime. For a shared borrow, the path is read-only; for a mutable borrow, it’s completely inaccessible. So there’s no way for the program to do anything that will invalidate the reference.
Paring these principles down to the simplest possible examples:
let mut x = 10;
let r1 = &x;
let r2 = &x; // ok: multiple shared borrows permitted
x += 10; // error: cannot assign to `x` because it is borrowed
let m = &mut x; // error: cannot borrow `x` as mutable because it is
// also borrowed as immutable
println!("{}, {}, {}", r1, r2, m); // the references are used here,
// so their lifetimes must last
// at least this long
let mut y = 20;
let m1 = &mut y;
let m2 = &mut y; // error: cannot borrow as mutable more than once
let z = y; // error: cannot use `y` because it was mutably borrowed
println!("{}, {}, {}", m1, m2, z); // references are used here
Rust applies these rules everywhere: if we borrow, say, a shared reference to a key in a HashMap
, we can’t borrow a mutable reference to the HashMap
until the shared reference’s lifetime ends. Designing collections to support unrestricted, simultaneous iteration and modification is difficult and often precludes simpler, more efficient implementations.
- Java’s
Hashtable
and C++’svector
don’t bother, and neither Python dictionaries nor JavaScript objects define exactly how such access behaves. - Other collection types in JavaScript do, but require heavier implementations as a result. C++’s
std::map
promises that inserting new entries doesn’t invalidate pointers to other entries in the map, but by making that promise, the standard precludes more cache-efficient designs like Rust’sBTreeMap
, which stores multiple entries in each node of the tree.
Another example of the kind of bug these rules catch:
// c++ version of managing a file descriptor
struct File {
int descriptor;
File(int d) : descriptor(d) { }
// assignment operator
File& operator=(const File &rhs) {
close(descriptor);
descriptor = dup(rhs.descriptor);
return *this;
}
};
File f(open("foo.txt", ...));
// ...
f = f;
- If we assign a
File
to itself, bothrhs
and*this
are the same object, sooperator=
closes the very file descriptor it’s about to pass todup
. We destroy the same resource we were meant to copy.
The analogous code in Rust:
// rust version of managing a file descriptor
struct File {
descriptor: i32
}
fn new_file(d: i32) -> File {
File { descriptor: d }
}
fn clone_from(this: &mut File, rhs: &File) {
close(this.descriptor);
this.descriptor = dup(rhs.descriptor);
}
let mut f = new_file(open("foo.txt", ...));
// ...
clone_from(&mut f, &f);
// error[E0502]: cannot borrow `f` as immutable because it is also borrowed as mutable
// |
// | clone_from(&mut f, &f);
// | - ^- mutable borrow ends here
// | | |
// | | immutable borrow occurs here
// | mutable borrow occurs here
By requiring mutable access to be exclusive, Rust has fended off a wide class of everyday mistakes.
- Two classic C++ bugs—failure to cope with self-assignment and using invalidated iterators—are the same underlying kind of bug. In both cases, code assumes it is modifying one value while consulting another, when in fact they’re both the same value.
- If you’ve ever accidentally let the source and destination of a call to
memcpy
orstrcpy
overlap in C or C++, that’s yet another form the bug can take.
- If you’ve ever accidentally let the source and destination of a call to
- The immiscibility of shared and mutable references really demonstrates its value when writing concurrent code. A data race is possible only when some value is both mutable and shared between threads—which is exactly what Rust’s reference rules eliminate. A concurrent Rust program that avoids
unsafe
code is free of data races by construction.
Rust’s Shared References Versus C’s Pointers to const
Rust’s rules for shared references are much stricter.
// c code
int x = 42; // int variable, not const
const int *p = &x; // pointer to const int
assert(*p == 42);
x++; // change variable directly
assert(*p == 43); // “constant” referent's value has changed
- The fact that
p
is aconst int *
means that you can’t modify its referent viap
itself:(*p)++
is forbidden. But you can also get at the referent directly asx
, which is notconst
, and change its value that way. The C family’sconst
keyword has its uses, but constant it is not.
In Rust, a shared reference forbids all modifications to its referent, until its lifetime ends:
let mut x = 42; // non-const i32 variable
let p = &x; // shared reference to i32
assert_eq!(*p, 42);
x += 1; // error: cannot assign to x because it is borrowed
assert_eq!(*p, 42); // if you take out the assignment, this is true
- To ensure a value is constant, we need to keep track of all possible paths to that value and make sure that they either don’t permit modification or cannot be used at all. C and C++ pointers are too unrestricted for the compiler to check this. Rust’s references are always tied to a particular lifetime, making it feasible to check them at compile time.
Taking Arms Against a Sea of Objects
Since the rise of automatic memory management in the 1990s, the default architecture of all programs has been the sea of objects.
This is what happens if you have garbage collection and you start writing a program without designing anything.
- Advantages of this architecture: initial progress is rapid, it’s easy to hack stuff in.
- Disadvantages of this architecture: everything depends on everything else; it’s hard to test, evolve, or even think about any component in isolation.
Because of the ownership model, it takes a bit of effort to make a cycle in Rust—two values such that each one contains a reference pointing to the other. You have to use a smart pointer type, such as Rc
, and interior mutability.
Rust prefers for pointers, ownership, and data flow to pass through the system in one direction.
The cure to a sea of objects is to do some up-front design and build a better program. Rust is all about transferring the pain of understanding your program from the future to the present. It works unreasonably well: not only can Rust force you to understand why your program is thread-safe, it can even require some amount of high-level architectural design.
References