Structs in Rust
Contents
Rust structs resemble struct
types in C and C++, classes in Python, and objects in JavaScript. A struct assembles several values of assorted types together into a single value so you can deal with them as a unit. Given a struct, you can read and modify its individual components. And a struct can have methods associated with it that operate on its components.
Rust has three kinds of struct types, named-field, tuple-like, and unit-like, which differ in how you refer to their components: a named-field struct gives a name to each component, whereas a tuple-like struct identifies them by the order in which they appear. Unit-like structs have no components at all.
Named-Field Structs
/// A rectangle of eight-bit grayscale pixels.
struct GrayscaleMap {
pixels: Vec<u8>,
size: (usize, usize)
}
The convention in Rust is for all types, structs included, to have names that capitalize the first letter of each word, like GrayscaleMap
, a convention called CamelCase (or PascalCase). Fields and methods are lowercase, with words separated by underscores. This is called snake_case.
A struct expression starts with the type name and lists the name and value of each field, all enclosed in curly braces. There’s also shorthand for populating fields from local variables or arguments with the same name. You can use key: value
syntax for some fields and shorthand for others in the same struct expression.
let width = 1024;
let height = 576;
let image = GrayscaleMap {
pixels: vec![0; width * height],
size: (width, height)
};
fn new_map(size: (usize, usize), pixels: Vec<u8>) -> GrayscaleMap {
assert_eq!(pixels.len(), size.0 * size.1);
// short for
// GrayscaleMap { pixels: pixels, size: size }.
GrayscaleMap { pixels, size }
}
To access a struct’s fields, use the .
operator.
Like all other items, structs are private by default, visible only in the module where they’re declared and its submodules. You can make a struct visible outside its module by prefixing its definition with pub
. The same goes for each of its fields, which are also private by default.
Even if a struct is declared pub
, its fields can be private:
/// A rectangle of eight-bit grayscale pixels.
pub struct GrayscaleMap {
pixels: Vec<u8>,
size: (usize, usize)
}
- Other modules can use this struct and any public associated functions it might have, but can’t access the private fields by name or use struct expressions to create new
GrayscaleMap
values.
Creating a struct value requires all the struct’s fields to be visible. This is why you can’t write a struct expression to create a new String
or Vec
. These standard types are structs, but all their fields are private. To create one, you must use public type-associated functions like Vec::new()
.
When creating a named-field struct value, you can use another struct of the same type to supply values for fields you omit. In a struct expression, if the named fields are followed by .. EXPR
, then any fields not mentioned take their values from EXPR
, which must be another value of the same struct type.
// In this game, brooms are monsters. You'll see.
struct Broom {
name: String,
height: u32,
health: u32,
position: (f32, f32, f32),
intent: BroomIntent
}
/// Two possible alternatives for what a `Broom` could be working on.
#[derive(Copy, Clone)]
enum BroomIntent { FetchWater, DumpWater }
// Receive the input Broom by value, taking ownership.
fn chop(b: Broom) -> (Broom, Broom) {
// Initialize `broom1` mostly from `b`, changing only `height`. Since
// `String` is not `Copy`, `broom1` takes ownership of `b`'s name.
let mut broom1 = Broom { height: b.height / 2, .. b };
// Initialize `broom2` mostly from `broom1`. Since `String` is not
// `Copy`, we must clone `name` explicitly.
let mut broom2 = Broom { name: broom1.name.clone(), .. broom1 };
// Give each fragment a distinct name.
broom1.name.push_str(" I");
broom2.name.push_str(" II");
(broom1, broom2)
}
Tuple-Like Structs
The second kind of struct type is called a tuple-like struct, because it resembles a tuple. The values held by a tuple-like struct are called elements, just as the values of a tuple are.
struct Bounds(usize, usize);
pub struct Bounds(pub usize, pub usize);
- Individual elements of a tuple-like struct may be public or not.
You construct a value of this type much as you would construct a tuple, except that you must include the struct name:
let image_bounds = Bounds(1024, 768);
-
The expression
Bounds(1024, 768)
looks like a function call, and in fact it is: defining the type also implicitly defines a function:fn Bounds(elem0: usize, elem1: usize) -> Bounds { ... }
You access them just as you would a tuple’s:
assert_eq!(image_bounds.0 * image_bounds.1, 786432);
At the most fundamental level, named-field and tuple-like structs are very similar. The choice of which to use comes down to questions of legibility, ambiguity, and brevity. If you will use the .
operator to get at a value’s components much at all, identifying fields by name provides the reader more information and is probably more robust against typos. If you will usually use pattern matching to find the elements, tuple-like structs can work nicely.
Tuple-like structs are good for newtypes, structs with a single component that you define to get stricter type checking.
struct Ascii(Vec<u8>);
- Using this type for your ASCII strings is much better than simply passing around
Vec<u8>
buffers and explaining what they are in the comments. The newtype helps Rust catch mistakes where some other byte buffer is passed to a function expecting ASCII text.
Unit-Like Structs
A value of unit-like struct occupies no memory, much like the unit type ()
.
Rust doesn’t bother actually storing unit-like struct values in memory or generating code to operate on them, because it can tell everything it might need to know about the value from its type alone. But logically, an empty struct is a type with values like any other—or more precisely, a type of which there is only a single value:
struct Onesuch;
let o = Onesuch;
// std::ops::RangeFull
pub struct RangeFull;
assert_eq!((..), std::ops::RangeFull);
// std::ops::Range
pub struct Range<Idx> {
pub start: Idx,
pub end: Idx,
}
Struct Layout
In memory, both named-field and tuple-like structs are the same thing: a collection of values, of possibly mixed types, laid out in a particular way in memory.
struct GrayscaleMap {
pixels: Vec<u8>,
size: (usize, usize)
}
- Unlike C and C++, Rust doesn’t make specific promises about how it will order a struct’s fields or elements in memory; this diagram shows only one possible arrangement.
- You can ask Rust to lay out structures in a way compatible with C and C++, using the
#[repr(C)]
attribute.
- You can ask Rust to lay out structures in a way compatible with C and C++, using the
- Rust does promise to store fields’ values directly in the struct’s block of memory. Whereas JavaScript, Python, and Java would put the pixels and size values each in their own heap-allocated blocks and have
GrayscaleMap
’s fields point at them, Rust embedspixels
andsize
directly in theGrayscaleMap
value. Only the heap-allocated buffer owned by thepixels
vector remains in its own block.- 两个 786432 分别是 vector 的容量和长度。
Defining Methods with impl
Rather than appearing inside the struct definition, as in C++ or Java, Rust methods appear in a separate impl
block.
An impl
block is simply a collection of fn
definitions, each of which becomes a method on the struct type named at the top of the block.
/// A first-in, first-out queue of characters.
pub struct Queue {
older: Vec<char>, // older elements, eldest last.
younger: Vec<char> // younger elements, youngest last.
}
impl Queue {
/// Push a character onto the back of a queue.
pub fn push(&mut self, c: char) {
self.younger.push(c);
}
/// Pop a character off the front of a queue. Return `Some(c)` if there
/// was a character to pop, or `None` if the queue was empty.
pub fn pop(&mut self) -> Option<char> {
if self.older.is_empty() {
if self.younger.is_empty() {
return None;
}
// Bring the elements in younger over to older, and put them in
// the promised order.
use std::mem::swap;
swap(&mut self.older, &mut self.younger);
self.older.reverse();
}
// Now older is guaranteed to have something. Vec's pop method
// already returns an Option, so we're set.
self.older.pop()
}
}
impl Queue {
pub fn is_empty(&self) -> bool {
self.older.is_empty() && self.younger.is_empty()
}
}
impl Queue {
pub fn split(self) -> (Vec<char>, Vec<char>) {
(self.older, self.younger)
}
}
let mut q = Queue { older: Vec::new(), younger: Vec::new() };
q.push('0');
q.push('1');
assert_eq!(q.pop(), Some('0'));
q.push('∞');
assert_eq!(q.pop(), Some('1'));
assert_eq!(q.pop(), Some('∞'));
assert_eq!(q.pop(), None);
q.push('P');
q.push('D');
assert_eq!(q.pop(), Some('P'));
q.push('X');
let (older, younger) = q.split();
// q is now uninitialized.
assert_eq!(older, vec!['D']);
assert_eq!(younger, vec!['X']);
- Rust passes a method the value it’s being called on as its first argument, which must have the special name
self
. Sinceself
’s type is obviously the one named at the top of theimpl
block, or a reference to that, Rust lets you omit the type, and writeself
,&self
, or&mut self
as shorthand forself: Queue
,self: &Queue
, orself: &mut Queue
.- You can use the longhand forms if you like, but almost all Rust code uses the shorthand, as shown before.
- Unlike C++ and Java, where the members of the “this” object are directly visible in method bodies as unqualified identifiers, a Rust method must explicitly use
self
to refer to the value it was called on, similar to the way Python methods useself
, and the way JavaScript methods usethis
. - Since
push
andpop
need to modify theQueue
, they both take&mut self
. However, when you call a method, you don’t need to borrow the mutable reference yourself; the ordinary method call syntax takes care of that implicitly. Simply writingq.push(...)
borrows a mutable reference toq
, as if you had written(&mut q).push(...)
, since that’s what thepush
method’sself
requires.
- If a method wants to take ownership of
self
, it can takeself
by value.
Functions defined in an impl
block are called associated functions, since they’re associated with a specific type. The opposite of an associated function is a free function, one that is not defined as an impl
block’s item.
Passing Self as a Box, Rc, or Arc
Sometimes, taking self
by value, or even by reference, isn’t enough, so Rust also lets you pass self
via smart pointer types.
A method’s self
argument can also be a Box<Self
>, Rc<Self>
, or Arc<Self>
. Such a method can only be called on a value of the given pointer type. Calling the method passes ownership of the pointer to it.
You won’t usually need to do this. A method that expects self
by reference works fine when called on any of those pointer types:
let mut bq = Box::new(Queue::new());
// `Queue::push` expects a `&mut Queue`, but `bq` is a `Box<Queue>`.
// This is fine: Rust borrows a `&mut Queue` from the `Box` for the
// duration of the call.
bq.push('■');
- For method calls and field access, Rust automatically borrows a reference from pointer types like
Box
,Rc
, andArc
, so&self
and&mut self
are almost always the right thing in a method signature, along with the occasionalself
.
Let’s say the method’s purpose involves managing ownership of the pointer. Suppose we have a tree of nodes, some sort of drastically simplified XML:
use std::rc::Rc;
struct Node {
tag: String,
children: Vec<Rc<Node>>
}
impl Node {
fn new(tag: &str) -> Node {
Node {
tag: tag.to_string(),
children: vec![],
}
}
}
- Each node has a tag, to indicate what sort of node it is, and a vector of children, held by reference-counted pointers to permit sharing and make their lifetimes a bit more flexible.
Usually, markup nodes have a method that appends a child to its own list, but for the moment, let’s reverse the roles and give Node
a method that appends it to some other Node
’s children:
impl Node {
fn append_to(self, parent: &mut Node) {
parent.children.push(Rc::new(self));
}
}
- This is unsatisfying. This method calls
Rc::new
to allocate a fresh heap location and moveself
into it, but if the caller (the markup node appending it self) already has anRc<Node>
, all that is unnecessary: we should just increment the reference count and push the pointer onto the vector. The whole point ofRc
to enable sharing.
Instead, we can write this:
impl Node {
fn append_to(self: Rc<Self>, parent: &mut Node) {
parent.children.push(self);
}
}
-
If the caller has an
Rc<Node>
at hand, it can callappend_to
directly, passing theRc
by value:let shared_node = Rc::new(Node::new("first")); shared_node.append_to(&mut parent);
- This passes ownership of
shared_node
to the method: no reference counts are adjusted, and there’s certainly no new allocation.
- This passes ownership of
-
If the caller needs to retain a pointer to the node for later use, then it can clone the
Rc
first:shared_node.clone().append_to(&mut parent);
- Cloning an
Rc
just bumps its reference count: there’s still no heap allocation or copying. But when the call returns, bothshared_node
and parent’s vector of children are pointing to the sameNode
.
- Cloning an
-
Finally, if the caller actually owns the
Node
outright, then it must create theRc
itself before passing it:let owned = Node::new("owned directly"); Rc::new(owned).append_to(&mut parent);
Type-Associated Functions
An impl
block for a given type can also define functions that don’t take self
as an argument at all. These are still associated functions, since they’re in an impl
block, but they’re not methods, since they don’t take a self
argument. To distinguish them from methods, we call them type-associated functions.
They’re often used to provide constructor functions, like this:
impl Queue {
pub fn new() -> Queue {
Queue { older: Vec::new(), younger: Vec::new() }
}
}
It’s conventional in Rust for constructor functions to be named new
. There’s nothing special about the name new
.
Although you can have many separate impl
blocks for a single type, they must all be in the same crate that defines that type.
- Rust does let you attach your own methods to other types.
There are several advantages to separating a type’s methods from its definition:
- It’s always easy to find a type’s data members. In large C++ class definitions, you might need to skim hundreds of lines of member function definitions to be sure you haven’t missed any of the class’s data members; in Rust, they’re all in one place.
- Although one can imagine fitting methods into the syntax for named-field structs, it’s not so neat for tuple-like and unit-like structs. Pulling methods out into an impl block allows a single syntax for all three.
- In fact, Rust uses this same syntax for defining methods on types that are not structs at all, such as
enum
types and primitive types likei32
. The fact that any type can have methods is one reason Rust doesn’t use the term object much, preferring to call everything a value.
- In fact, Rust uses this same syntax for defining methods on types that are not structs at all, such as
- The same
impl
syntax also serves neatly for implementing traits.
Associated Consts
Another feature of languages like C# and Java that Rust adopts in its type system is the idea of values associated with a type, rather than a specific instance of that type. In Rust, these are known as associated consts.
Associated consts are constant values. They’re often used to specify commonly used values of a type.
pub struct Vector2 {
x: f32,
y: f32,
}
impl Vector2 {
const ZERO: Vector2 = Vector2 { x: 0.0, y: 0.0 };
const UNIT: Vector2 = Vector2 { x: 1.0, y: 0.0 };
}
let scaled = Vector2::UNIT.scaled_by(2.0);
- These values are associated with the type itself, and you can use them without referring to another instance of
Vector2
.
Nor does an associated const have to be of the same type as the type it’s associated with; we could use this feature to add IDs or names to types.
Generic Structs
Rust structs can be generic, meaning that their definition is a template into which you can plug whatever types you like.
pub struct Queue<T> {
older: Vec<T>,
younger: Vec<T>
}
- You can read the
<T>
inQueue<T>
as “for any element typeT
…”. So this definition reads, “For any typeT
, aQueue<T>
is two fields of typeVec<T>
.”Vec
itself is a generic struct, defined in just this way.
In generic struct definitions, the type names used in impl
block for a generic struct looks like this:
impl<T> Queue<T> {
pub fn new() -> Queue<T> {
Queue { older: Vec::new(), younger: Vec::new() }
}
pub fn push(&mut self, t: T) {
self.younger.push(t);
}
pub fn is_empty(&self) -> bool {
self.older.is_empty() && self.younger.is_empty()
}
}
-
You can read the line
impl<T> Queue<T>
as something like, “for any typeT
, here are some associated functions available onQueue<T>
.” Then, you can use the type parameterT
as a type in the associated function definitions. -
As another shorthand, every
impl
block, generic or not, defines the special type parameterSelf
to be whatever type we’re adding methods to. In the preceding code,Self
would beQueue<T>
, so we can abbreviateQueue::new
’s definition a bit further:pub fn new() -> Self { Queue { older: Vec::new(), younger: Vec::new() } }
Self
can also be used in this way; we could have writtenSelf { ... }
instead.
-
In the body of
new
, we didn’t need to write the type parameter in the construction expression; simply writingQueue { ... }
was good enough. This is Rust’s type inference at work: since there’s only one type that works for that function’s return value—namely,Queue<T>
—Rust supplies the parameter for us. However, you’ll always need to supply type parameters in function signatures and type definitions. Rust doesn’t infer those; instead, it uses those explicit types as the basis from which it infers types within function bodies.
For associated function calls, you can supply the type parameter explicitly using the ::<> (turbofish) notation. But in practice, you can usually just let Rust figure it out for you:
let mut q = Queue::<char>::new();
let mut q = Queue::new();
let mut r = Queue::new();
q.push("CAD"); // apparently a Queue<&'static str>
r.push(0.74); // apparently a Queue<f64>
q.push("BTC"); // Bitcoins per USD, 2019-6
r.push(13764.0); // Rust fails to detect irrational exuberance
- This is exactly what we’ve been doing with
Vec
, another generic struct type.
Enums can take type parameters as well.
Structs with Lifetime Parameters
If a struct type contains references, you must name those references’ lifetimes.
struct Extrema<'elt> {
greatest: &'elt i32,
least: &'elt i32
}
You can think of a declaration like struct Queue<T>
as meaning that, given any specific type T
, you can make a Queue<T>
that holds that type. Similarly, you can think of struct Extrema<'elt>
as meaning that, given any specific lifetime 'elt
, you can make an Extrema<'elt>
that holds references with that lifetime.
// fn find_extrema(slice: &[i32]) -> Extrema {
fn find_extrema<'s>(slice: &'s [i32]) -> Extrema<'s> {
let mut greatest = &slice[0];
let mut least = &slice[0];
for i in 1..slice.len() {
if slice[i] < *least { least = &slice[i]; }
if slice[i] > *greatest { greatest = &slice[i]; }
}
Extrema { greatest, least }
}
let a = [0, -3, 0, 15, 48];
let e = find_extrema(&a);
assert_eq!(*e.least, -3);
assert_eq!(*e.greatest, 48);
- Since
find_extrema
borrows elements of slice, which has lifetime's
, theExtrema
struct we return also uses ’s as the lifetime of its references. Rust always infers lifetime parameters for calls, so calls tofind_extrema
needn’t mention them. - Because it’s so common for the return type to use the same lifetime as an argument, Rust lets us omit the lifetimes when there’s one obvious candidate.
Deriving Common Traits for Struct Types
Structs can be very easy to write:
struct Point {
x: f64,
y: f64
}
Using this Point
type is a bit of a pain. As written, Point
is not copyable or cloneable. You can’t print it with println!("{:?}", point);
and it does not support the ==
and !=
operators. Each of these features has a name in Rust—Copy
, Clone
, Debug
, and PartialEq
. They are called traits.
In the case of these standard traits, and several others, you don’t need to implement them by hand unless you want some kind of custom behavior. Rust can automatically implement them for you, with mechanical accuracy. Just add a #[derive]
attribute to the struct:
#[derive(Copy, Clone, Debug, PartialEq)]
struct Point {
x: f64,
y: f64
}
Each of these traits can be implemented automatically for a struct, provided that each of its fields implements the trait. We can ask Rust to derive PartialEq
for Point
because its two fields are both of type f64
, which already implements PartialEq
.
Rust can also derive PartialOrd
, which would add support for the comparison operators <
, >
, <=
, and >=
.
Interior Mutability
pub struct SpiderRobot {
species: String,
web_enabled: bool,
leg_devices: [fd::FileDesc; 8],
}
use std::rc::Rc;
pub struct SpiderSenses {
robot: Rc<SpiderRobot>, // <-- pointer to settings and I/O
eyes: [Camera; 32],
motion: Accelerometer,
}
- Spider robot control system has a central struct,
SpiderRobot
, that contains settings and I/O handles. It’s set up when the robot boots, and the values never change. Every major system of the robot is handled bySpiderSenses
, and each one has a pointer back to theSpiderRobot
. - A value in an
Rc
box is always shared and therefore always immutable. Now suppose you want to add a little logging to theSpiderRobot
struct, using the standardFile
type. There’s a problem: aFile
has to bemut
. All the methods for writing to it require amut
reference. - What we need is a little bit of mutable data (a
File
) inside an otherwise immutable value (theSpiderRobot
struct). This is called interior mutability.
The two most straightforward types for interior mutability are Cell<T>
and RefCell<T>
, both in the std::cell
module.
A Cell<T>
is a struct that contains a single private value of type T
. The only special thing about a Cell
is that you can get and set the field even if you don’t have mut
access to the Cell
itself:
Cell::new(value)
: Creates a newCell
, moving the given value into it.cell.get()
: Returns a copy of the value in thecell
.cell.set(value)
: Stores the given value in thecell
, dropping the previously stored value.-
This method takes self as a non-
mut
reference:fn set(&self, value: T) // note: not `&mut self`
-
This is unusual for methods named
set
. Rust has trained us to expect that we needmut
access if we want to make changes to data. But by the same token, this one unusual detail is the whole point ofCell
s. They’re simply a safe way of bending the rules on immutability—no more, no less.
-
A Cell
would be handy if you were adding a simple counter to your SpiderRobot
. Then even non-mut
methods of SpiderRobot
can access that u32
, using the .get()
and .set()
methods:
use std::cell::Cell;
pub struct SpiderRobot {
// ...
hardware_error_count: Cell<u32>,
// ...
}
impl SpiderRobot {
/// Increase the error count by 1.
pub fn add_hardware_error(&self) {
let n = self.hardware_error_count.get();
self.hardware_error_count.set(n + 1);
}
/// True if any hardware errors have been reported.
pub fn has_hardware_errors(&self) -> bool {
self.hardware_error_count.get() > 0
}
}
- This is easy enough, but it doesn’t solve our logging problem.
Cell
does not let you callmut
methods on a shared value. The.get()
method returns a copy of the value in the cell, so it works only ifT
implements theCopy
trait. For logging, we need a mutableFile
, andFile
isn’t copyable.
The right tool in this case is a RefCell
. Like Cell<T>
, RefCell<T>
is a generic type that contains a single value of type T
. Unlike Cell
, RefCell
supports borrowing references to its T
value:
RefCell::new(value)
: Creates a newRefCell
, moving value into it.ref_cell.borrow()
: Returns aRef<T>
, which is essentially just a shared reference to the value stored inref_cell
.- This method panics if the value is already mutably borrowed.
ref_cell.borrow_mut()
: Returns aRefMut<T>
, essentially a mutable reference to the value inref_cell
.- This method panics if the value is already borrowed.
ref_cell.try_borrow()
,ref_cell.try_borrow_mut()
: Work just likeborrow()
andborrow_mut()
, but return aResult
. Instead of panicking if the value is already mutably borrowed, they return anErr
value.
The two borrow
methods panic only if you try to break the Rust rule that mut references are exclusive references:
use std::cell::RefCell;
let ref_cell: RefCell<String> = RefCell::new("hello".to_string());
let r = ref_cell.borrow(); // ok, returns a Ref<String>
let count = r.len(); // ok, returns "hello".len()
assert_eq!(count, 5);
let mut w = ref_cell.borrow_mut(); // panic: already borrowed
w.push_str(" world");
- To avoid panicking, you could put these two borrows into separate blocks. That way,
r
would be dropped before you try to borroww
. - This is a lot like how normal references work. The only difference is that normally, when you borrow a reference to a variable, Rust checks at compile time to ensure that you’re using the reference safely. If the checks fail, you get a compiler error.
RefCell
enforces the same rule using run-time checks. So if you’re breaking the rules, you get a panic (or anErr
, fortry_borrow
andtry_borrow_mut
).
Put RefCell
to work in the SpiderRobot
type:
pub struct SpiderRobot {
// ...
log_file: RefCell<File>,
// ...
}
impl SpiderRobot {
/// Write a line to the log file.
pub fn log(&self, message: &str) {
let mut file = self.log_file.borrow_mut();
// `writeln!` is like `println!`, but sends
// output to the given file.
writeln!(file, "{}", message).unwrap();
}
}
Cells are easy to use. Having to call .get()
and .set()
or .borrow()
and .borrow_mut()
is slightly awkward, but that’s just the price we pay for bending the rules. The other drawback is less obvious and more serious: cells—and any types that contain them—are not thread-safe. Rust therefore will not allow multiple threads to access them at once.
References