Enums and Patterns in Rust
Contents
C++ and C# have enums; you can use them to define your own type whose values are a set of named constants. Rust takes enums much further. A Rust enum can also contain data, even data of varying types. For example, Rust’s Result<String, io::Error>
type is an enum; such a value is either an Ok
value containing a String
or an Err
value containing an io::Error
. This is beyond what C++ and C# enums can do. It’s more like a C union
—but unlike unions, Rust enums are type-safe.
Enums are useful whenever a value might be either one thing or another. The “price” of using them is that you must access the data safely, using pattern matching,
Patterns, too, may be familiar if you’ve used unpacking in Python or destructuring in JavaScript, but Rust takes patterns further. Rust patterns are a little like regular expressions for all your data. They’re used to test whether or not a value has a particular desired shape. They can extract several fields from a struct or tuple into local variables all at once. And like regular expressions, they are concise, typically doing it all in a single line of code.
Enums
// C-style enums
enum Ordering {
Less,
Equal,
Greater,
}
-
This declares a type Ordering with three possible values, called variants or constructors: Ordering::Less, Ordering::Equal, and Ordering::Greater. This particular enum is part of the standard library, so Rust code can import it, either by itself or with all its constructors:
use std::cmp::Ordering; fn compare(n: i32, m: i32) -> Ordering { if n < m { Ordering::Less } else if n > m { Ordering::Greater } else { Ordering::Equal } } // this is less explicit, it’s generally considered better style not to // import them except when it makes your code much more readable. use std::cmp::Ordering::{self, *}; // `*` to import all children // -> Ordering 用到 self fn compare(n: i32, m: i32) -> Ordering { if n < m { Less } else if n > m { Greater } else { Equal } }
To import the constructors of an enum declared in the current module, use a self
import:
enum Pet {
Orca,
Giraffe,
// ...
}
use self::Pet::*;
In memory, values of C-style enums are stored as integers. Rust will assign the numbers for you, starting at 0. Occasionally it’s useful to tell Rust which integers to use:
enum HttpStatus {
Ok = 200,
NotModified = 304,
NotFound = 404,
// ...
}
By default, Rust stores C-style enums using the smallest built-in integer type that can accommodate them. Most fit in a single byte:
use std::mem::size_of;
assert_eq!(size_of::<Ordering>(), 1);
assert_eq!(size_of::<HttpStatus>(), 2); // 404 doesn't fit in a u8
- You can override Rust’s choice of in-memory representation by adding a
#[repr]
attribute to the enum.
Casting a C-style enum to an integer is allowed. However, casting in the other direction, from the integer to the enum, is not. Unlike C and C++, Rust guarantees that an enum value is only ever one of the values spelled out in the enum
declaration. An unchecked cast from an integer type to an enum type could break this guarantee, so it’s not allowed. You can either write your own checked conversion, or use the enum_primitive
crate. It contains a macro that autogenerates this kind of conversion code for you.
assert_eq!(HttpStatus::Ok as i32, 200);
fn http_status_from_u32(n: u32) -> Option<HttpStatus> {
match n {
200 => Some(HttpStatus::Ok),
304 => Some(HttpStatus::NotModified),
404 => Some(HttpStatus::NotFound),
// ...
_ => None,
}
}
As with structs, the compiler will implement features like the ==
operator for you, but you have to ask. Enums can have methods, just like structs.
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum TimeUnit {
Seconds, Minutes, Hours, Days, Months, Years,
}
Enums with Data
/// A timestamp that has been deliberately rounded off, so our program
/// says "6 months ago" instead of "February 9, 2016, at 9:49 AM".
#[derive(Copy, Clone, Debug, PartialEq)]
enum RoughTime {
InThePast(TimeUnit, u32),
JustNow,
InTheFuture(TimeUnit, u32),
}
-
InThePast
andInTheFuture
, take arguments. These are called tuple variants. Like tuple structs, these constructors are functions that create newRoughTime
values:let four_score_and_seven_years_ago = RoughTime::InThePast(TimeUnit::Years, 4 * 20 + 7); let three_hours_from_now = RoughTime::InTheFuture(TimeUnit::Hours, 3);
Enums can also have struct variants, which contain named fields, just like ordinary structs:
enum Shape {
Sphere { center: Point3d, radius: f32 },
Cuboid { corner1: Point3d, corner2: Point3d },
}
let unit_sphere = Shape::Sphere {
center: ORIGIN,
radius: 1.0,
};
In all, Rust has three kinds of enum variant, echoing the three kinds of struct. Variants with no data correspond to unit-like structs. Tuple variants look and function just like tuple structs. Struct variants have curly braces and named fields. A single enum can have variants of all three kinds:
enum RelationshipStatus {
Single,
InARelationship,
ItsComplicated(Option<String>),
ItsExtremelyComplicated {
car: DifferentialEquation,
cdr: EarlyModernistPoem,
},
}
All constructors and fields of an enum share the same visibility as the enum itself.
Enums in Memory
In memory, enums with data are stored as a small integer tag, plus enough memory to hold all the fields of the largest variant. The tag field is for Rust’s internal use. It tells which constructor created the value and therefore which fields it has.
As of Rust 1.50, RoughTime fits in 8 bytes:
Rust makes no promises about enum layout, however, in order to leave the door open for future optimizations. In some cases, it would be possible to pack an enum more efficiently than the figure suggests. For instance, some generic structs can be stored without a tag at all.
Rich Data Structures Using Enums
Enums are also useful for quickly implementing tree-like data structures.
In memory, any JSON document can be represented as a value of this Rust type:
use std::collections::HashMap;
enum Json {
Null,
Boolean(bool),
Number(f64),
String(String),
Array(Vec<Json>),
Object(Box<HashMap<String, Json>>),
}
- A very similar enum can be found in
serde_json
, a serialization library for Rust structs. - In memory, values of type
Json
take up four machine words.String
andVec
values are three words, and Rust adds a tag byte.Null
andBoolean
values don’t have enough data in them to use up all that space, but allJson
values must be the same size. The extra space goes unused.
- The
Box
around theHashMap
that represents anObject
serves only to make allJson
values more compact. AHashMap
is larger still. If we had to leave room for it in everyJson
value, they would be quite large, eight words or so. But aBox<HashMap>
is a single word: it’s just a pointer to heap-allocated data. We could makeJson
even more compact by boxing more fields. - What’s remarkable here is how easy it was to set this up.
Generic Enums
Enums can be generic. The syntax for generic enums is the same as for generic structs.
enum Option<T> {
None,
Some(T),
}
enum Result<T, E> {
Ok(T),
Err(E),
}
- Rust can eliminate the tag field of
Option<T>
when the typeT
is a reference,Box
, or other smart pointer type. Since none of those pointer types is allowed to be zero, Rust can representOption<Box<i32>>
, say, as a single machine word: 0 forNone
and nonzero forSome
pointer. This makes suchOption
types close analogues to C or C++ pointer values that could be null. The difference is that Rust’s type system requires you to check that anOption
isSome
before you can use its contents. This effectively eliminates null pointer dereferences.
// An ordered collection of `T`s.
enum BinaryTree<T> {
Empty,
NonEmpty(Box<TreeNode<T>>),
}
// A part of a BinaryTree.
struct TreeNode<T> {
element: T,
left: BinaryTree<T>,
right: BinaryTree<T>,
}
- Each
BinaryTree
value is eitherEmpty
orNonEmpty
. If it’sEmpty
, then it contains no data at all. IfNonEmpty
, then it has aBox
, a pointer to a heap-allocatedTreeNode
. - Each
TreeNode
value contains one actual element, as well as two moreBinaryTree
values. This means a tree can contain subtrees, and thus aNonEmpty
tree can have any number of descendants. - Rust eliminates the tag field, so a
BinaryTree
value is just one machine word.
The tag field of an enum costs a little memory, up to eight bytes in the worst case, but that is usually negligible. The real downside to enums is that Rust code cannot throw caution to the wind and try to access fields regardless of whether they are actually present in the value:
let r = shape.radius; // error: no field `radius` on type `Shape`
The only way to access the data in an enum is the safe way: using patterns.
Patterns
enum RoughTime {
InThePast(TimeUnit, u32),
JustNow,
InTheFuture(TimeUnit, u32),
}
Suppose you have a RoughTime
value. You need to access the TimeUnit
and u32
fields inside the value. Rust doesn’t let you access them directly, by writing rough_time.0
and rough_time.1
, because after all, the value might be RoughTime::JustNow
. Use a match
expression instead:
// pattern matching with enums
fn rough_time_to_english(rt: RoughTime) -> String {
match rt {
RoughTime::InThePast(units, count) =>
format!("{} {} ago", count, units.plural()),
RoughTime::JustNow =>
format!("just now"),
RoughTime::InTheFuture(unit, 1) =>
format!("a {} from now", unit.singular()),
RoughTime::InTheFuture(units, count) =>
format!("{} {} from now", count, units.plural()),
}
}
- The patterns are the parts that appear before the
=>
symbol. Patterns that matchRoughTime
values look just like the expressions used to createRoughTime
values. This is no coincidence. Expressions produce values; patterns consume values. The two use a lot of the same syntax. - Pattern matching an enum, struct, or tuple works as though Rust is doing a simple left-to-right scan, checking each component of the pattern to see if the value matches it. If it doesn’t, Rust moves on to the next pattern.
- When a pattern contains simple identifiers like
units
andcount
, those become local variables in the code following the pattern. Whatever is present in the value is copied or moved into the new variables.
Rust patterns are their own little language.
Literals, Variables, and Wildcards in Patterns
When you need something like a C switch
statement, use match
with an integer value. Integer literals like 0
and 1
can serve as patterns:
match meadow.count_rabbits() {
0 => {} // nothing to say
1 => println!("A rabbit is nosing around in the clover."),
n => println!("There are {} rabbits hopping about in the meadow", n),
}
- The third pattern,
n
, is just a variable name. It can match any value, and the matched value is moved or copied into a new local variable.
Other literals can be used as patterns too, including Booleans, characters, and even strings:
let calendar = match settings.get_string("calendar") {
"gregorian" => Calendar::Gregorian,
"chinese" => Calendar::Chinese,
"ethiopian" => Calendar::Ethiopian,
other => return parse_error("calendar", other),
};
other
serves as a catchall pattern liken
in the previous example. These patterns play the same role as adefault
case in aswitch
statement, matching values that don’t match any of the other patterns.
If you need a catchall pattern, but you don’t care about the matched value, you can use a single underscore _
as a pattern, the wildcard pattern:
let caption = match photo.tagged_pet() {
Pet::Tyrannosaur => "RRRAAAAAHHHHHH",
Pet::Samoyed => "*dog thoughts*",
_ => "I'm cute, love me", // generic caption, works for any pet
};
The wildcard pattern matches any value, but without storing it anywhere. Since Rust requires every match expression to handle all possible values, a wildcard is often required at the end. Even if you’re very sure the remaining cases can’t occur, you must at least add a fallback arm, perhaps one that panics:
// There are many Shapes, but we only support "selecting"
// either some text, or everything in a rectangular area.
// You can't select an ellipse or trapezoid.
match document.selection() {
Shape::TextSpan(start, end) => paint_text_selection(start, end),
Shape::Rectangle(rect) => paint_rect_selection(rect),
_ => panic!("unexpected selection type"),
}
Tuple and Struct Patterns
Tuple patterns match tuples. They’re useful any time you want to get multiple pieces of data involved in a single match
:
fn describe_point(x: i32, y: i32) -> &'static str {
use std::cmp::Ordering::*;
match (x.cmp(&0), y.cmp(&0)) {
(Equal, Equal) => "at the origin",
(_, Equal) => "on the x axis",
(Equal, _) => "on the y axis",
(Greater, Greater) => "in the first quadrant",
(Less, Greater) => "in the second quadrant",
_ => "somewhere else",
}
}
Struct patterns use curly braces, just like struct expressions. They contain a subpattern for each field:
match balloon.location {
Point { x: 0, y: height } =>
println!("straight up {} meters", height),
Point { x: x, y: y } =>
println!("at ({}m, {}m)", x, y),
}
- Patterns like
Point { x: x, y: y }
are common when matching structs, and the redundant names are visual clutter, so Rust has a shorthand for this:Point {x, y}
. The meaning is the same. This pattern still stores a point’sx
field in a new localx
and itsy
field in a new localy
.
Even with the shorthand, it is cumbersome to match a large struct when we only care about a few fields. To avoid this, use ..
to tell Rust you don’t care about any of the other fields:
Some(Account { name, language, .. }) =>
language.show_custom_greeting(name),
Array and Slice Patterns
Array patterns match arrays. They’re often used to filter out some special-case values and are useful any time you’re working with arrays whose values have a different meaning based on position.
fn hsl_to_rgb(hsl: [u8; 3]) -> [u8; 3] {
match hsl {
[_, _, 0] => [0, 0, 0],
[_, _, 255] => [255, 255, 255],
// ...
}
}
Slice patterns are similar, but unlike arrays, slices have variable lengths, so slice patters match not only on values but also on length. ..
in a slice pattern matches any number of elements:
fn greet_people(names: &[&str]) {
match names {
[] => { println!("Hello, nobody.") },
[a] => { println!("Hello, {}.", a) },
[a, b] => { println!("Hello, {} and {}.", a, b) },
[a, .., b] => { println!("Hello, everyone from {} to {}.", a, b) }
}
}
Reference Patterns
Rust patterns support two features for working with references.
ref
patterns borrow parts of a matched value.&
patterns match references.
Matching a noncopyable value moves the value.
match account {
Account { name, language, .. } => {
ui.greet(&name, &language);
ui.show_settings(&account); // error: borrow of moved value: `account`
}
}
- The fields
account.name
andaccount.language
are moved into local variablesname
andlanguage
. The rest ofaccount
is dropped. That’s why we can’t borrow a reference to it afterward.
We need a kind of pattern that borrows matched values instead of moving them. The ref
keyword does just that:
match account {
Account { ref name, ref language, .. } => {
ui.greet(name, language);
ui.show_settings(&account);
}
}
Use ref mut
to borrow mut
references:
match line_result {
Err(ref err) => log_error(err), // `err` is &Error (shared ref)
Ok(ref mut line) => { // `line` is &mut String (mut ref)
trim_comments(line); // modify the String in place
handle(line);
}
}
- The pattern
Ok(ref mut line)
matches any success result and borrows a mut reference to the success value stored inside it.
A pattern starting with &
matches a reference:
match sphere.center() {
&Point3d { x, y, z } => // ...
}
-
Suppose
sphere.center()
returns a reference to a private field ofsphere
, a common pattern in Rust. The value returned is the address of aPoint3d
. If the center is at the origin, thensphere.center()
returns&Point3d { x: 0.0, y: 0.0, z: 0.0 }
. -
This is a bit tricky because Rust is following a pointer here, an action we usually associate with the
*
operator, not the&
operator. The thing to remember is that patterns and expressions are natural opposites.- The expression
(x, y)
makes two values into a new tuple, but the pattern(x, y)
does the opposite: it matches a tuple and breaks out the two values. - It’s the same with
&
. In an expression,&
creates a reference. In a pattern,&
matches a reference.
- The expression
-
Lifetimes are enforced. You can’t get
mut
access via a shared reference. And you can’t move a value out of a reference, even amut
reference. When we match&Point3d { x, y, z }
, the variablesx
,y
, andz
receive copies of the coordinates, leaving the original Point3d value intact. It works because those fields are copyable. If we try the same thing on a struct with noncopyable fields, we’ll get an error:match friend.borrow_car() { Some(&Car { engine, .. }) => // error: can't move out of borrow Some(&Car { ref engine, .. }) => // ok, engine is a reference // ... None => {} }
- You can use a ref pattern to borrow a reference to a part. You just don’t own it.
Match Guards
Sometimes a match arm has additional conditions that must be met before it can be considered a match.
Suppose we’re implementing a board game with hexagonal spaces, and the player just clicked to move a piece. To confirm that the click was valid, we might try something like this:
fn check_move(current_hex: Hex, click: Point) -> game::Result<Hex> {
match point_to_hex(click) {
None =>
Err("That's not a game space."),
Some(current_hex) => // try to match if user clicked the current_hex (it doesn't work)
Err("You are already there! You must click somewhere else."),
Some(other_hex) =>
Ok(other_hex)
}
}
-
This fails because identifiers in patterns introduce new variables. The pattern
Some(current_hex)
here creates a new local variablecurrent_hex
, shadowing the argumentcurrent_hex
. The last arm of thematch
is unreachable. One way to fix this is simply to use anif
expression in thematch
arm. -
Rust also provides match guards, extra conditions that must be true in order for a match arm to apply, written as
if CONDITION
, between the pattern and the arm’s=>
token. If the pattern matches, but the condition is false, matching continues with the next arm.match point_to_hex(click) { None => Err("That's not a game space."), Some(hex) if hex == current_hex => Err("You are already there! You must click somewhere else"), Some(hex) => Ok(hex) }
Matching Multiple Possibilities
The vertical bar (|
) can be used to combine several patterns in a single match
arm:
let at_end = match chars.peek() {
Some(&'\r') | Some(&'\n') | None => true,
_ => false,
};
In an expression, |
is the bitwise OR operator, but here it works more like the |
symbol in a regular expression.
Use ..=
to match a whole range of values. Range patterns include the begin and end values, so '0' ..= '9'
matches all the ASCII digits:
match next_char {
'0'..='9' => self.read_number(),
'a'..='z' | 'A'..='Z' => self.read_word(),
' ' | '\t' | '\n' => self.skip_whitespace(),
_ => self.handle_punctuation(),
}
Binding with @ Patterns
x @ pattern
matches exactly like the given pattern, but on success, instead of creating variables for parts of the matched value, it creates a single variable x
and moves or copies the whole value into it.
match self.get_selection() {
Shape::Rect(top_left, bottom_right) => {
optimized_paint(&Shape::Rect(top_left, bottom_right))
}
other_shape => {
paint_outline(other_shape.get_outline())
}
}
-
The first case unpacks a
Shape::Rect
value, only to rebuild an identicalShape::Rect
value on the next line. This can be rewritten to use an@
pattern:rect @ Shape::Rect(..) => { optimized_paint(&rect) }
@
patterns are also useful with ranges:
match chars.next() {
Some(digit @ '0'..='9') => read_number(digit, chars),
// ...
},
Where Patterns Are Allowed
Although patterns are most prominent in match
expressions, they are also allowed in several other places, typically in place of an identifier. The meaning is always the same: instead of just storing a value in a single variable, Rust uses pattern matching to take the value apart.
// ...unpack a struct into three new local variables
let Track { album, track_number, title, .. } = song;
// ...unpack a function argument that's a tuple
fn distance_to((x, y): (f64, f64)) -> f64 { ... }
// ...iterate over keys and values of a HashMap
for (id, document) in &cache_map {
println!("Document #{}: {}", id, document.title);
}
// ...automatically dereference an argument to a closure
// (handy because sometimes other code passes you a reference
// when you'd rather have a copy)
let sum = numbers.fold(0, |a, &num| a + num);
- Each of these saves two or three lines of boilerplate code. The same concept exists in some other languages: in JavaScript, it’s called destructuring, while in Python, it’s unpacking.
- In all four examples, we use patterns that are guaranteed to match. The pattern
Point3d { x, y, z }
matches every possible value of thePoint3d
struct type,(x, y)
matches any(f64, f64)
pair, and so on. Patterns that always match are special in Rust. They’re called irrefutable patterns, and they’re the only patterns allowed in the four places shown here (afterlet
, in function arguments, afterfor
, and in closure arguments). - A refutable pattern is one that might not match, like
Ok(x)
, which doesn’t match an error result, or'0' ..= '9'
, which doesn’t match the character'Q'
. Refutable patterns can be used inmatch
arms, becausematch
is designed for them: if one pattern fails to match, it’s clear what happens next. The four preceding examples are places in Rust programs where a pattern can be handy, but the language doesn’t allow for match failure.
Refutable patterns are also allowed in if let
and while let
expressions:
// ...handle just one enum variant specially
if let RoughTime::InTheFuture(_, _) = user.date_of_birth() {
user.set_time_traveler(true);
}
// ...run some code only if a table lookup succeeds
if let Some(document) = cache_map.get(&id) {
return send_cached_response(document);
}
// ...repeatedly try something until it succeeds
while let Err(err) = present_cheesy_anti_robot_task() {
log_robot_attempt(err);
// let the user try again (it might still be a human)
}
// ...manually loop over an iterator
while let Some(_) = lines.peek() {
read_paragraph(&mut lines);
}
Populating a Binary Tree
impl<T: Ord> BinaryTree<T> {
fn add(&mut self, value: T) {
match *self {
BinaryTree::Empty => {
*self = BinaryTree::NonEmpty(Box::new(TreeNode {
element: value,
left: BinaryTree::Empty,
right: BinaryTree::Empty,
}))
}
BinaryTree::NonEmpty(ref mut node) => {
if value <= node.element {
node.left.add(value);
} else {
node.right.add(value);
}
}
}
}
}
let mut tree = BinaryTree::Empty;
tree.add("Mercury");
tree.add("Venus");
// ...
The Big Picture
For a programming language designer, combining variants, references, mutability, and memory safety is extremely challenging. Functional programming languages dispense with mutability. C union
s, by contrast, have variants, pointers, and mutability —but are so spectacularly unsafe that even in C, they’re a last resort. Rust’s borrow checker is the magic that makes it possible to combine all four without compromise.
Programming is data processing. Getting data into the right shape can be the difference between a small, fast, elegant program and a slow, gigantic tangle of duct tape and virtual method calls. This is the problem space enums address.
Enums are a design tool for getting data into the right shape. For cases when a value may be one thing, or another thing, or perhaps nothing at all, enums are better than class hierarchies on every axis: faster, safer, less code, easier to document.
The limiting factor is flexibility. End users of an enum can’t extend it to add new variants. Variants can be added only by changing the enum declaration. And when that happens, existing code breaks. Every match expression that individually matches each variant of the enum must be revisited—it needs a new arm to handle the new variant.
- In some cases, trading flexibility for simplicity is just good sense. After all, the structure of JSON is not expected to change.
- And in some cases, revisiting all uses of an enum when it changes is exactly what we want. For example, when an
enum
is used in a compiler to represent the various operators of a programming language, adding a new operator should involve touching all code that handles operators.
But sometimes more flexibility is needed. For those situations, Rust has traits.
References