C++ and C# have enums; you can use them to define your own type whose values are a set of named constants. Rust takes enums much further. A Rust enum can also contain data, even data of varying types. For example, Rust’s Result<String, io::Error> type is an enum; such a value is either an Ok value containing a String or an Err value containing an io::Error. This is beyond what C++ and C# enums can do. It’s more like a C union —but unlike unions, Rust enums are type-safe.

Enums are useful whenever a value might be either one thing or another. The “price” of using them is that you must access the data safely, using pattern matching,

Patterns, too, may be familiar if you’ve used unpacking in Python or destructuring in JavaScript, but Rust takes patterns further. Rust patterns are a little like regular expressions for all your data. They’re used to test whether or not a value has a particular desired shape. They can extract several fields from a struct or tuple into local variables all at once. And like regular expressions, they are concise, typically doing it all in a single line of code.

Enums

// C-style enums

enum Ordering {
    Less,
    Equal,
    Greater,
}
  • This declares a type Ordering with three possible values, called variants or constructors: Ordering::Less, Ordering::Equal, and Ordering::Greater. This particular enum is part of the standard library, so Rust code can import it, either by itself or with all its constructors:

    use std::cmp::Ordering;
    
    fn compare(n: i32, m: i32) -> Ordering {
        if n < m {
            Ordering::Less
        } else if n > m {
            Ordering::Greater
        } else {
            Ordering::Equal
        }
    }
    
    // this is less explicit, it’s generally considered better style not to
    // import them except when it makes your code much more readable.
    use std::cmp::Ordering::{self, *}; // `*` to import all children
    
    // -> Ordering 用到 self
    fn compare(n: i32, m: i32) -> Ordering {
        if n < m {
            Less
        } else if n > m {
            Greater
        } else {
            Equal
        }
    }
    

To import the constructors of an enum declared in the current module, use a self import:

enum Pet {
    Orca,
    Giraffe,
    // ...
}

use self::Pet::*;

In memory, values of C-style enums are stored as integers. Rust will assign the numbers for you, starting at 0. Occasionally it’s useful to tell Rust which integers to use:

enum HttpStatus {
    Ok = 200,
    NotModified = 304,
    NotFound = 404,
    // ...
}

By default, Rust stores C-style enums using the smallest built-in integer type that can accommodate them. Most fit in a single byte:

use std::mem::size_of;

assert_eq!(size_of::<Ordering>(), 1);
assert_eq!(size_of::<HttpStatus>(), 2); // 404 doesn't fit in a u8
  • You can override Rust’s choice of in-memory representation by adding a #[repr] attribute to the enum.

Casting a C-style enum to an integer is allowed. However, casting in the other direction, from the integer to the enum, is not. Unlike C and C++, Rust guarantees that an enum value is only ever one of the values spelled out in the enum declaration. An unchecked cast from an integer type to an enum type could break this guarantee, so it’s not allowed. You can either write your own checked conversion, or use the enum_primitive crate. It contains a macro that autogenerates this kind of conversion code for you.

assert_eq!(HttpStatus::Ok as i32, 200);

fn http_status_from_u32(n: u32) -> Option<HttpStatus> {
    match n {
        200 => Some(HttpStatus::Ok),
        304 => Some(HttpStatus::NotModified),
        404 => Some(HttpStatus::NotFound),
        // ...
        _ => None,
    }
}

As with structs, the compiler will implement features like the == operator for you, but you have to ask. Enums can have methods, just like structs.

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum TimeUnit {
    Seconds, Minutes, Hours, Days, Months, Years,
}

Enums with Data

/// A timestamp that has been deliberately rounded off, so our program
/// says "6 months ago" instead of "February 9, 2016, at 9:49 AM".
#[derive(Copy, Clone, Debug, PartialEq)]
enum RoughTime {
    InThePast(TimeUnit, u32),
    JustNow,
    InTheFuture(TimeUnit, u32),
}
  • InThePast and InTheFuture, take arguments. These are called tuple variants. Like tuple structs, these constructors are functions that create new RoughTime values:

    let four_score_and_seven_years_ago =
        RoughTime::InThePast(TimeUnit::Years, 4 * 20 + 7);
    
    let three_hours_from_now =
        RoughTime::InTheFuture(TimeUnit::Hours, 3);
    

Enums can also have struct variants, which contain named fields, just like ordinary structs:

enum Shape {
    Sphere { center: Point3d, radius: f32 },
    Cuboid { corner1: Point3d, corner2: Point3d },
}

let unit_sphere = Shape::Sphere {
    center: ORIGIN,
    radius: 1.0,
};

In all, Rust has three kinds of enum variant, echoing the three kinds of struct. Variants with no data correspond to unit-like structs. Tuple variants look and function just like tuple structs. Struct variants have curly braces and named fields. A single enum can have variants of all three kinds:

enum RelationshipStatus {
    Single,
    InARelationship,
    ItsComplicated(Option<String>),
    ItsExtremelyComplicated {
        car: DifferentialEquation,
        cdr: EarlyModernistPoem,
    },
}

All constructors and fields of an enum share the same visibility as the enum itself.

Enums in Memory

In memory, enums with data are stored as a small integer tag, plus enough memory to hold all the fields of the largest variant. The tag field is for Rust’s internal use. It tells which constructor created the value and therefore which fields it has.

As of Rust 1.50, RoughTime fits in 8 bytes:

RoughTime values in memory

Rust makes no promises about enum layout, however, in order to leave the door open for future optimizations. In some cases, it would be possible to pack an enum more efficiently than the figure suggests. For instance, some generic structs can be stored without a tag at all.

Rich Data Structures Using Enums

Enums are also useful for quickly implementing tree-like data structures.

In memory, any JSON document can be represented as a value of this Rust type:

use std::collections::HashMap;

enum Json {
    Null,
    Boolean(bool),
    Number(f64),
    String(String),
    Array(Vec<Json>),
    Object(Box<HashMap<String, Json>>),
}
Json values in memory
  • A very similar enum can be found in serde_json, a serialization library for Rust structs.
  • In memory, values of type Json take up four machine words.
    • String and Vec values are three words, and Rust adds a tag byte.
    • Null and Boolean values don’t have enough data in them to use up all that space, but all Json values must be the same size. The extra space goes unused.
  • The Box around the HashMap that represents an Object serves only to make all Json values more compact. A HashMap is larger still. If we had to leave room for it in every Json value, they would be quite large, eight words or so. But a Box<HashMap> is a single word: it’s just a pointer to heap-allocated data. We could make Json even more compact by boxing more fields.
  • What’s remarkable here is how easy it was to set this up.

Generic Enums

Enums can be generic. The syntax for generic enums is the same as for generic structs.

enum Option<T> {
    None,
    Some(T),
}

enum Result<T, E> {
    Ok(T),
    Err(E),
}
  • Rust can eliminate the tag field of Option<T> when the type T is a reference, Box, or other smart pointer type. Since none of those pointer types is allowed to be zero, Rust can represent Option<Box<i32>>, say, as a single machine word: 0 for None and nonzero for Some pointer. This makes such Option types close analogues to C or C++ pointer values that could be null. The difference is that Rust’s type system requires you to check that an Option is Some before you can use its contents. This effectively eliminates null pointer dereferences.
// An ordered collection of `T`s.
enum BinaryTree<T> {
    Empty,
    NonEmpty(Box<TreeNode<T>>),
}

// A part of a BinaryTree.
struct TreeNode<T> {
    element: T,
    left: BinaryTree<T>,
    right: BinaryTree<T>,
}
  • Each BinaryTree value is either Empty or NonEmpty. If it’s Empty, then it contains no data at all. If NonEmpty, then it has a Box, a pointer to a heap-allocated TreeNode.
  • Each TreeNode value contains one actual element, as well as two more BinaryTree values. This means a tree can contain subtrees, and thus a NonEmpty tree can have any number of descendants.
  • Rust eliminates the tag field, so a BinaryTree value is just one machine word.

The tag field of an enum costs a little memory, up to eight bytes in the worst case, but that is usually negligible. The real downside to enums is that Rust code cannot throw caution to the wind and try to access fields regardless of whether they are actually present in the value:

let r = shape.radius; // error: no field `radius` on type `Shape`

The only way to access the data in an enum is the safe way: using patterns.

Patterns

enum RoughTime {
    InThePast(TimeUnit, u32),
    JustNow,
    InTheFuture(TimeUnit, u32),
}

Suppose you have a RoughTime value. You need to access the TimeUnit and u32 fields inside the value. Rust doesn’t let you access them directly, by writing rough_time.0 and rough_time.1, because after all, the value might be RoughTime::JustNow. Use a match expression instead:

// pattern matching with enums
fn rough_time_to_english(rt: RoughTime) -> String {
    match rt {
        RoughTime::InThePast(units, count) =>
            format!("{} {} ago", count, units.plural()),
        RoughTime::JustNow =>
            format!("just now"),
        RoughTime::InTheFuture(unit, 1) =>
            format!("a {} from now", unit.singular()),
        RoughTime::InTheFuture(units, count) =>
            format!("{} {} from now", count, units.plural()),
    }
}
  • The patterns are the parts that appear before the => symbol. Patterns that match RoughTime values look just like the expressions used to create RoughTime values. This is no coincidence. Expressions produce values; patterns consume values. The two use a lot of the same syntax.
  • Pattern matching an enum, struct, or tuple works as though Rust is doing a simple left-to-right scan, checking each component of the pattern to see if the value matches it. If it doesn’t, Rust moves on to the next pattern.
  • When a pattern contains simple identifiers like units and count, those become local variables in the code following the pattern. Whatever is present in the value is copied or moved into the new variables.

Rust patterns are their own little language.

Literals, Variables, and Wildcards in Patterns

When you need something like a C switch statement, use match with an integer value. Integer literals like 0 and 1 can serve as patterns:

match meadow.count_rabbits() {
    0 => {} // nothing to say
    1 => println!("A rabbit is nosing around in the clover."),
    n => println!("There are {} rabbits hopping about in the meadow", n),
}
  • The third pattern, n, is just a variable name. It can match any value, and the matched value is moved or copied into a new local variable.

Other literals can be used as patterns too, including Booleans, characters, and even strings:

let calendar = match settings.get_string("calendar") {
    "gregorian" => Calendar::Gregorian,
    "chinese" => Calendar::Chinese,
    "ethiopian" => Calendar::Ethiopian,
    other => return parse_error("calendar", other),
};
  • other serves as a catchall pattern like n in the previous example. These patterns play the same role as a default case in a switch statement, matching values that don’t match any of the other patterns.

If you need a catchall pattern, but you don’t care about the matched value, you can use a single underscore _ as a pattern, the wildcard pattern:

let caption = match photo.tagged_pet() {
    Pet::Tyrannosaur => "RRRAAAAAHHHHHH",
    Pet::Samoyed => "*dog thoughts*",
    _ => "I'm cute, love me", // generic caption, works for any pet
};

The wildcard pattern matches any value, but without storing it anywhere. Since Rust requires every match expression to handle all possible values, a wildcard is often required at the end. Even if you’re very sure the remaining cases can’t occur, you must at least add a fallback arm, perhaps one that panics:

// There are many Shapes, but we only support "selecting"
// either some text, or everything in a rectangular area.
// You can't select an ellipse or trapezoid.
match document.selection() {
    Shape::TextSpan(start, end) => paint_text_selection(start, end),
    Shape::Rectangle(rect) => paint_rect_selection(rect),
    _ => panic!("unexpected selection type"),
}

Tuple and Struct Patterns

Tuple patterns match tuples. They’re useful any time you want to get multiple pieces of data involved in a single match:

fn describe_point(x: i32, y: i32) -> &'static str {
    use std::cmp::Ordering::*;
    match (x.cmp(&0), y.cmp(&0)) {
        (Equal, Equal) => "at the origin",
        (_, Equal) => "on the x axis",
        (Equal, _) => "on the y axis",
        (Greater, Greater) => "in the first quadrant",
        (Less, Greater) => "in the second quadrant",
        _ => "somewhere else",
    }
}

Struct patterns use curly braces, just like struct expressions. They contain a subpattern for each field:

match balloon.location {
    Point { x: 0, y: height } =>
        println!("straight up {} meters", height),
    Point { x: x, y: y } =>
        println!("at ({}m, {}m)", x, y),
}
  • Patterns like Point { x: x, y: y } are common when matching structs, and the redundant names are visual clutter, so Rust has a shorthand for this: Point {x, y}. The meaning is the same. This pattern still stores a point’s x field in a new local x and its y field in a new local y.

Even with the shorthand, it is cumbersome to match a large struct when we only care about a few fields. To avoid this, use .. to tell Rust you don’t care about any of the other fields:

Some(Account { name, language, .. }) =>
    language.show_custom_greeting(name),

Array and Slice Patterns

Array patterns match arrays. They’re often used to filter out some special-case values and are useful any time you’re working with arrays whose values have a different meaning based on position.

fn hsl_to_rgb(hsl: [u8; 3]) -> [u8; 3] {
    match hsl {
        [_, _, 0] => [0, 0, 0],
        [_, _, 255] => [255, 255, 255],
        // ...
    }
}

Slice patterns are similar, but unlike arrays, slices have variable lengths, so slice patters match not only on values but also on length. .. in a slice pattern matches any number of elements:

fn greet_people(names: &[&str]) {
    match names {
        [] => { println!("Hello, nobody.") },
        [a] => { println!("Hello, {}.", a) },
        [a, b] => { println!("Hello, {} and {}.", a, b) },
        [a, .., b] => { println!("Hello, everyone from {} to {}.", a, b) }
    }
}

Reference Patterns

Rust patterns support two features for working with references.

  1. ref patterns borrow parts of a matched value.
  2. & patterns match references.

Matching a noncopyable value moves the value.

match account {
    Account { name, language, .. } => {
        ui.greet(&name, &language);
        ui.show_settings(&account); // error: borrow of moved value: `account`
    }
}
  • The fields account.name and account.language are moved into local variables name and language. The rest of account is dropped. That’s why we can’t borrow a reference to it afterward.

We need a kind of pattern that borrows matched values instead of moving them. The ref keyword does just that:

match account {
    Account { ref name, ref language, .. } => {
        ui.greet(name, language);
        ui.show_settings(&account);
    }
}

Use ref mut to borrow mut references:

match line_result {
    Err(ref err) => log_error(err), // `err` is &Error (shared ref)
    Ok(ref mut line) => {           // `line` is &mut String (mut ref)
        trim_comments(line);        // modify the String in place
        handle(line);
    }
}
  • The pattern Ok(ref mut line) matches any success result and borrows a mut reference to the success value stored inside it.

A pattern starting with & matches a reference:

match sphere.center() {
    &Point3d { x, y, z } => // ...
}
  • Suppose sphere.center() returns a reference to a private field of sphere, a common pattern in Rust. The value returned is the address of a Point3d. If the center is at the origin, then sphere.center() returns &Point3d { x: 0.0, y: 0.0, z: 0.0 }.

  • This is a bit tricky because Rust is following a pointer here, an action we usually associate with the * operator, not the & operator. The thing to remember is that patterns and expressions are natural opposites.

    • The expression (x, y) makes two values into a new tuple, but the pattern (x, y) does the opposite: it matches a tuple and breaks out the two values.
    • It’s the same with &. In an expression, & creates a reference. In a pattern, & matches a reference.
    Pattern matching with references
  • Lifetimes are enforced. You can’t get mut access via a shared reference. And you can’t move a value out of a reference, even a mut reference. When we match &Point3d { x, y, z }, the variables x, y, and z receive copies of the coordinates, leaving the original Point3d value intact. It works because those fields are copyable. If we try the same thing on a struct with noncopyable fields, we’ll get an error:

    match friend.borrow_car() {
        Some(&Car { engine, .. }) =>        // error: can't move out of borrow
        Some(&Car { ref engine, .. }) =>    // ok, engine is a reference
        // ...
        None => {}
    }
    
    • You can use a ref pattern to borrow a reference to a part. You just don’t own it.

Match Guards

Sometimes a match arm has additional conditions that must be met before it can be considered a match.

Suppose we’re implementing a board game with hexagonal spaces, and the player just clicked to move a piece. To confirm that the click was valid, we might try something like this:

fn check_move(current_hex: Hex, click: Point) -> game::Result<Hex> {
    match point_to_hex(click) {
        None =>
            Err("That's not a game space."),
        Some(current_hex) =>    // try to match if user clicked the current_hex (it doesn't work)
            Err("You are already there! You must click somewhere else."),
        Some(other_hex) =>
            Ok(other_hex)
    }
}
  • This fails because identifiers in patterns introduce new variables. The pattern Some(current_hex) here creates a new local variable current_hex, shadowing the argument current_hex. The last arm of the match is unreachable. One way to fix this is simply to use an if expression in the match arm.

  • Rust also provides match guards, extra conditions that must be true in order for a match arm to apply, written as if CONDITION, between the pattern and the arm’s => token. If the pattern matches, but the condition is false, matching continues with the next arm.

    match point_to_hex(click) {
        None => Err("That's not a game space."),
        Some(hex) if hex == current_hex =>
            Err("You are already there! You must click somewhere else"),
        Some(hex) => Ok(hex)
    }
    

Matching Multiple Possibilities

The vertical bar (|) can be used to combine several patterns in a single match arm:

let at_end = match chars.peek() {
    Some(&'\r') | Some(&'\n') | None => true,
    _ => false,
};

In an expression, | is the bitwise OR operator, but here it works more like the | symbol in a regular expression.

Use ..= to match a whole range of values. Range patterns include the begin and end values, so '0' ..= '9' matches all the ASCII digits:

match next_char {
    '0'..='9' => self.read_number(),
    'a'..='z' | 'A'..='Z' => self.read_word(),
    ' ' | '\t' | '\n' => self.skip_whitespace(),
    _ => self.handle_punctuation(),
}

Binding with @ Patterns

x @ pattern matches exactly like the given pattern, but on success, instead of creating variables for parts of the matched value, it creates a single variable x and moves or copies the whole value into it.

match self.get_selection() {
    Shape::Rect(top_left, bottom_right) => {
        optimized_paint(&Shape::Rect(top_left, bottom_right))
    }
    other_shape => {
        paint_outline(other_shape.get_outline())
    }
}
  • The first case unpacks a Shape::Rect value, only to rebuild an identical Shape::Rect value on the next line. This can be rewritten to use an @ pattern:

    rect @ Shape::Rect(..) => {
        optimized_paint(&rect)
    }
    

@ patterns are also useful with ranges:

match chars.next() {
    Some(digit @ '0'..='9') => read_number(digit, chars),
    // ...
},

Where Patterns Are Allowed

Although patterns are most prominent in match expressions, they are also allowed in several other places, typically in place of an identifier. The meaning is always the same: instead of just storing a value in a single variable, Rust uses pattern matching to take the value apart.

// ...unpack a struct into three new local variables
let Track { album, track_number, title, .. } = song;

// ...unpack a function argument that's a tuple
fn distance_to((x, y): (f64, f64)) -> f64 { ... }

// ...iterate over keys and values of a HashMap
for (id, document) in &cache_map {
    println!("Document #{}: {}", id, document.title);
}

// ...automatically dereference an argument to a closure
// (handy because sometimes other code passes you a reference
// when you'd rather have a copy)
let sum = numbers.fold(0, |a, &num| a + num);
  • Each of these saves two or three lines of boilerplate code. The same concept exists in some other languages: in JavaScript, it’s called destructuring, while in Python, it’s unpacking.
  • In all four examples, we use patterns that are guaranteed to match. The pattern Point3d { x, y, z } matches every possible value of the Point3d struct type, (x, y) matches any (f64, f64) pair, and so on. Patterns that always match are special in Rust. They’re called irrefutable patterns, and they’re the only patterns allowed in the four places shown here (after let, in function arguments, after for, and in closure arguments).
  • A refutable pattern is one that might not match, like Ok(x), which doesn’t match an error result, or '0' ..= '9', which doesn’t match the character 'Q'. Refutable patterns can be used in match arms, because match is designed for them: if one pattern fails to match, it’s clear what happens next. The four preceding examples are places in Rust programs where a pattern can be handy, but the language doesn’t allow for match failure.

Refutable patterns are also allowed in if let and while let expressions:

// ...handle just one enum variant specially
if let RoughTime::InTheFuture(_, _) = user.date_of_birth() {
    user.set_time_traveler(true);
}

// ...run some code only if a table lookup succeeds
if let Some(document) = cache_map.get(&id) {
    return send_cached_response(document);
}

// ...repeatedly try something until it succeeds
while let Err(err) = present_cheesy_anti_robot_task() {
    log_robot_attempt(err);
    // let the user try again (it might still be a human)
}

// ...manually loop over an iterator
while let Some(_) = lines.peek() {
    read_paragraph(&mut lines);
}

Populating a Binary Tree

impl<T: Ord> BinaryTree<T> {
    fn add(&mut self, value: T) {
        match *self {
            BinaryTree::Empty => {
                *self = BinaryTree::NonEmpty(Box::new(TreeNode {
                    element: value,
                    left: BinaryTree::Empty,
                    right: BinaryTree::Empty,
                }))
            }
            BinaryTree::NonEmpty(ref mut node) => {
                if value <= node.element {
                    node.left.add(value);
                } else {
                    node.right.add(value);
                }
            }
        }
    }
}

let mut tree = BinaryTree::Empty;
tree.add("Mercury");
tree.add("Venus");
// ...

The Big Picture

For a programming language designer, combining variants, references, mutability, and memory safety is extremely challenging. Functional programming languages dispense with mutability. C unions, by contrast, have variants, pointers, and mutability —but are so spectacularly unsafe that even in C, they’re a last resort. Rust’s borrow checker is the magic that makes it possible to combine all four without compromise.

Programming is data processing. Getting data into the right shape can be the difference between a small, fast, elegant program and a slow, gigantic tangle of duct tape and virtual method calls. This is the problem space enums address.

Enums are a design tool for getting data into the right shape. For cases when a value may be one thing, or another thing, or perhaps nothing at all, enums are better than class hierarchies on every axis: faster, safer, less code, easier to document.

The limiting factor is flexibility. End users of an enum can’t extend it to add new variants. Variants can be added only by changing the enum declaration. And when that happens, existing code breaks. Every match expression that individually matches each variant of the enum must be revisited—it needs a new arm to handle the new variant.

  • In some cases, trading flexibility for simplicity is just good sense. After all, the structure of JSON is not expected to change.
  • And in some cases, revisiting all uses of an enum when it changes is exactly what we want. For example, when an enum is used in a compiler to represent the various operators of a programming language, adding a new operator should involve touching all code that handles operators.

But sometimes more flexibility is needed. For those situations, Rust has traits.


References