Expressions in Rust
Contents
Most things in Rust are expressions. Expression make up the body of Rust functions and thus the majority of Rust code. Control flow in Rust is entirely expression-oriented.
An Expression Language
In C, expressions have values. Statements don’t.
// expression
5 * (fahr-32) / 9
// statement
for (; begin != end; ++begin) {
if (*begin == target)
break;
}
if
andswitch
are statements. They don’t produce a value, and they can’t be used in the middle of an expression.
Rust is what is called an expression language. It follows an older tradition, dating back to Lisp, where expressions do all the work.
In Rust, if
and match
can produce values.
// produces a numeric value
pixels[r * bounds.0 + c] =
match escapes(Complex { re: point.0, im: point.1 }, 255) {
None => 0,
Some(count) => 255 - count as u8
};
// initialize a variable
let status =
if cpu.temperature <= MAX_TEMP {
HttpStatus::Ok
} else {
HttpStatus::ServerError
};
// passed as an argument to a function or macro
println!("Inside the vat, you see {}.",
match vat.contents {
Some(brain) => brain.desc(),
None => "nothing of interest"
});
Most of the control flow tools in C are statements. In Rust, they are all expressions.
Precedence and Associativity
Like most programming languages, Rust has operator precedence to determine the order of operations when an expression contains multiple adjacent operators.
All of the operators that can usefully be chained are left-associative. That is, a chain of operations such as a - b - c
is grouped as (a - b) - c
, not a - (b - c)
.
# operators that can be chained
* / % + - << >> & ^ | && || as
The comparison operators, the assignment operators, and the range operators ..
and ..=
can’t be chained at all.
Blocks and Semicolons
Blocks are the most general kind of expression.
A block produces a value and can be used anywhere a value is needed.
- The value of the block is the value of its last expression.
- If a block has semicolons in all the familiar places like C or Java, then its value will be
()
.
This ability of blocks to contain declarations and also produce a value at the end is a neat feature. The one drawback is that it leads to an odd error message when you leave out a semicolon by accident.
// ...
if preferences.changed() {
page.compute_size() // oops, missing semicolon
}
// ...
// error[E0308]: mismatched types
// | page.compute_size() // oops, missing semicolon
// | ^^^^^^^^^^^^^^^^^^^- help: try adding a semicolon: `;`
// | |
// | expected (), found tuple
// |
// = note: expected unit type `()`
// found tuple `(u32, u32)`
- With the semicolon missing, the block’s value would be whatever
page.compute_size()
returns, but anif
without anelse
must always return()
.
Declarations
A block may contain any number of declarations, with the let
declarations that declare local variables being the most common ones.
let name: type = expr;
-
The
type
andinitializer
are optional.-
A
let
declaration can declare a variable without initializing it. The variable can then be initialized with a later assignment.let name; if user.has_nickname() { name = user.nickname(); } else { name = generate_unique_name(); user.register(&name); }
- Here there are two different ways the local variable
name
might be initialized, but either way it will be initialized exactly once, soname
does not need to be declaredmut
.
- Here there are two different ways the local variable
-
-
The semicolon is required.
It’s an error to use a variable before it’s initialized. This is closely related to the error of using a value after it’s been moved. Rust really wants you to use values only while they exist
Shadowing:
for line in file.lines() {
let line = line?;
}
- The type of the first variable line is
Result<String, io::Error>
. The secondline
is aString
. Its definition supersedes the first’s for the rest of the block.
A block can also contain item declarations. An item is simply any declaration that could appear globally in a program or module, such as a fn
, struct
, or use
.
-
When an
fn
is declared inside a block, its scope is the entire block—that is, it can be used throughout the enclosing block.use std::io; use std::cmp::Ordering; fn show_files() -> io::Result<()> { let mut v = vec![]; // ... fn cmp_by_timestamp_then_name(a: &FileInfo, b: &FileInfo) -> Ordering { a.timestamp.cmp(&b.timestamp) // first, compare timestamps .reverse() // newest file first .then(a.path.cmp(&b.path)) // compare paths to break ties } v.sort_by(cmp_by_timestamp_then_name); // ... }
- But a nested
fn
cannot access local variables or arguments that happen to be in scope.cmp_by_timestamp_then_name
could not usev
directly. - Closures in Rust do see into enclosing scopes.
- But a nested
A block can even contain a whole module.
if
and match
if condition1 {
block1
} else if condition2 {
block2
} else {
block_n
}
- Each
condition
must be an expression of typebool
. Rust does not implicitly convert numbers or pointers to Boolean value`s. - Parentheses are not required around conditions.
rustc
will emit a warning if unnecessary parentheses are present. - The
else if
blocks, as well as the finalelse
, are optional. Anif
expression with noelse
block behaves exactly as though it had an emptyelse
block. - All blocks of an
if
expression must produce values of the same type.
match
expressions are something like the C switch
statement, but more flexible.
// each pattern is a constant integer
match code {
0 => println!("OK"),
1 => println!("Wires Tangled"),
2 => println!("User Asleep"),
_ => println!("Unrecognized Error {}", code)
}
- The wildcard pattern
_
matches everything.- This is like the
default:
case in aswitch
statement, except that it must come last; placing a_
pattern before other patterns means that it will have precedence over them. Those patterns will never match anything.
- This is like the
- The compiler can optimize this kind of match using a jump table, just like a
switch
statement in C++. A similar optimization is applied when each arm of amatch
produces a constant value. In that case, the compiler builds an array of those values, and thematch
is compiled into an array access. Apart from a bounds check, there is no branching at all in the compiled code.-
ChatGPT: A jump table, also known as a dispatch table or a branch table, is a data structure used in computer programming to efficiently handle multiple branches or cases in a program. The jump table contains a list of addresses or offsets that correspond to different code blocks or functions. Each entry in the table represents a specific case or branch. When the program encounters a
switch
statement or a similar construct, it evaluates an expression to determine which branch to take. By using the index obtained from the expression’s value, the program can efficiently “jump” directly to the desired code block or function without the need for multiple conditional checks. This approach is particularly useful when there are a large number of cases or when the cases are non-contiguous, as it avoids the need for a linear search through the cases. They are commonly used in programming languages like C and C++ to implementswitch
statements.
-
The general form of a match
expression is:
match value {
pattern => expr,
...
}
-
The comma after an arm may be dropped if the
expr
is a block. -
Rust checks the given
value
against each pattern in turn, starting with the first. When a pattern matches, the correspondingexpr
is evaluated, and thematch
expression is complete; no further patterns are checked. -
At least one of the patterns must match. Rust prohibits
match
expressions that do not cover all possible values.let score = match card.rank { Jack => 10, Queen => 10, Ace => 11 }; // error: nonexhaustive patterns
-
All arms of a
match
expression must have the same type.let favorite_number = if user.is_hobbit() { "eleventy-one" } else { 9 }; // error let best_sports_team = if is_hockey_season() { "Predators" }; // error; during non-hockey season, the value if expression would be `()`
The versatility of match
stems from the variety of supported patterns that can be used to the left of =>
in each arm. A pattern can match a range of values. It can unpack tuples. It can match against individual fields of structs. It can chase references, borrow parts of a value, and more. Rust’s patterns are a mini-language of their own.
match params.get("name") {
Some(name) => println!("Hello, {}!", name),
None => println!("Greetings, stranger.")
}
if let
if let
expression is another form of if
.
if let pattern = expr {
block1
} else {
block2
}
The given expr
either matches the pattern
, in which case block1
runs, or doesn’t match, and block2
runs.
Sometimes this is a nice way to get data out of an Option
or Result
:
if let Some(cookie) = request.session_cookie {
return restore_session(cookie);
}
if let Err(err) = show_cheesy_anti_robot_task() {
log_robot_attempt(err);
politely_accuse_user_of_being_a_robot();
} else {
session.mark_as_human();
}
It’s never strictly necessary to use if let
, because match
can do everything if let
can do. An if let
expression is shorthand for a match
with just one pattern:
match expr {
pattern => { block1 }
_ => { block2 }
}
Loops
while condition {
block
}
while let pattern = expr {
block
}
loop {
block
}
for pattern in iterable {
block
}
Loops are expressions in Rust. The value of a while
or for
loop is always ()
.
A loop
expression can produce a value if you specify one.
The condition of a while
loop must be of the exact type bool
.
The while let
loop is analogous to if let
. At the beginning of each while let
loop iteration, the value of expr
either matches the given pattern
, in which case the block runs, or doesn’t, in which case the loop exits.
Use loop
to write infinite loops. It executes the block repeatedly forever (or until a break
or return
is reached or the thread panics).
A for
loop evaluates the iterable expression and then evaluates the block once for each value in the resulting iterator.
// c
for (int i = 0; i < 20; i++) {
printf("%d\n", i);
}
// rust
for i in 0..20 {
println!("{}", i);
}
Iterable types: all the standard collections like Vec
and HashMap
; arrays, slices.
The ..
operator produces a range, a simple struct with two fields: start
and end
. 0..20
is the same as std::ops::Range { start: 0, end: 20 }
. Ranges
can be used with for
loops because Range
is an iterable type: it implements the std::iter::IntoIterator
trait.
A for
loop over a value consumes the value:
let strings: Vec<String> = error_messages();
for s in strings { // each String is moved into s here...
println!("{}", s);
} // ...and dropped here
println!("{} error(s)", strings.len()); // error: use of moved value
for rs in &strings {
println!("String {:?} is at address {:p}.", *rs, rs);
}
Iterating over a mut
reference provides a mut reference to each element:
for rs in &mut strings { // the type of rs is &mut String
rs.push('\n'); // add a newline to each string
}
Control Flow in Loops
A break
expression exits an enclosing loop.
In Rust, break
works only in loops. It is not necessary in match
expressions.
Within the body of a loop
, you can give break an expression, whose value becomes that of the loop
:
let answer = loop {
if let Some(line) = next_line() {
if line.starts_with("answer: ") {
break line;
}
} else {
break "answer: nothing";
}
};
- Naturally, all the
break
expressions within aloop
must produce values with the same type, which becomes the type of theloop
itself.
A continue
expression jumps to the next loop iteration:
// Each call to `next_line` returns either `Some(line)`, where
// `line` is a line of input, or `None`, if we've reached the end of
// the input. Return the first line that starts with "answer: ".
// Otherwise, return "answer: nothing".
let answer = loop {
if let Some(line) = next_line() {
if line.starts_with("answer: ") {
break line;
}
} else {
break "answer: nothing";
}
};
A loop can be labeled with a lifetime:
'search:
for room in apartment {
for spot in room.hiding_spots() {
if spot.contains(keys) {
println!("Your keys are {} in the {}.", spot, room);
break 'search;
}
}
}
- Labels can also be used with
continue
.
A break
can have both a label and a value expression:
// Find the square root of the first perfect square
// in the series.
let sqrt = 'outer: loop {
let n = next_number();
for i in 1.. {
let square = i * i;
if square == n {
// Found a square root.
break 'outer i;
}
// `if` without an `else`
if square > n {
// `n` isn't a perfect square, try the next
break;
}
}
};
Labels can also be used with continue
.
return
Expressions
A return
expression exits the current function, returning a value to the caller.
return
without a value is shorthand for return ()
:
fn f() { // return type omitted: defaults to ()
return; // return value omitted: defaults to ()
}
If the last expression isn’t followed by a semicolon, its value is the function’s return value. This is the preferred way to supply a function’s return value in Rust.
Like a break
expression, return
can abandon work in progress.
Why Rust Has loop
Several pieces of the Rust compiler analyze the flow of control through your program:
- Rust checks that every path through a function returns a value of the expected return type.
- To do this correctly, it needs to know whether it’s possible to reach the end of the function.
- Rust checks that local variables are never used uninitialized.
- This entails checking every path through a function to make sure there’s no way to reach a place where a variable is used without having already passed through code that initializes it.
- Rust warns about unreachable code.
- Code is unreachable if no path through the function reaches it.
These are called flow-sensitive analyses.
- They are nothing new; Java has had a “definite assignment” analysis, similar to Rust’s, for years.
When enforcing this sort of rule, a language must strike a balance between simplicity, which makes it easier for programmers to figure out what the compiler is talking about sometimes, and cleverness, which can help eliminate false warnings and cases where the compiler rejects a perfectly safe program.
Rust went for simplicity. Its flow-sensitive analyses do not examine loop conditions at all, instead simply assuming that any condition in a program can be either true or false. This causes Rust to reject some safe programs:
fn wait_for_process(process: &mut Process) -> i32 {
while true {
if process.wait() {
return process.exit_code();
}
}
} // error: mismatched types: expected i32, found ()
- This function only exits via the
return
statement, so the fact that thewhile
loop doesn’t produce ani32
is irrelevant. - The
loop
expression is offered as a “say-what-you-mean” solution to this problem.
Rust’s type system is affected by control flow, too. All branches of an if
expression must have the same type. But it would be silly to enforce this rule on blocks that end with a break
or return
expression, an infinite loop
, or a call to panic!()
or std::process::exit()
. What all those expressions have in common is that they never finish in the usual way, producing a value.
So in Rust, these expressions don’t have a normal type. Expressions that don’t finish normally are assigned the special type !
, and they’re exempt from the rules about types having to match.
// std::process::exit()
fn exit(code: i32) -> !
fn serve_forever(socket: ServerSocket, handler: ServerHandler) -> ! {
socket.listen();
loop {
let s = socket.accept();
handler.handle(s);
}
}
-
The
!
means thatexit()
never returns. It’s a divergent function. You can write divergent functions of your own using the same syntax.fn serve_forever(socket: ServerSocket, handler: ServerHandler) -> ! { socket.listen(); loop { let s = socket.accept(); handler.handle(s); } }
- Rust considers it an error if the self-defined divergent function can return normally.
Function and Method Calls
The syntax for calling functions and methods is the same in Rust:
let x = gcd(1302, 462); // function call
let room = player.location(); // method call
Rust usually makes a sharp distinction between references and the values they refer to. If you pass a &i32
to a function that expects an i32
, that’s a type error. The .
operator relaxes those rules a bit. In the method call player.location()
, player
might be a Player
, a reference of type &Player
, or a smart pointer of type Box<Player>
or Rc<Player>
. The .location()
method might take the player
either by value or by reference. The same .location()
syntax works in all cases, because Rust’s .
operator automatically dereferences player or borrows a reference to it as needed.
A third syntax is used for calling type-associated functions:
let mut numbers = Vec::new(); // type-associated function call
Method calls can be chained
.
server
.bind("127.0.0.1:3000").expect("error binding server to address")
.run().expect("error running server");
One quirk of Rust syntax is that in a function call or method call, the usual syntax for generic types, Vec<T>
, does not work:
return Vec<i32>::with_capacity(1000); // error: something about chained comparisons
let ramp = (0 .. n).collect<Vec<i32>>(); // same error
-
In expressions,
<
is the less-than operator. -
The Rust compiler suggests writing
::<T>
instead of<T>
in this case, and that solves the problem.return Vec::<i32>::with_capacity(1000); // ok, using ::< let ramp = (0 .. n).collect::<Vec<i32>>(); // ok, using ::<
- The symbol
::<...>
is known in the Rust community as the turbofish.
- The symbol
-
Alternatively, it is often possible to drop the type parameters and let Rust infer them.
return Vec::with_capacity(10); // ok, if the fn return type is Vec<i32> let ramp: Vec<i32> = (0 .. n).collect(); // ok, variable's type is given
- It’s considered good style to omit the types whenever they can be inferred.
Fields and Elements
The fields of a struct are accessed using familiar syntax. Tuples are the same except that their fields have numbers rather than names:
game.black_pawns // struct field
coords.1 // tuple element
- If the value to the left of the dot is a reference or smart pointer type, it is automatically dereferenced, just as for method calls.
Square brackets access the elements of an array, a slice, or a vector:
pieces[i] // array element
- The value to the left of the brackets is automatically dereferenced.
Expressions like these three are called lvalues, because they can appear on the left side of an assignment:
game.black_pawns = 0x00ff0000_00000000_u64;
coords.1 = 0;
pieces[2] = Some(Piece::new(Black, Knight, coords));
Extracting a slice from an array or vector is straightforward:
// extracting a slice from an array or vector
let second_half = &game_moves[midpoint .. end];
game_moves
may be either an array, a slice, or a vector; the result is a borrowed slice of lengthend - midpoint
.game_moves
is considered borrowed for the lifetime ofsecond_half
.
The ..
operator allows either operand to be omitted.
.. // RangeFull
a .. // RangeFrom { start: a }
.. b // RangeTo { end: b }
a .. b // Range { start: a, end: b }
- The latter two forms are end-exclusive (or half-open): the end value is not included in the range represented.
The ..=
operator produces end-inclusive (or closed) ranges, which do include the end value.
..= b // RangeToInclusive { end: b }
a ..= b // RangeInclusive::new(a, b)
Only ranges that include a start value are iterable, since a loop must have somewhere to start. But in array slicing, all six forms are useful. If the start or end of the range is omitted, it defaults to the start or end of the data being sliced.
Reference Operators
The unary *
operator is used to access the value pointed to by a reference.
Rust automatically follows references when you use the .
operator to access a field or method, so the *
operator is necessary only when we want to read or write the entire value that the reference points to.
Arithmetic, Bitwise, Comparison, and Logical Operators
Rust has the usual arithmetic operators, +
, -
, *
, /
, and %
. Integer overflow is detected, and causes a panic, in debug builds. The standard library provides methods like a.wrapping_add(b)
for unchecked arithmetic.
Integer division rounds toward zero, and dividing an integer by zero triggers a panic even in release builds. Integers have a method a.checked_div(b)
that returns an Option
(None
if b
is zero) and never panics.
Unary -
negates a number. It is supported for all the numeric types except unsigned integers. There is no unary +
operator.
a % b
computes the signed remainder, or modulus, of division rounding toward zero. The result has the same sign as the lefthand operand.
-
%
can be used on floating-point numbers as well as integers:let x = 1234.567 % 10.0; // approximately 4.567
Rust also inherits C’s bitwise integer operators, &
, |
, ^
, <<
, and >>
. However, Rust uses !
instead of ~
for bitwise NOT.
Bit shifting is always sign-extending on signed integer types and zero-extending on unsigned integer types.
Rust’s comparison operators are ==
, !=
, <
, <=
, >
, and >=
. The two values being compared must have the same type.
Rust has the two short-circuiting logical operators &&
and ||
. Both operands must have the exact type bool
.
Assignment
The =
operator can be used to assign to mut
variables and their fields or elements.
In Rust, assignment is not as common as in other languages. Variables are immutable by default.
If the value has a non-Copy
type, assignment moves it into the destination. Ownership of the value is transferred from the source to the destination. The destination’s prior value, if any, is dropped.
Compound assignment is supported:
total += item.price;
Rust doesn’t support chaining assignment.
Rust does not have C’s increment and decrement operators ++
and --
.
Type Casts
Converting a value from one type to another usually requires an explicit cast in Rust. Casts use the as
keyword:
let x = 17; // x is type i32
let index = x as usize; // convert to usize
Several kinds of casts are permitted:
- Numbers may be cast from any of the built-in numeric types to any other.
- Casting an integer to another integer type is always well-defined.
- Converting to a narrower type results in truncation.
- A signed integer cast to a wider type is sign-extended, an unsigned integer is zero-extended, and so on.
- In short, there are no surprises.
- Converting from a floating-point type to an integer type rounds toward zero.
- The value of
-1.99 as i32
is-1
. - If the value is too large to fit in the integer type, the cast produces the closest value that the integer type can represent: the value of
1e6 as u8
is255
.
- The value of
- Casting an integer to another integer type is always well-defined.
- Values of type
bool
orchar
, or of a C-likeenum
type, may be cast to any integer type.- Casting in the other direction is not allowed, as
bool
,char
, andenum
types all have restrictions on their values that would have to be enforced with run-time checks.- Casting a
u16
to typechar
is banned because someu16
values, like0xd800
, correspond to Unicode surrogate code points and therefore would not make validchar
values. There is a standard method,std::char::from_u32()
, which performs the run-time check and returns anOption<char>
. The need for this kind of conversion has grown rare. We typically convert whole strings or streams at once, and algorithms on Unicode text are often nontrivial and best left to libraries. - As an exception, a
u8
may be cast to typechar
, since all integers from 0 to 255 are valid Unicode code points forchar
to hold.
- Casting a
- Casting in the other direction is not allowed, as
- Some casts involving unsafe pointer types are also allowed.
A conversion usually requires a cast. A few conversions involving reference types are so straightforward that the language performs them even without a cast.
- Values of a
mut
reference auto-convert to a non-mut
reference without a cast. - Values of type
&String
auto-convert to type&str
without a cast. - Values of type
&Vec<i32>
auto-convert to&[i32]
. - Values of type
&Box<Chessboard>
auto-convert to&Chessboard
.
These are called deref coercions, because they apply to types that implement the Deref
built-in trait. The purpose of Deref
coercion is to make smart pointer types, like Box
, behave as much like the underlying value as possible.
- Using a
Box<Chessboard>
is mostly just like using a plainChessboard
, thanks toDeref
. - User-defined types can implement the
Deref
trait, too.
Closures
Rust has closures, lightweight function-like values. A closure usually consists of an argument list, given between vertical bars, followed by an expression:
let is_even = |x| x % 2 == 0;
assert_eq!(is_even(14), true);
Rust infers the argument types and return type. You can also write them out explicitly. If you do specify a return type, then the body of the closure must be a block:
let is_even = |x: u64| -> bool x % 2 == 0; // error
let is_even = |x: u64| -> bool { x % 2 == 0 }; // ok
Calling a closure uses the same syntax as calling a function:
assert_eq!(is_even(14), true);
References