Internals of Solidity
Contents
TL;DR
the only place where the data location can be omitted
Data Location
Currently, reference types comprise structs, arrays and mappings.
- Fixed-size byte array is a value type.
Every reference type has an additional annotation, the “data location”, about where it is stored. If you use a reference type, you always have to explicitly provide the data area where the type is stored:
memory
: its lifetime is limited to an external function call;storage
is the location where the state variables are stored, where the lifetime is limited to the lifetime of a contract;calldata
: a non-modifiable, non-persistent area where function arguments are stored, and behaves mostly like memory.- Try to use
calldata
as data location if possible because it will avoid copies and also makes sure that the data cannot be modified. - Arrays and structs with
calldata
data location can also be returned from functions, but it is not possible to allocate such types.
- Try to use
A function parameter in solidity can be stored either in memory
or the calldata
. If the function is an entry point to the contract, called directly from a user (using a transaction) or from a different contract, then the parameter’s value can be taken directly from the call data. If the function is called internally, then the parameters have to be stored in memory
. From the perspective of the called contract calldata
is read only.
With scalar types such as uint
or address
the compiler handles the choice of storage for us, but with arrays, which are longer and more expensive, we specify the type of storage to be used. Return values are always returned in memory
.
case | data location |
---|---|
local variable | memory or storage |
state variable | only storage |
parameter of internal function | memory or storage |
parameter of external function | calldata |
Solidity by default puts complex data types, such as structs, in storage when initializing them as local variables.
An assignment or type conversion that changes the data location will always incur an automatic copy operation, while assignments inside the same data location only copy in some cases for storage types.
- Assignments between
storage
andmemory
(or fromcalldata
) always create an independent copy. - Assignments from
memory
tomemory
only create references.- This means that changes to one memory variable are also visible in all other memory variables that refer to the same data.
- Assignments from
storage
to a local storage variable also only assign a reference. - All other assignments to
storage
always copy.- Examples for this case are assignments to state variables or to members of local variables of
storage
struct type, even if the local variable itself is just a reference.
- Examples for this case are assignments to state variables or to members of local variables of
pragma solidity >=0.5.0 <0.9.0;
contract C {
// The data location of x (state variable) is storage.
// This is the only place where the data location can be omitted.
uint[] x;
function f(uint[] memory memoryArray) public {
x = memoryArray; // works, copies the whole array to storage
uint[] storage y = x; // works, assigns a pointer, data location of y (local variable) is storage
y[7]; // fine, returns the 8th element
y.pop(); // fine, modifies x through y
delete x; // fine, clears the array, also modifies y
// The following does not work; it would need to create a new temporary unnamed (local) array in storage,
// but storage is "statically" allocated:
y = memoryArray; // does not work
// Similarly, "delete y" is not valid, as assignments to local variables
// referencing storage objects can only be made from existing storage objects.
// It would "reset" the pointer, but there is no sensible location it could point to.
// For more details see the documentation of the "delete" operator.
delete y; // does not work
g(x); // calls g, handing over a reference to x
h(x); // calls h and creates an independent, temporary copy in memory
}
function g(uint[] storage) internal pure {}
function h(uint[] memory) public pure {}
}
y
的生命周期是整个合约的声明周期,而memoryArray
的声明周期是一次函数调用,y = memoryArray
要能成立,意味着memoryArray
必须要在 storage 中创建临时副本并让y
引用该副本,而 “assignments to local variables referencing storage objects can only be made from existing storage objects”。
Layout of State Variables in Storage
Except for dynamically-sized arrays and mappings (see below), data is stored contiguously item after item starting with the first state variable, which is stored in slot 0
.
- For each variable, a size in bytes is determined according to its type.
- State variables of contracts are stored in storage in a compact way such that multiple values sometimes use the same storage slot.
Multiple, contiguous items that need less than 32 bytes are packed into a single storage slot if possible, according to the following rules:
-
The first item in a storage slot is stored lower-order aligned (little-endian).
-
Endian and endianness (or “byte-order”) describe how computers organize the bytes that make up numbers.
- Little-endian means storing bytes in order of least-to-most-significant (where the least significant byte takes the first or lowest address), comparable to a common European way of writing dates (e.g., 31 December 2050).
- It’s used on all Intel processors.
- Big-endian is the opposite order, comparable to an ISO date (2050-12-31).
- Big-endian is also often called “network byte order” because Internet standards usually require data to be stored big-endian, starting at the standard UNIX socket level and going all the way up to standardized Web binary data structures.
- Older Mac computers using 68000-series and PowerPC microprocessors formerly used big-endian.
number: 0x01234567 memory address 0x100 0x101 0x102 0x103 little endian 67 45 23 01 big endian 01 23 45 67
- Little-endian means storing bytes in order of least-to-most-significant (where the least significant byte takes the first or lowest address), comparable to a common European way of writing dates (e.g., 31 December 2050).
-
-
Value types use only as many bytes as are necessary to store them.
-
If a value type does not fit the remaining part of a storage slot, it is stored in the next storage slot.
-
Structs and array data always start a new slot and their items are packed tightly according to these rules.
- The elements of structs and arrays are stored after each other, just as if they were given as individual values.
-
Items following struct or array data always start a new storage slot.
For contracts that use inheritance, the ordering of state variables is determined by the C3-linearized order of contracts starting with the most base-ward contract. If allowed by the above rules, state variables from different contracts do share the same storage slot.
Some considerations:
- When using elements that are smaller than 32 bytes, your contract’s gas usage may be higher. This is because the EVM operates on 32 bytes at a time. Therefore, if the element is smaller than that, the EVM must use more operations in order to reduce the size of the element from 32 bytes to the desired size.
- It might be beneficial to use reduced-size types if you are dealing with storage values because the compiler will pack multiple elements into one storage slot, and thus, combine multiple reads or writes into a single operation.
- If you are not reading or writing all the values in a slot at the same time, this can have the opposite effect, though: When one value is written to a multi-value storage slot, the storage slot has to be read first and then combined with the new value such that other data in the same slot is not destroyed.
- When dealing with function arguments or memory values, there is no inherent benefit because the compiler does not pack these values.
- In order to allow the EVM to optimize for this, ensure that you try to order your storage variables and struct members such that they can be packed tightly.
The layout of state variables in storage is considered to be part of the external interface of Solidity due to the fact that storage pointers can be passed to libraries.
Mappings and Dynamic Arrays
Due to their unpredictable size, mappings and dynamically-sized array types cannot be stored “in between” the state variables preceding and following them. Instead, they are considered to occupy only 32 bytes with regards to the rules above and the elements they contain are stored starting at a different storage slot that is computed using a Keccak-256 hash.
Assume the storage location of the mapping or array ends up being a slot p
after applying the storage layout rules.
- For dynamic arrays, this slot stores the number of elements in the array (byte arrays and strings are an exception).
- Array data is located starting at
keccak256(p)
and it is laid out in the same way as statically-sized array data would: One element after the other, potentially sharing storage slots if the elements are not longer than 16 bytes. - Dynamic arrays of dynamic arrays apply this rule recursively.
- The location of element
x[i][j]
, where the type ofx
isuint24[][]
, is computed as follows (again, assumingx
itself is stored at slotp
):- The slot is
keccak256(keccak256(p) + i) + floor(j / floor(256 / 24))
. - The element can be obtained from the slot data
v
using(v >> ((j % floor(256 / 24)) * 24)) & type(uint24).max
. - 一个 slot 有 256 bit,可以放多个元素。
- The slot is
- The location of element
- Array data is located starting at
- For mappings, the slot stays empty, but it is still needed to ensure that even if there are two mappings next to each other, their content ends up at different storage locations.
-
The value corresponding to a mapping key
k
is located atkeccak256(h(k) . p)
where.
is concatenation andh
is a function that is applied to the key depending on its type.- for value types,
h
pads the value to 32 bytes in the same way as when storing the value in memory. - for strings and byte arrays,
h(k)
is just the unpadded data.
- for value types,
-
If the mapping value is a non-value type, the computed slot marks the start of the data.
-
If the value is of struct type, you have to add an offset corresponding to the struct member to reach the member.
// compute the storage location of data[4][9].c struct S { uint16 a; uint16 b; uint256 c; } uint x; mapping(uint => mapping(uint => S)) data;
- The position of the mapping itself is
1
(the variablex
with 32 bytes precedes it). data[4]
(also a map) is stored atkeccak256(uint256(4) . uint256(1))
.- The data for
data[4][9]
starts at slotkeccak256(uint256(9) . keccak256(uint256(4) . uint256(1)))
. - The slot offset of the member
c
inside the structS
is1
becausea
andb
are packed in a single slot. - The slot for
data[4][9].c
iskeccak256(uint256(9) . keccak256(uint256(4) . uint256(1))) + 1
.- The type of the value is
uint256
, so it uses a single slot.
- The type of the value is
- The position of the mapping itself is
-
bytes
and string
are encoded identically. In general, the encoding is similar to bytes1[]
, in the sense that there is a slot for the array itself and a data area that is computed using a keccak256
hash of that slot’s position.
- For byte arrays that store data which is
32
or more bytes long, the main slotp
storeslength * 2 + 1
and the data is stored as usual inkeccak256(p)
.length * 2 + 1
是奇数,所以二进制最低位一定是 1,小端存储,所以 lowest bit 一定是 1。
- For short values (shorter than
32
bytes) the array elements are stored together with the length in the same slot.- If the data is at most
31
bytes long, the elements are stored in the higher-order bytes (left aligned) and the lowest-order byte stores the valuelength * 2
. length * 2
是整数,它二进制最低位一定是 0,最低字节存储该整数,小端存储,所以 lowest bit 一定是 0。
- If the data is at most
- You can distinguish a short array from a long array by checking if the lowest bit is set: short (not set) and long (set).
JSON Output
Layout in Memory
Solidity reserves four 32-byte slots, with specific byte ranges being used as follows:
0x00 - 0x3f
(64 bytes): scratch space for hashing methods- Scratch space can be used between statements (i.e. within inline assembly).
0x40 - 0x5f
(32 bytes): currently allocated memory size (aka. free memory pointer)0x60 - 0x7f
(32 bytes): zero slot- The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to
0x80
initially).
- The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to
Solidity always places new objects at the free memory pointer and memory is never freed (this might change in the future).
Elements in memory arrays in Solidity always occupy multiples of 32 bytes.
- This is even true for
bytes1[]
, but not forbytes
andstring
.
There are some operations in Solidity that need a temporary memory area larger than 64 bytes and therefore will not fit into the scratch space. They will be placed where the free memory points to, but given their short lifetime, the pointer is not updated.
- The memory may or may not be zeroed out. Because of this, one should not expect the free memory to point to zeroed out memory.
Differences to layout in storage:
uint8[4] a;
occupies 32 bytes (1 slot) in storage, but 128 bytes (4 items with 32 bytes each) in memory.- The following struct occupies 96 bytes (3 slots of 32 bytes) in storage, but 128 bytes (4 items with 32 bytes each) in memory.
struct S {
uint a;
uint b;
uint8 c;
uint8 d;
}
Layout of Call Data
The input data for a function call is assumed to be in the format defined by the ABI specification .
- The ABI specification requires arguments to be padded to multiples of 32 bytes.
- The internal function calls use a different convention.
- Arguments for the constructor of a contract are directly appended at the end of the contract’s code, also in ABI encoding.
- The constructor will access them through a hard-coded offset, and not by using the
codesize
opcode, since this of course changes when appending data to the code.
- The constructor will access them through a hard-coded offset, and not by using the
Cleaning Up Variables
When a value is shorter than 256 bit, in some cases the remaining bits must be cleaned.
- The Solidity compiler is designed to clean such remaining bits before any operations that might be adversely affected by the potential garbage in the remaining bits.
- Before writing a value to memory, the remaining bits need to be cleared because the memory contents can be used for computing hashes or sent as the data of a message call.
- Before storing a value in the storage, the remaining bits need to be cleaned because otherwise the garbled value can be observed.
- Access via inline assembly is not considered such an operation: If you use inline assembly to access Solidity variables shorter than 256 bits, the compiler does not guarantee that the value is properly cleaned up.
- We do not clean the bits if the immediately following operation is not affected.
- The Solidity compiler cleans input data when it is loaded onto the stack.
The Optimizer
Contract ABI Specification
An application binary interface is an interface between two program modules; often, between the operating system and user programs.
- An ABI defines how data structures and functions are accessed in machine code. It’s the primary way of encoding and decoding data into and out of machine code.
- Application programming interface defines this access in high-level, often human-readable formats as source code.
In Ethereum, the ABI is used to encode contract calls for the EVM and to read data out of transactions.
- The purpose of an ABI is to define the functions in the contract that can be invoked and describe how each function will accept arguments and return its result.
- A contract’s ABI is specified as a JSON array of function descriptions and events.
- A function description is a JSON object with fields
type
,name
,inputs
,outputs
,constant
, andpayable
. - An event description object has fields
type
,name
,inputs
, andanonymous
.
- A function description is a JSON object with fields
References