Learn Solidity: Variables (Part 2)

How data storage works

wissal haji
Better Programming

--

metal storage cabinet with many small, labelled drawers
Photo by jesse orrico on Unsplash

Welcome to another article in the Learn Solidity series. As I promised in the last article, we will see how data storage works in Solidity.

Ethereum Virtual Machine (EVM)

Before talking about data storage in Solidity, I would like to introduce a few things about the Ethereum virtual machine to make things clearer.

The internal workings of EVM:

flow chart showing the Ethereum EVM environment
EVM context (Image source: fullstacks.org)

When we install an Ethereum client, it comes with the EVM, a lightweight operating system that is specially created to run smart contracts. The architecture of the EVM is based on the model of a stack machine, which means that the instruction set is designed to work with a stack instead of registers. The list of EVM opcodes is described in the Yellow Paper and can be found “Ethereum VM (EVM) Opcodes and Instruction Reference.”

The code execution starts as follows: When a transaction results in smart contract code execution, an EVM is instantiated, and the EVM’s ROM is loaded with the code of the contract being called. The program counter is set to zero, the storage is loaded from the contract account’s storage, the memory is all set to zeros, and all the block and environment variables are set. Then the code gets executed.

Data Location

Let’s go back now to the memory key word, as mentioned in Solidity docs. Starting from version 0.5.0, all complex types must give an explicit data location where they are stored, and there are three data locations: memory, storage, and calldata.

Note: The only place where you can omit the data location is with state variables since they will always be stored in the account’s storage.

1. storage

  • Data in storage is stored permanently. The storage is a key value store.
  • Data in the storage are written in the blockchain (hence they change the state), and that’s why storage is very expensive to use.
  • To occupy a 256-bit slot costs 20,000 gas.
  • Changing a value of an already occupied slot costs 5,000 gas.
  • When clearing a storage slot (i.e., setting a nonzero byte to zero), a certain amount of gas is refunded.
  • Storage saves data in fields of 256-bit size (32 bytes = word). Cost occurs for every used slot, even if it is not fully occupied.

2. memory

  • memory is a byte array with slot sizes of 256 bits (32 bytes). Here data are stored only during function execution. After that, they are deleted. They are not saved to the blockchain.
  • Reading or writing a word (256 bit) costs 3 gas.
  • In order to avoid too much work for the miners, the costs per operation start to rise after 22 operations.

3. calldata

  • calldata is a non-modifiable, non-persistent area where function arguments are stored, and it behaves mostly like memory.
  • calldata is required for parameters of external functions but can also be used for other variables.
  • It avoids copies and also makes sure the data cannot be modified.
  • Arrays and structs with calldata data location can also be returned from functions, but it is not possible to allocate such types.

Data location and assignment behavior

It’s very important to understand how data location assignment works if you don’t want to have unexpected behavior.

The following rules are applied between assignments :

  • Assignments between storage and memory (or from calldata) always create an independent copy.
  • Assignments from memory to memory only create references. This means that changes to one memory variable are also visible in all other memory variables that refer to the same data.
  • Assignments from storage to a local storage variable also only assign a reference.
  • All other assignments to storage always copy. Examples for this case are assignments to state variables or to members of local variables of storage struct type, even if the local variable itself is just a reference.

Let’s examine this in more detail with Remix debugger.

Create a new file, copy the code above, and then deploy the contract.
Try now to call the function foo. You will see the details of the transaction in the console, and next to it there is a debug button. Click on it.

Solidity console screen showing details of a transaction, with a Debug button on the right

You should see now the debugger area that looks like this :

Solidity debugger area screen with the arrow that allows you to step over the code highlighted in red

To step over the code, click on the arrow that I have selected in red.

The first thing you should be noticing is that the storage was loaded with the content of stateVar, as we have mentioned in the EVM part, and of course there are no local variables.

When you step over, you should see the variable y appearing in the section of local variables (Solidity locals). Keep stepping over, and you will notice that it takes a lot of bytecodes in order to allocate the necessary memory space and load each word from the storage, and then copy it in memory. This means more gas to pay, and thus the assignment from storage to memory is expensive.

Let’s examine the second case: assignment from memory to storage.
It can be used when you are done with modifying the copy stored in memory and you want to save the changes back to storage. It also consumes a lot of gas; 17,083 gas if we compute the difference of remaining gas indicated in step details section from the debugger. The operation took four SSTORE opcodes: the first one for storing the size of the array that remained unchanged (it consumed 800 gas) and the other three for updating the values of the array (each one consumed 5,000 gas).

Now let’s look at case three: assignment from storage to storage. This time a new local variable is created and contains the same content of stateVar. If we look at the code execution, we notice that what Solidity did is push the address of the first slot of the storage that contains the length of the array. According to the documentation, for dynamic arrays, the position of the slot containing the length of the array serves for computing the position of the slots containing array’s data.

If we compare now the cost of copying data to memory, then updating it and copying it back to storage (21,629 gas) vs. creating a reference and updating the state directly (5,085 gas), it’s very clear that the second approach is much cheaper.

But what if we just updated the state variable directly like this:

stateVar[0] = 12;

It’s also possible. But if you are dealing with mappings and nested data types (as we will see later), using a storage pointer can lead to more readable code.

In order to make this article short and not overwhelm you with too much information, I decided to leave complex variables for the next article. I hope that this article was useful for you, and as usual, if you want to learn more, stay tuned for the upcoming articles.

--

--