Database Migrations

Background

Nodes use RocksDB database to store blockchain information locally. Some node releases need to change the format of data stored in that database, for example to enable new protocol features. The process of converting the data to a new format is called database migration. Data formats are also numbered and this is called database version.

Node binary can print its supported protocol version and the database version it needs.

For example, here we can see that the node expects database version 28, as indicated by db 28:

$ ./target/release/neard
neard (release 1.22.0) (build 5a6fb2bd2) (protocol 48) (db 28)
NEAR Protocol Node

If an existing database version is lower that what the binary needs, the node performs a database migration at startup. Here is an example of running a node version 1.24.0 using an instance of database created by a node version 1.21.1. You can see several DB migrations triggering sequentially.

$ ./target/release/neard
INFO neard: Version: 1.24.0, Build: crates-0.11.0-80-g164991d7a, Latest Protocol: 51
INFO near: Opening store database at "/home/user/.near/data"
INFO near: Migrate DB from version 27 to 28
INFO near: Migrate DB from version 28 to 29
INFO near: Migrate DB from version 29 to 30
INFO near: Migrate DB from version 30 to 31

What can go wrong?

Sometimes a database migration gets interrupted. This can happen for many reasons, such as a machine failing during a long-running database migration, or the user accidentally stopping the process with Ctrl-C. The data stored in the database has no self-describing metadata for efficiency reasons, therefore it is impossible to determine which database items were already converted to the new format, making it impossible to resume or start the migration over. This means that interrupting a database migration gets the database irrecoverably corrupted.

Safe database migrations

Starting with neard release 1.26.0, the node includes a way to recover the database instance even if the database migration gets corrupted. This feature is enabled by default but requires a manual intervention if a database migration actually gets interrupted.

One of the possible ways to restore a database is to use a known good state of the database. Before 1.26.0, this was mostly done by downloading a node database snapshot. Starting with 1.26.0, it can be done locally, which is more convenient and much faster.

For the demonstration purposes, let's assume that the near home directory is /home/user/.near, and the database location is /home/user/.near/data. Then a safe database migration works the following way:

Creates an instant and free snapshot of the existing database in /home/user/.near/data/migration-snapshot using filesystem hard links.
If your filesystem doesn't support hardlinks (or you’ve configured the snapshot to be created on a different file system), this step can take significant time and double the space used by the database.
Runs the database migration.
Even though a newly created snapshot takes no additional space, the space taken by the snapshot will gradually increase as the database migration progresses.
Deletes the snapshot.
Runs the node normally.

If the migration step is interrupted, a snapshot will not be deleted. Upon restart, the node will detect the presence of the local snapshot, assume that a database migration was interrupted (thus corrupting the database) and ask the user to recover the database from that snapshot.

Recovery

Assuming the corrupted database is in /home/user/.near/data, and the snapshot is in its default location in the database directory ( /home/user/.near/data/migration-snapshot) a user may restore the database as follows:

# Delete files of the corrupted database
rm /home/user/.near/data/*.sst

# Move not only the .sst files, but all files, to the data directory
mv /home/user/.near/data/migration-snapshot/* /home/user/.near/data/

# Delete the empty snapshot directory
rm -r /home/user/.near/data/migration-snapshot

# Restart
./target/release/neard

Configuration

Starting with upcoming release 1.30, the safe database migrations feature is configured by a store.migration_snapshot option (i.e., a migration_snapshot property of a store object). It can be set to one of the following:

an absolute path (e.g. "/srv/neard/migration-snapshot") — the snapshot will be created in specified location;
a relative path (e.g. "migration-snapshot") — the snapshot will be created in specified sub-directory inside of the database directory;
true (the default) — equivalent to specifying "migration-snapshot" relative path; or
false — the safe migration feature will be disabled.

Note that the default location of the snapshot is inside the database directory. This ensures the snapshot is instant and free (so long as the filesystem supports hardlinks).

Prior to version 1.30, the feature was configured by use_db_migration_snapshot and db_migration_snapshot_path options. They are are now deprecated and if the node detects that they are set, it will fail a migration with message explaining how to migrate to new options.

Got a question?
Ask it on StackOverflow!

Database Migrations

Background​

What can go wrong?​

Safe database migrations​

Recovery​

Configuration​

Background

What can go wrong?

Safe database migrations

Recovery

Configuration