How to set up Failover/Backup node
Overviewโ
There are two main ways to set up a failover node, each with its own pros and cons. Read through both options and choose the one that works best for your situation.
Option 1 is best if you need a failover node that can also serve as an RPC node, or if you need one failover node for multiple validators. However, it requires a restart when the failover node becomes a validator.
Option 2 is best for minimizing downtime, as the failover happens almost instantly. However, the failover node must be dedicated to a single validator and cannot serve as an RPC node.
Important Noteโ
- Once a node becomes a validator node and begins tracking a specific shard, it cannot be reverted to track all shards.
- During the failover process, a node operator may lose one RPC node when the node transitions to a validator node (option 1). This is a known issue, and it is on the team's roadmap for resolution.
[Option 1] RPC node as a failover nodeโ
This is the traditional recovery plan which has been available on the mainnet, where you have a primary validator node and a secondary failover node.
Prosโ
- It is possible to use an RPC node as a failover node.
- Failover node can be used for multiple validator nodes, each tracking different shards.
Consโ
- A restart of neard is required before a failover node can be promoted to a new validator node.
Setup for the failover node while it is on standbyโ
In config.json
- Set
tracked_shards_config
to"AllShards"
- Set
store.load_mem_tries_for_tracked_shards
tofalse
Procedureโ
- Copy
validator_key.json
to the failover node. - In
config.json
file of the failover node:- Set
tracked_shards_config
to"NoShards"
- Set
store.load_mem_tries_for_tracked_shards
totrue
- Set
- Note: You donโt need to swap the
node_key.json
file on the failover node. The network identifies nodes by their key and IP address, so changing the IP address might prevent successful syncing. - Stop the primary validator node.
- Restart the failover node.
[Option 2] Validator key hot swapโ
This method allows you to quickly transfer the validator key to the failover node with very little downtime.
Prosโ
- No need to restart the failover node during transition.
- Failover can happen in seconds, minimizing downtime.
Consโ
- The failover node cannot be used as an RPC node while it is on standby.
- The failover node is dedicated to just one validator node.
Setup for the failover node while it is on standbyโ
- In
config.json
- Set
tracked_shards_config
to{ "ShadowValidator": "<validator_id>" }
(where<validator_id>
is the pool ID of the validator). - Set
store.load_mem_tries_for_tracked_shards
totrue
.
- Set
- Note: The failover node must be dedicated to a single validator and cannot be used as an RPC node during failover. Since mem_trie doesnโt work well with RPC nodes, the failover node wonโt be able to perform RPC functions.
Procedureโ
With the changes made to the config.json
file and the validator key hot swap procedure, the failover node can quickly take over validator responsibilities.
- Copy
validator_key.json
to the failover node. - [Optional] Set
tracked_shards_config
to"NoShards"
inconfig.json
file of the failover node. - Note: You donโt need to swap the
node_key.json
file on the failover node. The network identifies nodes by their key and IP address, so changing the key might prevent successful syncing. - Stop the primary node.
- Send a
SIGHUP
signal to the failover node (without restarting it). - The failover node will pick up the validator key and start validating.