How to set up Failover/Backup node

Overview

There are two main ways to set up a failover node, each with its own pros and cons. Read through both options and choose the one that works best for your situation.

Option 1 is best if you need a failover node that can also serve as an RPC node, or if you need one failover node for multiple validators. However, it requires a restart when the failover node becomes a validator.

Option 2 is best for minimizing downtime, as the failover happens almost instantly. However, the failover node must be dedicated to a single validator and cannot serve as an RPC node.

Important Note

Once a node becomes a validator node and begins tracking a specific shard, it cannot be reverted to track all shards.
During the failover process, a node operator may lose one RPC node when the node transitions to a validator node (option 1). This is a known issue, and it is on the team's roadmap for resolution.

[Option 1] RPC node as a failover node

This is the traditional recovery plan which has been available on the mainnet, where you have a primary validator node and a secondary failover node.

Pros

It is possible to use an RPC node as a failover node.
Failover node can be used for multiple validator nodes, each tracking different shards.

Cons

A restart of neard is required before a failover node can be promoted to a new validator node.

Setup for the failover node while it is on standby

In config.json

Set tracked_shards to [0]
Set store.load_mem_tries_for_tracked_shards to false

Procedure

Copy validator_key.json to the failover node.
In config.json file of the failover node:
- Set tracked_shards to []
- Set store.load_mem_tries_for_tracked_shards to true
Note: You don’t need to swap the node_key.json file on the failover node. The network identifies nodes by their key and IP address, so changing the IP address might prevent successful syncing.
Stop the primary validator node.
Restart the failover node.

[Option 2] Validator key hot swap

This method allows you to quickly transfer the validator key to the failover node with very little downtime.

Pros

No need to restart the failover node during transition.
Failover can happen in seconds, minimizing downtime.

Cons

The failover node cannot be used as an RPC node while it is on standby.
The failover node is dedicated to just one validator node.

Setup for the failover node while it is on standby

In config.json
- Add “tracked_shadow_validator”: “<validator_id>” (where <validator_id> is the pool ID of the validator).
- Set tracked_shards to [].
- Set store.load_mem_tries_for_tracked_shards to true.
Note: The failover node must be dedicated to a single validator and cannot be used as an RPC node during failover. Since mem_trie doesn’t work well with RPC nodes, the failover node won’t be able to perform RPC functions.

Procedure

With the changes made to the config.json file and the validator key hot swap procedure, the failover node can quickly take over validator responsibilities.

Copy validator_key.json to the failover node.
[Optional] Remove “tracked_shadow_validator”: “<validator_id>” from config.json file of the failover node.
Note: You don’t need to swap the node_key.json file on the failover node. The network identifies nodes by their key and IP address, so changing the key might prevent successful syncing.
Stop the primary node.
Send a SIGHUP signal to the failover node (without restarting it).
The failover node will pick up the validator key and start validating.

How to set up Failover/Backup node

Overview​

Important Note​

[Option 1] RPC node as a failover node​

Pros​

Cons​

Setup for the failover node while it is on standby​

Procedure​

[Option 2] Validator key hot swap​

Pros​

Cons​

Setup for the failover node while it is on standby​

Procedure​

Overview

Important Note

[Option 1] RPC node as a failover node

Pros

Cons

Setup for the failover node while it is on standby

Procedure

[Option 2] Validator key hot swap

Pros

Cons

Setup for the failover node while it is on standby

Procedure