Updating BLOCK-PRODUCING node without downtime

I’m setting up a stake pool and have questions below. Any help is appreciated:

  1. Exactly how can I safely update the block-producing node, without taking it offline?

  2. Exactly which keys should be on the “block producing node”, and which should be only on the “air-gapped machine” (with cold keys). Please be specific so I dont have the wrong keys on a machine connected to the Internet.

  3. Exactly how can I move the keys to a completely different machine? (ie do I just copy them over and restart the block-producing node, or is there something else?)

Please give answers like I’m a complete newbie. I’m quite new to all this. Literally I need “cut and paste commands”. I barely know how to delete a file via command line in Ubuntu.

4 Likes

Hello Bob,

and welcome to our community forum! Let me help you with these questions

  1. depends on what you are planning to update or upgrade, as well how big (in terms of stake and pledge) your pool is.

Small pool, when quick fix is needed:
if the downtime is needed to update a new version or updating certificates, then i would just restart the node as 1.19. starts rather quickly (~15 sec), so the chances to miss a block are rather slim.

The only thing you should do to avoid long startup is to shutdown the node gracefully:

  • from LiveView / TUI ( Text-based User Interface) hitting the Q button and waiting till process exits
  • if you are running in SimpeView:

killall -SIGINT cardano-node

  • if you are running using systemd

sudo service cardano-node restart

Extensive / time consuming repairs/downtime:

basically you boot up a copy of your BP node and do all the necessary upgrades, when it’s done, then point your relays to this server (by replacing the IP addresses in the topology file). That’ s it.

If you are using cloud VPS like AWS/GCP, then it’s easier, as you can boot a copy of your node and just re-asign the IP to the new VM and you don’t need to change anything in Relays.

2. Keys you need on your BP node
you need only these 3 files on that server, never EVER put any other key on the server (even temporary)

  • KES Singing Key (hot KES Key) - usually named hot.skey
  • VRF Signing Key - usually named vrf.skey
  • Node Operation Certificate - usually named op.cert

3. Yes, that’ s the advantage of POS approach. You just need your 3 files and compiled cardano node on a server to run a pool. All you need is:

  • get/compile the carano-node / cardano-cli binaries on that new server
  • copy all the keys/certificates
  • point your relay nodes to this server
    done.

Let me know if you need any more assistance,
Lauris

5 Likes

Thanks! Very helpful. A few other questions:

1. Would there be any problem by having the same BP node connected at the same time?

Example: Lets say I had 4 relays, an old BP and an updated BP. After I update a relay, I connect it to the updated BP. Will the relay or any devices in the mainnet reject the updated BP, or cause any other issues?

2. Once the updated node has connected to the updated BP, will the BP be able to produce blocks?
I ask because for a brief moment, the BP may only connect to my relay. But would that one connection be enough to avoid missing out on creating a block?

I know it’s very unlikely, but I’m taking this very seriously and am investing a lot. I have the redundant connections and rolling backups all sorted, and just have some last parts.

3. What is your favorite color?

  1. only nodes which know your BP nodes IP address are your relays… nobody even will notice (and no one cares, except some hackers :slight_smile: ) if you change to another server. As long the updated node is connected to your relays (you have to do changes in all topology files) and all keys/certificates are loaded, you will be fine. Just turn off the old node (and remove it’ s ip from all the topology files on relays).

  2. yes, as soon your nodes are in sync you are ready to produce blocks. it is a possibility to miss a block, but it’ s rather small.

  3. The color of ADA :slight_smile:

Do adjustments to the topology file require restart of the node or do they automatically apply when the file is saved?

they require a restart

Thanks for all the help. One more question for now:

  1. Is there a problem having an exact copy of my block-producing node connected to the network at the same time? Would this cause some kind of conflict?

  2. If it’s not possible, what do you suggest to enable quick restoration of the BP node if the physical hardware fails?

I ask because if the physical machine fails, then the BP node will be down for maybe 10+ minutes

Running a duplicate block producer will create adversarial forks, and will be punished by the protocol in the future. This is not recommended.

Hardware redundancy is complicated. I would recommend taking necessary precautions (RAID, battery backup, etc), but would recommend a cloud node on standby, running passively and networked to your relays, which can be restarted, using a separate systemd service, as a temporary block producer.

You also might want to keep a relatively updated copy of the chain / database just in case.

2 Likes

Thanks. So say the original block-producing node crashed, exactly what do I need to do to get the passive node running as the block-producer?

Is it just move the 3 keys, then restart the node? Is it that simple?

If it is, how do I know the keys have been recognized and the BP is working properly?

Yes, move the 3 keys over and setup a separate systemd service in advance that accounts for starting the node using the block producer keys.

You will also want to keep a separate topology file on the relay which configures it as a block producer when you start the alternative systemd service which will to switch them out (use a separate environment file for the alternate systemd service which points the the alternate topology file).

From there you would just lock down your ports so the new bp can only receive inbound connections from your relays - you could go so far as to have these firewall commands ready to go, as I prefer to avoid firewall rule automation when possible

Your friend, FROG

2 Likes

Hi, Have your views evolved on automating the startup of a backup block producer?
I am using noip for my BP, was thinking this could be started on the backup upon failure of the main Producer to quickly update the topology on the running relays while changing the backup config to BP.
Main issue becomes having a heartbeat on main BP and functioning scrips to detect and act when main BP fails /restarts.