Updating BLOCK-PRODUCING node without downtime

Mr_Anderson · 26 August 2020 04:53

I’m setting up a stake pool and have questions below. Any help is appreciated:

Exactly how can I safely update the block-producing node, without taking it offline?
Exactly which keys should be on the “block producing node”, and which should be only on the “air-gapped machine” (with cold keys). Please be specific so I dont have the wrong keys on a machine connected to the Internet.
Exactly how can I move the keys to a completely different machine? (ie do I just copy them over and restart the block-producing node, or is there something else?)

Please give answers like I’m a complete newbie. I’m quite new to all this. Literally I need “cut and paste commands”. I barely know how to delete a file via command line in Ubuntu.

lauris · 26 August 2020 07:22

Hello Bob,

and welcome to our community forum! Let me help you with these questions

depends on what you are planning to update or upgrade, as well how big (in terms of stake and pledge) your pool is.

Small pool, when quick fix is needed:
if the downtime is needed to update a new version or updating certificates, then i would just restart the node as 1.19. starts rather quickly (~15 sec), so the chances to miss a block are rather slim.

The only thing you should do to avoid long startup is to shutdown the node gracefully:

from LiveView / TUI ( Text-based User Interface) hitting the Q button and waiting till process exits
if you are running in SimpeView:

killall -SIGINT cardano-node

if you are running using systemd

sudo service cardano-node restart

Extensive / time consuming repairs/downtime:

basically you boot up a copy of your BP node and do all the necessary upgrades, when it’s done, then point your relays to this server (by replacing the IP addresses in the topology file). That’ s it.

If you are using cloud VPS like AWS/GCP, then it’s easier, as you can boot a copy of your node and just re-asign the IP to the new VM and you don’t need to change anything in Relays.

2. Keys you need on your BP node
you need only these 3 files on that server, never EVER put any other key on the server (even temporary)

KES Singing Key (hot KES Key) - usually named hot.skey
VRF Signing Key - usually named vrf.skey
Node Operation Certificate - usually named op.cert

3. Yes, that’ s the advantage of POS approach. You just need your 3 files and compiled cardano node on a server to run a pool. All you need is:

get/compile the carano-node / cardano-cli binaries on that new server
copy all the keys/certificates
point your relay nodes to this server
done.

Let me know if you need any more assistance,
Lauris

Mr_Anderson · 27 August 2020 00:01

Thanks! Very helpful. A few other questions:

1. Would there be any problem by having the same BP node connected at the same time?

Example: Lets say I had 4 relays, an old BP and an updated BP. After I update a relay, I connect it to the updated BP. Will the relay or any devices in the mainnet reject the updated BP, or cause any other issues?

2. Once the updated node has connected to the updated BP, will the BP be able to produce blocks?
I ask because for a brief moment, the BP may only connect to my relay. But would that one connection be enough to avoid missing out on creating a block?

I know it’s very unlikely, but I’m taking this very seriously and am investing a lot. I have the redundant connections and rolling backups all sorted, and just have some last parts.

3. What is your favorite color?

lauris · 27 August 2020 06:27

only nodes which know your BP nodes IP address are your relays… nobody even will notice (and no one cares, except some hackers ) if you change to another server. As long the updated node is connected to your relays (you have to do changes in all topology files) and all keys/certificates are loaded, you will be fine. Just turn off the old node (and remove it’ s ip from all the topology files on relays).
yes, as soon your nodes are in sync you are ready to produce blocks. it is a possibility to miss a block, but it’ s rather small.
The color of ADA

TheAndy · 27 August 2020 19:58

Do adjustments to the topology file require restart of the node or do they automatically apply when the file is saved?

lauris · 28 August 2020 06:52

they require a restart

Mr_Anderson · 30 August 2020 00:17

Thanks for all the help. One more question for now:

Is there a problem having an exact copy of my block-producing node connected to the network at the same time? Would this cause some kind of conflict?
If it’s not possible, what do you suggest to enable quick restoration of the BP node if the physical hardware fails?

I ask because if the physical machine fails, then the BP node will be down for maybe 10+ minutes

ADAfrog · 31 August 2020 10:55

Running a duplicate block producer will create adversarial forks, and will be punished by the protocol in the future. This is not recommended.

Hardware redundancy is complicated. I would recommend taking necessary precautions (RAID, battery backup, etc), but would recommend a cloud node on standby, running passively and networked to your relays, which can be restarted, using a separate systemd service, as a temporary block producer.

You also might want to keep a relatively updated copy of the chain / database just in case.

Mr_Anderson · 1 September 2020 23:18

Thanks. So say the original block-producing node crashed, exactly what do I need to do to get the passive node running as the block-producer?

Is it just move the 3 keys, then restart the node? Is it that simple?

If it is, how do I know the keys have been recognized and the BP is working properly?

ADAfrog · 2 September 2020 00:57

Yes, move the 3 keys over and setup a separate systemd service in advance that accounts for starting the node using the block producer keys.

You will also want to keep a separate topology file on the relay which configures it as a block producer when you start the alternative systemd service which will to switch them out (use a separate environment file for the alternate systemd service which points the the alternate topology file).

From there you would just lock down your ports so the new bp can only receive inbound connections from your relays - you could go so far as to have these firewall commands ready to go, as I prefer to avoid firewall rule automation when possible

Your friend, FROG

Lionel · 1 June 2021 15:40

Hi, Have your views evolved on automating the startup of a backup block producer?
I am using noip for my BP, was thinking this could be started on the backup upon failure of the main Producer to quickly update the topology on the running relays while changing the backup config to BP.
Main issue becomes having a heartbeat on main BP and functioning scrips to detect and act when main BP fails /restarts.

Topic		Replies	Views
How to Update Cardano-Node to 1.20.0 Operate a Stake Pool stake-pools , cardano-node	6	1519	15 February 2021
Migrate block producer node to new server Operate a Stake Pool	8	710	1 February 2023
Stake pool creation help Operate a Stake Pool	18	798	24 May 2021
How to migrate BP node to a new server Operate a Stake Pool	34	2451	3 November 2022
Block producing node connection timed out Setup a Stake Pool	8	720	20 December 2021

Updating BLOCK-PRODUCING node without downtime

Related topics