HA Stake Pool Cluster

From what I have read about pool operations and other how-tos from others, most stake pool operations seem to follow a one block-producing node and multiple relay node topology (shown here in the official docs). Some tutorials mention having a failover block-producing node to prevent forks but those all seem to be from the Jormungandr days.

What I’m wondering is, how would someone go about achieving high-availability with cardano-node? We mostly want to check if any node is unhealthy, either by being in a forked or other failed state, and remove that node, if it is a relay, or fail to another block-producing node, if the block-producing node is the one at fault. Can a cluster have multiple block-producing modes that are load balanced? Can we failover to a new one if that isn’t possible? (and how would that be done with cardano-node?) Can relays be added and removed dynamically?

I realize that the relays could just be added or removed via DNS, so I’m not really worried there. I’m just wondering how in the heck I can add some redundancy to the block-producing node. Thanks for any help!

Hi Jamison,

You need to be careful with adding redundancy to you block producing nodes. If multiple bp nodes are running simultaneously, they will introduce adversarial forks to the chain when they mint - a behavior that which will be punished in the near future. So it makes sense to have a node on standby that can be spun up as a producer - but caution should be taken to ensure you never have duplicate block production nodes running concurrently.

The good news is 1.19 has remarkable performance improvements (nodes are coming up in 15-25s), so that will change strategies for many.

The characteristics you monitor to assess node health will be up to you, and many are enabling automation based on particular factors while integrating custom alert triggering for other factors.

Your friend, FROG

Thank you FROG.

Seems that it would be a “Shoot The Other Node In The Head” type problem. Do you have any information on the website or a whitepaper that talks about the adversarial fork feature?

Thanks!

I don’t offhand - but I’ve seen it discussed in groups from respected operators that it’s coming soon

I know we had a hell of a time with adversarial forking during ITN with one particular party - so much so that pool operators banded together, identified the target adversarial ip, and collectively firewalled it from our nodes - which effectively tanked the adversarial forker’s ability to land blocks on the chain - he stopped immediately as result. Adversarial forking is not something many operators take lightly.