Topology between Relays and Blockproducers in a High Availability Scenario

I would like to test a high availability scenario. I’m not sure if that’s the correct plan, but I thought I should give it a try:

Let’s imagine a scenario with one active and one hot-standby backup block producer node. (Blockproducer A and Backup-Blockproducer B).

Let’s imagine my relays control automatically the correct blockproducer IP address they connect to: either Blockproducer A or Backup-Blockproducer B.
If Blockproducer A goes down, then automatically all Relays connect to Backup-Blockproducer B. I haven’t tested yet but theoretically this automatic switch-over should work using Virtual IP addresses and some testing logic.

My question:
All relays would have outgoing connections to only Blocknode A or Backup-Blockproducer B. They will never have simultaenous outgoing connections to both blockproducers.

BUT: Is it okay when Backup-Blockproducer B has outgoing connections to all relays continously regardless of the status of Blockproducer A? In case this is okay, this would reduce the complexity of this setup since I only need to control the IPs the relays connect to.

My question focuses on the difference between outgoing and incoming connections. I could rephrase my question like this: Is it harmful to Cardano’s ecosystem when I have a duplicate of my Blockproducer running live in internet which has outgoing connections to relays - but no relay has outgoing connection to this duplicated blockproducer node?

Thanks so much for your time!

Cheers,
Chris

3 Likes

good question. I had the experience that if the block producer node - B - does not have any incoming connections then the created block was not validated. (I planned to ask this in a separate topic what could be the reason for that behavior)

maybe because the created block is not sent to the relay if the connection initiated by the producer…
but if it is not true and it is sent to the relays on the connection which was initiated by the producer that could cause issues - since the other block producer node - A - also created the block.

in general having 2 active/synced block producer node is not a good idea - I would try this setup first on the testnet…

1 Like

Thanks for your answer and your feedback!

The benefit is, when the backup-blockproducer B has permanent outgoing connections (but no incoming connections), it would be synced to the blockchain 100%, so that in case of a failover event it could quickly react and continue the work of forging blocks. However, it needs to be assured that a blockproducer which has only outgoing connections (and no incoming connections) can not harm ecosystem of cardano. That’s the big question here?

I know there are lots of other solutions out there like vmware High Availability. But in case HA can be easily achieved by using tools like HAProxy, then it would be good for the community to share this knowledge or build a guide so that everybody who likes can build a easy HA-setup without having to invest lots of money!

Cheers,
Chris

2 Likes

I’m also interested in an HA solution. I’m trying to experiment with a back up block producer. I think it would work where the producer and backup producer are connected and the relays can use the heartbeat to switch over topology files to connect to the backup producer. I do think the issue would be with the backup producer being able to sync.

Can anyone confirm that the back up producer would sync if it has the relays in it’s topology and if the relays DON’t have the producer in their topolgies?

That could be one option. I saw another option on reddit where a relay was used to switch over to producer mode so it’s already synced. I do have issues with this method though. It probably has keys on it to produce blocks because it’s a back up producer. In relay mode, that would expose a producer to the network. I guess unless that relay was hiding behind another relay and isn’t used to communicate to other producers. Another problem that seems to pop up is KES rotation.
I found a few links that also discuss this in other places. I put the links below.

If there is more block producer nodes both of them will create the block - the question is whether the block from the backup node will be populated towards the relays if no incoming relay node connection. Basically a connection should not be differentiated about how initiated the connection.

Exactly that’s the question! Looks like we need to learn this by testing - or is there anybody who knows a bit more than we do? :wink: And I also absolutely agree on your last sentence… Basically, there should be no difference who initiated the connection.

What if you set it up so that they wasn’t an outgoing connection for your fallback node? I think this could be accomplished with two floating IPs, where the backup one is reserved but not pointed at the relay node:

If I have this right (I might not), then both Block Producer A (BP_A) and BP_B should be able to download and keep a fully up-to-date blockchain. BP_A will communicate out normally through Floating IP A (FIP_A), while no matter what, BP_B cannot reach the relay since it’s a dead end.

In the event of a failure of BP_A, BP_B could call an API that unassigns FIP_A and reassigns FIP_B to R. This would let you keep all of the topology files static while letting the topology itself change.