Topology between Relays and Blockproducers in a High Availability Scenario

I would like to test a high availability scenario. I’m not sure if that’s the correct plan, but I thought I should give it a try:

Let’s imagine a scenario with one active and one hot-standby backup block producer node. (Blockproducer A and Backup-Blockproducer B).

Let’s imagine my relays control automatically the correct blockproducer IP address they connect to: either Blockproducer A or Backup-Blockproducer B.
If Blockproducer A goes down, then automatically all Relays connect to Backup-Blockproducer B. I haven’t tested yet but theoretically this automatic switch-over should work using Virtual IP addresses and some testing logic.

My question:
All relays would have outgoing connections to only Blocknode A or Backup-Blockproducer B. They will never have simultaenous outgoing connections to both blockproducers.

BUT: Is it okay when Backup-Blockproducer B has outgoing connections to all relays continously regardless of the status of Blockproducer A? In case this is okay, this would reduce the complexity of this setup since I only need to control the IPs the relays connect to.

My question focuses on the difference between outgoing and incoming connections. I could rephrase my question like this: Is it harmful to Cardano’s ecosystem when I have a duplicate of my Blockproducer running live in internet which has outgoing connections to relays - but no relay has outgoing connection to this duplicated blockproducer node?

Thanks so much for your time!

Cheers,
Chris

3 Likes

good question. I had the experience that if the block producer node - B - does not have any incoming connections then the created block was not validated. (I planned to ask this in a separate topic what could be the reason for that behavior)

maybe because the created block is not sent to the relay if the connection initiated by the producer…
but if it is not true and it is sent to the relays on the connection which was initiated by the producer that could cause issues - since the other block producer node - A - also created the block.

in general having 2 active/synced block producer node is not a good idea - I would try this setup first on the testnet…

1 Like

Thanks for your answer and your feedback!

The benefit is, when the backup-blockproducer B has permanent outgoing connections (but no incoming connections), it would be synced to the blockchain 100%, so that in case of a failover event it could quickly react and continue the work of forging blocks. However, it needs to be assured that a blockproducer which has only outgoing connections (and no incoming connections) can not harm ecosystem of cardano. That’s the big question here?

I know there are lots of other solutions out there like vmware High Availability. But in case HA can be easily achieved by using tools like HAProxy, then it would be good for the community to share this knowledge or build a guide so that everybody who likes can build a easy HA-setup without having to invest lots of money!

Cheers,
Chris

2 Likes

I’m also interested in an HA solution. I’m trying to experiment with a back up block producer. I think it would work where the producer and backup producer are connected and the relays can use the heartbeat to switch over topology files to connect to the backup producer. I do think the issue would be with the backup producer being able to sync.

Can anyone confirm that the back up producer would sync if it has the relays in it’s topology and if the relays DON’t have the producer in their topolgies?

That could be one option. I saw another option on reddit where a relay was used to switch over to producer mode so it’s already synced. I do have issues with this method though. It probably has keys on it to produce blocks because it’s a back up producer. In relay mode, that would expose a producer to the network. I guess unless that relay was hiding behind another relay and isn’t used to communicate to other producers. Another problem that seems to pop up is KES rotation.
I found a few links that also discuss this in other places. I put the links below.

If there is more block producer nodes both of them will create the block - the question is whether the block from the backup node will be populated towards the relays if no incoming relay node connection. Basically a connection should not be differentiated about how initiated the connection.

Exactly that’s the question! Looks like we need to learn this by testing - or is there anybody who knows a bit more than we do? :wink: And I also absolutely agree on your last sentence… Basically, there should be no difference who initiated the connection.

What if you set it up so that they wasn’t an outgoing connection for your fallback node? I think this could be accomplished with two floating IPs, where the backup one is reserved but not pointed at the relay node:

If I have this right (I might not), then both Block Producer A (BP_A) and BP_B should be able to download and keep a fully up-to-date blockchain. BP_A will communicate out normally through Floating IP A (FIP_A), while no matter what, BP_B cannot reach the relay since it’s a dead end.

In the event of a failure of BP_A, BP_B could call an API that unassigns FIP_A and reassigns FIP_B to R. This would let you keep all of the topology files static while letting the topology itself change.

I’ve added a cron job on each of my BPs that can add/remove firewall rules based on a flag provided by a separate VPS that is pinging all my BP nodes and decides which one to switch my relays to. My stake is small while I’m testing (no blocks so far) but I do see that the node with incoming relay connections is processing transactions while the other BP nodes aren’t. That is expected.

The switch over takes about a minute after cron job kicks in. I see my relays connecting to the new BP node. My script restarts the old BP node as firewall rules don’t disconnect already established connections.

I’m seeing one problem though and I hope someone can chime in on that. The BP node I switch my relays to starts processing transactions after some delay. Sometimes it is a few minutes, sometimes it is a few hours. I don’t know if that is going to be a problem. The cncli ping returns “ok” for that node.

How you see that relays are being connected to new BP?

There are incoming connections listed in the live view. The number starts increasing soon after I unblock the firewall. I’ve got 4 relays and the number goes from 0 to 4.

And backup BP has outgoing connections towards relays all the time right?

That is correct. My understanding is that as long as there are no incoming connection there is no interference with the core node. That way I can keep the backup node synced and ready.

I think that any solution via script involving IPs, and firewall changes could have side effects on the network. Most of the stake blockchain networks are designed to work with one BP due to the complexity that represents having multiple potential duplicated blocks at the same time being transmitted to the network.

I’m not 100% on what happens in Cardano when a duplicate block is sent using a cloned BP, but I assume that one block will go and the other might create a fork and be rejected at a later time. Any how, my point here is that there is a reason there is not an official config to do this, meaning is not designed to do so.

I’m not trying to discourage innovation, just giving feedback. I know that for example, Solana had some sort of backup producer config that it acts like a active-pasive system, they both are in sync with the network, but with different status flags and maintained a heartbeat between them to determine who should be promoted to be the active one, but, they ended up discontinuing it cause it was buggy.

MO

If your backup BP has connections with any relay which is connected to other Cardano relays then you will produce dual blocks and cause forks.

People are now running nodes with P2P capability on mainnet and these nodes establish bi-directional links (duplex and “full duplex” connections). This means that if your block producer connects to a relay, even if the block producer initiated the connection, the relay can pull data from the block producer if the link is upgraded to a duplex link.

Consider the situation where your block producer is behind a firewall with no open incoming ports:

If your block producer, with P2P capability, connects out through such a firewall to a relay, then that relay may pull blocks from the block producer, even if the firewall does not allow any incoming connections at all. This is because the block producer establised the connection, and the firewall allows packets for established connections, and the relay can upgrade the connection to a duplex or full-duplex connection. So both the block producer and the relay can pull blocks from each other.

The consequence of producing dual blocks is that they can often have different transactions in them depending on what the mempool of each block producer contained at the instant the block was forged. This means that each block will get a different hash and your pool key will have signed each block. It will therefore be obvious to other pools in Cardano that your pool produced “dual blocks” which is very frowned upon because it causes every other node to have to choose one even though both are perfectly valid. The next two block producers could each choose a different block to build upon and therefore one of these block producers will get their block orphaned.

And, they will blame you for their loss. Social pressure from other pools complaining about your malicious and un-Cardano behaviour may cause your delegators to move their stake to other pools.

Just imagine how pissed off a small stake pool operator can be when they finally get their chance to mint a block and someone running dual block producers causes their only block for the month (or year) to be orphaned. Well, I wouldn’t want to be on the other side of that anger.

If you want to try running dual block producers, you had better hope that you only cause one of the Binance pools to get an orphaned block.

Well, not sure about how it is now. But when I tried block pulling like half year ago and it didn’t worked via outgoing connection from BP towards Relay using duplex connection. if we assume that relay do not connects toward BP, but just BP connects towards relay.

I might be wrong, but my assumption is, that fancy name for duplex is just reusing same ports for connections, but in reality there are still connection from relay and from BP to relay, only they are not using 4 ports as it was in non-duplex, but 2 now.

In my current production, I have no-p2p connection between relay and bps.

My backup BP has constant connection to relay, but not other way around. I do not produce dual blocks.

Again, I personally don’t enable p2p between, producers and relays. I didn’t had chance to retest now again, but I would assume if relay do not connects to BP, even with duplex on it should not be able to pull blocks.

1 Like

If you watch this video at the 5 min mark:

Duncan Coutts outlines how a block producer can connect through a firewall, that blocks all incoming new connections, to a relay, and that relay will be able to pull blocks from the block producer over the duplex connection between them.

It is true that if you continue to run your block producer and relay in legacy mode, then you may be able to prevent these duplex connections.

1 Like

Another solution is to run your backup node as a relay so that it syncs all the time. If need arises it is very easy to turn into BP, that just requires restart. Switch over takes just a bit longer. Considering possible duplex connections in P2P mode that would be bulletproof alternative. There are just 2 more lines of code to add to my script.

The only problem with that approach is that after restart node might enter into DB validation mode that takes about 40 minutes to complete. I see that happening sometimes, not sure why.

I personally use haproxy for this and it works flawlessly.

So I have several relays connecting to haproxy and haproxy desides where to send traffic. Fail-over is almost instant. I do not use p2p between relays and BPs.

I took that idea from here:

1 Like

But your BP still connects to your relays in order to get synced, right? In legacy mode to eliminate duplex connections? I’m wondering if relay is in P2P and BP is in legacy mode is duplex still possible?

1 Like

Duplex as is for now should be enabled in config file using "TestEnableDevelopmentNetworkProtocols": true

So as far as it is not enabled, your node should not use it, even in p2p

1 Like