Running dual block producers to produce forks

7.4d4 · 25 April 2022 00:07

I just noticed that Pool: “P2P Validator #3” (P2P3) seems to be regularly producing forks.

You can see this via pooltool.io for the current epoch via this link:
Cardano PoolTool - The most comprehensive staking statistics for Cardano on the web.

If you click on the various blocks they have produced, you will see that they are producing many forks. 5 of 11 blocks so far this epoch have resulted in forks.

For example, here are the first, second and fifth blocks “P2P Validator #3” produced this epoch:

BlockNo	Pool Name			SlotNo  Block Hash
7150131	P2P Validator #3	32529	5b61c8caaaa8646ebf37ad402527a4732b82817f52170b67c12364c57385dfdc
7150131	P2P Validator #3	32529	2f5fe0cee3511a4ab43ffc7f7d927d1a2481fb9541eae08e579de768656ac53d

7151154	P2P Validator #3	53675	6d0ad8e3f83e6d83e2132fe20f9b73847d52ee0790ff0a4ab96c7f87f5f8c066
7151154	P2P Validator #3	53675	3e2fc3386080d293edf97ffab63c4f7627efd9e2d9255e44e08e76d168755071

7154970	P2P Validator #3	133877	a22bbe4e2041df29ddfe42fa0c4df3a9933e28e5ab117b9904dac94845e30ddc
7154970	P2P Validator #3	133877	449116af7e35a31457f620963bfd7f25473df3bd3e5cd3276a5bbb725ffa2055

Note that running backup (dual) block producers causes forks because each node maintains its own mempool. Therefore when both block producers produce a block, the blocks can differ by including different transactions.

Forking like this causes other pools to lose blocks because their nodes have to pick one fork to build upon and both forks are valid. The nodes resolve the chain in the end according to “longest chain wins”. Those pools that happened to build on the other fork get their blocks orphaned.

How do we bring attention to this so that pool operators don’t run dual block producers?

Triton-pool · 25 April 2022 06:08

The best way to draw spo’s attention is to delegate to some other pool.

HeptaSean · 25 April 2022 06:24

Hmm, I doubt that we can reach enough of this concrete pool’s delegators and get them to be interested in these technical details to make a real difference.

Is there a legitimate reason, why a pool could produce a fork with itself?

Perhaps a CIP to build into the protocol that such forks are punished by letting the pool lose the block completely is wothwhile?

Neo_Spank · 25 April 2022 06:35

Hello @7.4d4

There was a discussion about this before. It turned out that many stake pool operators didn’t make their stand-by back up BP as a stand by. After they realized this they switched to back-up BP as a node.

Concern was that it helps them win block battles if they encounter any.

If anyone can contact that stake pool operator just let them know that both BPs are running. Maybe they are not aware of this.

This thread on how to set up back-up BP could be helpful:

7.4d4 · 25 April 2022 07:01

The problem is that the following pool operator who receives one of the blocks doesn’t know this until they receive the other block. Depending on slot timing, they may have already produced their block on top.

So very annoying and incompetent running of a stake pool. They do not deserve their delegators.

I guess the only consolation is that “input endorsers” will fix the problem of other pool operators getting orphaned blocks from this incompetence.

brouwerQ · 26 April 2022 08:53

Ans why would delegators do that?

vitaly-p2p · 26 April 2022 12:12

It will be resolved soon after full migration to new infrastructure, I was assigned to pool configuration and migration and I needed to migrate to another infrastructure without downtime

I have another quesiton why network accepts blocks from not registered relays, for now there is only one block-producer connected to submitted relays and another one will be shutdown after a while but it is connected to different relays that are not submitted

Also there is no good example of how to run backup node

vitaly-p2p · 26 April 2022 12:23

when you run BP as relay then you must restart a node with keys after you found that something wrong with you active node
it requires up to 2 hours to start BP start working

then you can lose several slot during this time when your node just syncing and checks chunks

vitaly-p2p · 26 April 2022 12:28

Cardano was designed to be able run as much node as possible to run several nodes with the same key to improve your overall uptime

I also noticed that cardano protocol handles forks very much and that forks not a problem even if I run 100 of nodes, then I don’t know why afraid of them so much

There is no other way to improve your node uptime without running several nodes
when you produce several slots per hours you really cannot update node without losing any slot

So there is only one way to update node is to run second BP wait until it synced than update first one etc

ATADA · 26 April 2022 14:46

If your “not registered relay” is in the topology of any other node on the chain (including your registered ones), the block will be fetched from it and so it goes onto the chain. There are many “not-registered” relays on the chain.

maybe if your node has to migrate to a new db-format, a node start typically takes between 5-10mins nowadays. if yours is taking way longer, you may not shut it down correctly via a SIGINT signal.

Oh you’re wrong here, running more than one active BP node with the same keyset will generate forks very often. And all the nodes that are fetching blocks from your relays are having problems with that. Because they land on a forked chain piece. We have seen many BlockLosses in the past because of people don’t care and forking like a champ. So please stop it!

There is, do it like the Pros are doing it. Monitor your main BP-Node via a Monitoring Tool and firewall off your backup BP-Node. After a set time your monitoring tool is reporting a failure on your main BP-Node, drop the firewall of your backup BP-Node and it will be online with no downtime. Continue to monitor your main BP-Node. As soon as it response again correctly you can do an automatic or manual fallback. There is absolutly no need to run two or more BP-Nodes active side by side for more than a minute or so. Most unlikely that a block falls into that crossover window.

vitaly-p2p · 26 April 2022 14:57

Oh you’re wrong here, running more than one active BP node with the same keyset will generate forks very often. And all the nodes that are fetching blocks from your relays are having problems with that. Because they land on a forked chain piece. We have seen many BlockLosses in the past because of people don’t care and forking like a champ. So please stop it!

test

Neo_Spank · 26 April 2022 14:58

Hello @vitaly-p2p

Did you check out simple failover setup:

The reason why other stake pools try to discourage this is because it may cause them to loose their produced blocks. If one of your forks gets propagated as a tip to another BP they may produce many block and have them all ghosted when their fork gets ghosted.

Check out previous debates about this issue:

So it’s not anti-network behavior as much as it is anti-peer behavior. This is why community will usually try to reach out to stake pool operators that are causing this.

Check out examples of notifications/ debates on Twitter about this issue:

If you like to learn more why dual BP are bad here is a very good YouTube video by Adam Westberg on dual leaders. He can explain it way better then I can

vitaly-p2p · 26 April 2022 15:21

Okay will test it, current block-producer runs in docker and single docker restart results node syncing 50 minutes and up

I care about forks and care about cardano network as much as you, but I also care about uptime, BUT this behavior is by protocol.
It is blockchain if you want to make people stop running several nodes then make it not good for those people who run it, isn’t? (by the way I agreed with community and your afraid of forks, so soon after migration all block-producers but one will be shutdowned)

I just don’t understand how you can make you BP synced if it cannot communicate with network.
BP connects to relays via TCP so you cannot block only incoming or outcoming traffic, if connection is established nodes are allowed to exchange any information in both direction.

Please share some detailed information from these Pros how to implement this Firewall-BackupBP standby-active solution.

brouwerQ · 26 April 2022 16:01

Incoming and outgoing connections are separated for now (this will change with p2p, so the failover strategy implementation might change then).

Say you run your failover on e.g. port 6000. If you block incoming connections to port 6000 on your failover, none of the blocks forged there will propagate. But your failover can still connect to your relays (on also e.g. port 6000), the port used on the failover side will be another port (assigned by the network software). So those connections aren’t blocked and you can still sync your failover.
You will have incoming connections from your relays to your BP, these will have the port of your BP as destination port and a random port number as source port.
You will have outgoing connections from your BP to your relays, these will have a random

ATADA · 26 April 2022 16:15

With the current implementation you have to do the following:

Point your BP-Node to you Relay-Node by adding the relay ip:port in the topology of the BP-Node. This connection will stay active all the time, the BP-Node will stay on tip via this connection
Point your Relay-Node to your BP-Node by adding the bp ip:port in the topology of the Relay-Node. Lets say your BP-Node is listening on port 4001. Now you set up a firewall rule that disallows incoming connections to port 4001. You will see that your Relay-Node will try to connect to your BP-Node, but can’t establish a connection. Your BP-Node is not in “hot-standby” mode. As soon as you remove/disable that firewall rule and allow incoming connections to port 4001 on your BP-node, the Relay-node will connect within a few seconds and your backup BP-Node is now live on the chain and can distribute its blocks. With the active firewall the BP-Node just stays on tip and always synced. It will generate blocks, but noone will listen to them, so they don’t land onchain.

With future nodes that are also supporting P2P bidirectional connections, there will be other methods to enable/disable blockproduction on a node. Most likely via reloading config-files on the files triggered by signals without the need to restart the node.

ATADA · 26 April 2022 16:20

As this cannot be handled on the protocol level right now (there are discussions on making penalties for such a behavior), the past showed that handling it on the social media level works pretty well. SPOs will be called out very quickly on the socials, we had those situations in the past.

vitaly-p2p · 26 April 2022 16:24

@ATADA @brouwerQ @7.4d4 @Neo_Spank @HeptaSean @Triton-pool
thank you for help and sorry for making some troubles
It was made in purpose as I knew that it is allowed in cardano to run several block-producer to increase pool uptime and do not lost slots, I decided during migration to new infrastructure to hold old pools until migration of all pools will be completed

I also was told that network doesn’t accept block from not registered relays, but it seems wrong as now registered relays connected only to one BP

brouwerQ · 26 April 2022 17:18

Running two active block producers isn’t best practice.

Who told you that or where did you find that info? And what would be the purpose of a non registered relay if blocks weren’t accepted? Also, people or organizations can run relay nodes without running a pool just to support the network. E.g. IOG now runs some relays to support the network. All Daedalus instances only connect to those relays at this moment by the way (unless you change it’s config to explicitly point to another node).

ATADA · 26 April 2022 17:58

you’re welcome

7.4d4 · 27 April 2022 08:30

We are all learning. Especially me. Sometimes there are things that can be done better another way. I appreciate @vitaly-p2p rectifying things quickly. I also respect him/her for creating an account on this forum and responding. Thanks.

I guess the main point is that causing chain forks is a type of sybil attack. Other staking protocols implement slashing but Cardano does not. Instead Cardano relies on its community shifting stake around in order to mitigate such sybil attacks.

Large pool operators get the privilege to make lots of blocks. Small pool operators absolutely cherish every block they get and it hurts if they get an orphan. They will analyse the reasons if they get an orphaned block.

Neverthelsss, Cardano’s non-custodial staking with no slashing is a winning feature because it unlocks many other advantages.

Topic		Replies	Views
Block Producer - Failover Approach with a BP Standby Operate a Stake Pool	63	2934	24 February 2022
Hypothetical (having duplicate Block producers) 1 cloud 1 home Operate a Stake Pool	24	1812	10 March 2021
Does anybody understand Ouroboros? Education	12	4035	25 October 2018
Pool deliberately (?) producing empty blocks Operate a Stake Pool	10	1110	4 May 2022
:es: Bifurcaciones: Explicación por parte de Cardano PoolTool General español-🇪🇸 , cardano-pooltool	0	720	15 January 2020

Running dual block producers to produce forks

Related topics