Today my pool #ORPH lost a slot battle. Given it is approaching 200 produced blocks, this was bound to happen eventually, so kudos to the other pool for being luckier.
However, this raises an interesting question. While it is easy to figure out when you are on the losing end (lost slot battle: somebody else produced a block with the exact same slot number; lost height battle: somebody else produced a block with a higher but close slot number), is it possible to figure out when your pool is winning a slot battle or when your pool is causing somebody else to lose a height battle?
Is there a way to extract relevant information from the cardano-node logs (e.g. network fork events)?
Even if that was possible, according to my understanding, cardano-node only keeps track of network forks observed by individual relays/producers, therefore only a part of the network will ever learn of the lost blocks. Is this correct?
If yes, would it make sense to send such observed network fork events to a centralized service (e.g. similar to how tips are sent to pooltool.io to calculate propagation delays) to create a searchable database? It seems to me we are lacking such a feature that could be used to monitor the health of the network and optimize the topology (in case of height battles). Any thoughts about this?
Thanks, I never realized pooltool.io keeps track of slot and height battles. While there is no convenient way to query the database for my question specifically (when does my pool win slot/height battles), I guess the authors can easily extend the web interface to allow that, or at least one can write a script to check all produced blocks individually.
On the other hand, would you mind explaining how you can “see them in the logs”? Let’s say producer A minted a winning block and producer B minted an orphaned block. It is possible for producer A to learn it is a winner without relying on a centralized service like pooltool.io? Will the orphaned block be propagated through the whole network (in which case the answer is yes)? Or will it be propagated only by those relays who found out about the orphaned block sooner than the winning block, while the other relays quietly discard it (in which case the answer is no)?
The orphaned blocks are propagated through the entire network, but they are accepted by a limited number of nodes, until they receive the other block, and then they will replace the block with the new one, the one that won the slot/height battle.
You can see in the attached screenshot how many nodes received the block that lost the battle and reported it to pooltool, and how many received the winning block (or both, but adopted the winning block before reporting to pooltool):
It is more interesting when height battle happen. I am not sure here exactly what decides the winner in such a situation as in the attached picture:
By looking at the number of nodes reporting the winning block, it seems that all nodes reported the winning block, even those who reported first the orphaned block (all the blocks are reported by about 320 nodes). I am not sure if the algorithm is the same as for slot battles (depending on the VRF output) or a different one. In this case, the block produced one second later won. But the in the next screenshot, the block produced one second later lost the height battle:
The propagation times in the screenshots are so big because we are now in the rewards calculations period. Usually, the propagation times are under 1 second, most of them around 0.5 seconds.
In the logs, a slot/height battle looks like this:
The same height battle on pooltool looks like this:
In this case, both blocks propagated to a big number of reporting nodes, but the block created later and propagated slower won, which makes me think that the same algorithm applies in height battles, too (VRF output). I cannot read Haskell code to understand from there the algorithm, unfortunately.
I also notice that the BLOCK pool was really unlucky, in a few minutes it lost one slot battle and one slot height.
Thanks, it looks like we can’t figure slot/height battles just from the perspective of a single BP/relay.
Regarding slot battles, the algorithm simply chooses based on smaller VRF proof (but gives an edge to the pool with smaller active stake). Regarding height battles, it’s more complicated: each BP will mint a new block based on the previous tip it knows of, so any block not reaching the BP in time will be orphaned. The network accepts the chain that is the most dense (largest number of blocks) and has the highest slot number. In practice, only two pools will likely be engaged in a height battle, which means the density remains the same and the winner is always the pool with the higher slot number.
For slot battles it is clear, I meant the same thing.
But now I am confused about height battles. According to the explanations in the link, “nodes will choose the block with the higher slot number and orphan the other”. That’s exactly what I thought, but it does not seem to be (entirely) true: @georgem1976 posted an example where the pool with the lower slot number won a height battle.
Based on my brief understanding and based examples above, winner of height battle seems are very connected with amount of relays accepted block(same as @BEAVR wrote), in all examples above height battle were won with higher slot, only in one case it was lost, but in that cases only 5 relays accepted that block vs 300+, so it seems like everything is exactly as it should. Or no?
Hmm, well, it may actually be a double height battle. ADV2 didn’t receive BNP’s block in time, but, at the same time, QUEEN didn’t receive ADV2’s block in time. So, in effect, ADV2 won a height battle against BNP but lost against QUEEN. In this case, our understanding of height battles (and the explanation of @BEAVR) is correct and PoolTool simply did not consider such scenarios. What do you think @georgem1976?
Well, assuming the higher slot number always wins, if QUEEN did not participate in a height battle, then there is no reason for it to ignore ADV2 and accept BNP. Therefore, either our understanding of height battle is wrong or there was a double height battle (which can explain what we observe with our current understanding) that PoolTool didn’t represent correctly. I am inclined to believe the latter is true.
I should clarify and probably update my blog post regarding Height Battles.
From my experience, height battles are actually won by whichever block was received first. The slot number has no effect. We discovered that relays accept which block comes first when looking into this problem:
Basically, if any block producer that creates a valid block with the same block number that collides with any other block number, irrespective of slot number, a height battle is introduced. Whatever the relays receive first is the one that is accepted. One might think having the fastest block propagation would decrease the likeliness of height battles. This is true one one aspect in that you will not collide with other leaders in the next slots. But it does not cover the scenario when leaders in the past have bad block propagation.
See this question I posted on Cardano stackexchange:
I couldn’t figure out why my block got orphaned.
Now I think what happened is as follows:
The previous slot leader’s block was delayed so much (30 seconds) but ended up propagating across the network just as my block producer produced its block. By the time my block producer produced its block more than 50% of the network had received the previous slot leader’s block. Therefore my block got invalidated because it was received by the majority of the network after the previous leader’s block.
Basically the previous slot leader got away with a 30 second delay because my block producer was only leader right at the end of this delay time. My producer didn’t receive the previous leader’s block in time. However 50% of the network did receive the previous leader’s block just before they received my block.
Both blocks had the same block number so first one received wins.
OK Now I understand. Thank you so much for that link. I had been trying to understand this. It seems now that my initial thoughts on the matter were correct and hence the title of my original stackexchange question.
I checked the pool that caused my block to be orphaned and it had only 100K delegation so it got very lucky in producing a block. Thus if my block and it’s block were judged on VRF values then it would be highly likely that it’s block would win using the VRF metric. Even though it’s block was 30 seconds delayed.
I strongly agree with the change proposed by dcoutts that you linked to:
The current ordering rule is the lexicographic combination of the following, in order:
compare chain length
when (same slot number) then (compare if we produced the block ourselves) else equal
when (same block issuer) then (compare operational certificate issue number) else equal
compare (descending) leader VRF value
The suggestion is to change the last one:
when (both blocks’ slots are within Delta of now) then (compare (descending) leader VRF value) else equal
If this change was made then the smaller pool’s block, which arrived 30 seconds late, would not fall within the 5 second (Delta) window and thus my block would have been treated equally with it’s block. In other words, whichever block ended up propagating through the network fastest would likely win as other blocks got built on top. Using longest chain wins rule.
If this change is not made then selfish mining by smaller pools is possible as follows:
Make a small pool
Every time it mints a block deliberately delay the release of this block (eg. using firewall rules to temporarily disconnect relays)
Wait until another block by another pool is produced (this pool is more likely to be bigger and thus have a larger VRF value).
When you see this other pool’s block appear, immediately reconnect your relays and allow your delayed block to be propagated.
Currently the smaller pool is likely to win on the VRF comparison and therefore the larger pool will get an orphaned block.
Smaller pool benefits because proportionally it will earn more rewards per block produced.
Iterate on this approach by making many small pools all with the same agenda but now cross-check leaderlogs across your army of small pools to ensure you won’t conflict with any of your army of pools blocks.
Duncan Coutts’ proposed change needs to be adopted otherwise this design flaw can be gamed as I described.