Increased number of Orphaned blocks

Hi Guys,
Over the past 3 epochs one of the pools I am running seems to have had a significant spike in the number of orphaned blocks. I understand slot battles are out of my control, but the increase in height battles is concerning.
image

Is there something obvious I am missing, why do other large pools have no where near the same amount of orphans?

I think you will find they do have the same amount of orphans. Everyone is seeing more.

The reason orphans are increasing is because block transmission has slowed due to the increase in block size and increase in block utilisation. Blocks get verified before they can be pulled by another relay, so more data to verify means it takes longer. I presume pipelining will help improve this.

Height battles are often only 1 block in height and result from slots being only 1 or 2 seconds apart.

Say your are the slot leader for slot 50000001 and another slot leader has slot 50000003 (2 seconds later). If your block is not received by the second slot leader before he makes his block then his block will have the same block number and contain many transactions in common with your block. These 2 blocks will be in conflict so only 1 can be accepted.

The consensus algorithm determines the winner by:

  • compare chain length
  • when (same slot number) then (compare if we produced the block ourselves) else equal
  • when (same block issuer) then (compare operational certificate issue number) else equal
  • compare (descending) leader VRF value

See dcoutts message at: Consensus should favor expected slot height to ward off delay attack · Issue #2913 · input-output-hk/ouroboros-network · GitHub

Consequently, if you have a large amount of stake to your pool then you are more likely to lose slot battles, and 1 block height battles, to smaller pools. This is because the smaller pool will likely have a smaller VRF value for its block.

1 Like

Thanks for the detailed response! Is there anything I can do as an operator to reduce these lost height battles? The average block propagation time of the pool is ~1.7s. Is it even possible to optimize topology to reduce to <1s?

There is nothing you can do personally. You can only ensure that your pool is well connected. However this only determines how quickly you pull blocks from the network. It doesn’t determine how many other relays connect into you and thus how well blocks are pulled from your relays.

Unfortunately it is completely out of your control how well other pools are connected and in particular how many relays connect in to pull blocks from these other relays. This determines how quickly their produced blocks propagate.

I was forced to understand why this was happening because I lost a 1 block height battle to a very small pool whose block was delayed by 30 seconds! I mean, come on… 30 seconds delay is a bit much. My block was minted 28 seconds after this small pool’s slot but unfortunately I didn’t see their block until 30 seconds (2 seconds after my block was minted). Thus my block was in conflict with their block. I lost the slot fork decision (1 block high) based on the VRF calculation. I run a small pool so every block is precious to me, but this other pool was even smaller.

I suspect what happens is this:

  • Small pool operators often leave their relays off-line or not updated since they may only win a couple of slots per year. But they keep checking their leader-logs to see if they win a slot for the next epoch.
  • When they are lucky enough to be awarded a slot, and just before they are due to mint their block, they start up their relays and block producer, mint the block, and then go off-line again.
  • This means that services like api.clio.one don’t register their relays as being on-line and consequently don’t provide their relays in topology.json files to other pool operators.
  • This in turn means that not many relays are connecting into their relays in order to pull blocks.
  • Resulting in the slow propagation of their blocks
  • Causing more orphaned blocks

Perversely, the larger, often better managed pools, get punished for the poor network connectivity of the smaller pool due to how the VRF value is calculated.

Pretty annoying isn’t it.

I can understand if the delay is only a few seconds - fair enough to advantage the smaller pool.

However a 30 second delay is ridiculous. This sort of massive delay should result in that pool’s block being dropped if there is a conflict with any other pool’s block I think.

1 Like

Install chrony in case u haven’t for a better sync with the network

Cheers,

Thanks for the great info! Each orphaned blocks hurts a little, and I really appreciate the explanation.

Just wondering what tool you are using to see this detail showing Slot vs Height battle data?
Thanks

Hi Alex
How do you navigate to the view in pooltool.io? If I find my own pool and look at Orphan blocks tab, then drill down on it, I am not seeing the extra details. Is the view specific to one pool or network-wide?

Thank you

because (maybe) you don’t have orphans blocks for 325… try to move back to 324, 323, etc till u will find one

Actually, WOOF pool has one in 325. I just don’t see the extra detail of slot battle vs height battle. For the block that was orphaned in 325, I can see it was lost to another pool called Everstake, but no other info.

Here, watch more minutes and u will see… or maybe you are right and hi’s using another application

That’s a pretty cool real-time view!

The real time view (recent blocks) has a search bar. You can input your orphaned block id there too see who won against you and obtain some comparative metrics (propagation delays, number of nodes reporting)

Thanks @orpheus-ant !

For WOOF pool’s most recent orphaned block, 6973600, here is what I’m seeing on pooltool. What insights can I glean from this? It appears WOOF propagation is faster, but I am not sure what the graph of red bars represents?

Another question, when there is a slot battle, does that mean each pool in the slot battle has the identical parameter value to make them show up as the slot leader? In other words, if I run the cncli-leaderlogs report, will there be other pools having the same slots as me?

slotBattle

slot battle - same slot was assigned for 2 pools (visible for both on leaderlog) and the pool with smallest VRF will win the slot/battle (this is what I understood about battles)
cheers,

2 Likes

The example you show is a typical slot battle. The blocks produced by both pools were for the exact same slot. The protocol decides the outcome for who wins using the VRF score. The lower score wins. EVRST’s block must have had a lower VRF score than WOOF’s block, so EVRST won. EVRST block was accepted by the nodes and WOOF block was orphaned.

This particular slot battle is interesting because the smaller pool was WOOF. Usually the smaller pool will have a lower VRF score for its block but not always.

As I understand (but please correct the technicalities): The VRF score is calculated using the pool’s secret key combined with the epoch nonce value, and the slot number. In order to determine slot leadership, this VRF calculation is done for every slot in the epoch and compared to the total controlled stake for the pool. If the VRF value for a slot is lower than the stake then the pool is a slot leader for that slot. It is quite possible that two different pools calculate a low enough VRF score for the same slot and are thus both slot leaders. This is how a slot battle occurs since both pools will create a block for that particular slot. The winner of this slot battle will be the pool who had the lower VRF score for that particular slot.

Because smaller pools need to calculate a lower VRF score to get under their total stake value in order to be a slot leader in the first place, it will be much more common that their VRF score will be lower than the bigger pool in any slot battle. Thus the smaller pool will usually win slot battles. But sometimes the larger pool will have a very low value through random chance that just happened to be significantly below its total controlled stake.

To put numbers on it: Say one small pool had to get a score below 50 to be a slot leader. It might calculate a VRF score of 45 for a slot and so it is a slot leader. Another larger pool might have to get a VRF score under 1000 because it has more controlled stake. This larger pool could still happen to calculate a VRF score of 5 to be a slot leader. In this case the larger pool would win on the VRF comparison. However, if you looked at all the slots awarded for the larger pool most of its slots would have VRF scores above 50 since the distribution of numbers between 0 to 1000 is mostly above 50.

So this is why smaller pools usually win slot battles.

2 Likes

Do you know what the chart showing a distribution of red bars means? I was thinking it might play into who wins the slot battle, but can’t figure out what it represents.

1 Like

The red bars represent the distribution from reporting nodes of the delays until they received the block. A lot of pools send this block delay information to pooltool.io. (One of the scripts used for sending this block delay info to pooltool.io is called “sendtip”.)

The block delay has no impact on the outcome of the slot battle.

1 Like

Awesome info. Thanks so much.

1 Like