Reduce frequency of leader slots to reduce network centralisation

Terminada · 8 December 2022 22:04

This is the idea for a CIP that I have partially written here. Please read and comment.

I want to credit @Michael.Liesenfelt for coming up with the solution for this problem here

Motivation

The Ouroboros proof-of-stake protocol seeks to balance access to it’s resources and benefits in proportion to each participant’s stake in the system. Indeed, Cardano’s security model depends on its stake being widely distributed and that control over the protocol is always proportional to stake weighting.

Cardano’s stake pools produce the blocks that form the chain representing it’s decentralised ledger. When a stake pool is selected as the slot leader to make a block, it’s block needs to be propagated across the internet to the other nodes. The current Ouroboros implementation can produce consecutive slot leaders which are closer together in time than acceptable network delay constraints. This means that it is not uncommon for the next slot leader to have not received the previous block when it becomes time to make it’s own block. When this happens, the next slot leader makes its block upon the same point in the chain as the previous block, causing a fork to occur. In Ouroboros, such forks are resolved by nodes preferring the block with the lower VRF (verified random function) value. The output of this VRF is deterministic and independent of the block’s contents, so it cannot be manipulated by block producers. The effect of this is that when a fork occurs, each pool has an equal chance of winning the “fork battle” for it’s block to be adopted. The pool that loses the contest will have it’s block dropped so that it’s block will never form part of the canonical chain.

The problem is that more decentralised pools suffer greater network delays and so they disproportionately experience more “fork battles” involving their blocks. This is because the vast majority of the network is located across Europe and USA or with fast fibre links to these areas. Only a minority of pools by stake weighting are located in remote areas that suffer greater network delays. This means that if a pool located near the majority gets awarded a consecutive slot with another pool, this will be unlikely to result in a fork battle because the other pool is most likely closely located. On the other hand, if a physically remote pool is awarded a consecutive slot with another pool then this will almost always result in a fork battle because it is extremely unlikely that the other pool is close.

To be clear, this means that more decentralised pools are subjected to disproportionately more fork battles. This represents a diversion from stake weighted participation in the current Ouroboros implementation for the following reasons:

More decentralised pools will get less of their blocks adopted to the chain than they should in proportion to their stake. This means they get less control over access to the Cardano ledger than is proportional to their stake.
More decentralised pools will earn a lower rate of rewards.

Quantification of the problem

The current Ouroboros implementation has 1 second slots with 5% chance that any particular slot will have an elected leader permitted to make a block. Consequently a new block is produced on average every 20 seconds. This results in the possibility of two leaders being elected for the same slot at approximately 5%. Similarly, the chance that a leader is elected for the next slot or the previous is likewise approximately 5% each.

Now consider a pool that is physically located near the centre of the majority in Europe or USA. It’s average propagation delays are going to be less than half a second. This means that the only “fork battles” it will face will be with those pools elected slot leader for the exact same slot. It will only very rarely face a fork battle due to transmission delay when one of the few decentralised remote pools is elected leader consecutively with it. So, this centralised pool will face slot battles for only around 5% of its blocks and so will have around 2.5% of its blocks dropped.

But, consider the more decentralised physically remote pool that has average propagation delays of just 1 or 2 seconds. This pool will face fork battles not only with leaders for the same slot, but also with leaders for the previous 1 or 2 slots, and the next 1 or 2 slots. Therefore, a decentralised pool with 1 second delays will face fork battles over 15% of its blocks. Whereas an even more decentralised pool suffering 2 second delays will face fork battles over 25% of its blocks. Given that half fork battles will be lost, this will result in these decentralised pools losing 7.5% and 12.5% of blocks respectively.

These are quite large numbers and are a significant force driving pools to be located close to the centre of the network. Eg: In a major data centre in Europe or USA.

This wrinkle in the current implementation goes against the core philosophy of Cardano to create a diverse, decentralised, physically dispersed, and therefore more resilient, financial operating system for the future. Rather it creates the opposite incentive to locate all stake pools close to the centre of the network in order to minimise dropped blocks due to forks.

Proposed solution

The current Ouroboros implementation determines the next block producer by having each stake pool do a leadership VRF calculation every slot (1 second) to see if it is a leader. This VRF leadership function is calibrated to produce a leader on average every 20 seconds but it is possible for any particular slot to have a leader.

Instead, the protocol could be changed so that the leadership calculation is done only every 20th slot (20 seconds) and the calibration of the leadership VRF function modified so that it produces more valid leaders every time. This will result in a slot battle each time but these are simple, deterministic, and most importantly fair, to resolve by nodes preferring the block with the lowest VRF score.

Such a change could provide the following advantages:

The number of fork battles faced by pools would be identical whether the pool was located in Europe, USA, or somewhere very remote. IE: Control over and access to the protocol, as well as rewards received, is proportional to stake irrespective of where the pool is physically located.
Less multi-block forks since there would always be 20 seconds between blocks for all nodes to agree on the canonical chain.
Reduced number of leadership checks for block producer nodes. This could allow the strict software timing requirements on the performance of the VRF leadership check to be relaxed so that “missed slot leader checks” are no longer a problem for stake pool operators. Such “missed slot leader checks” can happen if the Haskell garbage collector is running at the time a VRF leadership calculation is required.
Nodes could save some bandwidth because they wouldn’t need to transmit the entire block but only the header containing the VRF value. Once they work out which header has the lowest VRF then they can download the winning block body.
Block timing would be predictable. (Though this might have unknown security implications where random block production is an advantage?)
Blocks would be more evenly filled with transactions. At present the filling of blocks depends on how long ago the last block occurred. Sometimes when a block is produced only 1 second after the previous block it will contain zero transactions.
There would be back-up block producers since each 20 second battle would have several potential contestants so it won’t matter if one fails to produce its block.
This might provide some protection against MEV (“Maximum Extractable Value”) because individual pools would not know ahead of time if they will be the winner of the pending “slot battle” they will only know that they have a ticket to the slot battle. Though they could predict their odds of winning based on their calculated VRF value.
The same high level design could be used for the Ranking Blocks (RB) in the implementation of Input Endorsers.

Alternatives

The Ouroboros protocol has been designed around an imposed upper limit for network propagation delay of 5 seconds to guarantee the protocol’s safety. Instead of doing the leadership check every 20 slots, it could be done every 5. This would still allow ample time for blocks to be propagated from and to the very remote decentralised nodes. And, the leadership VRF function could still be calibrated to produce a leader on average every 20 seconds. This would result in less slot battles than the proposed solution and less “back-up” block producers. It would also introduce some randomness to individual block timing which might have some value in terms of security.

Terminada · 8 December 2022 22:13

Tagging @COSDpool for advice on managing CIPs.

@intutionismRus this might also provide a partial solution identified in this other thread

Terminada · 8 December 2022 22:28

@JaredCorduan

Michael.Liesenfelt · 9 December 2022 03:21

As reference:

COSDpool · 9 December 2022 09:09

sure I will keep an eye on the presentation, and if necessary can help format it for submission & review - preferably after there is more commentary here on the ideas themselves

brouwerQ · 9 December 2022 10:44

How did you calculate this? This would mean that 25% of blocks are within 2 seconds from each other, which isn’t the case I think?

I don’t think slot time should be 20 seconds. I do think that is should be 5 seconds or so, because that metric is used all the time as the ‘maximum allowed propagation time’. I don’t think nodes should be in a place where such a max propagation can’t be achieved.

Terminada · 9 December 2022 12:56

This is where I need to “phone a friend”. I need help from a maths whiz please.

I need someone to confirm if this is the correct way to do the calculation:

There is a 95% chance that no other pool will get the same slot as me.
Similarly there is a 95% chance that no pool will get the previous slot and a 95% chance that no pool with get the following slot.
Therefore there is a 0.95 * 0.95 * 0.95 chance that no other pool will get one of those three slots == 0.8574.
Therefore there is a 14.26% chance that at least one other pool will get one of the three slots.

So, not 15% chance but rather 14.26%.

That seems about right to me based on my personal experience. For example, over the last 4 epochs my pool has been slot leader for 11 blocks in total. Of these, 3 blocks resulted in forks because there was another pool awarded the slot either before or after my slot. So 3 out of 11 (27.3%) which is nearly double the calculated 14.26%. (All 3 were actually adjacent slots. None were the same slot.) And to make matters worse, my pool lost all 3 of those fork battles. So, I guess I have been extra unlucky recently.

The numbers are even worse if your pool suffers from 2 second network delays. 0.95^5 == 0.7738. So for such a pool the chance of a fork would be 22.62%.

That is certainly unfair and significantly higher than the 5% chance you will get if you house your block producer in a US data centre owned by Amazon or Google.

JaredCorduan · 9 December 2022 12:59

I am not the best person to review this. Someone from consensus, or one of the Ouroboros authors would be much better suited.

brouwerQ · 9 December 2022 13:16

You can argue that the locations where the pools are hosted don’t need to be evenly distributed geographically to achieve decentralisation and that it’s more important who own those pools.

Terminada · 9 December 2022 14:13

I disagree. It would certainly not be a very resilient network if Google or Amazon decided to shut off services for those running Cardano stake pools in their European and US warehouses. That is the very definition of a centralised network that is not anti-fragile.

Sure, you might say that the pools can then be moved to somewhere else. But then you are admitting that Cardano only really possesses “the threat to become decentralised” if forced to do so. And will suffer major network disruption whilst it moves all its centralised stake pools to more decentralised locations.

Also, I am not saying Cardano pools needs to, or even should be, “evenly” distributed geographically. What I would argue though is that it should be as geographically decentralised as its stake holders are. In other words, if 5% of total stake is held by Aussies, and is staked with pools located in Australia, then 5% of the blocks adopted on chain should have been created by these Aussie stake pools. That is certainly how the protocol is supposed to work. Participation is supposed to be weighted by stake. The security of Cardano is based upon that premise.

I am a bit sick of Aussies always getting the rough end of the pineapple just because they are on the other side of the world. It is possible to make a very decent argument for why housing more pools in Australia will actually make Cardano more anti-fragile.

brouwerQ · 9 December 2022 14:31

I didn’t say running all pools in the same few data centers…

Terminada · 9 December 2022 15:33

Please see the updated draft CIP at:

COSDpool · 11 December 2022 14:15

just a note on the current draft at this location: the sections

Acceptance Criteria
Implementation Plan

… are subheadings of Path to Active section: so they can (and should) be demoted to H3 (###).

Terminada · 12 December 2022 11:06

Thanks. I much appreciate your advice.

I was trying to see if there is a specified set of headings that should be included in a CIP. However, when I looked at some CIPs they all seemed to use different headings.

So, is it OK if you make up your own headings? Or will that be frowned upon?

By the way, I have decided that I am going to completely rewrite it with significant changes because I think I need to make it more “to the point” and technically oriented. I have spent the last several days trying to read and understand the Ouroboros papers. Several times. But, I think I need to go to university for many years and get a PHD in Mathematics first.

COSDpool · 12 December 2022 16:44

We’re trying to keep to standard headings because it makes it easier to scrape information out of the CIPs for other types of publishing & indexing.

cips.cardano.org was doing this for a while (suspended now; I’ve enquired about when, why & how long, with no response yet) but there are plans to do it better by another site in the works to replace it and/or the site building method. I think the more consistency we have between outlines of CIPs, the better the results of this new site or sites will be.

The standard headings also make it faster & more effective to do on-the-spot reviews like we have in our bi-weekly CIP review meetings. If editors & reviewers can find the information they need in 30 seconds rather than 2 minutes, this makes a big difference in the quality of the discussion.

Some of the older CIPs are less likely to follow standard formats & we’re actually working on reviewing these now… mainly for consistency with a new CIP header… and when we find a document with a non-standard outline we’ve been trying to standardise it (as time & reason permit) Update older CIPs for revised CIP-0001 · Issue #389 · cardano-foundation/CIPs · GitHub

Topic		Replies	Views
Understanding slot battles Education	1	757	5 August 2023
Orphaned block, three way Height Battle Operate a Stake Pool	4	424	30 October 2022
How does Shelley mainet handle "nothing at stake" and "adversarial fork" problems? Misc Dev Talk	1	484	30 May 2021
Ouroboros MPC Stake Delegation	3	561	25 May 2021
Problems with using VRF to decide fork battles General Discussions	8	541	2 October 2023

Reduce frequency of leader slots to reduce network centralisation

Motivation

Quantification of the problem

Proposed solution

Alternatives

Related topics