The current vote on mainnet is generating a lot of debate over how to improve decentralisation as well as the role of stake pool operators and what they should be paid for the service they provide. This has got me thinking about:
- What is the ideal number of pools?
- What is the ideal proportion of total stake for each pool?
- What is the ideal total cost to run all of the pools?
The real goals
People often talk about decentralisation as if it is the goal itself. However, decentralisation is a means of achieving the real features we want in our financial operating system:
- reliability
- availability
- immutability
- equitable access
- fair competition for resources
- lack of censorship
- transparency
- …
Cost of reliability and availability
Many of the discussions about the costs of running a stake pool get to the point where it is argued that you get what you pay for. The contention is that for Cardano to be a highly reliable network, it needs to pay for highly reliable stake pools. But is this true? How has this trade-off between cost and reliability been resolved elsewhere?
It seems that everywhere you look cheaper parts are used redundantly for cost effective reliability: RAID (Redundant Array of Inexpensive Disks) has proven to be both cheaper and more reliable than SLED (Single Large Expensive Disk). Multi-core processors run cooler and more reliably than single core processors which need to be run faster to hit the same benchmarks. The internet is built with redundant network pathways for increased throughput and reliability.
The Ouroboros protocol has made the same design choice. It expects pools to suffer network disruption and that their stake can shift over time. In fact, Ouroboros has taken redundancy to an even higher level, because it assumes that even the human masters of each pool can sometimes fail to honestly follow the protocol. Ouroboros mitigates these problems by having many independent pools provide the block creation service, so when some go off-line or act dishonestly, the protocol continues to function correctly.
How reliable does each pool need to be and how much should we pay for them?
Whenever there is a want for more reliability there are always costs and trade-offs.
If we use the example of a modern home internet connected PC. It would seem that the least reliable part of the stack is the internet connection. Modern PCs are actually very reliable when running Linux. Here is the uptime output for my own PC:
$ uptime
18:18:01 up 78 days, 2:57, 1 user, load average: 0.98, 0.99, 0.99
78 days of continuous operation. These machines almost never fail and only go off-line when deliberately restarted. Note also that this computer is connected to the house 240V supply, so this supply has also been 100% reliable for all this time.
So what about the internet connection? If your internet connection has 99% uptime then this means that it can be down for 7hours 18minutes and 17.5seconds per month. That seems like quite a lot of time. I tried to find some “official” statistics for my home fibre internet connection but couldn’t. I do receive occasional emails, around 3 or 4 times a year, where my provider warns that my connection could be interrupted for up to 1 hr caused by network maintenance or upgrade. Usually the actual downtime turns out to be much less than 1hr. No doubt there are many short periods of disconnection which I don’t even notice. Some of these disconnections may be isolated to my individual provider and some may involve major connection pathways to large under-sea cables. I do know it is possible to pay for an “enterprise level” service and this may result in less disconnectedness with my provider but might do little to mitigate problems with major under-sea cables. In any case, it is clear that my internet connection is likely to be the cause of most of my unreliability. Nevertheless, I suspect I get more reliability than 99% because I don’t believe my home fibre internet service has more downtime than 7hours 18minutes and 17.5seconds per month.
If we assume that my home fibre internet connection is 99% reliable, then how much is it worth to pay for more than this? Of course the answer to that question is unique to me and will depend upon what I use my internet connection for and how important that is to me. So, let’s think about this from the point of view of running a stake pool and from the point of view of a delegator paying for this service. Let’s assume the following:
- The operator is a true believer in Cardano and has amassed 500,000 Ada pledge (Nb. Pledge amount has little effect on the calcs.)
- The pool has total stake of 34Million Ada (Half saturated at current k=500, but saturated at k=1000)
Such a pool would currently earn 1,185,835 per year in staking rewards, if it was 100% reliable. You can check with this awesome reward calculator.
If such a pool is run over a standard home fibre internet connection then it might fail to produce 1% of it’s blocks. That is a loss of 11,585 Ada in total rewards which at the current price (US$0.37/Ada) is US$4,388 per year. If the operator accepts this 99% reliability and sets his variable fee at 0%, with minPoolCost lowest permitted (340 Ada), then his delegators will earn 3.41% yield. On the other hand, if the operator spends some money to increase his reliability beyond 99% and charges more than 1% variable fee as compensation, then his delegators will be worse off.
From the delegators point of view, any increase in fees to improve reliability needs to be cost effective. For example, delegators might approve of the operator adding a backup 4/5G mobile network if this increased reliability from 99% to 99.3% so long as the variable fee wasn’t increased beyond 0.3%. So this upgrade had better cost the operator less than US$1,316/yr (0.3 * 4,388), otherwise the operator will be worse off by spending more than the extra he earns. Though, obviously improved reliability figures are hard to guesstimate and with this particular example it might be quite common that if the fibre connection goes down, the mobile 4/5G network is simultaneously down.
Moreover, increasing fees for improved reliability gets harder to justify if the base level of reliability is higher than 99%. It gets exponentially more difficult to increase reliability the closer you get to 100% (which is actually impossible to achieve).
Furthermore, if the pool is smaller, less saturated than 34Million stake, then a 1% change in reliability equates to a smaller amount of rewards. This means that the operator can afford to spend less on attempting to improve reliability. This makes it much more likely that the best option is to accept the easily obtained 99% reliability and not levy extra fees on delegators. For delegators, it would be economically smarter, and more reliable, for them to spread their delegation risk across many similar small pools.
Other services provided by stake pools
It is true that some operators provide services other than simply running Cardano stake pools. This will be even more the case in the future when pool operators start running side chains, layer 2 roll-ups, Hydra heads, or even provide market making services. But such services are not what delegators are paying for today. When these services do eventuate, there will be additional incentives for operators to opt in and provide such services.
What is the ideal size and number of stake pools?
It may be possible to have too many pools and for pools to be too small. One concern is that too many pools will increase block propagation times. However, the efficiency of the P2P mechanism probably mitigates this risk because it seems possible that network delays might scale quasi-logarithmically with the number of nodes. Furthermore, the Cardano network is much bigger than just it’s stake pools and is growing every day. There are people running random relays and full node wallets and any stake pool might be using some of these as their P2P hot peers.
Another concern would be that operators of tiny pools may become disillusioned if their block producer is inactive and they are unsure if everything is working properly. But how small is too small, and how long is too long to wait between blocks, would be something for the individual to decide.
Summary
We need to view the costs of running stake pools through the eyes of delegators because these are the people paying for the service. Ordinary home fibre internet is already very reliable, it may not be worth it to pay for better than this. It may be more cost effective to instead achieve greater reliability by decentralising further and having many small pools which are all independently owned. Such widespread independent ownership will also better deliver the other goals of censorship resistance, immutability, transparency, and equitable access.
I believe the total cost of running Cardano’s stake pools will be lower, and the service more reliable, by having lots of cheap 99% reliable small pools rather than a few 99.999% reliable large or multi-pools. If others agree, then we should remove the barriers for small pools to compete. We should remove the compulsory minPoolCost regulation and decentralise more.