The previous post was just the introduction. Let’s get into some more details:
Delimitation
We are not talking about stolen blocks here. Stolen blocks are part of the Cardano/Ouroboros Design and are constantly happening. There is nothing to be concerned about as they are not increasing over time.
What is a ghosted block?
A ghosted block is a block that was minted but rejected by another node. There are multiple potential reasons for this. Here we are purely focussing on one specific scenario.
Scenario
Example - Expected Sequence:
Each pool that forges a block is appending it to the chain, so it adds it to the previous block. In this example, everything is working as expected.
Slot | Forged Block | Previous block |
---|---|---|
45881000 | Block A (by Pool x) | some other |
… | ||
45881004 | Block B (by Pool y) | Block A |
45881005 | ||
45881006 | Block C (by Pool z) | Block B |
Example - Ghosted block
Now let’s assume that in the above scenario the propagation (time until other pools know about it) is >2s. Pool z would not know about Block B when minting Block C. Both of those blocks are getting the same block number. Only one block number (in this Case Block B) can survive. Block C would be reverted because there is already a block with that same number which was minted in an earlier slot.
Note: This is the most commonly observed behavior. There are other scenarios that we are not considering in this article.
Slot | Forged Block | Previous block | Result | Explanation |
---|---|---|---|---|
45881000 | Block A (by Pool x) | some other | adopted | |
… | ||||
45881004 | Block B (by Pool y) | Block A | adopted | This block is propagating slowly (>2s) |
45881005 | ||||
45881006 | Block C (by Pool z) | Block A | ghosted | Pool z does not know about Block B at this point in time and therefore used Block A as the previous. |
As this example shows, a bad propagation time of Block B (by Pool y) is causing Block C (by Pool z) to be ghosted.
Impact
One block less is forged. No Rewards are generated for the delegators of Pool z for Block C while the operator of Pool z cannot do anything about propagation delays from Pool y.
For this reason, this issue requires education and a common effort to optimize propagation delays.
Ghosted blocks decrease network density. Currently, we expect ~1.3% of blocks to be ghosted. This number would increase drastically if the density would be increased for scalability reasons.
Reasons and Mitigations
Reason | Mitigations | |
---|---|---|
Bad Propagation Time | Too Few In Connections on own relays causing latencies through more hops to propagate across world | Validate Prop Delays; Multiple Relays; Geo Distributed Relays; Topology updater; In Connections |
Delayed Forge | Low CPU / Blocked CPU. If the BP’s CPU is blocked at the minting time the forge is delayed and introduce latency | Validate forge timing; Analyze missed slots; Analyze slot timing |
Bad Timing | System Clock of BP is not running in sync and the block is minted at a wrong timing (too early or late) | Use Chrony |
SPO Check List to avoid high Propagation Delays
Time Synchronization
To avoid that your blocks are minted at a wrong time
- Install and configure Chrony (Stake Pool (Server) Time Synchronisation with Chrony - YouTube)
Improve Topology
To reach the global distribution of your block as fast as possible it’s important that the number of hops required to reach all pools is as low as possible. To achieve this the following aspects can help
- Run multiple geo-distributed Relays
- Have 20+ In connection on each Relay - You ask pools you know to add you as peers manually in their topology. Having some custom connections is anyways good practice for a scenario where topology updater would be out of service you would lose IN Connections and therefore lose the chance to propagate your blocks.
NOTE: Block propagation is based on a pull mechanism. So IN Connections mean. Some other pools are fetching blocks from your pool.
Node Configuration
To avoid delays in forging your block
- Minimum number of missed slots during epoch (excluding epoch transition). Preferably no missed slots
- Avoid enabling TraceMemPool setting, as you don’t actually need it to operate your pool. Enabling this increases CPU load.
- (No more missed slots (during epoch) after changing one setting)
Validation
- No missed slots during epoch (excluding epoch transition)
- Propagation delays during Epoch < 1s
- Propagation delays during rewards calculation (48h + 24h) < 2s
If you are not able to reach those validation aspects you may consider improving your hardware and/or infrastructure. Highly overleveraged VPS providers will cause missed slots (when the CPU is blocked from someone else). For bad propagation times, improved CPU performance helps to avoid latency until the block is available to get synchronized.