Ghosted Blocks - How to optimize Propagation Times - Action Required

The previous post was just the introduction. Let’s get into some more details:

Delimitation
We are not talking about stolen blocks here. Stolen blocks are part of the Cardano/Ouroboros Design and are constantly happening. There is nothing to be concerned about as they are not increasing over time.

What is a ghosted block?
A ghosted block is a block that was minted but rejected by another node. There are multiple potential reasons for this. Here we are purely focussing on one specific scenario.

Scenario

Example - Expected Sequence:

Each pool that forges a block is appending it to the chain, so it adds it to the previous block. In this example, everything is working as expected.

Slot Forged Block Previous block
45881000 Block A (by Pool x) some other
45881004 Block B (by Pool y) Block A
45881005
45881006 Block C (by Pool z) Block B

Example - Ghosted block

Now let’s assume that in the above scenario the propagation (time until other pools know about it) is >2s. Pool z would not know about Block B when minting Block C. Both of those blocks are getting the same block number. Only one block number (in this Case Block B) can survive. Block C would be reverted because there is already a block with that same number which was minted in an earlier slot.
Note: This is the most commonly observed behavior. There are other scenarios that we are not considering in this article.

Slot Forged Block Previous block Result Explanation
45881000 Block A (by Pool x) some other adopted
45881004 Block B (by Pool y) Block A adopted This block is propagating slowly (>2s)
45881005
45881006 Block C (by Pool z) Block A ghosted Pool z does not know about Block B at this point in time and therefore used Block A as the previous.

As this example shows, a bad propagation time of Block B (by Pool y) is causing Block C (by Pool z) to be ghosted.

Impact
One block less is forged. No Rewards are generated for the delegators of Pool z for Block C while the operator of Pool z cannot do anything about propagation delays from Pool y.

For this reason, this issue requires education and a common effort to optimize propagation delays.

Ghosted blocks decrease network density. Currently, we expect ~1.3% of blocks to be ghosted. This number would increase drastically if the density would be increased for scalability reasons.

Reasons and Mitigations

Reason Mitigations
Bad Propagation Time Too Few In Connections on own relays causing latencies through more hops to propagate across world Validate Prop Delays; Multiple Relays; Geo Distributed Relays; Topology updater; In Connections
Delayed Forge Low CPU / Blocked CPU. If the BP’s CPU is blocked at the minting time the forge is delayed and introduce latency Validate forge timing; Analyze missed slots; Analyze slot timing
Bad Timing System Clock of BP is not running in sync and the block is minted at a wrong timing (too early or late) Use Chrony

SPO Check List to avoid high Propagation Delays
Time Synchronization

To avoid that your blocks are minted at a wrong time

Improve Topology

To reach the global distribution of your block as fast as possible it’s important that the number of hops required to reach all pools is as low as possible. To achieve this the following aspects can help

  • Run multiple geo-distributed Relays
  • Have 20+ In connection on each Relay - You ask pools you know to add you as peers manually in their topology. Having some custom connections is anyways good practice for a scenario where topology updater would be out of service you would lose IN Connections and therefore lose the chance to propagate your blocks.
    NOTE: Block propagation is based on a pull mechanism. So IN Connections mean. Some other pools are fetching blocks from your pool.

Node Configuration

To avoid delays in forging your block

Validation

  • No missed slots during epoch (excluding epoch transition)
  • Propagation delays during Epoch < 1s
  • Propagation delays during rewards calculation (48h + 24h) < 2s

If you are not able to reach those validation aspects you may consider improving your hardware and/or infrastructure. Highly overleveraged VPS providers will cause missed slots (when the CPU is blocked from someone else). For bad propagation times, improved CPU performance helps to avoid latency until the block is available to get synchronized.

6 Likes