Troubleshooting why my block got orphaned

I run a small pool that doesn’t get a lot of blocks so I check every block. My standard way of checking is to run my leadership logs every epoch and, following each scheduled slot, I check the chain data via pooltool.io to make sure my block got adopted.

I recently had a missed block and had to figure out why my block didn’t make it on-chain. After working through the details, I thought the information might be useful for others as an example:

Basically, if your leadership-schedule showed that your were supposed to produce a block then you can use this information to see what happened in your logs. The cardano-node process logs enough information for you to figure out what happened.

My block was for slot 75680041. These are my logs immediately starting at this time (19:58:52.00 UTC). I have trimmed the logs a bit for readability.

[bprod1:cardano.node.BlockFetchDecision:Info:174] [2022-10-31 19:58:52.00 UTC] [TraceLabelPeer (ConnectionId {localAddress = 192.168.27.9:2700, remoteAddress = 192.168.27.7:2700}) (Right [At (Block {blockPointSlot = SlotNo 75680040, blockPointHash = 37309a49ee1d77fa2005285e7ac3e12e69b4a5f79c082faa15ae14870ccc5628})]),TraceLabelPeer (ConnectionId {localAddress = 192.168.27.9:2>
[bprod1:cardano.node.LeadershipCheck:Info:172] [2022-10-31 19:58:52.00 UTC] {"chainDensity":5.006609e-2,"credentials":"Cardano","delegMapSize":1224897,"kind":"TraceStartLeadershipCheck","slot":75680041,"utxoSize":8878928}
[bprod1:cardano.node.Forge:Info:172] [2022-10-31 19:58:52.03 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeIsLeader"),("slot",Number 7.5680041e7)]))]
[bprod1:cardano.node.Forge:Info:172] [2022-10-31 19:58:52.08 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("block",String "7039443d09a1a7d0221e4b0d3c22a3937d291f6be1d2297cd8713086e85c38d5"),("blockNo",Number 7955544.0),("blockPrev",String "c995e6f6e0329b32d6f4950d384f585bcc9deab9ebe8285cd3363653bace5aca"),("kind",String "TraceForgedBlock"),("slot",>
[bprod1:cardano.node.ChainDB:Notice:162] [2022-10-31 19:58:52.10 UTC] Chain extended, new tip: 37309a49ee1d77fa2005285e7ac3e12e69b4a5f79c082faa15ae14870ccc5628 at slot 75680040
[bprod1:cardano.node.ChainDB:Info:162] [2022-10-31 19:58:52.11 UTC] Block fits onto some fork: 7039443d09a1a7d0221e4b0d3c22a3937d291f6be1d2297cd8713086e85c38d5 at slot 75680041

Line 1. shows that my node started fetching a block with hash “37309a49” at time 19:58:52.00. This was a block for slot 75680040.
Line 2. shows my node doing a leadership check at exactly the same time 19:58:52.00.
Line 3. shows my node completing its leadership check and determined that it was the leader at 19:58:52.03 and printed in the logs “TraceNodeIsLeader”. My node is a leader for slot 75680041.
Line 4. shows my node finished producing its block at 19:58:52.08 with hash “7039443d” (Block No:7955544).
Line 5. shows my node extending its tip at 19:58:52.10 with block hash “37309a49”. This is the block for slot 75680040.
Line 6. shows my node at 19:58:52.11 deciding that its own block with hash “7039443d” fits onto a fork. The message says: “Block fits onto some fork”.

So now we see what happened: My block producer did produce its block but this happened at the same time that it was receiving a block for the previous slot. My block was therefore a fork with the block from the previous slot.

I then did a query of the cncli database to look at the block and slot numbers for the previous and next blocks that were actually included in the chain:

sqlite3 /var/lib/cardano/cncli.db "select id, block_number, slot_number, pool_id, substr(hash,1,8), substr(prev_hash,1,8), substr(block_vrf_0,1,8), orphaned from chain where slot_number between 75679986 and 75680100;"

ID | Block_No | Slot_No | Pool_ID | Block_hash | Prev_hash | Block_VRF | Orphaned
3474922|7955543|75679986|2352f8c7d0b9f0bf5b342c135d5b25c910491deec1440bd9d27799a6|c995e6f6|fde4c3c6|9bba37a0|0
3474923|7955544|75680040|e455088c9164ae075f0300bc1a7234c30433556c6b88d4301a06cfe5|37309a49|c995e6f6|1aa0e765|0
3474924|7955545|75680083|db3ef28510cc87c6a2f2b83c50707c40d8024438c8a61f24ac16a33f|91c9ab39|37309a49|9e17fe88|0
3474925|7955546|75680087|a56e07cb5cd11a2d2b6bff3c6ecbc7ddc79205e2831560792f06e795|aaef816e|91c9ab39|080f88e4|0

You can see from this output that my block with hash “7039443d” (slot 75680041, block No: 7955544) is not on the chain. Also you will notice that the block for slot 75680040 had the same block No: of 7955544. So clearly my block producer didn’t receive this block in time before it produced its own block because it used the same block number.

Another thing to note is that the previous logs showed that my block was built upon a previous block with hash “c995e6f6”. Which from the SQL query above shows that this is the exact same block that block with hash “37309a49” was built upon.

So now we know what happened:

  1. Another stake pool produced a block for slot 75680040 (on top of previous block with hash “37309a49”) and gave it block number 7955544.
  2. My node produced its block correctly for slot 75680041 (on top of previous block with hash “37309a49”) and gave it the same block number of 7955544.
  3. My node finished receiving the block for slot 75680040 and accepted it as the new tip of the chain.
  4. My node then looked at its own just completed block, with the same block number, built upon the same previous block, and realised that it was a fork with the block it had just received.
  5. My node then decided the fork battle in the favour of the other pool.

The next question is why did I lose the “single block height” fork battle? Note that it is technically not a slot battle because the two blocks were for different slots. However, these single block height battles are decided the same way: based on lowest block VRF score wins. (It used to be based on lowest leader VRF score.)

From the SQL query results above you can see that the block for slot 75680040 had a “block VRF” value of “1aa0e765”. My block would have needed a value less than this to win. Basically, it would have needed a block VRF starting with a “0” which is a roughly 1 in 16 chance.

Unfortunately, the logs don’t show what my block’s VRF value was and it didn’t end up in the cncli database because my own block producer never accepted it as the preferred tip. So it never ended up in my block producer’s chain and it never got sent to any of my own relays.

In summary, my block producer did produce its block correctly but my block got orphaned because it caused a 1 block fork and lost the block VRF battle. So, I was just unlucky!

8 Likes

Thanks for the detailed explanation. Helps everyone.

1 Like