Relays out of sync: HeaderProtocolError > ChainTransitionError > VRFLeaderValueTooBig

COSDpool · 19 August 2020 08:34

Our relays lost sync after the epoch boundary & are currently flooded with these error messages, which as far as I can tell are for every node in our topology, at 10 second intervals:

[relay-ny:cardano.node.DnsSubscription:Error:53946]
[2020-08-19 08:15:50.04 UTC]
[String "Application Exception: 54.201.36.182:6000 HeaderError
  (At (Block {blockPointSlot = SlotNo {unSlotNo = 6221024}, blockPointHash = ...}))
  (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr
    {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFLeaderValueTooBig
      (OutputVRF {getOutputVRFBytes = \"..."}) (3334337931 % 4383606254572)
      (ActiveSlotCoeff {unActiveSlotVal = UnsafeUnitInterval (1 % 20), unActiveSlotLog = ...})
    )]}))))
  (Tip (SlotNo {unSlotNo = 6230299}) ... (BlockNo {unBlockNo = 4576507}))
  (Tip (SlotNo {unSlotNo = 6258650}) ... (BlockNo {unBlockNo = 4578327}))",
 String "SubscriptionTrace",
 String "\"node1.stakelove.com\""]

I can’t debug anything like this so I hope one of the other SPOs can tie this to an observed solution, or a developer can make a recommendation about the software.

The relays have only a few “new tip” messages since the boundary, with the last one coming about 8 hours ago. We’re restarted both relays including a reboot of the second one in case it was a network layer problem.

We did upgrade briefly to 1.18.1 but rolled back a couple days before the boundary. From what the developers were saying, I thought we were advised to stay back on 1.18.0 because of potential issues like this. If our relays are not syncing then we can’t wait for the newer new version announced for next week… are there any recommendations?

COSDpool · 19 August 2020 09:04

Just popped the question into Telegram & no response there after 20 minutes: https://t.me/CardanoStakePoolWorkgroup/332907

ADAfrog · 19 August 2020 09:14

Hi Robert,

It looks like the chain is corrupted from the edge case associated with the use of 1.18.1. Please reference your error vs the original github issue here:

github.com/input-output-hk/cardano-node

[BUG] - cardano-node 1.18.1 failed on MC4

opened 03:02PM - 12 Aug 20 UTC

closed 05:53PM - 14 Aug 20 UTC

onyxstakepool

bug priority high

*External* otherwise. **Summary** Cardano node failed after upgrade to 1.1…8.1 on MC4. ``` cardano-node: reapplyLedgerBlock At (Block {blockPointSlot = SlotNo {unSlotNo = 44373}, blockPointHash = b189bf576c5f61774aeca20a9fe8402403d3ebb4323ff3f7a274a095167c4afb}): HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFLeaderValueTooBig (OutputVRF {getOutputVRFBytes = "\ENQ\203\161A\199JR\SI\203\197\228B\156\209\226\168\238\ng\154\246x9\t*\222Z<+\b\211\204\198\\*\DC3\229h'\rF\191 \177E}\216\214B\a\v\238\155\&95\204\&6\220)\166w\181\185\199"}) (0 % 1) (ActiveSlotCoeff {unActiveSlotVal = UnsafeUnitInterval (1 % 20), unActiveSlotLog = -512932943875505334261962382072846}))]}))) peers: 0 │ CallStack (from HasCallStack): │ error, called at src/Ouroboros/Consensus/Ledger/Extended.hs:200:19 in ouroboros-consensus-0.1.0.0-inplace:Ouroboros.Consensus.Ledger.Extended │ reapplyLedgerBlock, called at src/Ouroboros/Consensus/Ledger/Abstract.hs:95:7 in ouroboros-consensus-0.1.0.0-inplace:Ouroboros.Consensus.Ledger.Abstract ``` **Steps to reproduce** Steps to reproduce the behavior: 1. Start cardano-node 1.18.1 as block producer on mainnet-candidate-4 network **Expected behavior** Normal operation of the block producing node. **System info (please complete the following information):** - OS: Ubuntu - Version 20.04 - Node version ``` cardano-cli 1.18.1 - linux-x86_64 - ghc-8.6 git rev a4b6dae699fa21dc3c025c8a83d1718475cb3afc cardano-node 1.18.1 - linux-x86_64 - ghc-8.6 git rev a4b6dae699fa21dc3c025c8a83d1718475cb3afc ``` **Additional context** It could be a coincidence or different bug: After the incident above the stake pools in Daedalus 1.6.0-STN5 are again ordered by pool hash instead of performance. Did cardano-node 1.18.1 introduce a blockchain corruption?

I suggest stopping your relays and wiping the database:

cd
cd cardano-node
rm -R db

restart the relay and bring it back up to tip

FROG

COSDpool · 19 August 2020 09:46

thanks @ADAfrog, I thought it would come to that since we did accumulate about 24 hours worth of DB data under 1.18.1. I’ll reference to Github as soon as we’re rebuilding & wil post any other observations here

COSDpool · 19 August 2020 10:09

p.s. based on the comments in the issue above (now closed) and the related issue I’m also rebuilding DB on our core node, since I cannot be sure that the ledger corruption in the relays hasn’t been passed on to our core (it was running 1.18.1 during the same period our relays were).

I can see @ADAfrog why you don’t want to be the first one to upgrade to new node releases. I can rebuild our core now with impunity since we’re not eligible for block election till next epoch, but otherwise this would have been a real dilemma.

COSDpool · 19 August 2020 15:56

For anyone that finds themselves in the same situation as we did, the word on Github is that deleting the (corrupted) ledger folder is enough, which only has to rebuild the ledger back from genesis, without re-syncing the entire chain (which is what is taking forever for us today).

ADAfrog · 19 August 2020 21:40

IMPORTANT UPDATE -

It looks like using 1.18.1 in the epoch with d=1 will not cause future problems - but if you have ever used a node on 1.18.1 with d<1, you should absolutely rebuild your database from scratch.

danielimkk · 3 September 2020 17:36

Hey ADAfrog! Can you please explain what d=1 means?

ADAfrog · 3 September 2020 17:57

Hi Daniel,

“d” is the decentralization parameter, and will be decreased until community stake pools are minting 100% of blocks.

d = 1 implies 100% of all blocks are minted by federated nodes, and 0% minted by community stake pool operators.

d is currently set at 0.74 I believe, meaning 26% of the blocks are minted by community stake pool operators.

I hope this helps.

Your friend, FROG

Topic		Replies	Views
Relay stuck, error: cardano.node.ErrorPolicy Operate a Stake Pool	9	3099	26 April 2022
Relay node stuck. slotno 43372972 Operate a Stake Pool	3	371	24 October 2021
System time sync issue, causing relay wreck Operate a Stake Pool stake-pools , stakepool , pool , pool-operator	5	1402	30 November 2020
Error ErrorPolicyUnhandledApplicationException singleEraName = "Babbage" Operate a Stake Pool	2	559	24 September 2022
Relay not synching Operate a Stake Pool	5	355	19 March 2023

Relays out of sync: HeaderProtocolError > ChainTransitionError > VRFLeaderValueTooBig

Related topics