Relay node running good until stuck at one slot (invalid block)

ogre5000 · 8 March 2025 21:10

Hi
I’m running a cardano relay node in P2P running a long time without problems. Recently upgraded to 10.1.4, continued to work well until a few days ago when it started to be stuck at a particular block, with many errors of the following form:

** 2025-03-08T20:37:16.357204+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.InboundGovernor:Info:183]#033[0m [2025-03-08 19:37:16.35 UTC] TrMuxErrored (ConnectionId {localAddress = MYIP:3001, remoteAddress = 34.92.222.93:1338}) (InvalidBlock (At (Block {blockPointSlot = SlotNo 149690156, blockPointHash = 0c28ad6a7b445475d8c4961b0bdd8e7af41b94d25ee467a08f1b0fd02a5c30e7})) 0bc7ce20b6b5fc34a51300eadad64bb050db73b712632c4711be5d1cc9f511a4 (ValidationError (ExtValidationErrorLedger (HardForkLedgerErrorFromEra S (S (S (S (S (S (Z (WrapLedgerErr {unwrapLedgerErr = BBodyError (BlockTransitionError (LedgersFailure (LedgerFailure (ConwayWdrlNotDelegatedToDRep (KeyHash {unKeyHash = “0521608ad5f86f6856beeafda4d77b2371c1521c117ae5955a79af27”} ))) ))})))))))))))

2025-03-08T20:37:16.357387+01:00 lstcr5 cardano-node[555]: #033[34m[SERVER:cardano.node.InboundGovernor:Info:183]#033[0m [2025-03-08 19:37:16.35 UTC] TrInboundGovernorCounters (InboundGovernorCounters {coldPeersRemote = 1, idlePeersRemote = 5, warmPeersRemote = 2, hotPeersRemote = 1})

2025-03-08T20:37:16.357481+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.PeerSelectionCounters:Info:187]#033[0m [2025-03-08 19:37:16.35 UTC] PeerSelectionView {viewRootPeers = 60, viewKnownPeers = 45, viewAvailableToConnectPeers = 13, viewColdPeersPromotions = 4, viewEstablishedPeers = 9, viewWarmPeersDemotions = 0, viewWarmPeersPromotions = 0, viewActivePeers = 2, viewActivePeersDemotions = 0, viewKnownBigLedgerPeers = 15, viewAvailableToConnectBigLedgerPeers = 4, viewColdBigLedgerPeersPromotions = 4, viewEstablishedBigLedgerPeers = 0, viewWarmBigLedgerPeersDemotions = 0, viewWarmBigLedgerPeersPromotions = 0, viewActiveBigLedgerPeers = 0, viewActiveBigLedgerPeersDemotions = 0, viewKnownLocalRootPeers = 1, viewAvailableToConnectLocalRootPeers = 1, viewColdLocalRootPeersPromotions = 0, viewEstablishedLocalRootPeers = 1, viewWarmLocalRootPeersPromotions = 0, viewActiveLocalRootPeers = 1, viewActiveLocalRootPeersDemotions = 0, viewKnownNonRootPeers = 0, viewColdNonRootPeersPromotions = 0, viewEstablishedNonRootPeers = 0, viewWarmNonRootPeersDemotions = 0, viewWarmNonRootPeersPromotions = 0, viewActiveNonRootPeers = 0, viewActiveNonRootPeersDemotions = 0, viewKnownBootstrapPeers = 0, viewColdBootstrapPeersPromotions = 0, viewEstablishedBootstrapPeers = 0, viewWarmBootstrapPeersDemotions = 0, viewWarmBootstrapPeersPromotions = 0, viewActiveBootstrapPeers = 0, viewActiveBootstrapPeersDemotions = 0}

2025-03-08T20:37:16.357595+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.PeerSelection:Info:187]#033[0m [2025-03-08 19:37:16.35 UTC] TracePromoteWarmDone 2 2 95.217.220.29:3001

2025-03-08T20:37:16.368972+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.ConnectionManager:Info:6604]#033[0m [2025-03-08 19:37:16.36 UTC] TrConnectionHandler (ConnectionId {localAddress = MYIP:3001, remoteAddress = 34.92.222.93:1338}) (TrConnectionHandlerError OutboundError (InvalidBlock (At (Block {blockPointSlot = SlotNo 149690156, blockPointHash = 0c28ad6a7b445475d8c4961b0bdd8e7af41b94d25ee467a08f1b0fd02a5c30e7})) 0bc7ce20b6b5fc34a51300eadad64bb050db73b712632c4711be5d1cc9f511a4 (ValidationError (ExtValidationErrorLedger (HardForkLedgerErrorFromEra S (S (S (S (S (S (Z (WrapLedgerErr {unwrapLedgerErr = BBodyError (BlockTransitionError (LedgersFailure (LedgerFailure (ConwayWdrlNotDelegatedToDRep (KeyHash {unKeyHash = “0521608ad5f86f6856beeafda4d77b2371c1521c117ae5955a79af27”} ))) ))}))))))))))) ShutdownPeer)**

All of these errors occur at slot no 149690156. The command cardano-cli query tip --mainnet returns
{
** “block”: 11567820,**
** “epoch”: 544,**
** “era”: “Conway”,**
** “hash”: “aa71bda8b09756ef1535696c7e6ffcdc864d511547566700f4445945219e9271”,**
** “slot”: 149690099,**
** “slotInEpoch”: 45299,**
** “slotsToEpochEnd”: 386701,**
** “syncProgress”: “99.91”**
}

and it stays stuck there.

There are also a lot of other log entries (vs when the node was running fine) but i think they are from the P2P-connected nodes that are serving the invalid block to my node (the one which my node thinks it is invalid) and it always sets these nodes to cold after the invalid block message. So in gliveview, there are always some cold, warm and hot nodes but the numbers are changing every few seconds. Just hot stays always on 2. There are always around 5-10 incoming and some 30-40 outgoing peers.

My Cardano DB partition has 23 GB left (its not too much but should be sufficient for now), I have a 32 GB RAM setup with much better CPU than the minimum requirement, made sure my config files are exactly identical to Mainnet - The Cardano Operations Book to rule out config errors.

What would be the next steps to resolve this problem? Is it possible to go back with the DB to before this slot/block and try if it goes well next time? Or can I only delete the whole DB and restart from scratch (or a snapshot) and not only delete a part of it?

Thanks for your help!

s2cicf · 13 March 2025 17:25

Hi @ogre5000 I seem to be having the same problem. Did you find a resolution?

ogre5000 · 15 March 2025 08:58

@s2cicf I wasn’t able to resolve the real cause of this problem even after quite a bit of research on the internet, so here is what I did to resolve it with a workaround:

At the time of this happening, I had 2 relays (one online and one offline for maintenance) and 1 block producer, all running most recent mainnet releases 10.1.4 of cardano-node. So I fired up my offline relay again and it synced to 100% without this problem. I then took it offline again, copied the db folder to the relay with the problem, and TADA this relay worked again too. But this problem also got to my block producer, which was not able to connect with these working & synced relays anymore. Unfortunately I don’t remember the exact error messages. Probably these were the errors, along with the connection errors trying to establish hot connection with the relays and instantly going to cold again:

Mar 10 11:13:07 SERVER2 cardano-node[2098655]: #033[31m[lstcr4:cardano.node.Forge:Error:209]#033[0m [2025-03-10 10:13:07.05 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 1.50035296e8)]))]
Mar 10 11:13:08 SERVER2 cardano-node[2098655]: #033[31m[lstcr4:cardano.node.Forge:Error:209]#033[0m [2025-03-10 10:13:08.06 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 1.50035297e8)]))]

So I did the same on the block producer, took one relay and BP offline, copied the db folder to the block producer after deleting its old db folder and started relay and BP again. After this, all worked well again until now…

Just out of curiosity did you have this happening at the exact same slot?: InvalidBlock (At (Block {blockPointSlot = SlotNo 149690156

Mardusfolm · 22 April 2025 23:28

I’m coming across a similar problem and I have the exact same message at the exact same slot no. as you did?! thanks for posting about this. Not sure what I’ll do about it yet maybe I’ll try resynching. I was originally just trying to update to 10.1.4 and I haven’t been able to get my relay fully synched…

{
“block”: 11567820,
“epoch”: 544,
“era”: “Conway”,
“hash”: “aa71bda8b09756ef1535696c7e6ffcdc864d511547566700f4445945219e9271”,
“slot”: 149690099,
“slotInEpoch”: 45299,
“slotsToEpochEnd”: 386701,
“syncProgress”: “98.28”

fredrovicius · 25 April 2025 14:23

I ran into this issue today and was able to solve this by removing the files in the ledger and volatile directory in your db folder. This caused a reply of blocks and allowed my system to resume sync.

Topic		Replies	Views
Moving from htn to mainnet-candidate, I get InvalidBlock error slotNo 134000 Stake Delegation	3	683	14 July 2020
Help with Testnet relay error Setup a Stake Pool	11	1472	13 March 2021
Node stuck at block 5107443 Community Technical Support	29	2301	9 June 2021
Relay node stuck. slotno 43372972 Operate a Stake Pool	3	371	24 October 2021
Invalid block - no known cause Operate a Stake Pool	13	1435	16 December 2021

Relay node running good until stuck at one slot (invalid block)

Related topics