Hi
I’m running a cardano relay node in P2P running a long time without problems. Recently upgraded to 10.1.4, continued to work well until a few days ago when it started to be stuck at a particular block, with many errors of the following form:
** 2025-03-08T20:37:16.357204+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.InboundGovernor:Info:183]#033[0m [2025-03-08 19:37:16.35 UTC] TrMuxErrored (ConnectionId {localAddress = MYIP:3001, remoteAddress = 34.92.222.93:1338}) (InvalidBlock (At (Block {blockPointSlot = SlotNo 149690156, blockPointHash = 0c28ad6a7b445475d8c4961b0bdd8e7af41b94d25ee467a08f1b0fd02a5c30e7})) 0bc7ce20b6b5fc34a51300eadad64bb050db73b712632c4711be5d1cc9f511a4 (ValidationError (ExtValidationErrorLedger (HardForkLedgerErrorFromEra S (S (S (S (S (S (Z (WrapLedgerErr {unwrapLedgerErr = BBodyError (BlockTransitionError (LedgersFailure (LedgerFailure (ConwayWdrlNotDelegatedToDRep (KeyHash {unKeyHash = “0521608ad5f86f6856beeafda4d77b2371c1521c117ae5955a79af27”} )))
))})))))))))))
2025-03-08T20:37:16.357387+01:00 lstcr5 cardano-node[555]: #033[34m[SERVER:cardano.node.InboundGovernor:Info:183]#033[0m [2025-03-08 19:37:16.35 UTC] TrInboundGovernorCounters (InboundGovernorCounters {coldPeersRemote = 1, idlePeersRemote = 5, warmPeersRemote = 2, hotPeersRemote = 1})
2025-03-08T20:37:16.357481+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.PeerSelectionCounters:Info:187]#033[0m [2025-03-08 19:37:16.35 UTC] PeerSelectionView {viewRootPeers = 60, viewKnownPeers = 45, viewAvailableToConnectPeers = 13, viewColdPeersPromotions = 4, viewEstablishedPeers = 9, viewWarmPeersDemotions = 0, viewWarmPeersPromotions = 0, viewActivePeers = 2, viewActivePeersDemotions = 0, viewKnownBigLedgerPeers = 15, viewAvailableToConnectBigLedgerPeers = 4, viewColdBigLedgerPeersPromotions = 4, viewEstablishedBigLedgerPeers = 0, viewWarmBigLedgerPeersDemotions = 0, viewWarmBigLedgerPeersPromotions = 0, viewActiveBigLedgerPeers = 0, viewActiveBigLedgerPeersDemotions = 0, viewKnownLocalRootPeers = 1, viewAvailableToConnectLocalRootPeers = 1, viewColdLocalRootPeersPromotions = 0, viewEstablishedLocalRootPeers = 1, viewWarmLocalRootPeersPromotions = 0, viewActiveLocalRootPeers = 1, viewActiveLocalRootPeersDemotions = 0, viewKnownNonRootPeers = 0, viewColdNonRootPeersPromotions = 0, viewEstablishedNonRootPeers = 0, viewWarmNonRootPeersDemotions = 0, viewWarmNonRootPeersPromotions = 0, viewActiveNonRootPeers = 0, viewActiveNonRootPeersDemotions = 0, viewKnownBootstrapPeers = 0, viewColdBootstrapPeersPromotions = 0, viewEstablishedBootstrapPeers = 0, viewWarmBootstrapPeersDemotions = 0, viewWarmBootstrapPeersPromotions = 0, viewActiveBootstrapPeers = 0, viewActiveBootstrapPeersDemotions = 0}
2025-03-08T20:37:16.357595+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.PeerSelection:Info:187]#033[0m [2025-03-08 19:37:16.35 UTC] TracePromoteWarmDone 2 2 95.217.220.29:3001
2025-03-08T20:37:16.368972+01:00 SERVER cardano-node[555]: #033[34m[SERVER:cardano.node.ConnectionManager:Info:6604]#033[0m [2025-03-08 19:37:16.36 UTC] TrConnectionHandler (ConnectionId {localAddress = MYIP:3001, remoteAddress = 34.92.222.93:1338}) (TrConnectionHandlerError OutboundError (InvalidBlock (At (Block {blockPointSlot = SlotNo 149690156, blockPointHash = 0c28ad6a7b445475d8c4961b0bdd8e7af41b94d25ee467a08f1b0fd02a5c30e7})) 0bc7ce20b6b5fc34a51300eadad64bb050db73b712632c4711be5d1cc9f511a4 (ValidationError (ExtValidationErrorLedger (HardForkLedgerErrorFromEra S (S (S (S (S (S (Z (WrapLedgerErr {unwrapLedgerErr = BBodyError (BlockTransitionError (LedgersFailure (LedgerFailure (ConwayWdrlNotDelegatedToDRep (KeyHash {unKeyHash = “0521608ad5f86f6856beeafda4d77b2371c1521c117ae5955a79af27”} )))
))}))))))))))) ShutdownPeer)**
All of these errors occur at slot no 149690156. The command cardano-cli query tip --mainnet returns
{
** “block”: 11567820,**
** “epoch”: 544,**
** “era”: “Conway”,**
** “hash”: “aa71bda8b09756ef1535696c7e6ffcdc864d511547566700f4445945219e9271”,**
** “slot”: 149690099,**
** “slotInEpoch”: 45299,**
** “slotsToEpochEnd”: 386701,**
** “syncProgress”: “99.91”**
}
and it stays stuck there.
There are also a lot of other log entries (vs when the node was running fine) but i think they are from the P2P-connected nodes that are serving the invalid block to my node (the one which my node thinks it is invalid) and it always sets these nodes to cold after the invalid block message. So in gliveview, there are always some cold, warm and hot nodes but the numbers are changing every few seconds. Just hot stays always on 2. There are always around 5-10 incoming and some 30-40 outgoing peers.
My Cardano DB partition has 23 GB left (its not too much but should be sufficient for now), I have a 32 GB RAM setup with much better CPU than the minimum requirement, made sure my config files are exactly identical to Mainnet - The Cardano Operations Book to rule out config errors.
What would be the next steps to resolve this problem? Is it possible to go back with the DB to before this slot/block and try if it goes well next time? Or can I only delete the whole DB and restart from scratch (or a snapshot) and not only delete a part of it?
Thanks for your help!