[BUG] - Connection Handshake Error and Promotion Failure of Cold Big Ledger Peers in Cardano Node v8.9.2 Docker Image

Internal Issue

Area: Node Connection Management

Summary:
The Cardano node experiences connectivity issues with establishing and maintaining peer connections. These issues result in handshake errors and connection failures. Additionally, promotion attempts of cold big ledger peers fail due to connection errors, indicating potential issues with the availability of remote nodes. These problems significantly affect the node’s performance, leading to a decrease in download speed by 90% and the validation of only one slot per second, particularly after reaching 120 million slots.

Steps to reproduce:

1.Download the Docker image Package cardano-node · GitHub.
2.Use Docker to create a container.
3.Monitor the logs for errors related to connection management and promotion of cold big ledger peers.

System info:

OS Name: Ubuntu
OS Version: 22.04.1
Node version /CLI version: : Docker image Package cardano-node · GitHub.
Docker version 26.1.0, build 9714adc

Additional context:
The logs indicate various errors, including handshake errors during connection establishment and connection refusal while attempting to promote cold big ledger peers. These errors suggest potential issues with the availability or accessibility of remote nodes, impacting the node’s overall performance.

Initial Logs:

Logs:
2024-04-23 18:46:27 [8c5f5845:cardano.node.ChainDB:Info:5] [2024-04-23 21:46:27.09 UTC] Closed db with immutable tip at 0c08287b680a798aa7176d83fbacfa8bb196e62b18bd904037d497bf237fd3a0 at slot 122296839 and tip 6c2a171daf4d0350e1995fd718e79dbeb170f1a5403764c1605dfa64a1e662b6 at slot 122342476
2024-04-23 18:46:27
2024-04-23 18:46:27 Shutting down…
2024-04-23 18:46:27 cardano-node: thread killed
2024-04-23 18:59:36 Starting: exec /nix/store/9ph9pp8d21y17k95mhgk8jk2z2036dhj-cardano-node-exe-cardano-node-8.9.2/bin/cardano-node run
2024-04-23 18:59:36 --config /nix/store/76zh2cy72zz8bxh7k5c4v7rgiba9qrk1-config-0-0.json
2024-04-23 18:59:36 --database-path /data/db
2024-04-23 18:59:36 --topology /nix/store/czj80vj5a9vg1xdg3sgqpfj0hc8zv3pb-topology.yaml
2024-04-23 18:59:36 --host-addr 0.0.0.0
2024-04-23 18:59:36 --port 3001
2024-04-23 18:59:36 --socket-path /ipc/node.socket
Connection Handshake Error:
Logs:

[8c5f5845:cardano.node.ConnectionManager:Info:209] [2024-04-23 21:27:11.24 UTC] TrConnectionHandler (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 13.41.234.173:3001}) (TrConnectionHandlerError OutboundError ExceededTimeLimit …)
Summary: A connection handshake error occurred while attempting to establish a connection with a remote node. The error message indicates that the handshake process exceeded the time limit.

Promotion Failure of Cold Big Ledger Peer:
Logs:

[8c5f5845:cardano.node.PeerSelection:Info:172] [2024-04-23 21:27:24.28 UTC] TracePromoteColdBigLedgerPeerFailed 10 9 3.216.77.109:3001 9.624079842171s Network.Socket.connect: <socket: 45>: does not exist (Connection refused)
Summary: The attempt to promote a cold big ledger peer failed due to a connection error. The connection was refused, indicating a possible issue with the availability of the remote node.

Is there any way to proceed? How can we address the issue to resume the chain synchronization process?

I’d like to provide additional information regarding an issue I encountered. It appears that whenever I attempt to reconnect with peers, my network experiences some form of restriction or blockage. Consequently, the connection becomes significantly slow. Yesterday, I closed the node and reopened it today. Initially, it functioned smoothly for approximately three minutes, during which time it successfully downloaded and validated 92,538 slots at a rate of 514 per second. However, shortly thereafter, the connection error with peers resurfaced, causing a drastic slowdown in both download and validation speeds to just one slot per second.

Furthermore, I should mention that I am utilizing a Ryzen 5 5600X processor and 32 GB of RAM. Below, I will include the logs from the last slot validated during the “fast mode” until the subsequent slot, which experienced a slowdown.

2024-04-24 22:36:50 [8c5f5845:cardano.node.GSM:Info:146] [2024-04-25 01:36:50.22 UTC] fromList [(“currentSelection”,Object (fromList [(“kind”,String “Tip”),(“tipBlockNo”,Number 1.0230401e7),(“tipHash”,String “722d22bb1f25e440d57b5e453d977a01f3be4960d90091420cea4aa0ce9dc254”),(“tipSlotNo”,Number 1.22442706e8)])),(“kind”,String “GsmEventEnterCaughtUp”),(“peerNumber”,Number 2.0)]
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelection:Notice:160] [2024-04-25 01:36:50.22 UTC] TraceLedgerStateJudgementChanged YoungEnough
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelectionCounters:Info:160] [2024-04-25 01:36:50.22 UTC] PeerSelectionCounters {coldPeers = 0, warmPeers = 1, hotPeers = 2, coldBigLedgerPeers = 0, warmBigLedgerPeers = 0, hotBigLedgerPeers = 0, localRoots = }
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:50.22 UTC] TraceBigLedgerPeersRequest 15 0
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.22 UTC] [String “FetchingNewLedgerState”,Number 529.0,Number 2842.0]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "13.208.79.46" 6000”,String “PickedBigLedgerPeer”,Number 8.216508481077716e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "relays.ektrp.com" 5859”,String “PickedBigLedgerPeer”,Number 1.5988289600208803e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "52.167.20.127" 4000”,String “PickedBigLedgerPeer”,Number 9.952803016943504e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "3.228.183.84" 3001”,String “PickedBigLedgerPeer”,Number 1.477863825118689e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "cardano-relay-1.upbit.com" 30800”,String “PickedBigLedgerPeer”,Number 1.213966274671122e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "52.177.36.96" 3000”,String “PickedBigLedgerPeer”,Number 2.0126865838078914e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "ca.relays.cardanians.io" 1000”,String “PickedBigLedgerPeer”,Number 3.0810352060967883e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "r1.isp-r1.wjg.jp" 3001”,String “PickedBigLedgerPeer”,Number 9.45254952409071e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "149.102.140.196" 6000”,String “PickedBigLedgerPeer”,Number 3.1429371641176667e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "34.249.11.89" 3001”,String “PickedBigLedgerPeer”,Number 6.547682720921369e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "cardano-relays-2.nu.fi" 3001”,String “PickedBigLedgerPeer”,Number 9.85369740943107e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "3.222.153.137" 3001”,String “PickedBigLedgerPeer”,Number 1.4786498276707937e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessAddress "54.36.178.85" 6000”,String “PickedBigLedgerPeer”,Number 2.773403186912947e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "relay1.ada-stake.com" 3001”,String “PickedBigLedgerPeer”,Number 1.06290992637487e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “RelayAccessDomain "relay1.kaizn.kaizencrypto.com" 6000”,String “PickedBigLedgerPeer”,Number 1.2318284900528509e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String “[RelayAccessDomain "relay1.kaizn.kaizencrypto.com" 6000,RelayAccessDomain "relay1.ada-stake.com" 3001,RelayAccessAddress "54.36.178.85" 6000,RelayAccessAddress "3.222.153.137" 3001,RelayAccessDomain "cardano-relays-2.nu.fi" 3001,RelayAccessAddress "34.249.11.89" 3001,RelayAccessAddress "149.102.140.196" 6000,RelayAccessDomain "r1.isp-r1.wjg.jp" 3001,RelayAccessDomain "ca.relays.cardanians.io" 1000,RelayAccessAddress "52.177.36.96" 3000,RelayAccessDomain "cardano-relay-1.upbit.com" 30800,RelayAccessAddress "3.228.183.84" 3001,RelayAccessAddress "52.167.20.127" 4000,RelayAccessDomain "relays.ektrp.com" 5859,RelayAccessAddress "13.208.79.46" 6000]”,Number 15.0,Number 15.0,String “PickedBigLedgerPeers”]
2024-04-24 22:36:51 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:51.50 UTC] [String “UseLedgerPeers”,Number 1.16812831e8]
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:51.50 UTC] TraceBigLedgerPeersResults (fromList [51.89.97.27:1000,52.177.36.96:3000,34.147.178.39:3001,207.154.243.52:3001,3.228.183.84:3001,34.249.11.89:3001,3.222.153.137:3001,62.112.8.253:3001,52.167.20.127:4000,194.99.21.198:5859,13.208.79.46:6000,54.36.178.85:6000,149.102.140.196:6000,54.38.153.200:6000,15.164.112.189:30800]) 0 5s
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelectionCounters:Info:160] [2024-04-25 01:36:51.50 UTC] PeerSelectionCounters {coldPeers = 0, warmPeers = 1, hotPeers = 2, coldBigLedgerPeers = 15, warmBigLedgerPeers = 0, hotBigLedgerPeers = 0, localRoots = }
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:51.50 UTC] TracePromoteColdBigLedgerPeers 10 0 (fromList [52.177.36.96:3000,34.147.178.39:3001,207.154.243.52:3001,3.228.183.84:3001,34.249.11.89:3001,62.112.8.253:3001,194.99.21.198:5859,13.208.79.46:6000,54.36.178.85:6000,149.102.140.196:6000])
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:482] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:483] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:485] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:484] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:486] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:488] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:487] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:489] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:491] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:490] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:490] [2024-04-25 01:36:51.72 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 4})
2024-04-24 22:36:51 [8c5f5845:cardano.node.Handshake:Info:492] [2024-04-25 01:36:51.72 UTC] WithMuxBearer (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 54.36.178.85:6000}) Send (ClientAgency TokPropose,MsgProposeVersions (fromList [(NodeToNodeV_7,TList [TInt 764824073,TBool False]),(NodeToNodeV_8,TList [TInt 764824073,TBool False]),(NodeToNodeV_9,TList [TInt 764824073,TBool False]),(NodeToNodeV_10,TList [TInt 764824073,TBool False]),(NodeToNodeV_11,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_12,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_13,TList [TInt 764824073,TBool False,TInt 0,TBool False])]))
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:483] [2024-04-25 01:36:51.73 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 5})
2024-04-24 22:36:51 [8c5f5845:cardano.node.Handshake:Info:493] [2024-04-25 01:36:51.73 UTC] WithMuxBearer (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 34.147.178.39:3001}) Send (ClientAgency TokPropose,MsgProposeVersions (fromList [(NodeToNodeV_7,TList [TInt 764824073,TBool False]),(NodeToNodeV_8,TList [TInt 764824073,TBool False]),(NodeToNodeV_9,TList [TInt 764824073,TBool False]),(NodeToNodeV_10,TList [TInt 764824073,TBool False]),(NodeToNodeV_11,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_12,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_13,TList [TInt 764824073,TBool False,TInt 0,TBool False])]))
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:487] [2024-04-25 01:36:51.73 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 6})```

The issue with Cardano node connectivity and peer connection failures seems related to network restrictions or unavailability of remote nodes. To troubleshoot, monitor logs for handshake errors and connection refusals, especially during peer promotions. Investigate network settings or consider optimizing docker container networking for improved stability and performance.

This has just had a response from the IOG development team so should be best to follow it here:

THANKS! i was desesperate waiting a response