Producer node loses connection to relay

cardanopool · 13 May 2021 06:09

producer node loses connection and throws errors (“ErrorPolicySuspendConsumer”, “closed socket to”, “desired valency 1 current valency 0”, etc; see log below)
when i restart the node it reconnects successfully and works as expected for a while
then (hours later) it loses connection again

what could it be? how to configure nodes so they keep reconnecting automatically?

both nodes are running cardano-node version 1.26.2

producer node logs:

May 13 01:21:37 XXX cardano: \^[[35m[XXX:cardano.node.ChainDB:Notice:34]\^[[0m [2021-05-13 01:21:37.18 UTC] Chain extended, new tip: 089267708a29fc4e4259c71317e8957fe556678602ebad2b143b38fb47c6f06a at slot 29302606
May 13 01:22:44 XXX cardano: \^[[31m[XXX:cardano.node.IpSubscription:Error:57]\^[[0m [2021-05-13 01:22:44.62 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Application Exception: XXX.XXX.XXX.XXX:XXXXX ExceededTimeLimit (KeepAlive) (ServerAgency TokServer)
May 13 01:22:44 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:57]\^[[0m [2021-05-13 01:22:44.62 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Closed socket to XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:44 XXX cardano: \^[[35m[XXX:cardano.node.ErrorPolicy:Notice:50]\^[[0m [2021-05-13 01:22:44.63 UTC] IP XXX.XXX.XXX.XXX:XXXXX ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTimeLimit (KeepAlive) (ServerAgency TokServer))) 20s
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Restarting Subscription after 41091.916323806s desired valency 1 current valency 0
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Trying to connect to XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:78]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt Start, destination XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:45 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Waiting 0.025s before attempting a new connection
May 13 01:24:00 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:78]\^[[0m [2021-05-13 01:24:00.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Closed socket to XXX.XXX.XXX.XXX:XXXXX

mainnet-topology.json on producer:

{
  "Producers": [
    {
      "addr": "XXX.XXX.XXX.XXX",
      "port": XXXXX,
      "valency": 1
    }
  ]
}

mainnet-topology.json on relay:

{
  "Producers": [
    {
      "addr": "YYY.YYY.YYY.YYY",
      "port": YYYYY,
      "valency": 1
    },
    {
      "addr": "relays-new.cardano-mainnet.iohk.io",
      "port": 3001,
      "valency": 2
    }
  ]
}

Alexd1985 · 13 May 2021 06:18

What is the hw configuration of the servers?
u must use topologyupdater for the relay (out of topic)

cardanopool · 13 May 2021 06:25

@Alexd1985, thanks for prompt response!

both nodes are the same vms: 4 core cpu with 8192 mb ram

(noted about topologyupdater. thanks)

Alexd1985 · 13 May 2021 06:26

try journal -e -f -u cardano-node.service

cardanopool · 13 May 2021 06:31

oops, forgot to mention it’s bsd. no systemd. sorry.

what should i be looking for in the logs?

Alexd1985 · 13 May 2021 06:54

strange, try when it happens if u can telnet from the producer to the relay

from producer:

telnet Relay_IP Relay_cnode_port

try to run the topology updater on Relay and check after… if this issue still exist

cardanopool · 13 May 2021 07:01

connection looks good:

# telnet XXX.XXX.XXX.XXX XXXXX
Trying XXX.XXX.XXX.XXX...
Connected to XXX.XXX.XXX.XXX.
Escape character is '^]'.
Connection closed by foreign host.
#

producer node log after restart looks good as well (“ConnectSuccessLast”):

May 13 04:11:16 XXX cardano: \^[[34m[XXX:cardano.node.ChainDB:Info:5]\^[[0m [2021-05-13 04:11:16.60 UTC] Opened lgr db
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.ChainDB:Info:5]\^[[0m [2021-05-13 04:11:33.17 UTC] Opened db with immutable tip at dac67fe42995dd85f10b660631f4b3d0fc145ba6ae9d4597b9bab789113f2ee4 at slot 29258724 and tip 089267708a29fc4e4259c71317e8957fe556678602ebad2b143b38fb47c6f06a at slot 29302606
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Trying to connect to XXX.XXX.XXX.XXX:XXXXX
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:55]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt Start, destination XXX.XXX.XXX.XXX:XXXXX
May 13 04:11:33 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:52]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Waiting 0.025s before attempting a new connection
May 13 04:11:33 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:55]\^[[0m [2021-05-13 04:11:33.18 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt End, destination XXX.XXX.XXX.XXX:XXXXX outcome: ConnectSuccessLast

will try topology updater on relay a bit later. anything else i can check before that?

@Alexd1985, thanks again for such quick replies! appreciate your help.

Alexd1985 · 13 May 2021 07:03

this was tested when the connection dropped?

cardanopool · 13 May 2021 07:05

no, i can’t catch that moment when connection is lost. that was tested now

Alexd1985 · 13 May 2021 07:06

but u said the connection drop and u need to restart the producer in order to reconnect right?
before to restart next time, check the connection

I don’t have any ideea why the producer loss the connectivity… try to add the IOHK relays also in topology files and check if u are losing the connectivity only with your Relay

and one question: are u losing the IN or OUT connection?

cardanopool · 13 May 2021 07:18

correct. i need to restart producer in order to reconnect. before i restart i did check with telnet and it was connecting okay.

i guess that network goes down and then up again somewhere between producer and relay, but producer can’t reconnect automatically.

let me check relay logs as well and see. thanks!

Topic		Replies	Views
[P2P] Block producer loosing connections to relays Operate a Stake Pool	0	114	15 July 2024
Node disconnects every 24 hours Operate a Stake Pool	3	443	18 June 2021
Relay Node unable to connect to Producer node Operate a Stake Pool	8	1369	25 February 2021
Block Producing node having issue connecting to relay node Setup a Stake Pool	3	1109	26 May 2021
Block producer node abruptly stops syncing after a while and stops listening on port Setup a Stake Pool cardano , cardano-node	3	572	26 July 2021

Producer node loses connection to relay

Related topics