- producer node loses connection and throws errors (“ErrorPolicySuspendConsumer”, “closed socket to”, “desired valency 1 current valency 0”, etc; see log below)
- when i restart the node it reconnects successfully and works as expected for a while
- then (hours later) it loses connection again
what could it be? how to configure nodes so they keep reconnecting automatically?
both nodes are running cardano-node
version 1.26.2
producer node logs:
May 13 01:21:37 XXX cardano: \^[[35m[XXX:cardano.node.ChainDB:Notice:34]\^[[0m [2021-05-13 01:21:37.18 UTC] Chain extended, new tip: 089267708a29fc4e4259c71317e8957fe556678602ebad2b143b38fb47c6f06a at slot 29302606
May 13 01:22:44 XXX cardano: \^[[31m[XXX:cardano.node.IpSubscription:Error:57]\^[[0m [2021-05-13 01:22:44.62 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Application Exception: XXX.XXX.XXX.XXX:XXXXX ExceededTimeLimit (KeepAlive) (ServerAgency TokServer)
May 13 01:22:44 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:57]\^[[0m [2021-05-13 01:22:44.62 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Closed socket to XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:44 XXX cardano: \^[[35m[XXX:cardano.node.ErrorPolicy:Notice:50]\^[[0m [2021-05-13 01:22:44.63 UTC] IP XXX.XXX.XXX.XXX:XXXXX ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTimeLimit (KeepAlive) (ServerAgency TokServer))) 20s
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Restarting Subscription after 41091.916323806s desired valency 1 current valency 0
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Trying to connect to XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:45 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:78]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt Start, destination XXX.XXX.XXX.XXX:XXXXX
May 13 01:22:45 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:52]\^[[0m [2021-05-13 01:22:45.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Waiting 0.025s before attempting a new connection
May 13 01:24:00 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:78]\^[[0m [2021-05-13 01:24:00.64 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Closed socket to XXX.XXX.XXX.XXX:XXXXX
mainnet-topology.json on producer:
{
"Producers": [
{
"addr": "XXX.XXX.XXX.XXX",
"port": XXXXX,
"valency": 1
}
]
}
mainnet-topology.json on relay:
{
"Producers": [
{
"addr": "YYY.YYY.YYY.YYY",
"port": YYYYY,
"valency": 1
},
{
"addr": "relays-new.cardano-mainnet.iohk.io",
"port": 3001,
"valency": 2
}
]
}
What is the hw configuration of the servers?
u must use topologyupdater for the relay (out of topic)
@Alexd1985, thanks for prompt response!
both nodes are the same vms: 4 core cpu with 8192 mb ram
(noted about topologyupdater. thanks)
try journal -e -f -u cardano-node.service
oops, forgot to mention it’s bsd. no systemd. sorry.
what should i be looking for in the logs?
strange, try when it happens if u can telnet from the producer to the relay
from producer:
telnet Relay_IP Relay_cnode_port
try to run the topology updater on Relay and check after… if this issue still exist
connection looks good:
# telnet XXX.XXX.XXX.XXX XXXXX
Trying XXX.XXX.XXX.XXX...
Connected to XXX.XXX.XXX.XXX.
Escape character is '^]'.
Connection closed by foreign host.
#
producer node log after restart looks good as well (“ConnectSuccessLast”):
May 13 04:11:16 XXX cardano: \^[[34m[XXX:cardano.node.ChainDB:Info:5]\^[[0m [2021-05-13 04:11:16.60 UTC] Opened lgr db
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.ChainDB:Info:5]\^[[0m [2021-05-13 04:11:33.17 UTC] Opened db with immutable tip at dac67fe42995dd85f10b660631f4b3d0fc145ba6ae9d4597b9bab789113f2ee4 at slot 29258724 and tip 089267708a29fc4e4259c71317e8957fe556678602ebad2b143b38fb47c6f06a at slot 29302606
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:52]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Trying to connect to XXX.XXX.XXX.XXX:XXXXX
May 13 04:11:33 XXX cardano: \^[[34m[XXX:cardano.node.IpSubscription:Info:55]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt Start, destination XXX.XXX.XXX.XXX:XXXXX
May 13 04:11:33 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:52]\^[[0m [2021-05-13 04:11:33.17 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Waiting 0.025s before attempting a new connection
May 13 04:11:33 XXX cardano: \^[[35m[XXX:cardano.node.IpSubscription:Notice:55]\^[[0m [2021-05-13 04:11:33.18 UTC] IPs: 0.0.0.0:0 [XXX.XXX.XXX.XXX:XXXXX] Connection Attempt End, destination XXX.XXX.XXX.XXX:XXXXX outcome: ConnectSuccessLast
will try topology updater on relay a bit later. anything else i can check before that?
@Alexd1985, thanks again for such quick replies! appreciate your help.
this was tested when the connection dropped?
no, i can’t catch that moment when connection is lost. that was tested now 
but u said the connection drop and u need to restart the producer in order to reconnect right?
before to restart next time, check the connection
I don’t have any ideea why the producer loss the connectivity… try to add the IOHK relays also in topology files and check if u are losing the connectivity only with your Relay
and one question: are u losing the IN or OUT connection?
correct. i need to restart producer in order to reconnect. before i restart i did check with telnet and it was connecting okay.
i guess that network goes down and then up again somewhere between producer and relay, but producer can’t reconnect automatically.
let me check relay logs as well and see. thanks!