ErrorPolicySuspendPeer after upgrade to 1.26.1

Hi all, I just updated a cardano node to 1.26.1 and I get these strange messages

[relaynod:cardano.node.ErrorPolicy:Warning:88] [2021-04-09 12:05:41.92 UTC] IP 3.126.142.56:40741 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Broken pipe)) "(sendAll errored)"))) 20s 20s

Any clues?

I also have with 1.25.1 - so probably not node version related…
IP 77.68.95.116:41529 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Connection reset by peer)) "(sendAll errored)"))) 20s 20s

Same here after upgrade

I also get this error like this:
ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (InvalidBlock (At…

I also get this error. As well as many other errors. I never got any errors on 1.25.1

Hey guys, I already upgrade to 1.26.2, then the ErrorPolicySuspendPeer message was gone.

1 Like

I still get “Failed to start all required subscriptions” and “ErrorPolicySuspendPeer” after upgrading to 1.26.2.

Launched a testnet bp and relay nodes a couple of weeks ago on version 1.26.2, and I’ve been getting the error ever since, on both nodes. At least 32 times on BP node since May 3, and over 540 times since 4/29 on the relay node (bp node log had been overlaid a few times prior to May 3). Could this be a problem with the cloud provider I’m running on (Digital Ocean) or an issue with the overall Cardano network?

Latest error from BP node
ESC[35m[g-2vcpu-:cardano.node.ErrorPolicy:Notice:128]ESC[0m [2021-05-11 22:38:55.45 UTC] IP xxxxxxxxxx:xxxx ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply))) 20s

At the same time as the above error, relay node threw this

ESC[33m[ubuntu-g:cardano.node.ErrorPolicy:Warning:130]ESC[0m [2021-05-11 22:38:56.49 UTC] IP xxxxxxxxxxxxxx:38037 ErrorPolicySuspendPeer (Just (ApplicationExceptionTra
ce (MuxError MuxBearerClosed “<socket: 41> closed when reading data, waiting on next header True”))) 20s 20s

Same observations running BP and relay in Azure AKS. Docker container image inputoutput/cardano-node:1.27.0.
The message only appears (14185 times in 24 hours) on the relay node. What is even more strange is the IP listed in the Error warning message are assigned to containers in a different namespace (kube-system) unrelated to the namespace the node is running:

[foo-card:cardano.node.ErrorPolicy:Warning:60] [2021-07-22 20:25:17.81 UTC] IP 10.240.0.4:15817 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError MuxBearerClosed “<socket: 84> closed when reading data, waiting on next header True”))) 20s 20s
bash-5.1$ k get po -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-5cb457f4b-f6ssg 1/1 Running 0 35h 10.244.0.10 aks-agentpool-22638277-vmss000001
cert-manager cert-manager-cainjector-69d885bf55-5x8pw 1/1 Running 0 35h 10.244.1.10 aks-agentpool-22638277-vmss000000
cert-manager cert-manager-webhook-8d7495f4-ks4pm 1/1 Running 0 35h 10.244.1.11 aks-agentpool-22638277-vmss000000
kube-system azure-ip-masq-agent-7jlvz 1/1 Running 0 36h 10.240.0.4 aks-agentpool-22638277-vmss000000
kube-system azure-ip-masq-agent-7t6sl 1/1 Running 0 36h 10.240.0.5 aks-agentpool-22638277-vmss000001

10.240.0.4 is azure-ip-masq-agent.

The ip-masq-agent configures iptables rules to MASQUERADE traffic outside link-local (optional, enabled by default) and additional arbitrary IP ranges.

How could this cause the error messages in cardano is hard to say.

See: Using Source IP | Kubernetes

Packets sent to Services with Type=LoadBalancer are source NAT’d by default, because all schedulable Kubernetes nodes in the Ready state are eligible for load-balanced traffic. So if packets arrive at a node without an endpoint, the system proxies it to a node with an endpoint, replacing the source IP on the packet with the IP of the node (as described in the previous section).

Solved by setting service.spec.externalTrafficPolicy field to Local on the load-balancer service that exposes the relay node.