Investigating missed slots

Rest of Sunday and Monday went fine, no missled slots, but then today four missed slots. Still running on 4 cpu/16GB ram. Found these log entries at about the two times the slots were missed.

[bp-node:cardano.node.IpSubscription:Error:2086] [2021-06-22 13:22:57.58 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Application Exception: 143.110.
219.143:8001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Al
legra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra Standa
rdCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokM
ustReply)
[bp-node:cardano.node.ErrorPolicy:Notice:83] [2021-06-22 13:22:57.58 UTC] IP 143.110.219.143:8001 ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTi
meLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto))
(’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (Shell
eyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply))) 20s
[bp-node:cardano.node.IpSubscription:Notice:87] [2021-06-22 13:22:58.58 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Waiting 0.025s before attempting
a new connection
[bp-node:cardano.node.IpSubscription:Notice:12641] [2021-06-22 13:22:58.59 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Connection Attempt End, desti
nation 143.110.219.143:8001 outcome: ConnectSuccessLast
[bp-node:cardano.node.ErrorPolicy:Warning:91] [2021-06-22 13:22:58.63 UTC] IP 143.110.219.143:44883 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError Mu
xBearerClosed “<socket: 43> closed when reading data, waiting on next header True”))) 20s 20s
[bp-node:cardano.node.ErrorPolicy:Warning:91] [2021-06-22 13:22:58.80 UTC] IP 165.232.132.156:42661 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError Mu
xBearerClosed “<socket: 27> closed when reading data, waiting on next header True”))) 20s 20s
[bp-node:cardano.node.IpSubscription:Error:4185] [2021-06-22 13:23:42.62 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Application Exception: 165.232.
132.156:8001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Al
legra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra Standa
rdCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokM
ustReply)
[bp-node:cardano.node.ErrorPolicy:Notice:83] [2021-06-22 13:23:42.62 UTC] IP 165.232.132.156:8001 ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTi
meLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto))
(’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (Shell
eyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply))) 20s
[bp-node:cardano.node.IpSubscription:Error:87] [2021-06-22 13:23:43.62 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Failed to start all required subs
criptions
[bp-node:cardano.node.IpSubscription:Notice:87] [2021-06-22 13:23:53.62 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Waiting 0.025s before attempting
a new connection
[bp-node:cardano.node.IpSubscription:Notice:12685] [2021-06-22 13:23:53.70 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Connection Attempt End, desti
nation 165.232.132.156:8001 outcome: ConnectSuccessLast

[bp-node:cardano.node.IpSubscription:Error:12685] [2021-06-22 14:15:36.50 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Application Exception: 165.232
.132.156:8001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘A
llegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra Stand
ardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext Tok
MustReply)
[bp-node:cardano.node.ErrorPolicy:Notice:83] [2021-06-22 14:15:36.50 UTC] IP 165.232.132.156:8001 ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTi
meLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto))
(’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (Shell
eyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply))) 20s
[bp-node:cardano.node.IpSubscription:Notice:87] [2021-06-22 14:15:37.50 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Waiting 0.025s before attempting
a new connection
[bp-node:cardano.node.IpSubscription:Notice:12894] [2021-06-22 14:15:37.58 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Connection Attempt End, desti
nation 165.232.132.156:8001 outcome: ConnectSuccessLast
[bp-node:cardano.node.ErrorPolicy:Warning:91] [2021-06-22 14:16:22.46 UTC] IP 143.110.219.143:33441 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError Mu
xBearerClosed “<socket: 29> closed when reading data, waiting on next header True”))) 20s 20s

Anything in the relay logs I need to look at ?

Just to share what I have discovered over the past month of monitoring this problem. I have also been having missed slots showing up on Grafana ever since I started using Grafana reports about 2 months back. I was hoping that the issue was a configuration problem, because if it’s not, that would mean a hardware upgrade, which is additional costs. I was running 2 relays and 1 BP on 4 x vCores and 8GB RAM, for each instance.

About a month back, with the increasing load on the Cardano network and upcoming 1.29.0 Alonzo hard fork, I decided to try out a hardware upgrade since none of the config changes made a difference. Note that I was getting 2 to 6 missed slots every 2 to 4 hours. It was fairly consistent and somewhat unacceptable to me. I started with upgrading one of my relay servers to 8 cores and 16GB RAM. That already gave a dramatic improvement of 2 missed slots every 4 to 8 hours. But it wasn’t good enough. I tried to also upgrade my BP next to the same specs. Got 2 missed slots every 24 to 48 hours. Now I have just upgraded all my servers and also migrated to 1.29.0.

Still monitoring, but I’m expecting zero missed slots from now. Unfortunate that the specs and associated costs of running a stake pool are about double what I had originally planned, but I guess it’s worth it. Don’t want to miss a block when the time comes to be assigned one. If SPO out there is still on some Raspberry pi setup, might be a good time to consider a hardware and/or network upgrade. Don’t think the casual setup is going to work for much longer. Else, it’s you and your delegators that lose out.

Hope this very limited anecdotal evidence helps.

[Ticker: SGCO]

My 5 cents:

  1. Also, do not allow more than one network switch between one relay and a BP.
  2. If possible - abandon the virtual infrastructure.
  3. Use fast SSD (NVMe) drives.
  4. Use at least 16 GB per server.