Investigating missed slots

Hello everyone,

Last week I started seeing my missed slots count increase in Grafana and I’m trying to figure out why. Right now, my core is missing a slot every 3-4 minutes. This seems to be a constant rate.

  1. The core has 4 relays connected to/from in, 2 that are within the same LAN and 2 that are outside.
  2. Each of the 4 relays have ± 20 peers (in and out)
  3. All machine (relays & cores) are at ~5-6% CPU and 40% RAM
  4. Nothing changed in the setup/configuration for weeks

The core logs:

Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:39.10 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171506e7)]))]
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.LeadershipCheck:Info:118] [2021-03-03 02:16:39.10 UTC] {"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":299892,"slot":23171508,"chainDensity":4.7476634e-2,"utxoSize":1423622}
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []

----- SLOT NUMBER 2.3171507e7 SHOULD BE SOMEWHERE HERE BUT IT'S NOT -----

Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:39.10 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171508e7)]))]
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.LeadershipCheck:Info:118] [2021-03-03 02:16:40.00 UTC] {"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":299892,"slot":23171509,"chainDensity":4.7476634e-2,"utxoSize":1423622}
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:40.00 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171509e7)]))]

I’m trying to figure out what causes these missing slots.

Anyone experienced this?

1 Like

If it can help, here’s a screenshot of my Grafana dashboard… All systems seem stable, but the missed slots count increases…

s01-r01 is 4 cores 6GB
s01-r02 is 8 cores 8GB
s01-c01 is 4 cores 4GB

Sorry, can’t really help but may I ask how you calculate the missed slots in Grafana? Thanks

It’s the the cardano_node_metrics_slotsMissedNum_int metric (only available when your node is running as core).

By the way, I fixed my problem. I reduced the number of relays connected to my block producer and I’m not missing slots!

Glad to hear!

Actually, I looked it up but there is no cardano_node_metrics_slotsMissedNum_int metric on my BP did you change the config file other than the TraceFetchBlockDecision parameter? Best wishes

I think the metric won’t show until you at least have 1 missed slots. I currently have none and I don’t see it when I curl localhost:12798/metrics (EKG metrics).

1 Like

That’s true - the metric only shows up if at least one slot is missed.

Hi, can you share what number of relays are you allowing now?
I’ve got the same issue with slotsMissed (my current settings is to allow 20 peers). Cheers!

Missed slots are only on a block produced node. You shouldn’t have 20 peers on a block producer.

I was using a raspberry pi as my block producer and reducing to only 1 peer helped, but after 20-24 hours I had missed slots again. I ended up using a more powerful server (dedicated 4 cores + 16GB RAM) and I currently have 3 peers connected to my block producer and I never miss a slot anymore.

Thanks for sharing your config .
Sorry … I meant 20 peers to the relay nodes.
I already had the peers to my block producer limited to 3 (since we run 3 relay servers).

My Block Producer was running on 4 CPU / 16GB; I changed it to 8 CPU / 32GB but the results are the same.

I guess I’ll have to dig some more :wink:

Good people out there. Just now I have 80 missedSlots on 19h35min runtime. My gut feeling is that is way too much, but I’d love to see how I am doing compared to others. Could you share your missed slot numbers for reference.

0 missed slots on almost 44h runtime. Restarted with the epoch switch. Before that 0 missed slots for 5 days.

8 cores/ 32 GB ECC ram / 512 GB nvme

1 Like

IMO, it’s not so much the number, but the rate it increases over time once the first blocks are missed. It’s like the node is running into some critical state, where it’s no longer running without missing slots about every 30 min.

ATM, the problem goes away with the restart of the bp node, thus resetting from that critical state. As I said, I’ll add another (second) relay later this month. If that does not mitigate the situation, then IMO it makes sense to file an issue on github for cadrdano-node.