Last week I started seeing my missed slots count increase in Grafana and I’m trying to figure out why. Right now, my core is missing a slot every 3-4 minutes. This seems to be a constant rate.
The core has 4 relays connected to/from in, 2 that are within the same LAN and 2 that are outside.
Each of the 4 relays have ± 20 peers (in and out)
All machine (relays & cores) are at ~5-6% CPU and 40% RAM
Nothing changed in the setup/configuration for weeks
The core logs:
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:39.10 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171506e7)]))]
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.LeadershipCheck:Info:118] [2021-03-03 02:16:39.10 UTC] {"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":299892,"slot":23171508,"chainDensity":4.7476634e-2,"utxoSize":1423622}
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
----- SLOT NUMBER 2.3171507e7 SHOULD BE SOMEWHERE HERE BUT IT'S NOT -----
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:39.10 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171508e7)]))]
Mar 03 02:16:39 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:39.10 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.LeadershipCheck:Info:118] [2021-03-03 02:16:40.00 UTC] {"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":299892,"slot":23171509,"chainDensity":4.7476634e-2,"utxoSize":1423622}
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.ForgeTime:Info:118] [2021-03-03 02:16:40.00 UTC] fromList []
Mar 03 02:16:40 cardano-mainnet-s01-c01 bash[166580]: [cardano-:cardano.node.Forge:Info:118] [2021-03-03 02:16:40.00 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNodeNotLeader"),("slot",Number 2.3171509e7)]))]
I’m trying to figure out what causes these missing slots.
Actually, I looked it up but there is no cardano_node_metrics_slotsMissedNum_int metric on my BP did you change the config file other than the TraceFetchBlockDecision parameter? Best wishes
I think the metric won’t show until you at least have 1 missed slots. I currently have none and I don’t see it when I curl localhost:12798/metrics (EKG metrics).
Hi, can you share what number of relays are you allowing now?
I’ve got the same issue with slotsMissed (my current settings is to allow 20 peers). Cheers!
Missed slots are only on a block produced node. You shouldn’t have 20 peers on a block producer.
I was using a raspberry pi as my block producer and reducing to only 1 peer helped, but after 20-24 hours I had missed slots again. I ended up using a more powerful server (dedicated 4 cores + 16GB RAM) and I currently have 3 peers connected to my block producer and I never miss a slot anymore.
Thanks for sharing your config .
Sorry … I meant 20 peers to the relay nodes.
I already had the peers to my block producer limited to 3 (since we run 3 relay servers).
My Block Producer was running on 4 CPU / 16GB; I changed it to 8 CPU / 32GB but the results are the same.
Good people out there. Just now I have 80 missedSlots on 19h35min runtime. My gut feeling is that is way too much, but I’d love to see how I am doing compared to others. Could you share your missed slot numbers for reference.
IMO, it’s not so much the number, but the rate it increases over time once the first blocks are missed. It’s like the node is running into some critical state, where it’s no longer running without missing slots about every 30 min.
ATM, the problem goes away with the restart of the bp node, thus resetting from that critical state. As I said, I’ll add another (second) relay later this month. If that does not mitigate the situation, then IMO it makes sense to file an issue on github for cadrdano-node.
And for 48 hours runtime, had 0 missed slots, until this morning. Then started getting 2 missed slots about every 2 or 3 hours. Two more missed slots a couple of minutes ago but load averages at 0 and memory usage looks normal (35%). Cloud metrics don’t show any spikes that coincide with the missed blocks. So, not an increasing rate of missed slots. Two relay nodes, bp running at 2 cpu 16gb ram. Is that enough, or is the true minimum system requirements higher than that?
Thanks @QCPOLstakepool. I’ll give that a try. However not sure if I want to pay $120 per month for the cloud service, but will be interesting to see if it improves. I did turn tracemempool off and restarted. In the 16 hours since doing that and the restart, 6 missed slots.
Thanks. I’ll look into contabo. I’m currently running dedicated servers on DigitalOcean. I was influenced by https://docs.cardano.org/getting-started/guidelines-for-large-spos even though I’m a very small SPO, so I thought my nodes should not be on VPS. However, if a VPS at 8 vCPU/30 RAM is just as good if not better than a dedicated 4 vCPU/16 RAM, then it’s time to switch.
Except for a 3-minute anomaly with a relay node following a topology updater update (140 missed slots) no missed slots since the 4 CPU resize. Over 24 hours have passed. Will keep monitoring. @Alexd1985 does your core node run on a Contabo VPS? Any issues with missed slots/blocks?