Actually I checked the logs and couldn’t find any entry under “Missed” or “Missing” !!, is there another key word ?
Is this the “in epoch” slot ? , as it doesn’t look like missed slot query because relays doesn’t have missed slots
How did you find that you are missing slots? I mean what is the procedure to monitor that?
curl localhost:12798/metrics | grep “cardano_node_metrics_slotsMissedNum_int”
or using metric “cardano_node_metrics_slotsMissedNum_int” in Grafana dashboard
Thanks Mina, works like a charm.
This is my output from the curl call:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2054 0 2054 0 0 668k 0 --:–:-- --:–:-- --:–:-- 668k
What is the 0% on received? And what time frame are those slots? Is it in current epoch?
I see total and received are the same, does that mean I didn’t lose any slots?
Actually it looks like you have no missed slots , as this metric is only available when you have at lead one missed slot .
Good news for you
Can you please share if you use VPS for your BP node or cold metal ? also do you restart your BP node periodically ?
Thanks for the quick response. I don’t know what VPS or cold metal is. I am running my nodes on Azure Cloud, 16GB RAM, 2 cores and almost never restart my nodes. The bp and one relay have been up for 7 days, another relay for 20 days. There was some Memory issued right after the 1.27.0 upgrade, but now everything is smooth.
Regarding Amazon or Google: Don’t know I’m using some other hosting provider.
Restarting the service is not a real option, because you’re missing slots that way as well. It’s just that the node does not recognize it because of starting up.
VPS aka virtual private server is not a real server in hardware. It is a simulated server running together with many others VPS on a real (bare metal) server. That way cloud services and hosting providers save a lot of hardware by sharing real hardware for many customers.
Just FYI: the curl command only works on the bp node. Relay nodes don’t process slots and therefore cannot miss them. Furthermore, as you’ve seen, there are no entry in metrics if the number of missed slots is 0.
you are right , I shall find another solution rather than Google , do you have recommendations ?
Perfect , thanks a lot
There is an issue on Github opened for that, it seems TraceMemPool enabled is the most affecting thing here.
Checking on other missed slot topics (was following and posting on Investigating missed slots). I’m not missing as many slots as reported by @MinaFarahat but enough to be concerning. Have missed two in the last 48+ hours after I resized from 2 cpu to 4 cpu (16 gb ram) (Virtual Dedicated, DigitalOcean) Don’t know if it’s an issue with the two relays (both limited to 15 peers via topology updater) The relay nodes are also virtual dedicated but are running at 2 cpu/8gb ram. I’m thinking these probably should be upgraded.
From my experience 2 missed slots are not an issue. Very unlikely, that it happens during minting a block. Relay nodes are not that important regarding that issue.
A bigger issue is the amount of missed blocks during epoch change. IMO this will be addressed with future release of the node software.
Thanks for the feedback! I won’t sweat it. Actually, it was four missed slots: two at around 13:20gmt and then two at around 14:17gmt.
I experienced the same issue after 1.27 upgrade, disabling tracemempool on bp and second relay had decreased missed slot from 670 to 5 constant after 2 days , unfortunately disabling this option mean that you can not see tx related data.
Anyway thank you for posting informations about.
Has anyone had this happen with missed slots? You go for a few days with only a couple or no missed slots, then, in a span of a 4 or 5 minutes you miss over a 100? Has happened to me twice in the last 10 days. Most recently today. Between 21:43gmt to 21:45gmt went from 0 missed slots to 26 missed, then two minutes later another 118 missed, then another two minutes later 66 missed. Since then, nothing missed
21:44gmt was the last new tip message in the bp log, then nothing for 3 minutes until
[bp-node:cardano.node.IpSubscription:Error:18936] [2021-06-24 21:47:49.40 UTC] IPs: 0.0.0.0:0 [22.214.171.124:8001,126.96.36.199:8001] Application Exception: 188.8.131.52:8001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply)
Checked the relay nodes and no blips in the bandwidth history during that time. Ran the network delay checker from @laplasz on the relay nodes
2021-06-24 21:47:57.90 UTC 122.9
2021-06-24 21:48:09.01 UTC 11.01
2021-06-24 23:03:12.43 UTC 2.43
2021-06-24 23:58:38.18 UTC 2.18
Checking the relay node logs, at about the same time as the bp node, they threw the ExceededTimeLimit error.
Does this warrant some network monitoring that’s more detailed than the default graph offered by the cloud provider? Any good Linux tools that would collect and display network history?
U will see missed slots when the epoch change, I have them and everybody have (it will be fixed soon)
Thanks! I’ve seen previous posts that talk about that and I completely forgot to consider that.