BP node missing slots

MinaFarahat · 15 June 2021 18:10

I will check this option , is there a way to get more responsive VPS out of google or amazon services ?

MinaFarahat · 15 June 2021 18:11

Actually I checked the logs and couldn’t find any entry under “Missed” or “Missing” !!, is there another key word ?

MinaFarahat · 15 June 2021 18:12

Is this the “in epoch” slot ? , as it doesn’t look like missed slot query because relays doesn’t have missed slots

christop · 15 June 2021 19:10

How did you find that you are missing slots? I mean what is the procedure to monitor that?

MinaFarahat · 15 June 2021 19:25

curl localhost:12798/metrics | grep “cardano_node_metrics_slotsMissedNum_int”

or using metric “cardano_node_metrics_slotsMissedNum_int” in Grafana dashboard

christop · 15 June 2021 21:01

Thanks Mina, works like a charm.
This is my output from the curl call:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2054 0 2054 0 0 668k 0 --:–:-- --:–:-- --:–:-- 668k
What is the 0% on received? And what time frame are those slots? Is it in current epoch?
I see total and received are the same, does that mean I didn’t lose any slots?
Thanks again

MinaFarahat · 15 June 2021 21:06

Actually it looks like you have no missed slots , as this metric is only available when you have at lead one missed slot .
Good news for you

Can you please share if you use VPS for your BP node or cold metal ? also do you restart your BP node periodically ?

christop · 15 June 2021 21:26

Thanks for the quick response. I don’t know what VPS or cold metal is. I am running my nodes on Azure Cloud, 16GB RAM, 2 cores and almost never restart my nodes. The bp and one relay have been up for 7 days, another relay for 20 days. There was some Memory issued right after the 1.27.0 upgrade, but now everything is smooth.

jf3110 · 16 June 2021 13:34

Regarding Amazon or Google: Don’t know I’m using some other hosting provider.
Restarting the service is not a real option, because you’re missing slots that way as well. It’s just that the node does not recognize it because of starting up.

jf3110 · 16 June 2021 13:37

VPS aka virtual private server is not a real server in hardware. It is a simulated server running together with many others VPS on a real (bare metal) server. That way cloud services and hosting providers save a lot of hardware by sharing real hardware for many customers.

jf3110 · 16 June 2021 13:39

Just FYI: the curl command only works on the bp node. Relay nodes don’t process slots and therefore cannot miss them. Furthermore, as you’ve seen, there are no entry in metrics if the number of missed slots is 0.

MinaFarahat · 16 June 2021 16:23

you are right , I shall find another solution rather than Google , do you have recommendations ?

MinaFarahat · 16 June 2021 16:24

Perfect , thanks a lot

rafek47 · 22 June 2021 17:59

There is an issue on Github opened for that, it seems TraceMemPool enabled is the most affecting thing here.

github.com/input-output-hk/cardano-node

[BUG] - TraceMemPool set to true lead to missed slots on block producers

opened 10:16AM - 02 Apr 21 UTC

SmaugPool

bug

**External** **Area** *Other* Forging and Traces impact **Summary** **B…lock producer** nodes with `TraceMemPool` set to true miss more slots than ones without despite very few traces (compared to a relay with the same setting) and a low CPU consumption. Default config should set it to false or the trace implementation should be improved not to impact slots performance. **Steps to reproduce** Steps to reproduce the behavior: 1. Run a BP with [current official default config](https://hydra.iohk.io/build/5822084/download/1/mainnet-config.json) that has `TraceMemPool` set to true 2. Monitor `cardano_node_metrics_slotsMissedNum_int` during several days 3. Run a BP with same config except `TraceMemPool` set to false 4. Monitor `cardano_node_metrics_slotsMissedNum_int` during several days **Expected behavior** With a system powerful enough, both should have almost the same number of missed slots or not at all. In practice, the node with `TraceMemPool` set to true has much more missed slots. **System info (please complete the following information):** - OS: GNU/Linux Debian - Version 10 Buster - 1.25.1 **Screenshots and attachments** ![missed_slots](https://user-images.githubusercontent.com/66848001/113406331-99906580-93ab-11eb-9900-cf2700c92dad.png) This graph compare 2 BP nodes with the same setup running during the same period. The steps at epochs transitions are due to another cardano-node issue, but the missed slots increase inside epochs with `TraceMemPool` set to true is unexpected. ![cpu](https://user-images.githubusercontent.com/66848001/113406497-da887a00-93ab-11eb-9406-7b6e94ca6a48.png) CPU consumption during the same range is really reasonable. Here are the logs of a missed leader slot 25309533 (slots 25309532 and 25309534 are checked) likely because of the `TraceMemPool` `StakeKeyAlreadyRegisteredDELEG` trace just before the slot: ``` mars 27 21:10:23 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:23.00Z","env":"1.25.1:9a733","ns":["cardano.node.LeadershipCheck"],"data":{"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":389166,"slot":25309532,"chainDensity":4.946956e-2,"utxoSize":1693420},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"79"} mars 27 21:10:23 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:23.00Z","env":"1.25.1:9a733","ns":["cardano.node.Forge"],"data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":25309532}},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"79"} mars 27 21:10:23 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:23.86Z","env":"1.25.1:9a733","ns":["cardano.node"],"data":{"time(ps)":3082099509739250000,"kind":"MeasureTxsTimeStart","mempoolNumBytes":10504,"mempoolNumTxs":19},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"615015"} mars 27 21:10:23 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:23.86Z","env":"1.25.1:9a733","ns":["cardano.node.Mempool"],"data":{"tx":{"txid":"txid: TxId {_unTxId = \"d43d0be62b323e5f88975114295a03a5971b8427be0011acec8695876c2c728c\"}"},"kind":"TraceMempoolAddedTx","mempoolSize":{"numTxs":19,"bytes":10504}},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"615015"} mars 27 21:10:25 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:23.87Z","env":"1.25.1:9a733","ns":["cardano.node.Mempool"],"data":{"tx":{"txid":"txid: TxId {_unTxId = \"d43d0be62b323e5f88975114295a03a5971b8427be0011acec8695876c2c728c\"}"},"kind":"TraceMempoolRejectedTx","mempoolSize":{"numTxs":19,"bytes":10504},"err":{"kind":"StakeKeyAlreadyRegisteredDELEG","error":"Staking credential already registered","consumed":{"lovelace":0,"policies":{}},"credential":"KeyHashObj (KeyHash \"448ebf9d10c636d30c7309f9876f0c6f32e57d5d19b6b725217c7177\")","badInputs":[["b4129da1ebc9469113628712551188a315fd415ee7181e359ae036ce277127e4",0]],"produced":{"lovelace":962573798,"policies":{}}}},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"687010"} mars 27 21:10:25 bp1.smaug.pool.pm cardano-node[738]: {"at":"2021-03-27T20:10:25.01Z","env":"1.25.1:9a733","ns":["cardano.node.LeadershipCheck"],"data":{"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":389166,"slot":25309534,"chainDensity":4.946956e-2,"utxoSize":1693420},"app":[],"msg":"","pid":"738","loc":null,"host":"bp1","sev":"Info","thread":"79"} ``` The other BP without the `StakeKeyAlreadyRegisteredDELEG` trace checked it as expected: ``` mars 27 20:10:23 bp2 cardano-node[28595]: {"at":"2021-03-27T20:10:23.00Z","env":"1.25.1:9a733","ns":["cardano.node.LeadershipCheck"],"data":{"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":389166,"slot":25309532,"chainDensity":4.946956e-2,"utxoSize":1693420},"app":[],"msg":"","pid":"28595","loc":null,"host":"bp2","sev":"Info","thread":"67"} mars 27 20:10:23 bp2 cardano-node[28595]: {"at":"2021-03-27T20:10:23.00Z","env":"1.25.1:9a733","ns":["cardano.node.ForgeTime"],"data":{},"app":[],"msg":"","pid":"28595","loc":null,"host":"bp2","sev":"Info","thread":"67"} mars 27 20:10:23 bp2 cardano-node[28595]: {"at":"2021-03-27T20:10:23.00Z","env":"1.25.1:9a733","ns":["cardano.node.Forge"],"data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":25309532}},"app":[],"msg":"","pid":"28595","loc":null,"host":"bp2","sev":"Info","thread":"67"} mars 27 20:10:24 bp2 cardano-node[28595]: {"at":"2021-03-27T20:10:24.00Z","env":"1.25.1:9a733","ns":["cardano.node.LeadershipCheck"],"data":{"credentials":"Cardano","kind":"TraceStartLeadershipCheck","delegMapSize":389166,"slot":25309533,"chainDensity":4.946956e-2,"utxoSize":1693420},"app":[],"msg":"","pid":"28595","loc":null,"host":"bp2","sev":"Info","thread":"67"} mars 27 20:10:24 bp2 cardano-node[28595]: {"at":"2021-03-27T20:10:24.00Z","env":"1.25.1:9a733","ns":["cardano.node.Forge"],"data":{"credentials":"Cardano","val":{"kind":"TraceNodeIsLeader","slot":25309533}},"app":[],"msg":"","pid":"28595","loc":null,"host":"bp2","sev":"Info","thread":"67"} ``` **Additional context** Note that I have observed since that some other traces also have impact on missed slots, but to a lesser degree.

cardsfan7189 · 23 June 2021 02:27

Checking on other missed slot topics (was following and posting on Investigating missed slots). I’m not missing as many slots as reported by @MinaFarahat but enough to be concerning. Have missed two in the last 48+ hours after I resized from 2 cpu to 4 cpu (16 gb ram) (Virtual Dedicated, DigitalOcean) Don’t know if it’s an issue with the two relays (both limited to 15 peers via topology updater) The relay nodes are also virtual dedicated but are running at 2 cpu/8gb ram. I’m thinking these probably should be upgraded.

jf3110 · 23 June 2021 02:43

From my experience 2 missed slots are not an issue. Very unlikely, that it happens during minting a block. Relay nodes are not that important regarding that issue.

A bigger issue is the amount of missed blocks during epoch change. IMO this will be addressed with future release of the node software.

cardsfan7189 · 23 June 2021 03:05

Thanks for the feedback! I won’t sweat it. Actually, it was four missed slots: two at around 13:20gmt and then two at around 14:17gmt.

amicielettronici · 23 June 2021 07:23

Hi All,

I experienced the same issue after 1.27 upgrade, disabling tracemempool on bp and second relay had decreased missed slot from 670 to 5 constant after 2 days , unfortunately disabling this option mean that you can not see tx related data.

Anyway thank you for posting informations about.

Regards.

cardsfan7189 · 25 June 2021 02:56

Has anyone had this happen with missed slots? You go for a few days with only a couple or no missed slots, then, in a span of a 4 or 5 minutes you miss over a 100? Has happened to me twice in the last 10 days. Most recently today. Between 21:43gmt to 21:45gmt went from 0 missed slots to 26 missed, then two minutes later another 118 missed, then another two minutes later 66 missed. Since then, nothing missed

21:44gmt was the last new tip message in the bp log, then nothing for 3 minutes until
[bp-node:cardano.node.IpSubscription:Error:18936] [2021-06-24 21:47:49.40 UTC] IPs: 0.0.0.0:0 [165.232.132.156:8001,143.110.219.143:8001] Application Exception: 143.110.219.143:8001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokMustReply)

Checked the relay nodes and no blips in the bandwidth history during that time. Ran the network delay checker from @laplasz on the relay nodes

relay 1:
2021-06-24T21:48:00.52Z 125.52
2021-06-24T21:48:09.47Z 11.47
2021-06-24T23:03:12.66Z 2.66
2021-06-24T23:58:38.18Z 2.18
AVG= 0.50316

relay 2
2021-06-24 21:47:57.90 UTC 122.9
2021-06-24 21:48:09.01 UTC 11.01
2021-06-24 23:03:12.43 UTC 2.43
2021-06-24 23:58:38.18 UTC 2.18
AVG= 0.49142

Checking the relay node logs, at about the same time as the bp node, they threw the ExceededTimeLimit error.

Does this warrant some network monitoring that’s more detailed than the default graph offered by the cloud provider? Any good Linux tools that would collect and display network history?

Alexd1985 · 25 June 2021 03:17

U will see missed slots when the epoch change, I have them and everybody have (it will be fixed soon)

Thx

Topic		Replies	Views
Dedicated virtual machine for BP node Operate a Stake Pool	8	649	28 July 2021
Missed slots Operate a Stake Pool	59	2579	8 October 2021
BP is missing slots since upgrading to 1.27.0 Operate a Stake Pool	16	1092	13 August 2021
SlotsMissedNum_int Operate a Stake Pool	4	706	30 December 2021
About missed slots Setup a Stake Pool	15	1153	7 May 2021

BP node missing slots

Related topics