Question 1: I’m getting about 0.6% missed slot leader checks. Is this normal, and can it be reduced?
Question 2: What percentage of missed slots are others here getting?
Question 3: How can we reduce missed checks?
We’ve been through our topology closely, and our nodes are well-connected. The block producer has 20GB RAM, 4GB swap, and has super-fast dedicated Ryzen CPUs. I’ve read that the garbage collector mostly causes the problem.
Hi,
missed slot leader checks percentage should be near to 0%. I think 0.6% is not something to be concerned about. I am usually at 0.3 - 0.4% and sometimes goes up if the network is busy.
Hardware specs can influence this parameter, especially RAM. You can expand your Swap size to 8-10GB and see if it changes.
The use of RTS flags in the start script can improve node’s performance.
/usr/local/bin/cardano-node run +RTS -c -N -A16m -RTS ...
More info available here:
https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html
For a good hosting, except the period during the epoch change, the missed slot leader checks should be 0. If it is not 0, your hosting is probably not a very good quality one. If it is your own server/computer, something is probably not good enough. You should have at least SSD disks (or NVMe, but SSD is more than enough).
There are a few things you can do to decrease the missed slot leader checks during the epoch, if you have them. I tried many of the RTS flags, nothing seemed to really help except the -N (number of CPU Cores used by cardano-node). I am setting it to the number of CPU Cores of the server where it is running. If the server has at least 32 GB RAM, --nonmoving-gc might help.
Disabling TraceMempool on the block producer should help a lot reducing the load on it, and it should also help with the missed slot leader checks. I never had it enabled on the block producers on mainnet, so I cannot tell you how much it helps to disable it.
Another thing that is very effective is setting SnapshotInterval to 86400 (for example) in the mainnet-config.json file (it is missing by default). The default value is 3600. This changes the ledger snapshot (on disk) from the default of one hour to one day. And it helps a lot. The effect is that the node restart will take a little longer, but from my experience it is a matter of extra seconds, no it is not something to be concerned about.
These are the most important things that should help reducing the missed slot leader checks during the epoch (if you have), except changing the hosting.
1 Like