Missed slot leader checks

So after testing some tweaks found here in this forum I still get a lot of missed slot leader checks and huge memory usage from my BP node. My first one was +RTS -N -H3G -qg -qb --nonmoving-gc -RTS which work fine for a few hours but still get a lot of missed checks after maybe 15 hours or so and got an Out of Memory Killer (OOM Killer) so the node was killed and restarted automatically by the system. I tried setting up some 8gb swap space and added --disable-delayed-os-memory-return but still the same.

gliveview

BP and relay specs:

6 vCPU
16GB RAM
400GB SSD

So is it time to upgrade my BP memory? My relay node only uses 7.7GB of memory as of now and been on for 18 hours.

Update: So after reading this post it looks like --nonmoving-gc works really well with slot checks but with the issue with memory not being returned to the system memory usage eventually gets very high and uses swap space which might be the reason for missed checks.

Add -F1.1 to the RTS params - if it still OOMs then add some zramswap.

1 Like

It’s been a day and looks like memory usage is improving since I used -F1.1 as suggested by @weebl2000 and few missed checks too.
Capture

May I know what’s the “Served” under BLOCK PROPAGATION and “Total Tx” for? Why am i getting 0? Is this bad?

Hi,

Probably u have TraceMempool set to false inside configuration file

1 Like

Yeah I set it to false and now Total Tx are starting to show on both my BP and Relay node. Any reason why im getting 0 Served on my BP node?

Because u did not mint any block since node restart

Yeah I’m hoping to mint a block maybe in a month or so coz it’s just a week old stake pool. :slight_smile:

Why am I getting Served : 1910 on my relay node then?

did u read above what all these means? U have all explanations about glive outputs there

According to Guild Operators’ website:

  • Block propagation - Last delay measures the duration between when the last block was scheduled to be produced and when the node learned about it. Late blocks are blocks whose delay is larger than 5s. If the node is not synching, the number of late blocks needs to stay low. Within ⅓/5s estimates the chance of observing a delay of ⅓/5s (based on the delays observed for previous blocks). A healthy node needs to stay above 95% of blocks within 3s. Finally, served blocks counts how many blocks were fetched by “in” peers. If this does not increase for a long time, it means the “in” peers are learning about new blocks from somewhere else (and therefore this node is not contributing towards accelerating the propagation). Overall, these metrics are helpful in tweaking the topology and/or performance of the network links.

I don’t see any mention that a BP node needs to have a mint history in order to get the Served display on gLiveView. It makes sense for relay nodes to show more Served because it’s connected to more “in” peers than the bp node which is just connected to it’s relay node.