Missed slot leader checks

stakeside · 4 September 2022 00:00

So after testing some tweaks found here in this forum I still get a lot of missed slot leader checks and huge memory usage from my BP node. My first one was +RTS -N -H3G -qg -qb --nonmoving-gc -RTS which work fine for a few hours but still get a lot of missed checks after maybe 15 hours or so and got an Out of Memory Killer (OOM Killer) so the node was killed and restarted automatically by the system. I tried setting up some 8gb swap space and added --disable-delayed-os-memory-return but still the same.

gliveview

BP and relay specs:

6 vCPU
16GB RAM
400GB SSD

So is it time to upgrade my BP memory? My relay node only uses 7.7GB of memory as of now and been on for 18 hours.

Update: So after reading this post it looks like --nonmoving-gc works really well with slot checks but with the issue with memory not being returned to the system memory usage eventually gets very high and uses swap space which might be the reason for missed checks.

weebl2000 · 4 September 2022 20:10

Add -F1.1 to the RTS params - if it still OOMs then add some zramswap.

stakeside · 5 September 2022 22:58

It’s been a day and looks like memory usage is improving since I used -F1.1 as suggested by @weebl2000 and few missed checks too.
Capture

May I know what’s the “Served” under BLOCK PROPAGATION and “Total Tx” for? Why am i getting 0? Is this bad?

Alexd1985 · 6 September 2022 06:12

Hi,

Probably u have TraceMempool set to false inside configuration file

github.com

cardano-community/guild-operators/blob/alpha/docs/Scripts/gliveview.md

!!! info "Reminder !!"
    Ensure the [Pre-Requisites](../basics.md#pre-requisites) are in place before you proceed.

**Guild LiveView - gLiveView** is a local monitoring tool to use in addition to remote monitoring tools like Prometheus/Grafana, Zabbix or IOG's RTView. This is especially useful when moving to a systemd deployment - if you haven't done so already - as it offers an intuitive UI to monitor the node status.

The tool is independent from other files and can run as a standalone utility that can be stopped/started without affecting the status of `cardano-node`.

##### Download

If you've used [prereqs.sh](../basics.md#pre-requisites), you can skip this part, as this is already set up for you. The tool relies on the common `env` configuration file.
To get current epoch blocks, the [logMonitor.sh](../Scripts/logmonitor.md) script is needed (and can be combined with [CNCLI](../Scripts/cncli.md)). This is optional and **Guild LiveView** will function without it.

!!! info "Note"
    For those who follow guild's [folder structure](../basics.md#folder-structure) and do not wish to run `prereqs.sh`, you can run the below in `$CNODE_HOME/scripts` folder

To download the script:

```bash
curl -s -o gLiveView.sh https://raw.githubusercontent.com/cardano-community/guild-operators/master/scripts/cnode-helper-scripts/gLiveView.sh
curl -s -o env https://raw.githubusercontent.com/cardano-community/guild-operators/master/scripts/cnode-helper-scripts/env

This file has been truncated. show original

stakeside · 6 September 2022 07:08

Yeah I set it to false and now Total Tx are starting to show on both my BP and Relay node. Any reason why im getting 0 Served on my BP node?

Alexd1985 · 6 September 2022 09:03

Because u did not mint any block since node restart

stakeside · 6 September 2022 09:22

Yeah I’m hoping to mint a block maybe in a month or so coz it’s just a week old stake pool.

Why am I getting Served : 1910 on my relay node then?

Alexd1985 · 6 September 2022 09:27

did u read above what all these means? U have all explanations about glive outputs there

stakeside · 6 September 2022 09:36

According to Guild Operators’ website:

Block propagation - Last delay measures the duration between when the last block was scheduled to be produced and when the node learned about it. Late blocks are blocks whose delay is larger than 5s. If the node is not synching, the number of late blocks needs to stay low. Within ⅓/5s estimates the chance of observing a delay of ⅓/5s (based on the delays observed for previous blocks). A healthy node needs to stay above 95% of blocks within 3s. Finally, served blocks counts how many blocks were fetched by “in” peers. If this does not increase for a long time, it means the “in” peers are learning about new blocks from somewhere else (and therefore this node is not contributing towards accelerating the propagation). Overall, these metrics are helpful in tweaking the topology and/or performance of the network links.

I don’t see any mention that a BP node needs to have a mint history in order to get the Served display on gLiveView. It makes sense for relay nodes to show more Served because it’s connected to more “in” peers than the bp node which is just connected to it’s relay node.

Topic		Replies	Views
SlotsMissedNum_int Operate a Stake Pool	4	712	30 December 2021
What's the normal missed slot leaders checks? Operate a Stake Pool	2	639	25 October 2022
Too many Missed Slots happens, help me Operate a Stake Pool	72	4397	2 October 2022
Missed slots Operate a Stake Pool	59	2603	8 October 2021
Dedicated virtual machine for BP node Operate a Stake Pool	8	664	28 July 2021

Missed slot leader checks

Related topics