Good monitoring metrics for a "healthy" coreNode/relayNode?

ATADA · 5 June 2020 20:51

Hi,

to prepare some scripting for the MainNet / HTN Operation, what do you guys think are good parameters to monitor in an automated way to decide if the node is healthy or not? So warning messages can be produced and also the node can be automatically restarted for example.

Slot/BlockHeight growing in a period of Time
Block production of the coreNode itself
CPU and RAM usage
KES/Opcert periods valid

what else?

lauris · 10 June 2020 09:01

I would monitor rejected blocks and maybe connected relay node count

ATADA · 10 June 2020 09:21

i would say, if the slotnumber/blocknumber is not increasing in a given interval, this would also cover this automatically?

rejected blocks yes, but that is more a performance parameter than a health parameter?

Umed_SKY · 11 June 2020 22:52

Here is what I track on the core node:

ATADA · 13 June 2020 19:59

i like you live view, but i am searching for monitoring parameters that are monitoring the node directly on the machine itself and triggering some alarm actions.

leonfs · 29 March 2021 19:42

What did you end up using @ATADA as health metric?

kawan · 3 April 2021 11:14

Helo, do you have guide to do this monitoring ?

ATADA · 6 April 2021 19:42

I personally made my own monitoring tools… but they are not public available. But you can also look at our StakePoolOperator Tools Alliance Site here:

You can use the simpleLiveView from Adam or gLiveView from the Guild to check your running node.

Topic		Replies	Views
Cardano Node Metrics Setup a Stake Pool	0	359	20 July 2021
I Need help to understand Cardano Node Metrics Setup a Stake Pool	4	939	29 October 2021
Cardano-node healthchecks Setup a Stake Pool	12	1649	11 August 2021
Can someone explain to me what the relays are doing? Operate a Stake Pool	29	981	24 February 2021
Speed up BP and relay nodes Operate a Stake Pool	1	832	19 August 2021

Good monitoring metrics for a "healthy" coreNode/relayNode?

Related topics