Monitoring server specs?

Cardano-Cal · 10 October 2021 16:53

What specs would you guys recommend for the monitoring server (memory, vcpu & disk space)?

The server will be responsible for ETL of all data related to the nodes (2 relays and 1 BP), their environments and the blockchain.

It will be running things like Prometheus, Grafana, DBSync (might require a node).

Any suggestions?

Guillaume · 10 October 2021 18:33

Hello,

I recommend you check out: Guide: How to build a Cardano Stake Pool - CoinCashew It’s an excellent guide. Just today I finished configuring the pool on the testnet, with Grafana and Prometheus. You will see in the guide a section dedicated to this last point. NOTE: no special node is required (Grafana is installed on one of the relay nodes).

Greetings !
Guillaume

hanswurst · 11 October 2021 05:50

Including the effect, that if you you lose this node for whatever reason, your monitoring (which should make you aware of this) is dead as well… doh

mcrio · 11 October 2021 06:40

I too suggest going with a separate, cheap node hosting Grafana, Prometheus. Reasons: mostly what @hanswurst mentioned + no need to expose additional ports and run additional software on the Cardano node which might affect security (more attack surface).

Cardano-Cal · 11 October 2021 06:49

Thanks for the link.

I have nodes running in the TestNet with Prometheus and Grafana all up and working so all is good.

I am busy designing my MainNet architecture and would like to keep monitoring software off of my nodes but I read somewhere that DBSync should be run off the same server as the node because the amount of data written is too much for a LAN.

Cardano-Cal · 11 October 2021 06:51

Yeah agreed.

Do you think a cheap (low spec) node will suffice? Are these things not quite resource intensive?

mcrio · 11 October 2021 07:14

The comment above applies to Prometheus and Grafana, especially the Grafana frontend. The node would be just a metrics aggregator. My monitoring node consumes about 300MB of RAM. The load is very low too. The disk space may be a factor due to how much data is being stored. I suggest you give it a try and see how it goes. You still need to run the Prometheus metrics exporters on all nodes that are monitored.

Cardano-Cal · 11 October 2021 07:38

Thanks, I will start low and see.

I am doing this in AWS, do you think a 2vcpu, 4Gb memory server will be enough?

How much disk space does it use up?

mcrio · 11 October 2021 08:09

That should be more than enough to run Prometheus and Grafana. The disk space depends on the retention time and number of metrics. I don’t have disk space shortages but let’s say for a 15 day retention time I think you should be good with 15+GB (including the OS).

jf3110 · 11 October 2021 10:45

Actually, I decided to use some simple python scripts for monitoring together with sending alarms by email. That way, I don’t need to watch Grafana graphs and can still step-in in case something fails.

Currently, I’m monitoring tx processed on relay and node and return codes from topology-update. Used to monitor missed slots as well, but that’s no major concern anymore.

Recently, I’m getting false alarms from relay not restart within 5 minutes after topology updates going effective. Seems to be the case with the Alonzo enabled versions of cardano-node that they take longer to restart.

jf3110 · 11 October 2021 10:48

If someone is interested there’s an example python script for topology-updater on github:

Topic		Replies	Views
Prometheus/Grafana Operate a Stake Pool	34	2038	7 April 2021
Cardano Stake Pool Monitoring with Prometheus/Grafana Operate a Stake Pool stake-pools	4	1224	10 March 2021
Grafana + prometheus setup Setup a Stake Pool setup	14	1841	1 February 2023
Issue with CoinCashew Guide on Grafana/Prometheus setup Operate a Stake Pool	10	1089	9 July 2021
Monitoring using prometheus and Grafana Operate a Stake Pool	21	2107	25 February 2022

Monitoring server specs?

Related topics