I recommend you check out: Guide: How to build a Cardano Stake Pool - CoinCashew It’s an excellent guide. Just today I finished configuring the pool on the testnet, with Grafana and Prometheus. You will see in the guide a section dedicated to this last point. NOTE: no special node is required (Grafana is installed on one of the relay nodes).
I too suggest going with a separate, cheap node hosting Grafana, Prometheus. Reasons: mostly what @hanswurst mentioned + no need to expose additional ports and run additional software on the Cardano node which might affect security (more attack surface).
I have nodes running in the TestNet with Prometheus and Grafana all up and working so all is good.
I am busy designing my MainNet architecture and would like to keep monitoring software off of my nodes but I read somewhere that DBSync should be run off the same server as the node because the amount of data written is too much for a LAN.
The comment above applies to Prometheus and Grafana, especially the Grafana frontend. The node would be just a metrics aggregator. My monitoring node consumes about 300MB of RAM. The load is very low too. The disk space may be a factor due to how much data is being stored. I suggest you give it a try and see how it goes. You still need to run the Prometheus metrics exporters on all nodes that are monitored.
That should be more than enough to run Prometheus and Grafana. The disk space depends on the retention time and number of metrics. I don’t have disk space shortages but let’s say for a 15 day retention time I think you should be good with 15+GB (including the OS).
Actually, I decided to use some simple python scripts for monitoring together with sending alarms by email. That way, I don’t need to watch Grafana graphs and can still step-in in case something fails.
Currently, I’m monitoring tx processed on relay and node and return codes from topology-update. Used to monitor missed slots as well, but that’s no major concern anymore.
Recently, I’m getting false alarms from relay not restart within 5 minutes after topology updates going effective. Seems to be the case with the Alonzo enabled versions of cardano-node that they take longer to restart.