Can no longer get Grafana and Prometheus monitoring to work

Hi there - I have always been able to use Granfana and Prometheus monitoring on all my nodes for the past year but now since I have upgraded them all to 1.29.0 I can no longer get the monitoring to work. The nodes all run fine and the setup with them is OK. All my nodes are fresh installs and I followed the same procedure in installing Grafana/Prometheus using the ./setup_mon.sh script. I have double and triple checked all the IP settings, port settings and firewall settings and everything appear just fine but when I go to connect the Prometheus Data Source in Grafana and try to save the settings I just keep getting the error “HTTP Error Bad Gateway” - it is really fruastrating as I have watched all the videos and checked the relevant community channels for solutions and cannot find anything to solve this. Any help would be much appreciated.

I don’t recall if this is the error I had, but my fix was in mainnet-config.json. On my mainnet-config.json, line 65 I had to set the listen IP’s to 0.0.0.0. The default config file only listens on ip 127.0.0.1, it was one sed command I missed when I rebuilt my nodes.

“hasEKG”: 12788,
“hasPrometheus”: [
"0.0.0.0",
12798
],

Thanks - yes I have tried doing exactly what you say but still get the same error - I will try again though just to be sure and report back

Yes just checked - lines 53 to 56 in my config.json file are as you say

“hasEKG”: 12788,
“hasPrometheus”: [
“0.0.0.0”,
12798
],

Not sure what I can try next?

Can you run ‘curl http://core.ip:12798/metrics’ from your relay node and get a result?
If not, can you run ‘curl http://127.0.0.1:12798/metrics’ from a relay or core node’s command line and get results? If you can get data from issuing the curl command on 127.0.0.1 but not from another node using the core’s ip number, then it may not be listening.

Sure - I will try that now

when i try curl http://core.ip:12798/metrics I just get
curl: (6) Could not resolve host: core.ip

but when I try url http://127.0.0.1:12798/metrics I get a whole bunch of info which I guess are the results we want

Well, don’t use core.ip. Login to your relay node and try http://(your core servers ip):12798/metrics to see if you get the same data dump.

Yes when I run “curl http://My local IP:12798/metrics” I get the same data dump

If you can get the same data from all cardano nodes, when you run that http command from the node that is running the grafana server, then you at least know it’s not a cardano-related problem. It has to do with prometheus.

Yes the Cardano nodes are all running fine

sorry I just realised how dumb it was using core.ip instead of my IP address - been a long night :slight_smile:

:slight_smile: When I do something like that, I just walk away until the next day. Once you start making basic mistakes, it’s time to take an overnight break pal. We all do it.

Ok - might leave it for now - so something to do with Prometheus then?

It must be. You’ve verified the communication to the cardano node is working just fine. Maybe an extra space used to indent in the prometheus.yaml file. Prometheus seems to be very very picky in how its configuration file is laid out.

Thanks - I will have a look at that

Ok so we know Node is working fine and all ports, IPs and firewall settings are OK. We can get metrics from the node but for some reason Prometheus will not connect. prometheus.yaml file looks good and can’t see any issues there so if anyone has any further ideas please let me know - thanks for your help