My relay keep restarting every 24 hours

For some strange reason some of my relay randomly crashes. they are in a google cloud vps 24 and was running ok since 1.26.1, when I had the 1.26.2 I started again from scratch using the Alex CNTOOLs tutorial because it makes easier the upgrade.
Following my configuration, how can I see the logs of what can be happened?
Screenshot 2021-04-24 at 15.30.08

U can check the logs

journalctl -e -f -u cnode.service

Apr 24 14:38:17 cardano-relay-2-migrate systemd[1]: cnode.service: Main process exited, code=killed, status=2/INT

Apr 24 14:38:17 cardano-relay-2-migrate systemd[1]: cnode.service: Succeeded.

Apr 24 14:38:17 cardano-relay-2-migrate systemd[1]: Stopped Cardano Node.

Apr 24 14:38:23 cardano-relay-2-migrate systemd[1]: Started Cardano Node.

Apr 24 14:38:24 cardano-relay-2-migrate cnode[6690]: Failed to query protocol-parameters from node, not yet fully started?

Apr 24 14:38:24 cardano-relay-2-migrate cnode[6690]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.

Apr 24 14:38:25 cardano-relay-2-migrate cnode[6690]: Listening on http://0.0.0.0:12798

Apr 24 14:43:13 cardano-relay-2-migrate systemd[1]: cnode.service: Current command vanished from the unit file, execution of the command list won’t be resumed.

Apr 24 14:46:49 cardano-relay-2-migrate systemd[1]: Stopping Cardano Node…

Apr 24 14:46:49 cardano-relay-2-migrate cnode[6690]: Shutting down…

Apr 24 14:46:49 cardano-relay-2-migrate systemd[1]: cnode.service: Main process exited, code=killed, status=2/INT

Apr 24 14:46:49 cardano-relay-2-migrate systemd[1]: cnode.service: Succeeded.

Apr 24 14:46:49 cardano-relay-2-migrate systemd[1]: Stopped Cardano Node.

Apr 24 14:46:55 cardano-relay-2-migrate systemd[1]: Started Cardano Node.

Apr 24 14:46:55 cardano-relay-2-migrate cnode[9124]: Failed to query protocol-parameters from node, not yet fully started?

Apr 24 14:46:55 cardano-relay-2-migrate cnode[9124]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.

Apr 24 14:46:56 cardano-relay-2-migrate cnode[9124]: Listening on http://0.0.0.0:12798

Apr 24 15:19:22 cardano-relay-2-migrate cnode[9124]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 9197 Killed cardano-node “{CPU_RUNTIME[@]}" run --topology "{TOPOLOGY}” --config “{CONFIG}" --database-path "{DB_DIR}” --socket-path "{CARDANO_NODE_SOCKET_PATH}" --port {CNODE_PORT} “${host_addr[@]}”

Apr 24 15:19:22 cardano-relay-2-migrate systemd[1]: cnode.service: Main process exited, code=exited, status=137/n/a

Apr 24 15:19:22 cardano-relay-2-migrate systemd[1]: cnode.service: Failed with result ‘exit-code’.

Apr 24 15:19:28 cardano-relay-2-migrate systemd[1]: cnode.service: Service RestartSec=5s expired, scheduling restart.

Apr 24 15:19:28 cardano-relay-2-migrate systemd[1]: cnode.service: Scheduled restart job, restart counter is at 1.

Apr 24 15:19:28 cardano-relay-2-migrate systemd[1]: Stopped Cardano Node.

Apr 24 15:19:36 cardano-relay-2-migrate systemd[1]: Started Cardano Node.

Type top

check the cpu and mem usage

I don’t have problems with memory now, by the way it has 8gb ram both relay, I can’t understand what’s going on. I only have Grafana and prometheus installed besides cardano node. right now its running for 4h without crashing but it happens after running more than 12 hours. is there anyway to reduce the so many peers? I have more than 15 peers on this topology updater is that really needed?

If both nodes have same issue, try for one of them to set the tracemempool to false in configuration file (and keep it under monitoring)

good I’ll try that on one of them and see what happens thank you

Hi!

Any update on this?

I had a similar issue with exit code 137 after upgrading to 1.26.2 on the relay. I had to upgrade to an 8gb instance with 100gb storage.

exit code 137 can refer to out of memory issue, yes

I see you are using a google cloud machine which by default comes without a SWAP partition ! Having a swap partition avoids the node crashing (i tested) when it runs out of ram. I suggest either add a bit more ram to cope during the spikes, or find out how to add a swap partition.

I have no idea on how to do that can you point me how did you do?
I had 50gb on disk, increased to 100gb I already have 8gb ram.

I’ll accept that as solution, I did it right now I hadn’t any problems on it anymore.
thanks

1 Like

Sounds good - You may try rebooting the instance. 8gb should be enough memory to allow the relay to boot. 4gb for sure will throw 137 code. I tested it with amazon and google.

1 Like

sorry, I don’t get it. my relay runs fine about 23h and just restarts… my BP doesn’t have the same issue even tough followed the same guide and same version of code cli, the relay keeps restarting.
it has swap setup.
total used free shared buff/cache available
Mem: 7.8Gi 5.5Gi 1.5Gi 0.0Ki 822Mi 2.1Gi
Swap: 2.0Gi 0.0Ki 2.0Gi

Did you get any log about the reason of the crash?

sorry I don’t…

how do you run the cardano-node - as a service? or?

its a service, it crashed after 23h running… my Grafana shows me the time when it crashed
checked the glView and it restarted the time running, so the service restarted
Screenshot 2021-05-06 at 17.29.47