All nodes keep restarting?

Ideal · 17 August 2021 00:40

Hi all,

For some reason two relay nodes of mine keep restarting now. I seem them connect for a short amount and then disappear again. I checked the hardware usage and all seemed fine (the 2 nodes are running on 8vCPU, 8GB MEM, 80GB Disk each).

When I run the journal command I get the following on both nodes:

Aug 17 00:24:09 -: /opt/cardano/cnode/scripts/cnode.sh: line 57: 1549347 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG                                }" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --port ${CNODE_PORT} "${host_addr[@]}"
Aug 17 00:24:24 -: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Aug 17 00:24:26 -: Listening on http://0.0.0.0:12798

I tried shutting down the node and exporting a new socket file (I checked the folder path and it is correct).

export CARDANO_NODE_SOCKET_PATH=/opt/cardano/cnode/sockets/node0.socket

This doesn’t seem to do the trick however. I checked the CPU usage after a couple of minutes (both of them are sitting at 100% when looking via top (the dashboard of the host is saying 25~30% however). I also see that since the nodes are offline the disks are reading quite a bit. However, there is no weird incoming traffic or behavior on the server.

I also checked the gLiveView.sh. The nodes remain on status “starting…”. What am I overlooking? How might I fix this issue?

Help is much appreciated!

Ruslan_Sendecky · 17 August 2021 01:47

Hi,

It is very common for a node to restart if they lack memory. Two nodes on 8Gb will definitely restart.
A node process will consume around 7.5Gb - 8.5Gb of ram. So do the maths
If you have multiple relays on a single VM, provision at least 8Gb per instance. And don’t forget that the OS needs to eat too.

Ideal · 17 August 2021 01:50

Thanks for answering. Each node has its own VM with 8GB. I don’t know how both nodes can run for months on end and just now both keep restarting at the same time. The nodes are in different places so it can’t be a physical connection/ error either.

Ruslan_Sendecky · 17 August 2021 01:51

What does “free -h” say?

Ideal · 17 August 2021 01:52

Both present similar numbers

              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       287Mi       119Mi       0.0Ki       7.4Gi       7.3Gi
Swap:            0B          0B          0B

Right now it might have to do with the node socket again (restarted them a couple of minutes ago)

Ruslan_Sendecky · 17 August 2021 01:57

It is probably just restarted and goes through the usual DB scan. Once it runs for a bit it will average at around 7.5. Mine currently runs at 7.973Gb.
In any case, my bet would be on memory. Even if it did run ok before - I’d beef it up. Do an experiment, double the ram and see how it goes. You can always change it back.
If you are saying that they’d been running like that before then it could be something else…

DevJohn · 17 August 2021 04:37

These are the numbers when your nodes are “relaxed”, but they actually consume around 7GB when running under stress. So you need to upgrade your hardware.

Alexd1985 · 17 August 2021 07:09

Go to configuration file and set the TraceMempool to false save the file and restart the node

It’s starting now?

Ideal · 17 August 2021 09:33

Just woke up (after a couple of hours of updating, upgrading, checking and restarting the nodes and letting them sync and take their time). Both nodes seem to be working again and are stable once more. The hardware seems to be sufficient too.

I will keep the TraceMempool change in mind in case the nodes are behaving weird again.

Anton_io · 26 October 2021 22:48

Hi, I had the same issue when one of my nodes start restarting by itself. I had 8GB of memory and this is not enough at the current epoch (298). What helps me (and hopefully helps anyone in the future who has the same issue) is setting up a swap file.

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
sudo swapon --show

Topic		Replies	Views
Periodically restarting nodes? Operate a Stake Pool	15	1009	29 June 2021
Node never cleanly shuts down Operate a Stake Pool	11	770	30 April 2022
Relay - Failed with result 'signal' Operate a Stake Pool	8	700	29 March 2022
Issues stopping cardano-node Operate a Stake Pool	2	703	20 February 2021
New node hangs at about 70% and then wont start Setup a Stake Pool	4	506	14 August 2021

All nodes keep restarting?

Related topics