For some reason two relay nodes of mine keep restarting now. I seem them connect for a short amount and then disappear again. I checked the hardware usage and all seemed fine (the 2 nodes are running on 8vCPU, 8GB MEM, 80GB Disk each).
When I run the journal command I get the following on both nodes:
Aug 17 00:24:09 -: /opt/cardano/cnode/scripts/cnode.sh: line 57: 1549347 Killed cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG }" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --port ${CNODE_PORT} "${host_addr[@]}"
Aug 17 00:24:24 -: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Aug 17 00:24:26 -: Listening on http://0.0.0.0:12798
I tried shutting down the node and exporting a new socket file (I checked the folder path and it is correct).
This doesn’t seem to do the trick however. I checked the CPU usage after a couple of minutes (both of them are sitting at 100% when looking via top (the dashboard of the host is saying 25~30% however). I also see that since the nodes are offline the disks are reading quite a bit. However, there is no weird incoming traffic or behavior on the server.
I also checked the gLiveView.sh. The nodes remain on status “starting…”. What am I overlooking? How might I fix this issue?
It is very common for a node to restart if they lack memory. Two nodes on 8Gb will definitely restart.
A node process will consume around 7.5Gb - 8.5Gb of ram. So do the maths
If you have multiple relays on a single VM, provision at least 8Gb per instance. And don’t forget that the OS needs to eat too.
Thanks for answering. Each node has its own VM with 8GB. I don’t know how both nodes can run for months on end and just now both keep restarting at the same time. The nodes are in different places so it can’t be a physical connection/ error either.
It is probably just restarted and goes through the usual DB scan. Once it runs for a bit it will average at around 7.5. Mine currently runs at 7.973Gb.
In any case, my bet would be on memory. Even if it did run ok before - I’d beef it up. Do an experiment, double the ram and see how it goes. You can always change it back.
If you are saying that they’d been running like that before then it could be something else…
These are the numbers when your nodes are “relaxed”, but they actually consume around 7GB when running under stress. So you need to upgrade your hardware.
Just woke up (after a couple of hours of updating, upgrading, checking and restarting the nodes and letting them sync and take their time). Both nodes seem to be working again and are stable once more. The hardware seems to be sufficient too.
I will keep the TraceMempool change in mind in case the nodes are behaving weird again.
Hi, I had the same issue when one of my nodes start restarting by itself. I had 8GB of memory and this is not enough at the current epoch (298). What helps me (and hopefully helps anyone in the future who has the same issue) is setting up a swap file.