CNTools systemd service keeps crashing / restarting node

Hi everyone,

Been struggling with this issue for a little while. When I deploy the CNTools systemd service to my block producers (seems to work fine on the relays) the node will restart every ten minutes or so. The logs don’t report anything out of the ordinary and when I start the node manually it works fine.

The only error I get is the following “Syntax error: standard in line 1” (or something like that) overlayed in the gLiveView. Wondering if anyone has experienced the same.

Also, when the node crashes, it can take up to 45 min to restart. This time goes down to a few minutes if the node is properly shutdown (with SIGINT) but still seems abnormally long. Not sure if these issues are related

Would appreciate any suggestions for debugging!

Thanks,
Aleks

journalctl -e -f -u cnode

What is the hardware configuration?

Hardware: 8 GB RAM, 4 CPUs and 160 GB SSD

Nothing in those logs, just me manually stopping the service:

Jun 04 15:17:28 blockproducer-toronto cnode[1164467]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:17:33 blockproducer-toronto cnode[1164705]: Listening on http://127.0.0.1:12798
Jun 04 15:28:14 blockproducer-toronto cnode[1164467]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 1164705 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:28:24 blockproducer-toronto cnode[1179189]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:28:30 blockproducer-toronto cnode[1179559]: Listening on http://127.0.0.1:12798

Ok, start the node, and open glive; what is the output?

And what do u mean by “ When I deploy the CNTools systemd service to my block producers” ?

I meant when I use the deploy-as-systemd.sh script on my block producers (I manage two pools). Haven’t had this problem on any of the four relays

So just now I was running ./cnode.sh in tmux. The node has been up for the last two hours. I stopped it using killall -s SIGINT cardano-node and restarted the cnode service with sudo systemctl restart cnode. The node went live in less than a minute and runs perfectly fine for about 4 minutes before the following suddenly displays in gLiveView and the node restarts:

Screenshot from 2021-06-04 16-32-44

It then takes anywhere from 10 to 40 minutes for the node to start again.

I’ve re-downloaded the scripts with ./prereqs.sh several times now and even re-built the block-producer from scratch but the problem persists.

Logs for the past day:

Jun 04 15:21:26 blockproducer-frankfurt cnode[3354]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:21:28 blockproducer-frankfurt cnode[3621]: Listening on http://127.0.0.1:12798
Jun 04 15:25:54 blockproducer-frankfurt cnode[3354]: /opt/cardano/cnode/scripts/cnode.sh: line 57:  3621 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:26:01 blockproducer-frankfurt cnode[12878]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:26:02 blockproducer-frankfurt cnode[13164]: Listening on http://127.0.0.1:12798
Jun 04 15:38:56 blockproducer-frankfurt cnode[12878]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 13164 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:39:02 blockproducer-frankfurt cnode[44073]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:39:04 blockproducer-frankfurt cnode[44362]: Listening on http://127.0.0.1:12798
Jun 04 15:48:53 blockproducer-frankfurt cnode[44362]: Shutting down..
Jun 04 20:07:53 blockproducer-frankfurt cnode[541857]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:07:54 blockproducer-frankfurt cnode[542123]: Listening on http://127.0.0.1:12798
Jun 04 20:13:01 blockproducer-frankfurt cnode[541857]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 542123 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:13:07 blockproducer-frankfurt cnode[548154]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:13:08 blockproducer-frankfurt cnode[548457]: Listening on http://127.0.0.1:12798
Jun 04 20:25:20 blockproducer-frankfurt cnode[562284]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:25:21 blockproducer-frankfurt cnode[562539]: Listening on http://127.0.0.1:12798
Jun 04 20:29:27 blockproducer-frankfurt cnode[562284]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 562539 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:29:33 blockproducer-frankfurt cnode[571320]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:29:34 blockproducer-frankfurt cnode[571643]: Listening on http://127.0.0.1:12798```

try journalctl -e -f -u cnode

so, on this server how many nodes do u run?

u said that u manage 2 pools… both on this server?

No I run separate servers for each pool.

This is the output from journal -e -f -u cnode. It actually looks like something is killing the process. The last two times it died here wasn’t anything I did.

Jun 04 20:25:21 blockproducer-frankfurt cnode[562539]: Listening on http://127.0.0.1:12798
Jun 04 20:29:27 blockproducer-frankfurt cnode[562284]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 562539 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:29:33 blockproducer-frankfurt cnode[571320]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:29:34 blockproducer-frankfurt cnode[571643]: Listening on http://127.0.0.1:12798
Jun 04 20:42:02 blockproducer-frankfurt cnode[571320]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 571643 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:42:08 blockproducer-frankfurt cnode[590145]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:42:10 blockproducer-frankfurt cnode[590404]: Listening on http://127.0.0.1:12798
Jun 04 20:54:16 blockproducer-frankfurt cnode[590145]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 590404 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:54:23 blockproducer-frankfurt cnode[600026]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:54:24 blockproducer-frankfurt cnode[600293]: Listening on http://127.0.0.1:12798

Familiar with linux but not a sysadmin so not sure how to figure out whats killing it.

Could be MEM issues… open top or htop and monitor the MEM while cntools is starting/running

Ok it looks like you’re on to something. When the node starts the server has mem usage of about 4.11GB. Then suddenly that jumps to well over 7GB.

Note that this doesn’t happen when NOT using systemd services (ie just running ./cnode.sh). I also noticed that fail2ban is using up to 10% cpu on the server running systemd right now.

1.27.0 use over 7G of RAM

Ahh that makes sense. This did start becoming an issue after the upgrade.

So the reason it works with just ./cnode.sh is because it doesn’t have other services running in the background?

With smart contracts coming out what do you think sufficient hardware configuration will be for the next 6 - 12 months?

try increasing the size of your swap file, it wont be fast but might stop crashing also whenever you see “did not shut down cleanly” make sure you kill the node process ps -aux will showall processes then kill -9 pid will kill the process then start the node again - you dont want that error message in your logs