CNTools systemd service keeps crashing / restarting node

adjuric · 4 June 2021 15:46

Hi everyone,

Been struggling with this issue for a little while. When I deploy the CNTools systemd service to my block producers (seems to work fine on the relays) the node will restart every ten minutes or so. The logs don’t report anything out of the ordinary and when I start the node manually it works fine.

The only error I get is the following “Syntax error: standard in line 1” (or something like that) overlayed in the gLiveView. Wondering if anyone has experienced the same.

Also, when the node crashes, it can take up to 45 min to restart. This time goes down to a few minutes if the node is properly shutdown (with SIGINT) but still seems abnormally long. Not sure if these issues are related

Would appreciate any suggestions for debugging!

Thanks,
Aleks

Alexd1985 · 4 June 2021 17:22

journalctl -e -f -u cnode

What is the hardware configuration?

adjuric · 4 June 2021 17:36

Hardware: 8 GB RAM, 4 CPUs and 160 GB SSD

Nothing in those logs, just me manually stopping the service:

Jun 04 15:17:28 blockproducer-toronto cnode[1164467]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:17:33 blockproducer-toronto cnode[1164705]: Listening on http://127.0.0.1:12798
Jun 04 15:28:14 blockproducer-toronto cnode[1164467]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 1164705 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:28:24 blockproducer-toronto cnode[1179189]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:28:30 blockproducer-toronto cnode[1179559]: Listening on http://127.0.0.1:12798

Alexd1985 · 4 June 2021 17:44

Ok, start the node, and open glive; what is the output?

And what do u mean by “ When I deploy the CNTools systemd service to my block producers” ?

adjuric · 4 June 2021 20:37

I meant when I use the deploy-as-systemd.sh script on my block producers (I manage two pools). Haven’t had this problem on any of the four relays

So just now I was running ./cnode.sh in tmux. The node has been up for the last two hours. I stopped it using killall -s SIGINT cardano-node and restarted the cnode service with sudo systemctl restart cnode. The node went live in less than a minute and runs perfectly fine for about 4 minutes before the following suddenly displays in gLiveView and the node restarts:

Screenshot from 2021-06-04 16-32-44

It then takes anywhere from 10 to 40 minutes for the node to start again.

I’ve re-downloaded the scripts with ./prereqs.sh several times now and even re-built the block-producer from scratch but the problem persists.

Logs for the past day:

Jun 04 15:21:26 blockproducer-frankfurt cnode[3354]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:21:28 blockproducer-frankfurt cnode[3621]: Listening on http://127.0.0.1:12798
Jun 04 15:25:54 blockproducer-frankfurt cnode[3354]: /opt/cardano/cnode/scripts/cnode.sh: line 57:  3621 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:26:01 blockproducer-frankfurt cnode[12878]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:26:02 blockproducer-frankfurt cnode[13164]: Listening on http://127.0.0.1:12798
Jun 04 15:38:56 blockproducer-frankfurt cnode[12878]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 13164 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 15:39:02 blockproducer-frankfurt cnode[44073]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 15:39:04 blockproducer-frankfurt cnode[44362]: Listening on http://127.0.0.1:12798
Jun 04 15:48:53 blockproducer-frankfurt cnode[44362]: Shutting down..
Jun 04 20:07:53 blockproducer-frankfurt cnode[541857]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:07:54 blockproducer-frankfurt cnode[542123]: Listening on http://127.0.0.1:12798
Jun 04 20:13:01 blockproducer-frankfurt cnode[541857]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 542123 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:13:07 blockproducer-frankfurt cnode[548154]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:13:08 blockproducer-frankfurt cnode[548457]: Listening on http://127.0.0.1:12798
Jun 04 20:25:20 blockproducer-frankfurt cnode[562284]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:25:21 blockproducer-frankfurt cnode[562539]: Listening on http://127.0.0.1:12798
Jun 04 20:29:27 blockproducer-frankfurt cnode[562284]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 562539 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:29:33 blockproducer-frankfurt cnode[571320]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:29:34 blockproducer-frankfurt cnode[571643]: Listening on http://127.0.0.1:12798```

Alexd1985 · 4 June 2021 20:41

try journalctl -e -f -u cnode

so, on this server how many nodes do u run?

Alexd1985 · 4 June 2021 20:46

u said that u manage 2 pools… both on this server?

adjuric · 4 June 2021 21:03

No I run separate servers for each pool.

This is the output from journal -e -f -u cnode. It actually looks like something is killing the process. The last two times it died here wasn’t anything I did.

Jun 04 20:25:21 blockproducer-frankfurt cnode[562539]: Listening on http://127.0.0.1:12798
Jun 04 20:29:27 blockproducer-frankfurt cnode[562284]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 562539 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:29:33 blockproducer-frankfurt cnode[571320]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:29:34 blockproducer-frankfurt cnode[571643]: Listening on http://127.0.0.1:12798
Jun 04 20:42:02 blockproducer-frankfurt cnode[571320]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 571643 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:42:08 blockproducer-frankfurt cnode[590145]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:42:10 blockproducer-frankfurt cnode[590404]: Listening on http://127.0.0.1:12798
Jun 04 20:54:16 blockproducer-frankfurt cnode[590145]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 590404 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOGY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
Jun 04 20:54:23 blockproducer-frankfurt cnode[600026]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Jun 04 20:54:24 blockproducer-frankfurt cnode[600293]: Listening on http://127.0.0.1:12798

Familiar with linux but not a sysadmin so not sure how to figure out whats killing it.

Alexd1985 · 4 June 2021 21:21

Could be MEM issues… open top or htop and monitor the MEM while cntools is starting/running

adjuric · 4 June 2021 21:41

Ok it looks like you’re on to something. When the node starts the server has mem usage of about 4.11GB. Then suddenly that jumps to well over 7GB.

Note that this doesn’t happen when NOT using systemd services (ie just running ./cnode.sh). I also noticed that fail2ban is using up to 10% cpu on the server running systemd right now.

Alexd1985 · 4 June 2021 21:52

1.27.0 use over 7G of RAM

adjuric · 4 June 2021 21:55

Ahh that makes sense. This did start becoming an issue after the upgrade.

So the reason it works with just ./cnode.sh is because it doesn’t have other services running in the background?

adjuric · 4 June 2021 22:21

With smart contracts coming out what do you think sufficient hardware configuration will be for the next 6 - 12 months?

santonode · 15 June 2021 17:23

try increasing the size of your swap file, it wont be fast but might stop crashing also whenever you see “did not shut down cleanly” make sure you kill the node process ps -aux will showall processes then kill -9 pid will kill the process then start the node again - you dont want that error message in your logs

Topic		Replies	Views
My producer Node cnode.service restarts itself every 60 min Setup a Stake Pool	21	961	20 February 2021
Issues stopping cardano-node Operate a Stake Pool	2	703	20 February 2021
My relay keep restarting every 24 hours Setup a Stake Pool	30	1460	7 May 2021
Relay - Failed with result 'signal' Operate a Stake Pool	8	700	29 March 2022
Restarted, backuped servers (not working now) infinite starting loop Setup a Stake Pool	23	863	28 August 2021

CNTools systemd service keeps crashing / restarting node

Related topics