Cnode service reboots every 10 min on BP after 1.27.0 upgrade

Hello guys, I have upgraded to 1.27 version all was successful. BP started fine and was working for 3 days. last night starting at 3 am PST I noticed cnode service reboots every 10 min or so. Live view shows “Starting…” stays like that for 5 min then goes up process 30-50 transactions then goes to “starting…” again. Please advice where to start looking why service reboots it self.
2021-05-24 07_51_31-KAAdmin@SPCORENODE01_ _opt_cardano_cnode_scripts
2021-05-24 07_43_06-Untitled - Message (HTML)

Upgrade the nodes

BP and relays are all running 1.27

I meant upgrade the hardware for the nodes

What is the actual hardware configuration?

This is VM in azure, 4 vCPU 8GB RAM, all was running fine for 3 month.
2021-05-24 08_07_55-Window

It ran but starting with 1.27.0 u will need more resources, perhaps next version will consume less but till then… u will need the upgrade… or go to configuration file and set the TraceMempool=false

U will not see tx processed in glive but at least the server should not restart anymore

do you have specs ?

I am looking here Releases · input-output-hk/cardano-node · GitHub
looks like still the same. How do you know it needs more ?

  • An Intel or AMD x86 processor with two or more cores, at 1.6GHz or faster (2GHz or faster for a stake pool or relay)
  • 8GB of RAM
  • 10GB of free storage (20GB for a stake pool)

@Alexd1985 disabled on BP rebooted service waiting to come back up.

1 Like

@Alexd1985 TraceMempool disabled. CPU at 17% RAM at 37%. Service keeps on rebooting.

journalctl -e -f -u cardano-node

What file are logs going to ?

2021-05-24 09_07_53-KAAdmin@SPCORENODE01_ _opt_cardano_cnode_scripts

– Logs begin at Wed 2021-02-24 04:33:09 UTC. –
May 24 09:45:54 SPCORENODE01 cnode[1067]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 1841 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLOG Y}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCKE T_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key “$ {POOL_DIR}/${POOL_VRF_SK_FILENAME}” --shelley-operational-certificate “${POOL_DIR}/${POO L_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 09:45:53 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 09:45:53 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 09:45:59 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 09:45:59 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 1.
May 24 09:45:59 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 09:45:59 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 09:45:59 SPCORENODE01 cnode[13596]: WARN: A prior running Cardano node was not cl eanly shutdown, socket file still exists. Cleaning up.
May 24 09:46:01 SPCORENODE01 cnode[13596]: Listening on http://0.0.0.0:12798
May 24 09:55:27 SPCORENODE01 cnode[13596]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 14144 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLO GY}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCK ET_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key " ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate “${POOL_DIR}/${PO OL_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 09:55:27 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 09:55:27 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 09:55:32 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 09:55:32 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 2.
May 24 09:55:32 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 09:55:32 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 09:55:33 SPCORENODE01 cnode[1974]: WARN: A prior running Cardano node was not cle anly shutdown, socket file still exists. Cleaning up.
May 24 09:55:35 SPCORENODE01 cnode[1974]: Listening on http://0.0.0.0:12798
May 24 10:05:05 SPCORENODE01 cnode[1974]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 2459 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLOG Y}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCKE T_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key “$ {POOL_DIR}/${POOL_VRF_SK_FILENAME}” --shelley-operational-certificate “${POOL_DIR}/${POO L_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 10:05:05 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 10:05:05 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 10:05:11 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 10:05:11 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 3.
May 24 10:05:11 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:05:11 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 10:05:11 SPCORENODE01 cnode[22898]: WARN: A prior running Cardano node was not cl eanly shutdown, socket file still exists. Cleaning up.
May 24 10:05:13 SPCORENODE01 cnode[22898]: Listening on http://0.0.0.0:12798
May 24 10:14:42 SPCORENODE01 cnode[22898]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 23478 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLO GY}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCK ET_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key " ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate “${POOL_DIR}/${PO OL_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 10:14:42 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 10:14:42 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 10:14:47 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 10:14:47 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 4.
May 24 10:14:47 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:14:47 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 10:14:48 SPCORENODE01 cnode[11440]: WARN: A prior running Cardano node was not cl eanly shutdown, socket file still exists. Cleaning up.
May 24 10:14:50 SPCORENODE01 cnode[11440]: Listening on http://0.0.0.0:12798
May 24 10:24:08 SPCORENODE01 cnode[11440]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 11969 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLO GY}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCK ET_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key " ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate “${POOL_DIR}/${PO OL_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 10:24:08 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 10:24:08 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 10:24:13 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 10:24:13 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 5.
May 24 10:24:13 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:24:13 SPCORENODE01 systemd[1]: Started Cardano Node.

May 24 15:18:46 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:18:47 SPCORENODE01 cnode[32392]: WARN: A prior running Cardano node was not cl eanly shutdown, socket file still exists. Cleaning up.
May 24 15:18:48 SPCORENODE01 cnode[32392]: Listening on http://0.0.0.0:12798
May 24 15:28:20 SPCORENODE01 cnode[32392]: /opt/cardano/cnode/scripts/cnode.sh: line 57: 459 Killed cardano-node “${CPU_RUNTIME[@]}” run --topology “${TOPOLO GY}” --config “${CONFIG}” --database-path “${DB_DIR}” --socket-path “${CARDANO_NODE_SOCK ET_PATH}” --shelley-kes-key “${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}” --shelley-vrf-key " ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate “${POOL_DIR}/${PO OL_OPCERT_FILENAME}” --port ${CNODE_PORT} “${host_addr[@]}”
May 24 15:28:20 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited , status=137/n/a
May 24 15:28:20 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘exit-code’.
May 24 15:28:25 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche duling restart.
May 24 15:28:25 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c ounter is at 8.
May 24 15:28:25 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 15:28:25 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:28:26 SPCORENODE01 cnode[29347]: WARN: A prior running Cardano node was not cl eanly shutdown, socket file still exists. Cleaning up.
May 24 15:28:28 SPCORENODE01 cnode[29347]: Listening on http://0.0.0.0:12798
May 24 15:30:54 SPCORENODE01 systemd[1]: Stopping Cardano Node…
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: State ‘stop-sigterm’ timed out. Killing.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29347 (cnode.sh) with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29966 (cardano-n ode) with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=killed , status=9/KILL
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29966 (cardano-n ode) with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘timeout’.
May 24 15:30:59 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 15:30:59 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:31:01 SPCORENODE01 cnode[3283]: Listening on http://0.0.0.0:12798
May 24 15:33:07 SPCORENODE01 systemd[1]: Stopping Cardano Node…
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: State ‘stop-sigterm’ timed out. Killing.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3283 (cnode.sh) with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749 (cardano-no de) with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=killed , status=9/KILL
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749 (cardano-no de) with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Failed with result ‘timeout’.
May 24 15:33:12 SPCORENODE01 systemd[1]: Stopped Cardano Node.
– Reboot –

Something is killing the node

May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749

How do i know which process is that ?

Try to restart the server, I don’t know why on cacti the mem + swap is full

Rebooted many times does not help.
SWAP file is n/A I do not have that file. this is all good.
One of the DISK is full this is temp DISK just the way Azure works
My production drive at 92%

Do u have the possibility to update one node for test?