Cnode service reboots every 10 min on BP after 1.27.0 upgrade

Hello guys, I have upgraded to 1.27 version all was successful. BP started fine and was working for 3 days. last night starting at 3 am PST I noticed cnode service reboots every 10 min or so. Live view shows “Starting…” stays like that for 5 min then goes up process 30-50 transactions then goes to “starting…” again. Please advice where to start looking why service reboots it self.
2021-05-24 07_51_31-KAAdmin@SPCORENODE01_ _opt_cardano_cnode_scripts
2021-05-24 07_43_06-Untitled - Message (HTML)

Upgrade the nodes

BP and relays are all running 1.27

I meant upgrade the hardware for the nodes

What is the actual hardware configuration?

This is VM in azure, 4 vCPU 8GB RAM, all was running fine for 3 month.
2021-05-24 08_07_55-Window

It ran but starting with 1.27.0 u will need more resources, perhaps next version will consume less but till then… u will need the upgrade… or go to configuration file and set the TraceMempool=false

U will not see tx processed in glive but at least the server should not restart anymore

do you have specs ?

I am looking here Releases · input-output-hk/cardano-node · GitHub
looks like still the same. How do you know it needs more ?

  • An Intel or AMD x86 processor with two or more cores, at 1.6GHz or faster (2GHz or faster for a stake pool or relay)
  • 8GB of RAM
  • 10GB of free storage (20GB for a stake pool)

@Alexd1985 disabled on BP rebooted service waiting to come back up.

1 Like

@Alexd1985 TraceMempool disabled. CPU at 17% RAM at 37%. Service keeps on rebooting.

journalctl -e -f -u cardano-node

What file are logs going to ?

2021-05-24 09_07_53-KAAdmin@SPCORENODE01_ _opt_cardano_cnode_scripts

-- Logs begin at Wed 2021-02-24 04:33:09 UTC. --
May 24 09:45:54 SPCORENODE01 cnode[1067]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                          1841 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOG                                                                        Y}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKE                                                                        T_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "$                                                                        {POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POO                                                                        L_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 09:45:53 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 09:45:53 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 09:45:59 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 09:45:59 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 1.
May 24 09:45:59 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 09:45:59 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 09:45:59 SPCORENODE01 cnode[13596]: WARN: A prior running Cardano node was not cl                                                                        eanly shutdown, socket file still exists. Cleaning up.
May 24 09:46:01 SPCORENODE01 cnode[13596]: Listening on http://0.0.0.0:12798
May 24 09:55:27 SPCORENODE01 cnode[13596]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                         14144 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLO                                                                        GY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCK                                                                        ET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "                                                                        ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${PO                                                                        OL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 09:55:27 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 09:55:27 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 09:55:32 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 09:55:32 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 2.
May 24 09:55:32 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 09:55:32 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 09:55:33 SPCORENODE01 cnode[1974]: WARN: A prior running Cardano node was not cle                                                                        anly shutdown, socket file still exists. Cleaning up.
May 24 09:55:35 SPCORENODE01 cnode[1974]: Listening on http://0.0.0.0:12798
May 24 10:05:05 SPCORENODE01 cnode[1974]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                          2459 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLOG                                                                        Y}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCKE                                                                        T_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "$                                                                        {POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${POO                                                                        L_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 10:05:05 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 10:05:05 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 10:05:11 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 10:05:11 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 3.
May 24 10:05:11 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:05:11 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 10:05:11 SPCORENODE01 cnode[22898]: WARN: A prior running Cardano node was not cl                                                                        eanly shutdown, socket file still exists. Cleaning up.
May 24 10:05:13 SPCORENODE01 cnode[22898]: Listening on http://0.0.0.0:12798
May 24 10:14:42 SPCORENODE01 cnode[22898]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                         23478 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLO                                                                        GY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCK                                                                        ET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "                                                                        ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${PO                                                                        OL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 10:14:42 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 10:14:42 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 10:14:47 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 10:14:47 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 4.
May 24 10:14:47 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:14:47 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 10:14:48 SPCORENODE01 cnode[11440]: WARN: A prior running Cardano node was not cl                                                                        eanly shutdown, socket file still exists. Cleaning up.
May 24 10:14:50 SPCORENODE01 cnode[11440]: Listening on http://0.0.0.0:12798
May 24 10:24:08 SPCORENODE01 cnode[11440]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                         11969 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLO                                                                        GY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCK                                                                        ET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "                                                                        ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${PO                                                                        OL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 10:24:08 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 10:24:08 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 10:24:13 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 10:24:13 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 5.
May 24 10:24:13 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 10:24:13 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:18:46 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:18:47 SPCORENODE01 cnode[32392]: WARN: A prior running Cardano node was not cl                                                                        eanly shutdown, socket file still exists. Cleaning up.
May 24 15:18:48 SPCORENODE01 cnode[32392]: Listening on http://0.0.0.0:12798
May 24 15:28:20 SPCORENODE01 cnode[32392]: /opt/cardano/cnode/scripts/cnode.sh: line 57:                                                                           459 Killed                  cardano-node "${CPU_RUNTIME[@]}" run --topology "${TOPOLO                                                                        GY}" --config "${CONFIG}" --database-path "${DB_DIR}" --socket-path "${CARDANO_NODE_SOCK                                                                        ET_PATH}" --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" --shelley-vrf-key "                                                                        ${POOL_DIR}/${POOL_VRF_SK_FILENAME}" --shelley-operational-certificate "${POOL_DIR}/${PO                                                                        OL_OPCERT_FILENAME}" --port ${CNODE_PORT} "${host_addr[@]}"
May 24 15:28:20 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=exited                                                                        , status=137/n/a
May 24 15:28:20 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'exit-code'.
May 24 15:28:25 SPCORENODE01 systemd[1]: cnode.service: Service hold-off time over, sche                                                                        duling restart.
May 24 15:28:25 SPCORENODE01 systemd[1]: cnode.service: Scheduled restart job, restart c                                                                        ounter is at 8.
May 24 15:28:25 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 15:28:25 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:28:26 SPCORENODE01 cnode[29347]: WARN: A prior running Cardano node was not cl                                                                        eanly shutdown, socket file still exists. Cleaning up.
May 24 15:28:28 SPCORENODE01 cnode[29347]: Listening on http://0.0.0.0:12798
May 24 15:30:54 SPCORENODE01 systemd[1]: Stopping Cardano Node...
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: State 'stop-sigterm' timed out.                                                                         Killing.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29347 (cnode.sh)                                                                         with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29966 (cardano-n                                                                        ode) with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=killed                                                                        , status=9/KILL
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Killing process 29966 (cardano-n                                                                        ode) with signal SIGKILL.
May 24 15:30:59 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'timeout'.
May 24 15:30:59 SPCORENODE01 systemd[1]: Stopped Cardano Node.
May 24 15:30:59 SPCORENODE01 systemd[1]: Started Cardano Node.
May 24 15:31:01 SPCORENODE01 cnode[3283]: Listening on http://0.0.0.0:12798
May 24 15:33:07 SPCORENODE01 systemd[1]: Stopping Cardano Node...
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: State 'stop-sigterm' timed out.                                                                         Killing.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3283 (cnode.sh)                                                                         with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749 (cardano-no                                                                        de) with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Main process exited, code=killed                                                                        , status=9/KILL
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749 (cardano-no                                                                        de) with signal SIGKILL.
May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Failed with result 'timeout'.
May 24 15:33:12 SPCORENODE01 systemd[1]: Stopped Cardano Node.
-- Reboot --

Something is killing the node

May 24 15:33:12 SPCORENODE01 systemd[1]: cnode.service: Killing process 3749

How do i know which process is that ?

Try to restart the server, I don’t know why on cacti the mem + swap is full

Rebooted many times does not help.
SWAP file is n/A I do not have that file. this is all good.
One of the DISK is full this is temp DISK just the way Azure works
My production drive at 92%

Do u have the possibility to update one node for test?