A couple of problems getting mainnet node started

For about a month I had no issues running a bp+2 relays on testnet 2 cpu 4GB ram, part of that time on 1.27.0. I’ve now set up a node (passive for now) to run against mainnet. Using the 1.27.0 version of cardano-node. First I noticed that while synching, the memory usage would increase to the point that the OS oom facility would kill cardano-node and I had to start over. I was running on 2 dedicated cpu’s and 4gb ram, so I upped the RAM to 8GB. Process then was not killed by oom and I was able to make it to the Mary era according to the tip command (epoch 260). I decided to shutdown the process so I could change the logging settings with the command “killall -s SIGKILL cardano-node”, thinking that might be a graceful termination of the node. After restarting, it doesn’t work, no loggng and when running the tip command I get the error about the socket doesn’t exist (db/node.socket disappeared and is not created when I tried to restart). So, a couple of questions:

  1. I did see Relay node syncing issue - 1.27.0 - #6 by laplasz which talked about memory when it comes to version 1.27.0. Will it be normal for cardano-node to run over 40% memory (of 8GB ram) while it catches up on the syncing, and will that memory usage go down after it has caught up with the epoch?

  2. Is there a way to cleanly shutdown cardano-node without causing problems for the restart? I’m wondering if my killall SIGKILL method corrupted the db

Hi,

Quick question, on which testnet are you? https://explorer.cardano-testnet.iohkdev.io/en currently we are on Epoch 136.

8gb should be the minimum, on main I would go with 12+

The best way to start/stop cardano node is by running it as a service (check forum for details or setup it as here: Launching your Cardano BP node! - Cardano Node Installation and Configuration Guide)

In worst case scenario you can try killall with SIGINTsignal first before going for SIGKILL (after this you will need to re-check the DB)

Check the log files, that will give you a clearer picture on what is happening with the cardano node process

Hope this helps

Regarding shutdown - SIGINT is the correct way to shutdown. Just for the next time :slight_smile:

Thanks @jf3110 and @lauris for the quick responses. I had been running against testnet-magic 1097911063. I had shutdown all three testnet nodes several days ago, then today fired up one of the relay nodes to see how it did catching up. Didn’t take long to catch up

@RelayNode2:/var/log$ cardano-cli query tip --testnet-magic 1097911063
{
“epoch”: 136,
“hash”: “c5a2d8c0f6c8112f71eac9386aad50d4ccb8278d2152417165ff3155f579d9a8”,
“slot”: 28545135,
“block”: 2645803,
“era”: “Mary”
}

and running at 32% memory (4GB max ram). So maybe that was typical of the testnet relay node running at that much memory. Back to the mainnet passive node (to become a bp node if I decide to make it real), I’ve left it running after the last restart and it’s running at 16.7% of the 8gb ram. Still running at high cpu. Currently at epoch 223. I’m hoping the cpu will go down after syncing is completed and that the memory usage stops inching upwards.

I had run the testnet nodes as a service (systemd), but when I tried using systemd to run the mainnet node, I kept getting an error that a libsodium library could not be found. So something in the profile or settings that’s causing that, so in the meantime just to gauge performance I’ve run the node as a background process with nohup.

Thanks for the link to the other guide. I had been using cardano-node — cardano-node Documentation 1.0.0 documentation.

For some reason IOG uses an extended version of lilbsodium from their own github repo. Depending on your setup you should set LD_LIBRARY_PATH to the directory of this lib. On some systems it might be in /usr/local/lib - or in $HOME/.local/lib.

@jf3110 and @lauris an update. Got the mainnet node running and it finally synced up. CPU usage dropped way down after it finished catching up but memory stayed high, but at least it stayed steady. I then switched from running adhoc via nohup to running under systemd. Got past the error with the libsodium library not found (corrected the path to the .profile file in the startup script). However, nothing is being logged in node.log on this second startup, and no errors in var/log/syslog, and I get this error when trying to query the tip

bp-node:~/cardano-node$ cardano-cli query tip --mainnet

cardano-cli: Network.Socket.connect: <socket: 11>: does not exist (No such file or directory)arbest@bp-node:~/cardano-node

Prior to the second start of the node from systemd, I had to stop the node via systemd to make a change but looks like the graceful shutdown failed and ended up as a kill
Jun  6 21:01:46 bp-node systemd[1]: Started Cardano Pool producer node.
Jun  6 21:05:47 bp-node systemd[1]: Stopping Cardano Pool producer node...
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: State 'stop-sigterm' timed out. Killing.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1085 (bash) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1087 (cardano-node) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1088 (ghc_ticker) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1090 (n/a) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1091 (cardano-node:w) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1092 (n/a) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1093 (cardano-node:w) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1094 (cardano-node:w) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Killing process 1095 (cardano-node:w) with signal SIGKILL.
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Main process exited, code=killed, status=9/KILL
Jun  6 21:07:17 bp-node systemd[1]: cardano-restart.service: Failed with result 'timeout'.

Kill signal and restart kill signal are set to SIGINT in the systemd file

Does this mean I need to delete the db directory and start over and resync?

What is the hardware configuration? 1.27.0 requires ~8G RAM

16GB ram, 2 dedicated cpus 2.30 ghz. I had shutdown the service down for an hour to avoid run costs to work on another node, and decided just now to restart, and it’s working. The server restart kicked off the cardano process and now I’m able to query the tip. However, it’s not logging anything, even though I have these settings in the config

 "minSeverity": "Info",

  "setupScribes": 
    {
      "scFormat": "ScText",
      "scKind": "FileSK",
      "scName": "/home/xxxxxxx/cardano-node/logs/node.log"
    }
  ]

and this setting in the systemd file

StandardOutput=append:/home/xxxxx/cardano-node/logs/node.log
StandardError=append:/home/xxxxxx/cardano-node/logs/node.log

Can you tell if I’m missing something with the file logging set up?

Logging to file figured out. I just had to go back to the testnet config file to check the settings. Made sure that config had these properties

  "defaultScribes": [
    [
      "FileSK",
      "/home/xxxxxx/cardano-node/logs/node.log"
    ]
  ],

  "setupScribes": [
    {
      "scFormat": "ScText",
      "scKind": "FileSK",
      "scName": "/home/xxxxxx/cardano-node/logs/node.log"
    }
  ]