Node stops syncing after a while

I’m seeing issues where after I start my node, after some time (10-20min) the node will just stop syncing. I’m trying to sync a new node, so it has a lot of blocks to sync.

I’ve run network stats and I see that the external connections just stop. No issues with free mem at that time, and cpu isn’t that high. CPU temp is around 50c-55c.
I ran a pcap and I see that the last packet in the stream is an ACK from my pi.

It’s literally as if the relays i’m syncing from just stop sending new data.

Setup:
Raspberry Pi 4 8g mem
20g swap
Cardano-Node 1.25.1

Not seeing anything useful in the logs, just shows the last block synced.

Any ideas?

1 Like

Possibly related, I am running daedalus on my home PC, and if I leave daedalus open long enough 30minutes or so, it will eventually stop syncing as well and i’ll get an error. Closing it and restarting it fixes this error.

Do you have systemctl / systemd setup?

Yes, @Anti.biz
@pi-relay-node:/opt/cardano/cnode/scripts# systemctl status cnode
● cnode.service - Cardano Node
Loaded: loaded (/etc/systemd/system/cnode.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-03-01 14:00:20 CST; 2h 14min ago
Main PID: 359320 (cnode.sh)
Tasks: 17 (limit: 4205)
CGroup: /system.slice/cnode.service
├─359320 /bin/bash /opt/cardano/cnode/scripts/cnode.sh
└─359415 cardano-node run --topology /opt/cardano/cnode/files/topology.json --config /opt/cardano/c>

Mar 01 14:00:20 pi-relay-node systemd[1]: Started Cardano Node.
Mar 01 14:00:23 pi-relay-node cnode[359320]: WARN: A prior running Cardano node was not cleanly shutdown, socket>
Mar 01 14:00:27 pi-relay-node cnode[359415]: Listening on http://127.0.0.1:12798

This is what mine looks like.

I’m not using cntools. What is your setup like, how many raspberry pis do you have? 1 for block producer 1 for relay?

post me your ENV

bottom of your .bashrc

and your ./startRelayNode.sh ./startBlockProducer.sh

configs (make sure to hide ips/usernames, etc)

have you tried syncing the node without the scripts? Just use the standard ./startBlockProducer.sh (im not sure if you have that, its in the Coincashew guide, I dont know where people get the cntools guide)

env

#!/usr/bin/env bash
# shellcheck disable=SC2034,SC2086,SC2230,SC2009,SC2206,SC2062,SC2059

######################################
# User Variables - Change as desired #
# Leave as is if unsure              #
######################################

#CCLI="${HOME}/.cabal/bin/cardano-cli"                  # Override automatic detection of path to cardano-cli executable
#CNCLI="${HOME}/.cargo/bin/cncli"                       # Override automatic detection of path to cncli executable (https://github.com/AndrewWestberg/cncli)
#CNODE_HOME="/opt/cardano/cnode"                        # Override default CNODE_HOME path (defaults to /opt/cardano/cnode)
CNODE_PORT=6001                                         # Set node port
#CONFIG="${CNODE_HOME}/files/config.json"               # Override automatic detection of node config path
#SOCKET="${CNODE_HOME}/sockets/node0.socket"            # Override automatic detection of path to socket
#TOPOLOGY="${CNODE_HOME}/files/topology.json"           # Override default topology.json path
#LOG_DIR="${CNODE_HOME}/logs"                           # Folder where your logs will be sent to (must pre-exist)
#DB_DIR="${CNODE_HOME}/db"                              # Folder to store the cardano-node blockchain db
#EKG_HOST=127.0.0.1                                     # Set node EKG host IP
#EKG_PORT=12788                                         # Override automatic detection of node EKG port
#PROM_HOST=127.0.0.1                                    # Set node Prometheus host IP
#PROM_PORT=12798                                        # Override automatic detection of node Prometheus port
#EKG_TIMEOUT=3                                          # Maximum time in seconds that you allow EKG request to take before aborting (node metrics)
#CURL_TIMEOUT=10                                        # Maximum time in seconds that you allow curl file download to take before aborting (GitHub update process)
#BLOCKLOG_DIR="${CNODE_HOME}/guild-db/blocklog"         # Override default directory used to store block data for core node
#BLOCKLOG_TZ="UTC"                                      # TimeZone to use when displaying blocklog - https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
#SHELLEY_TRANS_EPOCH=208                                # Override automatic detection of shelley epoch start, e.g 208 for mainnet
#TG_BOT_TOKEN=""                                        # Uncomment and set to enable telegramSend function. To create your own BOT-token and Chat-Id follow guide at:
#TG_CHAT_ID=""                                          # https://cardano-community.github.io/guild-operators/#/Scripts/sendalerts
#USE_EKG="N"                                            # Use EKG metrics from the node instead of Promethus. Promethus metrics(default) should yield slightly better performance

#WALLET_FOLDER="${CNODE_HOME}/priv/wallet"              # Root folder for Wallets
#POOL_FOLDER="${CNODE_HOME}/priv/pool"                  # Root folder for Pools
                                                        # Each wallet and pool has a friendly name and subfolder containing all related keys, certificates, ...
#POOL_NAME=""                                           # Set the pool's name to run node as a core node (the name, NOT the ticker, ie folder name)

#WALLET_PAY_VK_FILENAME="payment.vkey"                  # Standardized names for all wallet related files
#WALLET_PAY_SK_FILENAME="payment.skey"
#WALLET_HW_PAY_SK_FILENAME="payment.hwsfile"
#WALLET_PAY_ADDR_FILENAME="payment.addr"
#WALLET_BASE_ADDR_FILENAME="base.addr"
#WALLET_STAKE_VK_FILENAME="stake.vkey"
#WALLET_STAKE_SK_FILENAME="stake.skey"
#WALLET_HW_STAKE_SK_FILENAME="stake.hwsfile"
#WALLET_STAKE_ADDR_FILENAME="reward.addr"
#WALLET_STAKE_CERT_FILENAME="stake.cert"
#WALLET_STAKE_DEREG_FILENAME="stake.dereg"
#WALLET_DELEGCERT_FILENAME="delegation.cert"

#POOL_ID_FILENAME="pool.id"                             # Standardized names for all pool related files
#POOL_HOTKEY_VK_FILENAME="hot.vkey"
#POOL_HOTKEY_SK_FILENAME="hot.skey"
#POOL_COLDKEY_VK_FILENAME="cold.vkey"
#POOL_COLDKEY_SK_FILENAME="cold.skey"
#POOL_OPCERT_COUNTER_FILENAME="cold.counter"
#POOL_OPCERT_FILENAME="op.cert"
#POOL_VRF_VK_FILENAME="vrf.vkey"
#POOL_VRF_SK_FILENAME="vrf.skey"
#POOL_CONFIG_FILENAME="pool.config"
#POOL_REGCERT_FILENAME="pool.cert"
#POOL_CURRENT_KES_START="kes.start"
#POOL_DEREGCERT_FILENAME="pool.dereg"

bashrc

# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything
[ -z "$PS1" ] && return

# don't put duplicate lines in the history. See bash(1) for more options
# ... or force ignoredups and ignorespace
HISTCONTROL=ignoredups:ignorespace

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=1000
HISTFILESIZE=2000

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

# set variable identifying the chroot you work in (used in the prompt below)
if [ -z "$debian_chroot" ] && [ -r /etc/debian_chroot ]; then
    debian_chroot=$(cat /etc/debian_chroot)
fi

# set a fancy prompt (non-color, unless we know we "want" color)
case "$TERM" in
    xterm-color) color_prompt=yes;;
esac

# uncomment for a colored prompt, if the terminal has the capability; turned
# off by default to not distract the user: the focus in a terminal window
# should be on the output of commands, not on the prompt
#force_color_prompt=yes

if [ -n "$force_color_prompt" ]; then
    if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
        # We have color support; assume it's compliant with Ecma-48
        # (ISO/IEC-6429). (Lack of such support is extremely rare, and such
        # a case would tend to support setf rather than setaf.)
        color_prompt=yes
    else
        color_prompt=
    fi
fi

if [ "$color_prompt" = yes ]; then
    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
else
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
fi
unset color_prompt force_color_prompt

# If this is an xterm set the title to user@host:dir
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

# enable color support of ls and also add handy aliases
if [ -x /usr/bin/dircolors ]; then
    test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
    alias ls='ls --color=auto'
    #alias dir='dir --color=auto'
    #alias vdir='vdir --color=auto'

    alias grep='grep --color=auto'
    alias fgrep='fgrep --color=auto'
    alias egrep='egrep --color=auto'
fi

# some more ls aliases
alias ll='ls -alF'
alias la='ls -A'
alias l='ls -CF'

# Alias definitions.
# You may want to put all your additions into a separate file like
# ~/.bash_aliases, instead of adding them here directly.
# See /usr/share/doc/bash-doc/examples in the bash-doc package.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

# enable programmable completion features (you don't need to enable
# this, if it's already enabled in /etc/bash.bashrc and /etc/profile
# sources /etc/bash.bashrc).
#if [ -f /etc/bash_completion ] && ! shopt -oq posix; then
#    . /etc/bash_completion
#fi

pi-relay-node:/opt/cardano/cnode/scripts# cat cnode.sh

#!/bin/bash
# shellcheck disable=SC2086
#shellcheck source=/dev/null

. "$(dirname $0)"/env offline

######################################
# User Variables - Change as desired #
# Common variables set in env file   #
######################################

#placeholder section

######################################
# Do NOT modify code below           #
######################################

if [[ -S "${CARDANO_NODE_SOCKET_PATH}" ]]; then
  if pgrep -f "[c]ardano-node.*.${CARDANO_NODE_SOCKET_PATH}"; then
     echo "ERROR: A Cardano node is already running, please terminate this node before starting a new one with this script."
     exit 1
  else
    echo "WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up."
    unlink "${CARDANO_NODE_SOCKET_PATH}"
  fi
fi

[[ ! -d "${LOG_DIR}/archive" ]] && mkdir -p "${LOG_DIR}/archive"

[[ $(find "${LOG_DIR}"/*.json 2>/dev/null | wc -l) -gt 0 ]] && mv "${LOG_DIR}"/*.json "${LOG_DIR}"/archive/

if [[ -f "${POOL_DIR}/${POOL_OPCERT_FILENAME}" && -f "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" && -f "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" ]]; then
  cardano-node run \
    --topology "${TOPOLOGY}" \
    --config "${CONFIG}" \
    --database-path "${DB_DIR}" \
    --socket-path "${CARDANO_NODE_SOCKET_PATH}" \
    --host-addr 0.0.0.0 \
    --shelley-kes-key "${POOL_DIR}/${POOL_HOTKEY_SK_FILENAME}" \
    --shelley-vrf-key "${POOL_DIR}/${POOL_VRF_SK_FILENAME}" \
    --shelley-operational-certificate "${POOL_DIR}/${POOL_OPCERT_FILENAME}" \
    --port ${CNODE_PORT}
else
  cardano-node run \
    --topology "${TOPOLOGY}" \
    --config "${CONFIG}" \
    --database-path "${DB_DIR}" \
    --socket-path "${CARDANO_NODE_SOCKET_PATH}" \
    --host-addr 0.0.0.0 \
    --port ${CNODE_PORT}
fi

@Anti.biz
Setup:
2 RP 4 8gig mem
Running cardano nodes bare metal (no docker)
500gb SSD as boot for ubuntu20

Having said that, i’m up to about 55% synced, and it hasn’t had any failures in the past 5hrs… Makes me wonder if it was just some network bugs from earlier. I’m going to get another Pi ready and take it over to my friend’s place (another ISP) just to see if there’s any notable differences.

hmmm maybe cntools works differently , I dont see any echo lines at the bottom, so I dont want to mess anything up because I dont understand how cntools configurations work.

I would say maybe let your nodes run if their going. If you have any issues dont use the systemctl scripts and just use the direct command approach.

Yeah i’ll let it run as long as it can. Hopefully i’ll track down root cause at some point and post here for posterity. Thanks for your help!

1 Like

I kept having various issues, which mostly boiled down to files being in the wrong location and paths to those files written wrong in the configs. just takes some messing around and asking questions once you have a clear error.

Try this command, and restart the node


. "${HOME}/.bashrc"

testing now

Now, to check the logs, try

journalctl -e -f -u cnode.service

Also check in gliveview if ur node is up and synced, if it has peers, and if it’s processing transactions…

About to hit 100% synced. I setup a crontab to restart the service every 2 hrs last night.
Now ill let it run all day today and see how it does now that it is synced.
Also journal didn’t have any smoking guns, normal output.

Just providing a final update here.
Looks like the issue has resolved itself.

As far as root cause i’m leaning toward rate limiting or ddos protection from my ISP.
Since during the initial sync, it transfers tons of data from 1 or 2 sources.

I’ve been up and running with no issues since the initial sync.

Thanks everyone for the TS :slight_smile:

Matt

Thought I would come on and provide a final final update.
The issue was actually related to a poorly built cardano node. I was using an old and unsupported GHC version which caused the nodes to crash. (unsafe threads)
After using the correct and patched binaries it is syncing without issues.

1 Like