Issues after upgrading to 1.35.5: Missed Block

Hello community, I was producing blocks with version 1.35.3 but after upgrading to 1.35.5 I missed a block yesterday and in trying to figure out what happened I notice some oddities (Stakepool CTL). Firstly my log is writing the wrong version. See below, LiveView shows 1.35.5 however my logs show 1.35.3

image

Logs:

/opt/cardano/cnode/logs$ cat node0.json

{"app":[],"at":"2023-02-03T01:14:07.11Z","data":{"credentials":"Cardano","val":{"kind":"TraceNodeNotLeader","slot":83820556}},"env":"**1.35.3**:950c4","host":"x9x","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"36609","sev":"Info","thread":"499"}
{"app":[],"at":"2023-02-03T01:14:08.00Z","data":{"chainDensity":4.853867e-2,"credentials":"Cardano","delegMapSize":1262642,"kind":"TraceStartLeadershipCheck","slot":83820557,"utxoSize":9641613},"env":"1.35.3:950c4","host":"x9x","loc":null,"msg":"","ns":["cardano.node.LeadershipCheck"],"pid":"36609","sev":"Info","thread":"499"}


Secondly I cannot view the operational certificate via cardano-cli:

cardano-cli query kes-period-info --op-cert-file op.cert --mainnet

Command failed: query kes-period-info  Error: op.cert: op.cert: openFile: does not exist (No such file or directory)


image

You log has this in it. I’m not too good at reading the logs, but it is odd that it references version 1.35.3, not 1.35.5…

thanks Jeremy, any idea what would cause this and how I can fix it?

did u set the path for op.cert in case u ran the command from other location than pool folder files?

try which cardano-cli

nano env

inside env do u see the path with /.cabal/bin or /.local/bin (for first 2-3 lines)

Cheers,

1 Like

Hi again Alex,

my binaries live in:

which cardano-cli
/home/core/.cabal/bin/cardano-cli

I edited the env file and changed:

#CCLI="${HOME}/.cabal/bin/cardano-cli"                  # Override automatic detection of path to cardano-cli exec>
#CNCLI="${HOME}/.cargo/bin/cncli"  

to 
CCLI="${HOME}/.cabal/bin/cardano-cli"                  # Override automatic detection of path to cardano-cli exec>
CNCLI="${HOME}/.cargo/bin/cncli"  

I then restarted the node:

sudo systemctl restart cnode

but it still returns error:

 cardano-cli query kes-period-info --op-cert-file op.cert --mainnet
Command failed: query kes-period-info  Error: op.cert: op.cert: openFile: does not exist (No such file or directory)

I think I may have seen the problem, I notice the cnode service shows the following:

Which is pointing to the old binary. I assume that the cnode.sh file contains an path variable pointing to the wrong binary but shouldn’t those paths be in the env file, or is it located in the bashrc file?

Here is part of the env file (no further un-commented parts in the config):


CCLI="${HOME}/.cabal/bin/cardano-cli"                  # Override automatic detection of path to cardano-cli executable
CNCLI="${HOME}/.cargo/bin/cncli"                       # Override automatic detection of path to cncli executable (https://github.com/AndrewWestberg/cncli)
#CNODE_HOME="/opt/cardano/cnode"                        # Override default CNODE_HOME path (defaults to /opt/cardano/cnode)
CNODE_PORT=3001                                         # Set node port
#CONFIG="${CNODE_HOME}/files/config.json"               # Override automatic detection of node config path
#SOCKET="${CNODE_HOME}/sockets/node0.socket"            # Override automatic detection of path to socket
#TOPOLOGY="${CNODE_HOME}/files/topology.json"           # Override default topology.json path
#LOG_DIR="${CNODE_HOME}/logs"                           # Folder where your logs will be sent to (must pre-exist)
#DB_DIR="${CNODE_HOME}/db"                              # Folder to store the cardano-node blockchain db
#UPDATE_CHECK="Y"                                       # Check for updates to scripts, it will still be prompted before proceeding (Y|N).
#TMP_DIR="/tmp/cnode"                                   # Folder to hold temporary files in the various scripts, each script might create additional subfolders
#USE_EKG="Y"                                            # Use EKG metrics from the node instead of Prometheus. Prometheus metrics yield slightly better performance but>
#EKG_HOST=127.0.0.1                                     # Set node EKG host IP
#EKG_PORT=12788                                         # Override automatic detection of node EKG port
#PROM_HOST=127.0.0.1                                    # Set node Prometheus host IP
#PROM_PORT=12798                                        # Override automatic detection of node Prometheus port
#EKG_TIMEOUT=3                                          # Maximum time in seconds that you allow EKG request to take before aborting (node metrics)
#CURL_TIMEOUT=10                                        # Maximum time in seconds that you allow curl file download to take before aborting (GitHub update process)
#BLOCKLOG_DIR="${CNODE_HOME}/guild-db/blocklog"         # Override default directory used to store block data for core node
#BLOCKLOG_TZ="UTC"                                      # TimeZone to use when displaying blocklog - https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
#SHELLEY_TRANS_EPOCH=208                                # Override automatic detection of shelley epoch start, e.g 208 for mainnet
#TG_BOT_TOKEN=""                                        # Uncomment and set to enable telegramSend function. To create your own BOT-token and Chat-Id follow guide at:
#TG_CHAT_ID=""                                          # https://cardano-community.github.io/guild-operators/Scripts/sendalerts
TIMEOUT_LEDGER_STATE=3600                               # Timeout in seconds for querying and dumping ledger-state
#IP_VERSION=4                                           # The IP version to use for push and fetch, valid options: 4 | 6 | mix (Default: 4)
#DBSYNC_QUERY_FOLDER="${CNODE_HOME}/files/dbsync/queries" # [advanced feature] Folder containing DB-Sync chain analysis queries

#WALLET_FOLDER="${CNODE_HOME}/priv/wallet"              # Root folder for Wallets
#POOL_FOLDER="${CNODE_HOME}/priv/pool"                  # Root folder for Pools
                                                        # Each wallet and pool has a friendly name and subfolder containing all related keys, certificates, ...
POOL_NAME="stake_pool"                                           # Set the pool's name to run node as a core node (the name, NOT the ticker, ie folder name)


Here is part of the bashrc file:

PATH=/home/core/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
export NODE_CONFIG=mainnet
export LD_LIBRARY_PATH=/usr/local/lib:
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:
export NODE_BUILD_NUM=5821110
#[ -f "/home/core/.ghcup/env" ] && source "/home/core/.ghcup/env" # ghcup-env
export CNODE_HOME=/opt/cardano/cnode
export CARDANO_NODE_SOCKET_PATH=/opt/cardano/cnode/sockets/node0.socket
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

[ -f "/home/core/.ghcup/env" ] && source "/home/core/.ghcup/env" # ghcup-env


once with 1.35.5 the location of bin files has changed (the new bin should be located on /.local/bin not /.cabal/bin)

check

cd "${HOME}"/.cabal/bin/
ls -l
"${HOME}"/.local/bin/
ls -l

take a look on this

Thanks Alex,

I have edited the env file and commented the two lines pointing to .cabal folder.

I have copied all the contents from the .cabal folder to the .local folder:

cd ${HOME}/.cabal/bin/
cp ./* ${HOME}/.local/bin/

Then removed the .cabal folder:

rm -r -f ${HOME}/.cabal/

I then rebooted the machine and view the log file. It now shows 1.35.5 in the env. The cardano-cli is now being referenced in the .local folder correctly:

which cardano-cli
/home/core/.local/bin/cardano-cli

and the cnode service is running:

I thought I was in the clear but running the command still gives an error:

cardano-cli query kes-period-info --op-cert-file op.cert --mainnet
Command failed: query kes-period-info  Error: op.cert: op.cert: openFile: does not exist (No such file or directory)

ok, but from which director are u running this command?
you need to be inside /priv/pool/pool_folder/ if you use it without path…
if you are running it from another location you will need to add the full path like…

cardano-cli query kes-period-info --op-cert-file /opt/cardano/cnode/priv/pool/pool_folder/op.cert --mainnet

and also I hope you edited again the env file, topology file (add back the relays), config file (set tracemempool to true)…

Hi Alex,

I have followed the instructions for migrating to guild-deploy.sh and have reset the config files after update. Done this to both relay and core machines then rebooted both.

The nodes seem to be running ok with 14 connections incoming and 14 connection outgoing from the relay.

The problem is with the core machine, there are 2 outgoing connections (expected) but 0 incoming connections. I doubled check to make sure that the env file has the correct port set as per the previous env file.

image

image

image

you must also edit, for BP the topology file to add your relays inside and for Relays edit topologyUpdater.sh to add your BP (and other custom peers) to custom peers line (and uncomment the line)

you can find the informations in old files which were backup-ed

Many thanks Alex,

It was the topologyUpdater.sh file on the relay node and once I amended custom peers as you suggested, the BP node now has incoming connections.

Back to the original problem on why I missed a block, does the below operational certificate look ok?

cardano-cli query kes-period-info --op-cert-file /opt/cardano/cnode/priv/pool/stake_pool/op.cert --mainnet
✓ Operational certificate's KES period is within the correct KES period interval
✓ The operational certificate counter agrees with the node protocol state counter
{
    "qKesCurrentKesPeriod": 647,
    "qKesEndKesInterval": 678,
    "qKesKesKeyExpiry": null,
    "qKesMaxKESEvolutions": 62,
    "qKesNodeStateOperationalCertificateNumber": 6,
    "qKesOnDiskOperationalCertificateNumber": 6,
    "qKesRemainingSlotsInKesPeriod": 4002447,
    "qKesSlotsPerKesPeriod": 129600,
    "qKesStartKesInterval": 616
}

Or do you think perhaps it was the servers configuration that prevented the node from minting the block?

I seem to have another problem. After updating, I tried to run the cncli.sh sync command and I was told I had to upgrade to a later version. I upgrade to v5.3.0 (Releases · cardano-community/cncli · GitHub) as outlined on the instructions here (cncli/INSTALL.md at develop · cardano-community/cncli · GitHub) but now I am getting the following error.

is not ok, should be

   "qKesNodeStateOperationalCertificateNumber": 6,
    "qKesOnDiskOperationalCertificateNumber": 7,

It is ok if it is the current op cert and there has been a block minted on it. But if it is a new op cert it should be +1 (as far as I understand).

I have minted blocks since the operational certificate was issued so perhaps the same number is correct.

For those who had the same problem with cncli.sh, I was able to fix the issue by upgrading the server with sudo do-release-upgrade -d

1 Like

Did u search inside the logs?

I did a search for a couple of key words but nothing came up but to be honest I really don’t know where to start when looking for clues in the log as to why a block was missed. Is there a guide somewhere with steps to determine why a block was missed?